Was set defensively without knowing the actual GPU arch; if ROCm
supports the card natively, the override is at best a no-op and at
worst masks the real arch. Add it back with the right value if the
service actually fails to detect the GPU.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs llama.cpp's ROCm build under DynamicUser, with the HF model cache
in StateDirectory (survives systemctl clean) and KV slot saves in
CacheDirectory. Listens on :8000.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>