Commit graph

3 commits

Author SHA1 Message Date
a95417da8b feat(halo): use unsloth/Qwen3.6-27B-GGUF:UD-Q8_K_XL 2026-05-06 13:02:20 +02:00
da88a9b2d6 fix(halo): drop speculative HSA_OVERRIDE_GFX_VERSION from llama-server
Was set defensively without knowing the actual GPU arch; if ROCm
supports the card natively, the override is at best a no-op and at
worst masks the real arch. Add it back with the right value if the
service actually fails to detect the GPU.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:42:17 +02:00
b11e5c8356 feat(halo): add llama-server systemd unit for Qwen3.6-35B-A3B
Runs llama.cpp's ROCm build under DynamicUser, with the HF model cache
in StateDirectory (survives systemctl clean) and KV slot saves in
CacheDirectory. Listens on :8000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:42:17 +02:00