nixcfg/overlays
Harald Hoyer f62e8ac470 perf(llama-cpp-rocm): tune for Strix Halo (gfx1151)
- Restrict rocmGpuTargets to gfx1151 (Radeon 8060S, RDNA 3.5) — smaller
  closure, faster compile, no wasted device kernels.
- Enable GGML_HIP_ROCWMMA_FATTN: rocWMMA-backed flash attention is a
  major win on RDNA3+ for the GPU-offloaded attention path.
- Enable GGML_HIP_GRAPHS to lower per-token launch overhead.
- Add rocwmma to buildInputs to satisfy the WMMA path.

llama-server on halo runs with -ngl 99 --flash-attn on, so these flags
target the hot path. CPU-side AVX-512 was skipped intentionally — Zen 5
has it, but with full GPU offload the CPU paths barely run.
2026-05-06 09:13:54 +02:00
..
extern nix fmt 2026-02-24 13:25:42 +01:00
inetutils-darwin-fix chore: nix fmt 2026-05-03 14:57:49 +02:00
mods chore: nix fmt 2026-05-03 14:57:49 +02:00
unstable perf(llama-cpp-rocm): tune for Strix Halo (gfx1151) 2026-05-06 09:13:54 +02:00