feat(halo): serve multiple llama models via models.ini preset

Replace the per-model llama-server units with a single service that uses llama-server's --models-preset (models.ini) and --models-max 2, so the 35B-A3B and 27B models are loaded on demand from one config. Drop the now-redundant 27B / 27B-MTP / coder-next variant files and the unused CacheDirectory + slot-save-path KV-slot handling.
2026-05-20 00:19:27 +02:00 · 2026-05-20 00:19:27 +02:00 · 0edf975c30
commit 0edf975c30
parent ae068cfd84
6 changed files with 34 additions and 199 deletions
--- a/systems/x86_64-linux/halo/default.nix
+++ b/systems/x86_64-linux/halo/default.nix
@ -10,8 +10,7 @@ with lib.metacfg;
    ./hardware-configuration.nix
    #./xremap.nix
    ./wyoming.nix
-    #./llama-server-coder-next.nix
-    ./llama-server-27B-MTP.nix
+    ./llama-server.nix
  ];

  boot.lanzaboote.pkiBundle = "/var/lib/sbctl";