feat(halo): serve multiple llama models via models.ini preset
Replace the per-model llama-server units with a single service that uses llama-server's --models-preset (models.ini) and --models-max 2, so the 35B-A3B and 27B models are loaded on demand from one config. Drop the now-redundant 27B / 27B-MTP / coder-next variant files and the unused CacheDirectory + slot-save-path KV-slot handling.
This commit is contained in:
parent
ae068cfd84
commit
0edf975c30
6 changed files with 34 additions and 199 deletions
|
|
@ -10,8 +10,7 @@ with lib.metacfg;
|
|||
./hardware-configuration.nix
|
||||
#./xremap.nix
|
||||
./wyoming.nix
|
||||
#./llama-server-coder-next.nix
|
||||
./llama-server-27B-MTP.nix
|
||||
./llama-server.nix
|
||||
];
|
||||
|
||||
boot.lanzaboote.pkiBundle = "/var/lib/sbctl";
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue