Preload Qwen3.6-27B and Qwen3.6-35B-A3B at startup (load-on-startup) so both are warm immediately under --models-max 2, set parallel = 1 as the [*] fallback for any other model, and adjust per-model context size and draft depth. |
||
|---|---|---|
| .. | ||
| default.nix | ||
| hardware-configuration.nix | ||
| llama-server.nix | ||
| models.ini | ||
| sound.nix | ||
| wyoming.nix | ||
| xremap.nix | ||