Add a multilingual bge-m3 embedding model to the llama-server preset and raise --models-max to 2 so it stays co-resident with the coder model. This gives the RAG stack a local embeddings endpoint without a second service, keeping all inference on halo. Embedding-specific overrides (ubatch-size, context, pooling) are pinned since the global defaults would truncate or misconfigure embedding requests. |
||
|---|---|---|
| .. | ||
| default.nix | ||
| hardware-configuration.nix | ||
| llama-server.nix | ||
| models.ini | ||
| sound.nix | ||
| wyoming.nix | ||
| xremap.nix | ||