halo's llama-server now runs in router mode where the model field selects a preset (coder/fast/bge-m3); the old "halo-8000" name is no longer valid. Use the fast MoE model for the Talk bot's responses.