nixcfg

Author	SHA1	Message	Date
Harald Hoyer	bc0d79db57	feat(rag): route the fast model and use it for synthesis by default Expose halo's [fast] MoE preset through the LiteLLM gateway and make it the rag CLI's default chat model (overridable via RAG_CHAT_MODEL), so query synthesis is quicker than the larger coder model.	2026-05-22 09:15:59 +02:00
Harald Hoyer	116d04665d	fix(sgx): move LiteLLM off port 4000 to avoid uptime-kuma clash Uptime Kuma already binds 4000, so the gateway never got the port and requests hit the wrong service. Move LiteLLM to 4001 and update the rag CLI default endpoint to match.	2026-05-22 07:08:26 +02:00
Harald Hoyer	95668b71a7	feat(sgx): add CLI RAG stack (Qdrant + embeddings gateway + rag tool) Stand up document retrieval as shared, client-agnostic primitives rather than locking it inside Open WebUI: - Qdrant as the LAN-reachable vector store - LiteLLM gains a bge-m3 route so sgx:4000 also serves /v1/embeddings - a thin `rag` CLI (ingest/query, optional coder synthesis) usable from any machine and from scripts Embeddings and synthesis run on halo via the gateway; the CLI is configured entirely through RAG_* env vars.	2026-05-22 00:35:54 +02:00
Harald Hoyer	fdefdf31b2	feat(litellm): add LiteLLM gateway on sgx fronting halo's llama-server Exposes an OpenAI-compatible endpoint on sgx:4000 (LAN-reachable) that routes the `coder` model to halo's llama-server, so clients get a stable gateway with per-key auth instead of hardcoding halo's address. Master key is sourced from a sops-encrypted env file.	2026-05-21 23:15:47 +02:00

4 commits