nixcfg

Author	SHA1	Message	Date
Harald Hoyer	89c82cf804	feat(rag): install on m4 and amd, default endpoints to sgx Add the rag CLI to the m4 and amd hosts and point its default API_BASE and QDRANT_URL at sgx (where the gateway and Qdrant run) instead of localhost. The services live on sgx, so a localhost default only worked there; sgx resolves to itself on sgx, so this default is correct on every host and leaves only RAG_API_KEY to set.	2026-05-22 09:12:22 +02:00
Harald Hoyer	6fd6060dd7	fix(rag): send explicit encoding_format to avoid llama.cpp null error When encoding_format is unset, LiteLLM forwards it to the backend as JSON null, and llama.cpp's embeddings endpoint rejects it with a 500 ("type must be string, but is null"). Pin encoding_format="float" so the gateway always relays a string.	2026-05-22 08:34:42 +02:00
Harald Hoyer	116d04665d	fix(sgx): move LiteLLM off port 4000 to avoid uptime-kuma clash Uptime Kuma already binds 4000, so the gateway never got the port and requests hit the wrong service. Move LiteLLM to 4001 and update the rag CLI default endpoint to match.	2026-05-22 07:08:26 +02:00
Harald Hoyer	95668b71a7	feat(sgx): add CLI RAG stack (Qdrant + embeddings gateway + rag tool) Stand up document retrieval as shared, client-agnostic primitives rather than locking it inside Open WebUI: - Qdrant as the LAN-reachable vector store - LiteLLM gains a bge-m3 route so sgx:4000 also serves /v1/embeddings - a thin `rag` CLI (ingest/query, optional coder synthesis) usable from any machine and from scripts Embeddings and synthesis run on halo via the gateway; the CLI is configured entirely through RAG_* env vars.	2026-05-22 00:35:54 +02:00

4 commits