Commit graph

4 commits

Author SHA1 Message Date
89c82cf804 feat(rag): install on m4 and amd, default endpoints to sgx
Add the rag CLI to the m4 and amd hosts and point its default API_BASE
and QDRANT_URL at sgx (where the gateway and Qdrant run) instead of
localhost. The services live on sgx, so a localhost default only worked
there; sgx resolves to itself on sgx, so this default is correct on every
host and leaves only RAG_API_KEY to set.
2026-05-22 09:12:22 +02:00
6fd6060dd7 fix(rag): send explicit encoding_format to avoid llama.cpp null error
When encoding_format is unset, LiteLLM forwards it to the backend as JSON
null, and llama.cpp's embeddings endpoint rejects it with a 500
("type must be string, but is null"). Pin encoding_format="float" so the
gateway always relays a string.
2026-05-22 08:34:42 +02:00
116d04665d fix(sgx): move LiteLLM off port 4000 to avoid uptime-kuma clash
Uptime Kuma already binds 4000, so the gateway never got the port and
requests hit the wrong service. Move LiteLLM to 4001 and update the rag
CLI default endpoint to match.
2026-05-22 07:08:26 +02:00
95668b71a7 feat(sgx): add CLI RAG stack (Qdrant + embeddings gateway + rag tool)
Stand up document retrieval as shared, client-agnostic primitives rather
than locking it inside Open WebUI:

- Qdrant as the LAN-reachable vector store
- LiteLLM gains a bge-m3 route so sgx:4000 also serves /v1/embeddings
- a thin `rag` CLI (ingest/query, optional coder synthesis) usable from
  any machine and from scripts

Embeddings and synthesis run on halo via the gateway; the CLI is
configured entirely through RAG_* env vars.
2026-05-22 00:35:54 +02:00