Commit graph

5 commits

Author SHA1 Message Date
2b1bba0703 feat(rag): show cited sources and exit cleanly on interrupt/errors
- After a synthesized answer, print a legend mapping the [n] tags the
  model actually cited to their source files (on stderr, so piping the
  answer stays clean). Only cited tags are listed, not every retrieved
  chunk.
- Catch KeyboardInterrupt/BrokenPipeError for a quiet abort and collapse
  other exceptions to a one-line message; set RAG_DEBUG=1 for the full
  traceback.
2026-05-22 09:12:22 +02:00
89c82cf804 feat(rag): install on m4 and amd, default endpoints to sgx
Add the rag CLI to the m4 and amd hosts and point its default API_BASE
and QDRANT_URL at sgx (where the gateway and Qdrant run) instead of
localhost. The services live on sgx, so a localhost default only worked
there; sgx resolves to itself on sgx, so this default is correct on every
host and leaves only RAG_API_KEY to set.
2026-05-22 09:12:22 +02:00
6fd6060dd7 fix(rag): send explicit encoding_format to avoid llama.cpp null error
When encoding_format is unset, LiteLLM forwards it to the backend as JSON
null, and llama.cpp's embeddings endpoint rejects it with a 500
("type must be string, but is null"). Pin encoding_format="float" so the
gateway always relays a string.
2026-05-22 08:34:42 +02:00
116d04665d fix(sgx): move LiteLLM off port 4000 to avoid uptime-kuma clash
Uptime Kuma already binds 4000, so the gateway never got the port and
requests hit the wrong service. Move LiteLLM to 4001 and update the rag
CLI default endpoint to match.
2026-05-22 07:08:26 +02:00
95668b71a7 feat(sgx): add CLI RAG stack (Qdrant + embeddings gateway + rag tool)
Stand up document retrieval as shared, client-agnostic primitives rather
than locking it inside Open WebUI:

- Qdrant as the LAN-reachable vector store
- LiteLLM gains a bge-m3 route so sgx:4000 also serves /v1/embeddings
- a thin `rag` CLI (ingest/query, optional coder synthesis) usable from
  any machine and from scripts

Embeddings and synthesis run on halo via the gateway; the CLI is
configured entirely through RAG_* env vars.
2026-05-22 00:35:54 +02:00