nixcfg

harald/nixcfg

Fork 0

Commit graph

Author	SHA1	Message	Date
Harald Hoyer	6fd6060dd7	fix(rag): send explicit encoding_format to avoid llama.cpp null error When encoding_format is unset, LiteLLM forwards it to the backend as JSON null, and llama.cpp's embeddings endpoint rejects it with a 500 ("type must be string, but is null"). Pin encoding_format="float" so the gateway always relays a string.	2026-05-22 08:34:42 +02:00
Harald Hoyer	116d04665d	fix(sgx): move LiteLLM off port 4000 to avoid uptime-kuma clash Uptime Kuma already binds 4000, so the gateway never got the port and requests hit the wrong service. Move LiteLLM to 4001 and update the rag CLI default endpoint to match.	2026-05-22 07:08:26 +02:00
Harald Hoyer	95668b71a7	feat(sgx): add CLI RAG stack (Qdrant + embeddings gateway + rag tool) Stand up document retrieval as shared, client-agnostic primitives rather than locking it inside Open WebUI: - Qdrant as the LAN-reachable vector store - LiteLLM gains a bge-m3 route so sgx:4000 also serves /v1/embeddings - a thin `rag` CLI (ingest/query, optional coder synthesis) usable from any machine and from scripts Embeddings and synthesis run on halo via the gateway; the CLI is configured entirely through RAG_* env vars.	2026-05-22 00:35:54 +02:00

Author

SHA1

Message

Date

Harald Hoyer

6fd6060dd7

fix(rag): send explicit encoding_format to avoid llama.cpp null error

When encoding_format is unset, LiteLLM forwards it to the backend as JSON
null, and llama.cpp's embeddings endpoint rejects it with a 500
("type must be string, but is null"). Pin encoding_format="float" so the
gateway always relays a string.

2026-05-22 08:34:42 +02:00

Harald Hoyer

116d04665d

fix(sgx): move LiteLLM off port 4000 to avoid uptime-kuma clash

Uptime Kuma already binds 4000, so the gateway never got the port and
requests hit the wrong service. Move LiteLLM to 4001 and update the rag
CLI default endpoint to match.

2026-05-22 07:08:26 +02:00

Harald Hoyer

95668b71a7

feat(sgx): add CLI RAG stack (Qdrant + embeddings gateway + rag tool)

Stand up document retrieval as shared, client-agnostic primitives rather
than locking it inside Open WebUI:

- Qdrant as the LAN-reachable vector store
- LiteLLM gains a bge-m3 route so sgx:4000 also serves /v1/embeddings
- a thin `rag` CLI (ingest/query, optional coder synthesis) usable from
  any machine and from scripts

Embeddings and synthesis run on halo via the gateway; the CLI is
configured entirely through RAG_* env vars.

2026-05-22 00:35:54 +02:00

3 commits