Uptime Kuma already binds 4000, so the gateway never got the port and
requests hit the wrong service. Move LiteLLM to 4001 and update the rag
CLI default endpoint to match.
Stand up document retrieval as shared, client-agnostic primitives rather
than locking it inside Open WebUI:
- Qdrant as the LAN-reachable vector store
- LiteLLM gains a bge-m3 route so sgx:4000 also serves /v1/embeddings
- a thin `rag` CLI (ingest/query, optional coder synthesis) usable from
any machine and from scripts
Embeddings and synthesis run on halo via the gateway; the CLI is
configured entirely through RAG_* env vars.