Add the rag CLI to the m4 and amd hosts and point its default API_BASE
and QDRANT_URL at sgx (where the gateway and Qdrant run) instead of
localhost. The services live on sgx, so a localhost default only worked
there; sgx resolves to itself on sgx, so this default is correct on every
host and leaves only RAG_API_KEY to set.
When encoding_format is unset, LiteLLM forwards it to the backend as JSON
null, and llama.cpp's embeddings endpoint rejects it with a 500
("type must be string, but is null"). Pin encoding_format="float" so the
gateway always relays a string.
LiteLLM 1.75.5 in nixpkgs 25.11 crashes on /v1/embeddings with
"RuntimeError: Stream consumed" (the proxy reads the request body twice).
Source litellm from nixos-unstable (1.83.14) where this is resolved, so
the gateway can proxy embeddings to halo's bge-m3 for any client.
Uptime Kuma already binds 4000, so the gateway never got the port and
requests hit the wrong service. Move LiteLLM to 4001 and update the rag
CLI default endpoint to match.
Stand up document retrieval as shared, client-agnostic primitives rather
than locking it inside Open WebUI:
- Qdrant as the LAN-reachable vector store
- LiteLLM gains a bge-m3 route so sgx:4000 also serves /v1/embeddings
- a thin `rag` CLI (ingest/query, optional coder synthesis) usable from
any machine and from scripts
Embeddings and synthesis run on halo via the gateway; the CLI is
configured entirely through RAG_* env vars.
Add a multilingual bge-m3 embedding model to the llama-server preset and
raise --models-max to 2 so it stays co-resident with the coder model.
This gives the RAG stack a local embeddings endpoint without a second
service, keeping all inference on halo. Embedding-specific overrides
(ubatch-size, context, pooling) are pinned since the global defaults
would truncate or misconfigure embedding requests.
opencode reformats files in the background after each write, which
desyncs the model's view of a file between edits and breaks follow-up
edits whose old_string no longer matches. Set formatter = false; the
formatter tools stay on PATH via home.packages for manual use.
Documents Rust language, stdlib, and Cargo features stabilized after the
~2025 training cutoff (1.85–1.95, 2024 edition) so agents generate
current syntax instead of relying on a stale mental model.
The Ollama/OpenAI connection env vars are PersistentConfig: read only on
first launch and thereafter owned by Open WebUI's DB. They no longer
reflected the live backend, so remove them and document that connections
are configured through the admin UI.
Exposes an OpenAI-compatible endpoint on sgx:4000 (LAN-reachable) that
routes the `coder` model to halo's llama-server, so clients get a stable
gateway with per-key auth instead of hardcoding halo's address. Master
key is sourced from a sops-encrypted env file.
Add a 0.74 confidence threshold so speculative drafting stops early
once the draft model's predicted token probability drops below it,
favoring shorter, higher-acceptance draft sequences.
Switch the coder model from Q6_K to the UD-Q8_K_XL quant for better
output quality, and raise spec-draft-n-max from 4 to 5 to allow longer
speculative draft sequences.
Rename the Qwen3.6-27B model section to "coder" so it matches the
opencode provider config, and add ngram-simple to the speculative
decoding chain alongside draft-mtp.
Re-add home.packages so nixfmt, prettier, shfmt, ruff, taplo and stylua
are available for interactive use, alongside the store-path-pinned
references in the generated config.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build opencode's config.json with pkgs.formats.json instead of shipping
a static file, pinning each formatter command to its store-path binary
via lib.getExe. Drops the standalone config/opencode/config.json.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add formatter entries for nix, prettier (md/yaml/json/web), shell,
python, toml and lua, and install the matching tools via the opencode
home module so they are available wherever opencode runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Serve only Qwen3.6-27B; remove the unused 35B-A3B preset.
Tuning:
- Move model-specific keys (spec-type, sampling temp/top-p/top-k/min-p)
out of the [*] defaults into [Qwen3.6-27B] so they no longer leak onto
other models; draft-mtp in particular only works on MTP-weighted models.
- Drop the duplicate parallel key from [*].
- Bump ubatch-size 256 -> 512 for faster iGPU prefill on Strix Halo.
- Add threads-batch = 16 to use all cores for prefill while keeping
generation at threads = 8 under full GPU offload.
Resolves the URL through the Odesli public API (api.song.link) and
replies with the canonical song.link page plus per-platform deep links
(Spotify, Apple Music, YouTube/YT Music, Tidal, Deezer, Amazon Music,
SoundCloud). Country is pinned to DE.
Preload Qwen3.6-27B and Qwen3.6-35B-A3B at startup (load-on-startup)
so both are warm immediately under --models-max 2, set parallel = 1
as the [*] fallback for any other model, and adjust per-model context
size and draft depth.
Replace the per-model llama-server units with a single service that
uses llama-server's --models-preset (models.ini) and --models-max 2,
so the 35B-A3B and 27B models are loaded on demand from one config.
Drop the now-redundant 27B / 27B-MTP / coder-next variant files and
the unused CacheDirectory + slot-save-path KV-slot handling.
A skeptical PR review skill that defaults to REJECT. Encodes the
staff-engineer adversarial stance: lead with problems, assume bugs
exist, require severity+location+fix+test per finding, mandate an
execution trace, and end with an explicit verdict.
Includes base-branch detection (gh pr view → upstream → heuristic →
ask) so the review never silently diffs against the wrong base.
Apple's built-in ssh-agent has no sk-api/libfido2 support and refuses
signing operations for ed25519-sk / ecdsa-sk hardware keys. Enable the
existing metacfg.security.ssh module (which runs pkgs.openssh's
ssh-agent under launchd) via the common darwin suite, and export
SSH_AUTH_SOCK from environment.shellInit so bash, zsh, and fish (via
/etc/fish/foreign-env/shellInit) all point at the nix-managed socket.
Add obsidian-skills as a flake input (flake = false) and map each
skill subdirectory into ~/.agents/skills/<skill>, alongside the
existing local skills. Updates flow through `nix flake update
obsidian-skills`.
- harald@sgx-nixos: orphan, no matching NixOS system and no
home.stateVersion set, so it failed standalone evaluation
- harald@sgx-azure: referenced metacfg.tools.direnv.enable but no
modules/home/tools/direnv exists, causing eval failure