nixcfg

Author	SHA1	Message	Date
Harald Hoyer	a1b55fe2ec	chore(halo): model update	2026-05-22 00:21:23 +02:00
Harald Hoyer	9986d286b1	refactor(openwebui): drop stale backend env vars now managed via UI The Ollama/OpenAI connection env vars are PersistentConfig: read only on first launch and thereafter owned by Open WebUI's DB. They no longer reflected the live backend, so remove them and document that connections are configured through the admin UI.	2026-05-21 23:15:47 +02:00
Harald Hoyer	fdefdf31b2	feat(litellm): add LiteLLM gateway on sgx fronting halo's llama-server Exposes an OpenAI-compatible endpoint on sgx:4000 (LAN-reachable) that routes the `coder` model to halo's llama-server, so clients get a stable gateway with per-key auth instead of hardcoding halo's address. Master key is sourced from a sops-encrypted env file.	2026-05-21 23:15:47 +02:00
Harald Hoyer	ccd8750899	chore(halo): set spec-draft-p-min for coder model Add a 0.74 confidence threshold so speculative drafting stops early once the draft model's predicted token probability drops below it, favoring shorter, higher-acceptance draft sequences.	2026-05-21 23:15:09 +02:00
Harald Hoyer	3a070413e4	chore(halo): upgrade coder model to Q8 quant and bump spec draft Switch the coder model from Q6_K to the UD-Q8_K_XL quant for better output quality, and raise spec-draft-n-max from 4 to 5 to allow longer speculative draft sequences.	2026-05-21 23:11:00 +02:00
Harald Hoyer	689389ebf8	chore(halo): rename model to coder and add ngram-simple speculation Rename the Qwen3.6-27B model section to "coder" so it matches the opencode provider config, and add ngram-simple to the speculative decoding chain alongside draft-mtp.	2026-05-21 22:07:57 +02:00
Harald Hoyer	ee396ffd42	chore(halo): more parallel	2026-05-21 20:54:08 +02:00
Harald Hoyer	70da67555f	chore(halo): llama.cpp update	2026-05-21 20:46:06 +02:00
Harald Hoyer	1376ab0ba0	chore(halo): reduce ubatch size	2026-05-21 08:47:39 +02:00
Harald Hoyer	6c5ce8742c	fix(halo): only one model	2026-05-20 14:23:42 +02:00
Harald Hoyer	5ee2f65337	chore(halo): tune llama models.ini and drop 35B-A3B model Serve only Qwen3.6-27B; remove the unused 35B-A3B preset. Tuning: - Move model-specific keys (spec-type, sampling temp/top-p/top-k/min-p) out of the [] defaults into [Qwen3.6-27B] so they no longer leak onto other models; draft-mtp in particular only works on MTP-weighted models. - Drop the duplicate parallel key from []. - Bump ubatch-size 256 -> 512 for faster iGPU prefill on Strix Halo. - Add threads-batch = 16 to use all cores for prefill while keeping generation at threads = 8 under full GPU offload.	2026-05-20 14:23:42 +02:00
Harald Hoyer	5b44e037a1	feat(halo): add `song <URL>` command to convert via song.link Resolves the URL through the Odesli public API (api.song.link) and replies with the canonical song.link page plus per-platform deep links (Spotify, Apple Music, YouTube/YT Music, Tidal, Deezer, Amazon Music, SoundCloud). Country is pinned to DE.	2026-05-20 09:42:11 +02:00
Harald Hoyer	ac70c57c15	chore(halo): preload both llama models and tune preset Preload Qwen3.6-27B and Qwen3.6-35B-A3B at startup (load-on-startup) so both are warm immediately under --models-max 2, set parallel = 1 as the [*] fallback for any other model, and adjust per-model context size and draft depth.	2026-05-20 07:14:26 +02:00
Harald Hoyer	31e491e314	Revert "fix(halo): 27 only" This reverts commit `72e7bf613f`.	2026-05-20 07:05:27 +02:00
Harald Hoyer	72e7bf613f	fix(halo): 27 only	2026-05-20 02:14:08 +02:00
Harald Hoyer	807a3d0d8e	fix(halo): context	2026-05-20 01:21:10 +02:00
Harald Hoyer	0edf975c30	feat(halo): serve multiple llama models via models.ini preset Replace the per-model llama-server units with a single service that uses llama-server's --models-preset (models.ini) and --models-max 2, so the 35B-A3B and 27B models are loaded on demand from one config. Drop the now-redundant 27B / 27B-MTP / coder-next variant files and the unused CacheDirectory + slot-save-path KV-slot handling.	2026-05-20 00:23:50 +02:00
Harald Hoyer	ae068cfd84	feat(mx): increase halo bot timeout	2026-05-19 23:52:46 +02:00
Harald Hoyer	b4063fda66	feat(halo): MTP --parallel 2	2026-05-19 23:48:53 +02:00
Harald Hoyer	8bd096ff8d	feat(halo): inc. mtp to 6	2026-05-19 06:40:13 +02:00
Harald Hoyer	492362fa31	feat(amd): enable Wake-on-LAN on enp7s0	2026-05-16 13:40:25 +02:00
Harald Hoyer	38d2d4f4ae	fix(halo): q6_k with mtp 2	2026-05-15 07:47:43 +02:00
Harald Hoyer	1e3b2fc9a7	feat(halo): unsloth MTP	2026-05-13 19:42:54 +02:00
Harald Hoyer	42c52bd87f	refactor(mx): drive opencode bot via direct chat-completions API The bot no longer shells out to `opencode run`. Instead it POSTs to the OpenAI-compatible /chat/completions endpoint exposed by llama-server on halo.hoyer.tail:8000 directly. This removes the Bun/sqlite cold-start overhead per request, drops the pkgs.opencode runtime dependency, and eliminates the ExecStartPre dance that materialized config.json into the service's $HOME. Conversation history is now stored as a proper OpenAI `messages` list with system/user/assistant roles, instead of the XML blob that was inlined into a single `opencode run` argument. The interactive opencode setup (config/opencode/config.json) is unchanged — only the bot stops depending on it. The module gains a `modelBaseUrl` option; `model` is now the bare model name (`halo-8000`) without the provider/ prefix that the opencode CLI required.	2026-05-13 16:38:58 +02:00
Harald Hoyer	d8e8293c0e	feat(mx): add Nextcloud Talk opencode bot pointing at halo.hoyer.tail:8000 Mirrors the existing nextcloud-claude-bot setup but invokes `opencode run` against the local `halo-8000` provider/model. The bot listens on 127.0.0.1:8086, is exposed via the `/_opencode-bot/` location on nc.hoyer.xyz, and uses `@Halo` as its mention trigger in group chats. The opencode config (config/opencode/config.json) is installed into the service's $HOME/.config/opencode/ on each start, so the bot picks up the same provider definition the user uses interactively. The model map keys are renamed to `halo-8000` / `halo-8001` so the canonical `provider/model` reference works without an alias indirection.	2026-05-13 15:08:18 +02:00
Harald Hoyer	dadfb07914	fix(halo): set `--alias halo-8000`	2026-05-13 14:52:49 +02:00
Harald Hoyer	f9a2e0d301	chore(x1,amd): disable cratedocs-mcp service Keep it enabled only on sgx.	2026-05-13 11:35:59 +02:00
Harald Hoyer	4ce7bcf354	fix(mx): make tailscale exit-node advertisement actually apply tailscale set is strict about boolean flags and silently ignores --advertise-exit-node without =true. Result: the tailscaled-set unit ran cleanly but AdvertiseRoutes stayed null. Spell the value out so the flag takes effect.	2026-05-13 09:28:20 +02:00
Harald Hoyer	67b7c3a9fd	feat(headscale): add ACL policy, isolate mx, make mx an exit node Introduces a headscale ACL policy (file-mode) plus matching client config: - New systems/x86_64-linux/attic/headscale-policy.hujson: * tag:llm restricts a node to talking only to halo:8000 * all other harald@ nodes have full mesh access to each other * harald@ nodes can route internet traffic via approved exit nodes * autoApprovers.exitNode = [tag:llm] auto-approves the exit route advertised by any tag:llm node (currently mx) - attic headscale.nix: wire policy.mode = "file" / policy.path to the .hujson above. - mx default.nix: enable useRoutingFeatures = "server" (needed for IP forwarding) and add extraSetFlags = ["--advertise-exit-node"] so the flag is reapplied on every activation, not just initial login. Operational steps after deploy: headscale nodes tag -i 10 -t tag:llm	2026-05-13 09:06:40 +02:00
Harald Hoyer	87bdaf15da	fix(attic): keep headscale domain as headscale.hoyer.xyz Avoid breaking existing clients and the registered OIDC redirect URI by keeping the original domain. Only the host backing it changes (mx -> attic); DNS just needs to be repointed.	2026-05-13 08:48:37 +02:00
Harald Hoyer	12c25bcde8	refactor(attic): move headscale from mx to attic Headscale is moving off the mx mailserver onto the attic cache host. The new public URL is https://headscale.hoyer.world. - Switch from useACMEHost = "hoyer.xyz" (mx wildcard DNS-01) to enableACME = true, since attic only has HTTP-01 configured. - Move headscale port to 8081 to avoid clashing with atticd on 8080. - Drop the 192.168.178.254 LAN nameserver from dns.nameservers.global, which isn't reachable from the Hetzner instance. Operational steps still required on attic: - Provision /var/lib/headscale/client_secret - Migrate the headscale state DB from mx - Point headscale.hoyer.world DNS at attic - Update the Nextcloud OIDC client's redirect URI	2026-05-13 08:42:46 +02:00
Harald Hoyer	e440bf39fd	feat(halo): llama-server-27B-MTP.nix	2026-05-12 16:16:15 +02:00
Harald Hoyer	ca4ee90828	feat(halo): coder next	2026-05-11 12:22:34 +02:00
Harald Hoyer	7b04b55ce8	feat(halo): cache-ram 0	2026-05-10 20:50:08 +02:00
Harald Hoyer	04342222a2	fix(halo): 27b	2026-05-10 20:46:12 +02:00
Harald Hoyer	689cdec28d	feat(halo): activate qwen 27b	2026-05-10 20:44:38 +02:00
Harald Hoyer	bef528e26a	feat(halo): use qwen-35b-a3b	2026-05-10 20:44:38 +02:00
Harald Hoyer	d47bb6e15b	feat(halo): add different llama servers	2026-05-07 14:54:48 +02:00
Harald Hoyer	b548126fb8	fix(halo): fix systemd description for llama	2026-05-07 14:40:18 +02:00
Harald Hoyer	02b3c73376	fix(halo): fix systemd description for llama	2026-05-06 14:03:28 +02:00
Harald Hoyer	7ebd97629d	feat(halo): use am17an/Qwen3.6-27B-MTP-GGUF:Q8_0 with MTP spec	2026-05-06 14:01:31 +02:00
Harald Hoyer	a95417da8b	feat(halo): use unsloth/Qwen3.6-27B-GGUF:UD-Q8_K_XL	2026-05-06 13:02:20 +02:00
Harald Hoyer	3a1cb7487a	refactor(opencode): extract serve service into shared NixOS module New `metacfg.services.opencode` module under modules/nixos/services/opencode/ with options for port, user, homeDir, sopsFile, and extraPackages. User and homeDir default off `metacfg.user`. Host configs for amd and sgx reduce to enabling the module and pointing at their respective sops file. Service PATH gains jq, yq-go, python3, gh, gnutar, gzip, unzip, wget, diffutils, patch, file, tree, bun, uv, ast-grep, claude-code, and tmux for agent ergonomics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:43:27 +02:00
Harald Hoyer	da88a9b2d6	fix(halo): drop speculative HSA_OVERRIDE_GFX_VERSION from llama-server Was set defensively without knowing the actual GPU arch; if ROCm supports the card natively, the override is at best a no-op and at worst masks the real arch. Add it back with the right value if the service actually fails to detect the GPU. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:42:17 +02:00
Harald Hoyer	b11e5c8356	feat(halo): add llama-server systemd unit for Qwen3.6-35B-A3B Runs llama.cpp's ROCm build under DynamicUser, with the HF model cache in StateDirectory (survives systemctl clean) and KV slot saves in CacheDirectory. Listens on :8000. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:42:17 +02:00
Harald Hoyer	624a72737c	fix(opencode): narrow LD_LIBRARY_PATH to libstdc++ only The full nix-ld library list shadowed nix's own curl, breaking libnixstore.so with "CURL_OPENSSL_4 not found". The prebuilt node watcher binding only needs libstdc++/libgcc_s, so use stdenv.cc.cc.lib and let nix-built tools resolve their own deps via RUNPATH.	2026-05-04 08:58:37 +02:00
Harald Hoyer	0d5fb73022	fix(amd): opencode	2026-05-03 16:31:02 +02:00
Harald Hoyer	5693009488	fix(opencode): set LD_LIBRARY_PATH for prebuilt node bindings The file watcher binding (and other node-precompiled .node modules loaded via dlopen) failed with "libstdc++.so.6: cannot open shared object file" because systemd services don't inherit the user shell's LD path. Reuse the nix-ld library list so the service sees the same common libraries unwrapped binaries get globally.	2026-05-03 16:29:24 +02:00
Harald Hoyer	441df05d86	fix(opencode): add git and dev tools to service PATH The opencode-serve unit ran with systemd's minimal default PATH, so shell commands invoked by the agent (git, make, nix, node, rg, etc.) were not found. Set systemd.services.opencode-serve.path on both sgx and amd to a common dev toolset.	2026-05-03 16:09:31 +02:00
Harald Hoyer	0e723e2da8	feat(amd): add opencode web server at opencode.amd.hoyer.world Mirror of the sgx opencode setup: systemd service on port 4196 fronted by nginx with a per-host ACME cert (DNS-01 via internetbs). Adds amd key + path rule to .sops.yaml so secrets under .secrets/amd/ encrypt for the host.	2026-05-03 15:55:15 +02:00

1 2 3 4 5 ...

437 commits