Commit graph

415 commits

Author SHA1 Message Date
1e3b2fc9a7 feat(halo): unsloth MTP 2026-05-13 19:42:54 +02:00
42c52bd87f refactor(mx): drive opencode bot via direct chat-completions API
The bot no longer shells out to `opencode run`. Instead it POSTs to the
OpenAI-compatible /chat/completions endpoint exposed by llama-server on
halo.hoyer.tail:8000 directly. This removes the Bun/sqlite cold-start
overhead per request, drops the pkgs.opencode runtime dependency, and
eliminates the ExecStartPre dance that materialized config.json into the
service's $HOME.

Conversation history is now stored as a proper OpenAI `messages` list
with system/user/assistant roles, instead of the XML blob that was
inlined into a single `opencode run` argument. The interactive opencode
setup (config/opencode/config.json) is unchanged — only the bot stops
depending on it.

The module gains a `modelBaseUrl` option; `model` is now the bare model
name (`halo-8000`) without the provider/ prefix that the opencode CLI
required.
2026-05-13 16:38:58 +02:00
d8e8293c0e feat(mx): add Nextcloud Talk opencode bot pointing at halo.hoyer.tail:8000
Mirrors the existing nextcloud-claude-bot setup but invokes `opencode run`
against the local `halo-8000` provider/model. The bot listens on
127.0.0.1:8086, is exposed via the `/_opencode-bot/` location on
nc.hoyer.xyz, and uses `@Halo` as its mention trigger in group chats.

The opencode config (config/opencode/config.json) is installed into the
service's $HOME/.config/opencode/ on each start, so the bot picks up the
same provider definition the user uses interactively. The model map keys
are renamed to `halo-8000` / `halo-8001` so the canonical
`provider/model` reference works without an alias indirection.
2026-05-13 15:08:18 +02:00
dadfb07914 fix(halo): set --alias halo-8000 2026-05-13 14:52:49 +02:00
f9a2e0d301 chore(x1,amd): disable cratedocs-mcp service
Keep it enabled only on sgx.
2026-05-13 11:35:59 +02:00
4ce7bcf354 fix(mx): make tailscale exit-node advertisement actually apply
tailscale set is strict about boolean flags and silently ignores
--advertise-exit-node without =true. Result: the tailscaled-set unit
ran cleanly but AdvertiseRoutes stayed null. Spell the value out so the
flag takes effect.
2026-05-13 09:28:20 +02:00
67b7c3a9fd feat(headscale): add ACL policy, isolate mx, make mx an exit node
Introduces a headscale ACL policy (file-mode) plus matching client config:

- New systems/x86_64-linux/attic/headscale-policy.hujson:
  * tag:llm restricts a node to talking only to halo:8000
  * all other harald@ nodes have full mesh access to each other
  * harald@ nodes can route internet traffic via approved exit nodes
  * autoApprovers.exitNode = [tag:llm] auto-approves the exit route
    advertised by any tag:llm node (currently mx)

- attic headscale.nix: wire policy.mode = "file" / policy.path to
  the .hujson above.

- mx default.nix: enable useRoutingFeatures = "server" (needed for IP
  forwarding) and add extraSetFlags = ["--advertise-exit-node"] so the
  flag is reapplied on every activation, not just initial login.

Operational steps after deploy:
  headscale nodes tag -i 10 -t tag:llm
2026-05-13 09:06:40 +02:00
87bdaf15da fix(attic): keep headscale domain as headscale.hoyer.xyz
Avoid breaking existing clients and the registered OIDC redirect URI by
keeping the original domain. Only the host backing it changes (mx -> attic);
DNS just needs to be repointed.
2026-05-13 08:48:37 +02:00
12c25bcde8 refactor(attic): move headscale from mx to attic
Headscale is moving off the mx mailserver onto the attic cache host.
The new public URL is https://headscale.hoyer.world.

- Switch from useACMEHost = "hoyer.xyz" (mx wildcard DNS-01) to
  enableACME = true, since attic only has HTTP-01 configured.
- Move headscale port to 8081 to avoid clashing with atticd on 8080.
- Drop the 192.168.178.254 LAN nameserver from dns.nameservers.global,
  which isn't reachable from the Hetzner instance.

Operational steps still required on attic:
- Provision /var/lib/headscale/client_secret
- Migrate the headscale state DB from mx
- Point headscale.hoyer.world DNS at attic
- Update the Nextcloud OIDC client's redirect URI
2026-05-13 08:42:46 +02:00
e440bf39fd feat(halo): llama-server-27B-MTP.nix 2026-05-12 16:16:15 +02:00
ca4ee90828 feat(halo): coder next 2026-05-11 12:22:34 +02:00
7b04b55ce8 feat(halo): cache-ram 0 2026-05-10 20:50:08 +02:00
04342222a2 fix(halo): 27b 2026-05-10 20:46:12 +02:00
689cdec28d feat(halo): activate qwen 27b 2026-05-10 20:44:38 +02:00
bef528e26a feat(halo): use qwen-35b-a3b 2026-05-10 20:44:38 +02:00
d47bb6e15b feat(halo): add different llama servers 2026-05-07 14:54:48 +02:00
b548126fb8 fix(halo): fix systemd description for llama 2026-05-07 14:40:18 +02:00
02b3c73376 fix(halo): fix systemd description for llama 2026-05-06 14:03:28 +02:00
7ebd97629d feat(halo): use am17an/Qwen3.6-27B-MTP-GGUF:Q8_0 with MTP spec 2026-05-06 14:01:31 +02:00
a95417da8b feat(halo): use unsloth/Qwen3.6-27B-GGUF:UD-Q8_K_XL 2026-05-06 13:02:20 +02:00
3a1cb7487a refactor(opencode): extract serve service into shared NixOS module
New `metacfg.services.opencode` module under modules/nixos/services/opencode/
with options for port, user, homeDir, sopsFile, and extraPackages. User and
homeDir default off `metacfg.user`. Host configs for amd and sgx reduce to
enabling the module and pointing at their respective sops file.

Service PATH gains jq, yq-go, python3, gh, gnutar, gzip, unzip, wget,
diffutils, patch, file, tree, bun, uv, ast-grep, claude-code, and tmux for
agent ergonomics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:43:27 +02:00
da88a9b2d6 fix(halo): drop speculative HSA_OVERRIDE_GFX_VERSION from llama-server
Was set defensively without knowing the actual GPU arch; if ROCm
supports the card natively, the override is at best a no-op and at
worst masks the real arch. Add it back with the right value if the
service actually fails to detect the GPU.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:42:17 +02:00
b11e5c8356 feat(halo): add llama-server systemd unit for Qwen3.6-35B-A3B
Runs llama.cpp's ROCm build under DynamicUser, with the HF model cache
in StateDirectory (survives systemctl clean) and KV slot saves in
CacheDirectory. Listens on :8000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:42:17 +02:00
624a72737c fix(opencode): narrow LD_LIBRARY_PATH to libstdc++ only
The full nix-ld library list shadowed nix's own curl, breaking
libnixstore.so with "CURL_OPENSSL_4 not found". The prebuilt node
watcher binding only needs libstdc++/libgcc_s, so use stdenv.cc.cc.lib
and let nix-built tools resolve their own deps via RUNPATH.
2026-05-04 08:58:37 +02:00
0d5fb73022 fix(amd): opencode 2026-05-03 16:31:02 +02:00
5693009488 fix(opencode): set LD_LIBRARY_PATH for prebuilt node bindings
The file watcher binding (and other node-precompiled .node modules
loaded via dlopen) failed with "libstdc++.so.6: cannot open shared
object file" because systemd services don't inherit the user shell's
LD path. Reuse the nix-ld library list so the service sees the same
common libraries unwrapped binaries get globally.
2026-05-03 16:29:24 +02:00
441df05d86 fix(opencode): add git and dev tools to service PATH
The opencode-serve unit ran with systemd's minimal default PATH, so
shell commands invoked by the agent (git, make, nix, node, rg, etc.)
were not found. Set systemd.services.opencode-serve.path on both sgx
and amd to a common dev toolset.
2026-05-03 16:09:31 +02:00
0e723e2da8 feat(amd): add opencode web server at opencode.amd.hoyer.world
Mirror of the sgx opencode setup: systemd service on port 4196 fronted
by nginx with a per-host ACME cert (DNS-01 via internetbs). Adds amd
key + path rule to .sops.yaml so secrets under .secrets/amd/ encrypt
for the host.
2026-05-03 15:55:15 +02:00
01f42c0851 feat(sops): trigger service restarts on secret rotation
Wire up restartUnits on secrets whose consumers cache them in memory
(daemons read at startup), so sops-nix restarts the affected unit on
activation when the decrypted content changes:

- firefly: app_key → phpfpm-firefly-iii;
  auto_import_secret + access_token → phpfpm-firefly-iii-data-importer
- searx: secret_key → uwsgi
- opencode: web password → opencode-serve
- mail: sasl_passwd → postfix
- forgejo: gitea_dbpass → forgejo; runner-token → gitea-runner-default

Secrets read on demand by oneshots/timers (firefly sparda_pin, ntfy
token, restic backup creds, acme dns creds, wg conf) are left as-is.
2026-05-03 15:23:40 +02:00
0989b8ae46 feat(sgx): add opencode web server 2026-05-03 14:57:49 +02:00
f74928ce5f chore: nix fmt 2026-05-03 14:57:49 +02:00
c99ea665d4 feat(sgx): add opencode 2026-05-03 13:47:39 +02:00
b2027bd283 sgx/network: open TCP 8000-8999 in firewall
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:47:39 +02:00
e96bf83dfd feat(halo): add python313Packages.huggingface-hub 2026-05-03 09:00:13 +02:00
73bf52dbaf sgx/firefly: bump fastcgi_read_timeout + PHP max_execution_time on both vhosts
Bulk imports of 100+ transactions per chunk hit the default 60s
fastcgi timeout on the main Firefly III vhost too — not just the
importer endpoint. The importer's per-transaction API call to Firefly's
/api/v1/transactions can take 20+s on a fresh DB without ANALYZE,
which compounds with the 30s PHP max_execution_time cap.

- nginx fastcgi_read_timeout=600s on both `firefly` and `firefly-import`
  vhosts
- php_admin_value[max_execution_time]=600 + memory_limit=512M on both
  PHP-FPM pools
- VANITY_URL on the importer now points to the main Firefly III URL
  (was wrongly pointing at the importer's own domain, breaking
  clickable transaction-show links in importer log messages)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 16:44:20 +02:00
491a7b38e4 sgx/firefly: switch Firefly III backend from sqlite to postgres
SQLite was slow under btrfs CoW, and the no-CoW migration path turned
out to be fragile (WAL deletion without checkpoint = data loss). Move
to PostgreSQL on Unix-socket peer auth — no password needed for the
local-host setup, NixOS provisions the database+user declaratively.

Drop the now-unused +C tmpfiles rule on the sqlite directory; the
leftover database.sqlite* files at /var/lib/firefly-iii/storage/database/
are harmless and can be removed manually after switch is verified.

Migration of existing Firefly III data is not preserved by this
commit — fresh-start path: re-register admin, re-issue PAT, re-POST
the bulk CSV through the importer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:49:08 +02:00
c1503b56aa sgx/firefly: disable btrfs CoW on Firefly III sqlite directory
Random-write SQLite traffic fragments CoW filesystems quickly. The `h`
tmpfiles directive sets +C on the database directory; new SQLite files
(WAL, SHM, recreated main DB) inherit no-CoW automatically. No-op on
non-btrfs filesystems.

Migration of existing files must be done manually with checkpoint-first:
  systemctl stop phpfpm-firefly-iii.service
  sqlite3 .../database.sqlite 'PRAGMA wal_checkpoint(TRUNCATE);'
  # then recreate main file inside the +C dir
  systemctl start phpfpm-firefly-iii.service

Skipping the wal_checkpoint and naively deleting .sqlite-wal will lose
all writes that haven't been checkpointed (PHP-FPM SIGTERM does not
trigger a checkpoint).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:31:24 +02:00
e0d2a2f50d feat(sgx): finish firefly-sparda-fetch — headless FinTS import
End-to-end verified: aqbanking-cli fetches Sparda Südwest transactions
via FinTS PIN/TAN + SecureGo+, exports CSV using a custom decimal-amount
profile, POSTs to firefly-iii-data-importer's autoupload endpoint, which
creates transactions in Firefly III via API.

Changes vs. previous WIP commit:
- firefly/access_token sops slot for the importer's Firefly III API auth
  (FIREFLY_III_ACCESS_TOKEN_FILE — was the missing piece causing 401s
  from the API after the autoupload secret authenticated)
- nginx fastcgi_read_timeout=600s on the importer vhost (prevents 504
  while PHP-FPM is still processing the batch)
- PHP-FPM max_execution_time=600s + memory_limit=512M on the importer
  pool (PHP's stock 30s aborts mid-import for batches > ~50 transactions)
- timer re-enabled, wantedBy=[timers.target]

Caveats baked into a code comment:
- Sparda online-banking PIN must be [A-Za-z0-9] only. aqbanking 6.8.2's
  -P pinfile mangles `:`, `+`, `'`, `?`, `@`, `%`, `*`; bank locks the
  access (3 soft / 9 hard strikes) on rejected attempts. Same applies
  whenever the sops secret is rotated.
- Bulk historical imports beyond the PSD2 90-day window need interactive
  SCA approval per ~30-day chunk and cannot run from the timer; the
  daily 35-day rolling window stays inside the no-SCA region.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 18:58:45 +02:00
74af9fd5ac wip(sgx): firefly-sparda-fetch service + timer (DISABLED)
End-to-end FinTS pipeline against Sparda Südwest is wired up but
disabled — aqbanking 6.8.2's `-P pinfile` flag does not consume the
file content correctly on this build (verified: pinfile bytes match
the manually-typed PIN exactly, yet the bank receives a wrong PIN).
Three rejected attempts locked the access at Sparda; do not re-arm
the timer until the auth path is replaced (likely python-fints).

What works:
- aqbanking config and FinTS dialog (manual PIN entry)
- getaccsepa workaround for HKCAZ "Mussfeld 9160" rejection
- custom CSV profile (decimal amounts + IBAN columns) wired in
- Firefly importer auto-upload settings + sops secret slot
- inbox + profile-symlink tmpfiles

What's broken:
- Headless PIN delivery via aqbanking-cli -P
- Timer left wantedBy=[] so it cannot fire post-deploy

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:44:19 +02:00
a5472c567a feat(amd): latest kernel ryzen kernel module 2026-04-27 10:40:23 +02:00
06d26311fd feat(sgx): add aqbanking + sparda PIN slot for FinTS bank sync
Lays the groundwork for Sparda-Bank Südwest transaction sync via
direct FinTS (no third-party data proxy). aqbanking-cli in the system
PATH, persistent state at /var/lib/firefly-aqbanking, sops slot for
the online-banking PIN. Initial enrollment must be done interactively
on the host; systemd timer for automated fetches comes in a follow-up.
2026-04-26 16:36:52 +02:00
f4eb0c5939 feat(sgx): add firefly-iii personal finance manager
Self-hosted Firefly III with data-importer, SQLite backend, behind
nginx with the existing internal.hoyer.world ACME cert.
2026-04-26 14:09:40 +02:00
4045aa1859 refactor(mx): extract disk check services into disk-check.nix
Share the check script via a parameterized mkDiskCheck function over
{ name, mountPoint, label } and iterate an attrset to emit the boot
and root services plus their daily timers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:28:27 +02:00
6d0186eadb chore: statix fix 2026-04-20 10:09:24 +02:00
a6736c2ac1 fix(sgx): treat rsync exit code 24 as success in backup
Files vanishing during transfer is expected for mail directories
where messages are constantly moved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 15:39:08 +02:00
c986fa7808 fix(attic): fix nginx proxy cache bypass and add cache lock
Replace broken proxy_cache_bypass (was bypassing every request) with
proxy_cache_lock to coalesce concurrent requests for the same path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 13:38:32 +01:00
13a386fe98 feat(attic): add daily garbage collection timer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 12:07:44 +01:00
510e3505a8 feat(attic): add nginx proxy cache to reduce S3 egress
Caches GET/HEAD responses up to 10 GB on disk with 30-day eviction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 12:06:13 +01:00
f2afa78817 fix(attic): correct S3 bucket name to teepot-attic
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 11:56:07 +01:00
f039e4af1b feat(attic): enable HTTPS and configure Nginx with ACME
- Allow TCP ports 80 and 443 in the firewall for HTTP and HTTPS traffic.
- Enable Nginx with ACME integration for automatic SSL certificate management.
- Configure a virtual host with proxy settings and support for WebSocket traffic.
2026-03-25 11:18:02 +01:00