zeroclaw/docs/operations-runbook.md

2.8 KiB

ZeroClaw Operations Runbook

This runbook is for operators who maintain availability, security posture, and incident response.

Last verified: February 18, 2026.

Scope

Use this document for day-2 operations:

  • starting and supervising runtime
  • health checks and diagnostics
  • safe rollout and rollback
  • incident triage and recovery

For first-time installation, start from one-click-bootstrap.md.

Runtime Modes

Mode Command When to use
Foreground runtime zeroclaw daemon local debugging, short-lived sessions
Foreground gateway only zeroclaw gateway webhook endpoint testing
User service zeroclaw service install && zeroclaw service start persistent operator-managed runtime

Baseline Operator Checklist

  1. Validate configuration:
zeroclaw status
  1. Verify diagnostics:
zeroclaw doctor
zeroclaw channel doctor
  1. Start runtime:
zeroclaw daemon
  1. For persistent user session service:
zeroclaw service install
zeroclaw service start
zeroclaw service status

Health and State Signals

Signal Command / File Expected
Config validity zeroclaw doctor no critical errors
Channel connectivity zeroclaw channel doctor configured channels healthy
Runtime summary zeroclaw status expected provider/model/channels
Daemon heartbeat/state ~/.zeroclaw/daemon_state.json file updates periodically

Logs and Diagnostics

macOS / Windows (service wrapper logs)

  • ~/.zeroclaw/logs/daemon.stdout.log
  • ~/.zeroclaw/logs/daemon.stderr.log

Linux (systemd user service)

journalctl --user -u zeroclaw.service -f

Incident Triage Flow (Fast Path)

  1. Snapshot system state:
zeroclaw status
zeroclaw doctor
zeroclaw channel doctor
  1. Check service state:
zeroclaw service status
  1. If service is unhealthy, restart cleanly:
zeroclaw service stop
zeroclaw service start
  1. If channels still fail, verify allowlists and credentials in ~/.zeroclaw/config.toml.

  2. If gateway is involved, verify bind/auth settings ([gateway]) and local reachability.

Safe Change Procedure

Before applying config changes:

  1. backup ~/.zeroclaw/config.toml
  2. apply one logical change at a time
  3. run zeroclaw doctor
  4. restart daemon/service
  5. verify with status + channel doctor

Rollback Procedure

If a rollout regresses behavior:

  1. restore previous config.toml
  2. restart runtime (daemon or service)
  3. confirm recovery via doctor and channel health checks
  4. document incident root cause and mitigation