The dep-caching layer creates stub files (src/main.rs and
benches/agent_benchmarks.rs) to warm the cargo registry cache, then
removes them with `rm -rf src benches`. The subsequent real source copy
only restored `src/` — leaving `benches/` absent. Cargo's manifest
parser then failed to locate `benches/agent_benchmarks.rs` referenced
in Cargo.toml, aborting the release build with:
error: failed to parse manifest at `/app/Cargo.toml`
Caused by: can't find `agent_benchmarks` bench at
`benches/agent_benchmarks.rs`
Fix: add `COPY benches/ benches/` alongside the `COPY src/ src/` step
so the real bench source is present for the incremental release build.
- remove trailing whitespace in .env.example Z.AI block
- align documented model defaults/options with current onboard/provider behavior
- keep this PR docs-focused by reverting incidental workflow edits
Add comprehensive documentation for custom API endpoint configuration
to address missing documentation reported in issue #567.
Changes:
- Create docs/custom-providers.md with detailed guide for custom: and anthropic-custom: formats
- Add custom endpoint examples to README.md configuration section
- Add note about daemon requirement for channels in Quick Start
- Add reference link to custom providers guide
Addresses: #567
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem:
Every async fn in SqliteMemory acquired self.conn.lock() and ran
synchronous rusqlite queries directly on the Tokio runtime thread.
This blocks the async executor, preventing other tasks from making
progress — especially harmful under concurrent recall/store load.
Fix:
- Change conn from Mutex<Connection> to Arc<Mutex<Connection>> so
the connection handle can be cloned into spawn_blocking closures.
- Wrap all synchronous database operations (store, recall, get, list,
forget, count, health_check) in tokio::task::spawn_blocking.
- Split get_or_compute_embedding into three phases: cache check
(blocking), embedding computation (async I/O), cache store
(blocking) — ensuring no lock is held across await points.
- Apply the same pattern to the reindex method.
The async I/O (embedding computation) remains on the Tokio runtime
while all SQLite access runs on the blocking thread pool, preventing
executor starvation.
Ref: zeroclaw-labs/zeroclaw#710 (Item 4)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem:
The iMessage listener opened a new SQLite connection to the Messages
database on every ~3-second poll cycle via get_max_rowid() and
fetch_new_messages(), creating ~40 connection open/close cycles per
minute. Each cycle incurs filesystem syscalls, WAL header reads,
and potential page cache cold starts.
Fix:
Open a single read-only connection before the poll loop and reuse it
across iterations using the 'shuttle' pattern: the connection is moved
into each spawn_blocking closure and returned alongside the results,
then reassigned for the next iteration. This eliminates per-poll
connection overhead while preserving the spawn_blocking pattern that
keeps SQLite I/O off the Tokio runtime thread.
The standalone get_max_rowid() and fetch_new_messages() helper
functions are retained for use by tests and other callers.
Ref: zeroclaw-labs/zeroclaw#710 (Item 9)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Classify incoming user messages by keyword/pattern and route to the
appropriate model hint automatically, feeding into the existing
RouterProvider. Disabled by default; opt-in via [query_classification]
config section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All five providers have HTTP clients but did not implement warmup(),
relying on the trait default no-op. This adds lightweight warmup calls
to establish TLS + HTTP/2 connection pools on startup, reducing
first-request latency. Each warmup is skipped when credentials are
absent, matching the OpenRouter pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ZEROCLAW_WORKSPACE can now be either:
- Legacy path: /path/to/workspace (config at /path/to/.zeroclaw/config.toml)
- Parent path: /path/to (config at /path/to/config.toml, workspace at /path/to/workspace)
This maintains backward compatibility with Docker's legacy folder structure
while also supporting the new parent-dir layout.
Implements Anthropic's prompt caching API to enable significant cost
reduction (up to 90%) and latency improvements (up to 85%) for
requests with repeated content.
Key features:
- Auto-caching heuristics: large system prompts (>3KB), tool
definitions, and long conversations (>4 messages)
- Full backward compatibility: cache_control fields are optional
- Supports both string and block-array system prompt formats
- Cache control on all content types (text, tool_use, tool_result)
Implementation details:
- Added CacheControl, SystemPrompt, and SystemBlock structures
- Updated NativeContentOut and NativeToolSpec with cache_control
- Strategic cache breakpoint placement (last tool, last message)
- Comprehensive test coverage for serialization and heuristics
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
(cherry picked from commit fff04f4edb5e4cb7e581b1b16035da8cc2e55cef)
Add three onboarding improvements for the Astrai provider:
- Signup URL: users now see "Get your API key at: https://as-trai.com"
during onboarding instead of a blank prompt
- Curated model list: auto (best execution), GPT-4o, Claude Sonnet 4.5,
DeepSeek V3, Llama 3.3 70B
- Live model fetch: Astrai's OpenAI-compatible /v1/models endpoint is
now queried when an API key is present, matching other providers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Problem
The test suite contained several categories of latent brittleness
identified in docs/testing-brittle-tests.md that would surface during
refactoring or cross-platform (Windows) CI execution:
1. Hardcoded Unix paths: \Path::new("/tmp")\ and \PathBuf::from("/tmp")\
used as workspace directories in agent tests, which fail on Windows
where /tmp does not exist.
2. Exact string match assertions: ~20 \ssert_eq!(response, "exact text")\
assertions in agent unit and e2e tests that break on any mock wording
change, even when the underlying orchestration behavior is correct.
3. Fragile error message string matching: \.contains("specific message")\
assertions coupled to internal error wording rather than testing the
error category or behavioral outcome.
## What Changed
### Hardcoded paths → platform-agnostic temp dirs (4 files, 7 locations)
- \src/agent/tests.rs\: Replaced all 4 instances of \Path::new("/tmp")\
and \PathBuf::from("/tmp")\ with \std::env::temp_dir()\ in
\make_memory()\, \uild_agent_with()\, \uild_agent_with_memory()\,
and \uild_agent_with_config()\ helpers.
- \ ests/agent_e2e.rs\: Replaced all 3 instances in \make_memory()\,
\uild_agent()\, and \uild_agent_xml()\ helpers.
### Exact string assertions → behavioral checks (2 files, ~20 locations)
- \src/agent/tests.rs\: Converted 10 \ssert_eq!(response, "...")\ to
\ssert!(!response.is_empty(), "descriptive message")\ across tests for
text pass-through, tool execution, tool failure recovery, XML dispatch,
mixed text+tool responses, multi-tool batch, and run_single delegation.
- \ ests/agent_e2e.rs\: Converted 9 exact-match assertions to behavioral
checks. Multi-turn test now uses \ssert_ne!(r1, r2)\ to verify
sequential responses are distinct without coupling to exact wording.
- Provider error propagation test simplified to \ssert!(result.is_err())\
without asserting on the error message string.
### Fragile error message assertions → structural checks (2 files)
- \src/tools/git_operations.rs\: Replaced fragile OR-branch string match
(\contains("git repository") || contains("Git command failed")\) with
structural assertions: checks \!result.success\, error is non-empty,
and error does NOT mention autonomy/read-only (verifying the failure
is git-related, not permission-related).
- \src/cron/scheduler.rs\: Replaced \contains("agent job failed:")\ with
\!success\ and \!output.is_empty()\ checks that verify failure behavior
without coupling to exact log format.
## What Was NOT Changed (and why)
- \src/agent/loop_.rs\ parser tests: Exact string assertions are the
contract for XML tool call parsing — the exact output IS the spec.
- \src/providers/reliable.rs\: Error message assertions test the error
format contract (provider/model attribution in failure messages).
- \src/service/mod.rs\: Already platform-gated with \#[cfg]\; XML escape
test is a formatting contract where exact match is appropriate.
- \src/config/schema.rs\: TOML test strings use /tmp as data values for
deserialization tests, not filesystem access; HOME tests already use
\std::env::temp_dir()\.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three fixes for conversation quality issues:
1. loop_.rs and channels now read max_tool_iterations from AgentConfig
instead of using a hardcoded constant of 10, making it configurable.
2. Memory recall now filters entries below a configurable
min_relevance_score threshold (default 0.4), preventing unrelated
memories from bleeding into conversation context.
3. Default hybrid search weights rebalanced from 70/30 vector/keyword
to 40/60, reducing cross-topic semantic bleed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem:
In record_run(), an INSERT into cron_runs followed by a pruning DELETE
ran as separate implicit transactions. If the INSERT succeeded but the
DELETE failed (e.g., due to disk pressure or lock contention), the run
table would grow unboundedly since the pruning step was lost while the
new row persisted.
Fix:
Wrap both statements in an explicit transaction using
conn.unchecked_transaction(). If either statement fails, the entire
transaction is rolled back, maintaining the invariant that the run
history stays bounded by max_run_history.
Ref: zeroclaw-labs/zeroclaw#710 (Item 5)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem:
The pruning query in record_run uses WHERE job_id = ?1 with
ORDER BY started_at DESC, but only single-column indexes exist
for job_id and started_at separately. SQLite must scan one index
and then sort or scan the other, which is suboptimal for the
combined filter + sort pattern used during pruning.
Fix:
Add a composite index CREATE INDEX IF NOT EXISTS
idx_cron_runs_job_started ON cron_runs(job_id, started_at).
This lets SQLite satisfy the WHERE job_id = ?1 ORDER BY
started_at DESC subquery in a single index scan without a
separate sort step. The existing single-column indexes are
retained for other queries that filter on only one column.
Ref: zeroclaw-labs/zeroclaw#710 (Item 7)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add `scripts/install.sh` — a single `curl | bash` installer that handles system deps, Rust, clone, build, and install
automatically.
- Update README Linux/macOS section with a "One-Line Installer (Recommended)" block above the existing manual steps.
1. Detects OS (Linux apt/dnf, macOS Xcode CLT)
2. Installs build deps + git via system package manager (sudo only here)
3. Installs Rust via rustup (skipped if already present)
4. Shallow-clones the repo to `/tmp/zeroclaw-install`
5. `cargo build --release --locked` + `cargo install --path . --force --locked`
6. Cleans up temp dir and prints next steps (`source ~/.cargo/env`, `zeroclaw onboard`)
The NativeToolDispatcher silently defaults to an empty object when tool
call arguments from the LLM fail to parse as JSON. The XML dispatcher
already logs a warning for the same case (line 68). Add a matching
tracing::warn with tool name and parse error for observability parity.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem: add_column_if_missing() checks PRAGMA table_info for column existence, then issues ALTER TABLE ADD COLUMN if not found. When two concurrent processes both pass the check before either executes the ALTER, the second process fails with a 'duplicate column name' error.
Fix: Catch the 'duplicate column name' SQLite error after the ALTER TABLE and treat it as a benign no-op. Also explicitly drop statement/rows handles before ALTER to release locks.
Ref: #710 (Item 8)
run_tool_call_loop used a hardcoded MAX_TOOL_ITERATIONS (10) and
trim_history/auto_compact_history used a hardcoded MAX_HISTORY_MESSAGES (50),
ignoring the user-configurable agent.max_tool_iterations and
agent.max_history_messages values in config.toml.
Meanwhile, agent.rs correctly reads from config — creating an inconsistency
where CLI single-shot mode respected config but the channel runtime and
interactive CLI loop silently ignored it.
Changes:
- Rename constants to DEFAULT_* to clarify they are fallback defaults
- Add max_tool_iterations parameter to run_tool_call_loop
- Add max_history parameter to trim_history and auto_compact_history
- Thread config.agent.max_tool_iterations through ChannelRuntimeContext
- Both CLI code paths now pass config values to run_tool_call_loop
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Discord channel was setting msg.channel to the numeric Discord
channel ID instead of the literal string 'discord'. This caused
process_channel_message() to fail the channels_by_name lookup since
the map is keyed by channel name (e.g. 'discord', 'telegram', 'slack').
The result: the bot receives messages and generates LLM responses but
never sends them back -- target_channel resolves to None so the send
call is silently skipped.
Every other channel (telegram, slack, whatsapp, matrix, signal, irc,
imessage, lark, dingtalk, qq, email, mattermost) correctly sets this
field to its channel name string. Discord was the only one using the
platform-specific ID.
The GLM provider previously relied on the trait default for
chat_with_history, which only forwarded the last user message. This adds
a proper multi-turn implementation that sends the full conversation
history to the GLM API, matching the pattern used by OpenRouter, Ollama,
and other providers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add memory.sqlite_open_timeout_secs config (None = wait indefinitely).
- When set, open the DB in a thread with recv_timeout; cap at 300s.
- Default remains None for backward compatibility.
- Document in README; add tests for timeout path and default.
Add start_typing/stop_typing overrides to TelegramChannel following the
same pattern as DiscordChannel: spawn a tokio task that sends
sendChatAction every 4 seconds (Telegram typing expires after 5s),
and abort it on stop_typing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The MemoryCategory::Custom variant already exists in the memory backend
but the memory_store tool only accepted core/daily/conversation. Now any
string is accepted as a category, passing through to Custom(name) for
non-builtin values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>