Add two Mattermost channel enhancements:
1. thread_replies config option (default: false)
- When false, replies go to the channel root instead of threading.
- When true, replies thread on the original post.
- Existing thread replies always stay in-thread regardless of setting.
2. Typing indicator (start_typing/stop_typing)
- Implements the Channel trait's typing methods for Mattermost.
- Fires POST /api/v4/users/me/typing every 4s in a background task.
- Supports parent_id for threaded typing indicators.
- Aborts cleanly on stop_typing via JoinHandle.
Updated all MattermostChannel::new call sites (start_channels, scheduler)
and added 9 unit tests covering thread routing and edge cases.
Reasoning/thinking models (Qwen3, GLM-4, DeepSeek, etc.) may return
output in `reasoning_content` instead of `content`. Add automatic
fallback for both OpenAI and OpenAI-compatible providers, including
streaming SSE support.
Changes:
- Add `reasoning_content` field to response structs in both providers
- Add `effective_content()` helper that prefers `content` but falls
back to `reasoning_content` when content is empty/null/missing
- Update all extraction sites to use `effective_content()`
- Add streaming SSE fallback for `reasoning_content` chunks
- Add 16 focused unit tests covering all edge cases
Tested end-to-end against GLM-4.7-flash via local LLM server.
Channel messages (Telegram, Discord, etc.) previously had no multi-turn
context — each incoming message was processed with a fresh history
containing only the system prompt and the current user message.
This patch:
- Maintains a per-sender conversation history map (Arc<Mutex<HashMap>>)
- Restores prior turns when processing each new message
- Saves user + assistant turns after successful LLM response
- Caps history at 50 messages per sender to bound memory usage
Fixes the channel context continuity issue where the bot would respond
with 'I have no context' to every follow-up question.
- Add strip_tool_call_tags() to finalize_draft to prevent Markdown
parse failures from tool-call tags reaching Telegram API
- Deduplicate parse_reply_target() call in update_draft (was called
twice, discarding thread_id both times)
- Replace body.as_object_mut().unwrap() mutation with separate
plain_body JSON literal (eliminates unwrap in runtime path)
- Clean up per-chat rate-limit HashMap entry in finalize_draft to
prevent unbounded growth over long uptimes
- Extract magic number 80 to STREAM_CHUNK_MIN_CHARS constant in
agent loop
- Fix silent parse failures: message_id.parse().unwrap_or(0) replaced
with match + tracing::warn on parse error (update_draft, finalize_draft)
- Fix UTF-8 panic: byte-based truncation replaced with char_indices()
safe boundary detection for TELEGRAM_MAX_MESSAGE_LENGTH
- Fix global rate limiter: Mutex<Option<Instant>> replaced with
Mutex<HashMap<String, Instant>> for per-chat rate limiting so
concurrent conversations don't interfere with each other
- Document Block variant: clarify it's reserved for future use and
currently behaves the same as Partial
Previously on_delta sent the entire completed response as a single
message, defeating the purpose of the streaming draft updates. Now
the text is split into ~80-char chunks on whitespace boundaries
(UTF-8 safe via split_inclusive) and sent progressively through the
channel, so Telegram draft edits show text arriving incrementally.
The consumer in process_channel_message already accumulates chunks
and calls update_draft with the full text so far, and Telegram's
rate-limiting (draft_update_interval_ms) throttles editMessageText
calls to avoid API spam.
Wire the existing provider-layer streaming infrastructure through the
channel trait and agent loop so Telegram users see tokens arrive
progressively via editMessageText, instead of waiting for the full
response.
Changes:
- Add StreamMode enum (off/partial/block) and draft_update_interval_ms
to TelegramConfig (backward-compatible defaults: off, 1000ms)
- Add supports_draft_updates/send_draft/update_draft/finalize_draft to
Channel trait with no-op defaults (zero impact on existing channels)
- Implement draft methods on TelegramChannel using sendMessage +
editMessageText with rate limiting and Markdown fallback
- Add on_delta mpsc::Sender<String> parameter to run_tool_call_loop
(None preserves existing behavior)
- Wire streaming in process_channel_message: when channel supports
drafts, send initial draft, spawn updater task, finalize on completion
Edge cases handled:
- 4096-char limit: finalize draft and fall back to chunked send
- Broken Markdown: use no parse_mode during streaming, apply on finalize
- Edit failures: fall back to sending complete response as new message
- Rate limiting: configurable draft_update_interval_ms (default 1s)
Add native web search capability that works regardless of LLM tool-calling
support. This is particularly useful for GLM models via Z.AI that don't
reliably support standard tool calling formats.
Features:
- DuckDuckGo provider (free, no API key required)
- Brave Search provider (optional, requires API key)
- Configurable max results and timeout
- Enabled by default
Configuration (config.toml):
[web_search]
enabled = true
provider = "duckduckgo"
max_results = 5
The tool allows agents to search the web for current information without
requiring proper tool calling support from the LLM.
Also includes CI workflow fix for first-interaction action inputs.
GLM models output tool calls in proprietary formats that ZeroClaw
doesn't natively support. This adds parsing for GLM-specific formats:
- browser_open/url>https://... -> shell tool with curl command
- shell/command>ls -> shell tool with command arg
- http_request/url>... -> http_request tool
- Plain URLs -> shell tool with curl command
Also adds:
- find_json_end() helper for parsing JSON objects
- Unclosed <toolcall> tag handling
- Unit tests for GLM-style parsing
The parsing is deliberately placed after XML and markdown code block
parsing, so it acts as a fallback for models that don't use standard
tool calling formats.
This enables GLM models (via Z.AI or other providers) to successfully
execute tools in ZeroClaw.
fix(token): update token generation to use rand::rng()
Addresses warning coming from compiler:
❯ rspberrypi@localhost:~/zeroclaw$ cargo build --release --locked
warning: use of deprecated function rand::thread_rng: Renamed to rng
--> src/security/pairing.rs:186:11 |
186 | rand::thread_rng().fill_bytes(&mut bytes); | ^^^^^^^^^^
|
= note: #[warn(deprecated)] on by default
The dep-caching layer creates stub files (src/main.rs and
benches/agent_benchmarks.rs) to warm the cargo registry cache, then
removes them with `rm -rf src benches`. The subsequent real source copy
only restored `src/` — leaving `benches/` absent. Cargo's manifest
parser then failed to locate `benches/agent_benchmarks.rs` referenced
in Cargo.toml, aborting the release build with:
error: failed to parse manifest at `/app/Cargo.toml`
Caused by: can't find `agent_benchmarks` bench at
`benches/agent_benchmarks.rs`
Fix: add `COPY benches/ benches/` alongside the `COPY src/ src/` step
so the real bench source is present for the incremental release build.
- remove trailing whitespace in .env.example Z.AI block
- align documented model defaults/options with current onboard/provider behavior
- keep this PR docs-focused by reverting incidental workflow edits
Add comprehensive documentation for custom API endpoint configuration
to address missing documentation reported in issue #567.
Changes:
- Create docs/custom-providers.md with detailed guide for custom: and anthropic-custom: formats
- Add custom endpoint examples to README.md configuration section
- Add note about daemon requirement for channels in Quick Start
- Add reference link to custom providers guide
Addresses: #567
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem:
Every async fn in SqliteMemory acquired self.conn.lock() and ran
synchronous rusqlite queries directly on the Tokio runtime thread.
This blocks the async executor, preventing other tasks from making
progress — especially harmful under concurrent recall/store load.
Fix:
- Change conn from Mutex<Connection> to Arc<Mutex<Connection>> so
the connection handle can be cloned into spawn_blocking closures.
- Wrap all synchronous database operations (store, recall, get, list,
forget, count, health_check) in tokio::task::spawn_blocking.
- Split get_or_compute_embedding into three phases: cache check
(blocking), embedding computation (async I/O), cache store
(blocking) — ensuring no lock is held across await points.
- Apply the same pattern to the reindex method.
The async I/O (embedding computation) remains on the Tokio runtime
while all SQLite access runs on the blocking thread pool, preventing
executor starvation.
Ref: zeroclaw-labs/zeroclaw#710 (Item 4)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem:
The iMessage listener opened a new SQLite connection to the Messages
database on every ~3-second poll cycle via get_max_rowid() and
fetch_new_messages(), creating ~40 connection open/close cycles per
minute. Each cycle incurs filesystem syscalls, WAL header reads,
and potential page cache cold starts.
Fix:
Open a single read-only connection before the poll loop and reuse it
across iterations using the 'shuttle' pattern: the connection is moved
into each spawn_blocking closure and returned alongside the results,
then reassigned for the next iteration. This eliminates per-poll
connection overhead while preserving the spawn_blocking pattern that
keeps SQLite I/O off the Tokio runtime thread.
The standalone get_max_rowid() and fetch_new_messages() helper
functions are retained for use by tests and other callers.
Ref: zeroclaw-labs/zeroclaw#710 (Item 9)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Classify incoming user messages by keyword/pattern and route to the
appropriate model hint automatically, feeding into the existing
RouterProvider. Disabled by default; opt-in via [query_classification]
config section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All five providers have HTTP clients but did not implement warmup(),
relying on the trait default no-op. This adds lightweight warmup calls
to establish TLS + HTTP/2 connection pools on startup, reducing
first-request latency. Each warmup is skipped when credentials are
absent, matching the OpenRouter pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ZEROCLAW_WORKSPACE can now be either:
- Legacy path: /path/to/workspace (config at /path/to/.zeroclaw/config.toml)
- Parent path: /path/to (config at /path/to/config.toml, workspace at /path/to/workspace)
This maintains backward compatibility with Docker's legacy folder structure
while also supporting the new parent-dir layout.
Implements Anthropic's prompt caching API to enable significant cost
reduction (up to 90%) and latency improvements (up to 85%) for
requests with repeated content.
Key features:
- Auto-caching heuristics: large system prompts (>3KB), tool
definitions, and long conversations (>4 messages)
- Full backward compatibility: cache_control fields are optional
- Supports both string and block-array system prompt formats
- Cache control on all content types (text, tool_use, tool_result)
Implementation details:
- Added CacheControl, SystemPrompt, and SystemBlock structures
- Updated NativeContentOut and NativeToolSpec with cache_control
- Strategic cache breakpoint placement (last tool, last message)
- Comprehensive test coverage for serialization and heuristics
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
(cherry picked from commit fff04f4edb5e4cb7e581b1b16035da8cc2e55cef)
Add three onboarding improvements for the Astrai provider:
- Signup URL: users now see "Get your API key at: https://as-trai.com"
during onboarding instead of a blank prompt
- Curated model list: auto (best execution), GPT-4o, Claude Sonnet 4.5,
DeepSeek V3, Llama 3.3 70B
- Live model fetch: Astrai's OpenAI-compatible /v1/models endpoint is
now queried when an API key is present, matching other providers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Problem
The test suite contained several categories of latent brittleness
identified in docs/testing-brittle-tests.md that would surface during
refactoring or cross-platform (Windows) CI execution:
1. Hardcoded Unix paths: \Path::new("/tmp")\ and \PathBuf::from("/tmp")\
used as workspace directories in agent tests, which fail on Windows
where /tmp does not exist.
2. Exact string match assertions: ~20 \ssert_eq!(response, "exact text")\
assertions in agent unit and e2e tests that break on any mock wording
change, even when the underlying orchestration behavior is correct.
3. Fragile error message string matching: \.contains("specific message")\
assertions coupled to internal error wording rather than testing the
error category or behavioral outcome.
## What Changed
### Hardcoded paths → platform-agnostic temp dirs (4 files, 7 locations)
- \src/agent/tests.rs\: Replaced all 4 instances of \Path::new("/tmp")\
and \PathBuf::from("/tmp")\ with \std::env::temp_dir()\ in
\make_memory()\, \uild_agent_with()\, \uild_agent_with_memory()\,
and \uild_agent_with_config()\ helpers.
- \ ests/agent_e2e.rs\: Replaced all 3 instances in \make_memory()\,
\uild_agent()\, and \uild_agent_xml()\ helpers.
### Exact string assertions → behavioral checks (2 files, ~20 locations)
- \src/agent/tests.rs\: Converted 10 \ssert_eq!(response, "...")\ to
\ssert!(!response.is_empty(), "descriptive message")\ across tests for
text pass-through, tool execution, tool failure recovery, XML dispatch,
mixed text+tool responses, multi-tool batch, and run_single delegation.
- \ ests/agent_e2e.rs\: Converted 9 exact-match assertions to behavioral
checks. Multi-turn test now uses \ssert_ne!(r1, r2)\ to verify
sequential responses are distinct without coupling to exact wording.
- Provider error propagation test simplified to \ssert!(result.is_err())\
without asserting on the error message string.
### Fragile error message assertions → structural checks (2 files)
- \src/tools/git_operations.rs\: Replaced fragile OR-branch string match
(\contains("git repository") || contains("Git command failed")\) with
structural assertions: checks \!result.success\, error is non-empty,
and error does NOT mention autonomy/read-only (verifying the failure
is git-related, not permission-related).
- \src/cron/scheduler.rs\: Replaced \contains("agent job failed:")\ with
\!success\ and \!output.is_empty()\ checks that verify failure behavior
without coupling to exact log format.
## What Was NOT Changed (and why)
- \src/agent/loop_.rs\ parser tests: Exact string assertions are the
contract for XML tool call parsing — the exact output IS the spec.
- \src/providers/reliable.rs\: Error message assertions test the error
format contract (provider/model attribution in failure messages).
- \src/service/mod.rs\: Already platform-gated with \#[cfg]\; XML escape
test is a formatting contract where exact match is appropriate.
- \src/config/schema.rs\: TOML test strings use /tmp as data values for
deserialization tests, not filesystem access; HOME tests already use
\std::env::temp_dir()\.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three fixes for conversation quality issues:
1. loop_.rs and channels now read max_tool_iterations from AgentConfig
instead of using a hardcoded constant of 10, making it configurable.
2. Memory recall now filters entries below a configurable
min_relevance_score threshold (default 0.4), preventing unrelated
memories from bleeding into conversation context.
3. Default hybrid search weights rebalanced from 70/30 vector/keyword
to 40/60, reducing cross-topic semantic bleed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem:
In record_run(), an INSERT into cron_runs followed by a pruning DELETE
ran as separate implicit transactions. If the INSERT succeeded but the
DELETE failed (e.g., due to disk pressure or lock contention), the run
table would grow unboundedly since the pruning step was lost while the
new row persisted.
Fix:
Wrap both statements in an explicit transaction using
conn.unchecked_transaction(). If either statement fails, the entire
transaction is rolled back, maintaining the invariant that the run
history stays bounded by max_run_history.
Ref: zeroclaw-labs/zeroclaw#710 (Item 5)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>