Add two Mattermost channel enhancements:
1. thread_replies config option (default: false)
- When false, replies go to the channel root instead of threading.
- When true, replies thread on the original post.
- Existing thread replies always stay in-thread regardless of setting.
2. Typing indicator (start_typing/stop_typing)
- Implements the Channel trait's typing methods for Mattermost.
- Fires POST /api/v4/users/me/typing every 4s in a background task.
- Supports parent_id for threaded typing indicators.
- Aborts cleanly on stop_typing via JoinHandle.
Updated all MattermostChannel::new call sites (start_channels, scheduler)
and added 9 unit tests covering thread routing and edge cases.
Reasoning/thinking models (Qwen3, GLM-4, DeepSeek, etc.) may return
output in `reasoning_content` instead of `content`. Add automatic
fallback for both OpenAI and OpenAI-compatible providers, including
streaming SSE support.
Changes:
- Add `reasoning_content` field to response structs in both providers
- Add `effective_content()` helper that prefers `content` but falls
back to `reasoning_content` when content is empty/null/missing
- Update all extraction sites to use `effective_content()`
- Add streaming SSE fallback for `reasoning_content` chunks
- Add 16 focused unit tests covering all edge cases
Tested end-to-end against GLM-4.7-flash via local LLM server.
Channel messages (Telegram, Discord, etc.) previously had no multi-turn
context — each incoming message was processed with a fresh history
containing only the system prompt and the current user message.
This patch:
- Maintains a per-sender conversation history map (Arc<Mutex<HashMap>>)
- Restores prior turns when processing each new message
- Saves user + assistant turns after successful LLM response
- Caps history at 50 messages per sender to bound memory usage
Fixes the channel context continuity issue where the bot would respond
with 'I have no context' to every follow-up question.
- Add strip_tool_call_tags() to finalize_draft to prevent Markdown
parse failures from tool-call tags reaching Telegram API
- Deduplicate parse_reply_target() call in update_draft (was called
twice, discarding thread_id both times)
- Replace body.as_object_mut().unwrap() mutation with separate
plain_body JSON literal (eliminates unwrap in runtime path)
- Clean up per-chat rate-limit HashMap entry in finalize_draft to
prevent unbounded growth over long uptimes
- Extract magic number 80 to STREAM_CHUNK_MIN_CHARS constant in
agent loop
- Fix silent parse failures: message_id.parse().unwrap_or(0) replaced
with match + tracing::warn on parse error (update_draft, finalize_draft)
- Fix UTF-8 panic: byte-based truncation replaced with char_indices()
safe boundary detection for TELEGRAM_MAX_MESSAGE_LENGTH
- Fix global rate limiter: Mutex<Option<Instant>> replaced with
Mutex<HashMap<String, Instant>> for per-chat rate limiting so
concurrent conversations don't interfere with each other
- Document Block variant: clarify it's reserved for future use and
currently behaves the same as Partial
Previously on_delta sent the entire completed response as a single
message, defeating the purpose of the streaming draft updates. Now
the text is split into ~80-char chunks on whitespace boundaries
(UTF-8 safe via split_inclusive) and sent progressively through the
channel, so Telegram draft edits show text arriving incrementally.
The consumer in process_channel_message already accumulates chunks
and calls update_draft with the full text so far, and Telegram's
rate-limiting (draft_update_interval_ms) throttles editMessageText
calls to avoid API spam.
Wire the existing provider-layer streaming infrastructure through the
channel trait and agent loop so Telegram users see tokens arrive
progressively via editMessageText, instead of waiting for the full
response.
Changes:
- Add StreamMode enum (off/partial/block) and draft_update_interval_ms
to TelegramConfig (backward-compatible defaults: off, 1000ms)
- Add supports_draft_updates/send_draft/update_draft/finalize_draft to
Channel trait with no-op defaults (zero impact on existing channels)
- Implement draft methods on TelegramChannel using sendMessage +
editMessageText with rate limiting and Markdown fallback
- Add on_delta mpsc::Sender<String> parameter to run_tool_call_loop
(None preserves existing behavior)
- Wire streaming in process_channel_message: when channel supports
drafts, send initial draft, spawn updater task, finalize on completion
Edge cases handled:
- 4096-char limit: finalize draft and fall back to chunked send
- Broken Markdown: use no parse_mode during streaming, apply on finalize
- Edit failures: fall back to sending complete response as new message
- Rate limiting: configurable draft_update_interval_ms (default 1s)
Add native web search capability that works regardless of LLM tool-calling
support. This is particularly useful for GLM models via Z.AI that don't
reliably support standard tool calling formats.
Features:
- DuckDuckGo provider (free, no API key required)
- Brave Search provider (optional, requires API key)
- Configurable max results and timeout
- Enabled by default
Configuration (config.toml):
[web_search]
enabled = true
provider = "duckduckgo"
max_results = 5
The tool allows agents to search the web for current information without
requiring proper tool calling support from the LLM.
Also includes CI workflow fix for first-interaction action inputs.
GLM models output tool calls in proprietary formats that ZeroClaw
doesn't natively support. This adds parsing for GLM-specific formats:
- browser_open/url>https://... -> shell tool with curl command
- shell/command>ls -> shell tool with command arg
- http_request/url>... -> http_request tool
- Plain URLs -> shell tool with curl command
Also adds:
- find_json_end() helper for parsing JSON objects
- Unclosed <toolcall> tag handling
- Unit tests for GLM-style parsing
The parsing is deliberately placed after XML and markdown code block
parsing, so it acts as a fallback for models that don't use standard
tool calling formats.
This enables GLM models (via Z.AI or other providers) to successfully
execute tools in ZeroClaw.
fix(token): update token generation to use rand::rng()
Addresses warning coming from compiler:
❯ rspberrypi@localhost:~/zeroclaw$ cargo build --release --locked
warning: use of deprecated function rand::thread_rng: Renamed to rng
--> src/security/pairing.rs:186:11 |
186 | rand::thread_rng().fill_bytes(&mut bytes); | ^^^^^^^^^^
|
= note: #[warn(deprecated)] on by default
The dep-caching layer creates stub files (src/main.rs and
benches/agent_benchmarks.rs) to warm the cargo registry cache, then
removes them with `rm -rf src benches`. The subsequent real source copy
only restored `src/` — leaving `benches/` absent. Cargo's manifest
parser then failed to locate `benches/agent_benchmarks.rs` referenced
in Cargo.toml, aborting the release build with:
error: failed to parse manifest at `/app/Cargo.toml`
Caused by: can't find `agent_benchmarks` bench at
`benches/agent_benchmarks.rs`
Fix: add `COPY benches/ benches/` alongside the `COPY src/ src/` step
so the real bench source is present for the incremental release build.
- remove trailing whitespace in .env.example Z.AI block
- align documented model defaults/options with current onboard/provider behavior
- keep this PR docs-focused by reverting incidental workflow edits
Add comprehensive documentation for custom API endpoint configuration
to address missing documentation reported in issue #567.
Changes:
- Create docs/custom-providers.md with detailed guide for custom: and anthropic-custom: formats
- Add custom endpoint examples to README.md configuration section
- Add note about daemon requirement for channels in Quick Start
- Add reference link to custom providers guide
Addresses: #567
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem:
Every async fn in SqliteMemory acquired self.conn.lock() and ran
synchronous rusqlite queries directly on the Tokio runtime thread.
This blocks the async executor, preventing other tasks from making
progress — especially harmful under concurrent recall/store load.
Fix:
- Change conn from Mutex<Connection> to Arc<Mutex<Connection>> so
the connection handle can be cloned into spawn_blocking closures.
- Wrap all synchronous database operations (store, recall, get, list,
forget, count, health_check) in tokio::task::spawn_blocking.
- Split get_or_compute_embedding into three phases: cache check
(blocking), embedding computation (async I/O), cache store
(blocking) — ensuring no lock is held across await points.
- Apply the same pattern to the reindex method.
The async I/O (embedding computation) remains on the Tokio runtime
while all SQLite access runs on the blocking thread pool, preventing
executor starvation.
Ref: zeroclaw-labs/zeroclaw#710 (Item 4)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem:
The iMessage listener opened a new SQLite connection to the Messages
database on every ~3-second poll cycle via get_max_rowid() and
fetch_new_messages(), creating ~40 connection open/close cycles per
minute. Each cycle incurs filesystem syscalls, WAL header reads,
and potential page cache cold starts.
Fix:
Open a single read-only connection before the poll loop and reuse it
across iterations using the 'shuttle' pattern: the connection is moved
into each spawn_blocking closure and returned alongside the results,
then reassigned for the next iteration. This eliminates per-poll
connection overhead while preserving the spawn_blocking pattern that
keeps SQLite I/O off the Tokio runtime thread.
The standalone get_max_rowid() and fetch_new_messages() helper
functions are retained for use by tests and other callers.
Ref: zeroclaw-labs/zeroclaw#710 (Item 9)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Classify incoming user messages by keyword/pattern and route to the
appropriate model hint automatically, feeding into the existing
RouterProvider. Disabled by default; opt-in via [query_classification]
config section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All five providers have HTTP clients but did not implement warmup(),
relying on the trait default no-op. This adds lightweight warmup calls
to establish TLS + HTTP/2 connection pools on startup, reducing
first-request latency. Each warmup is skipped when credentials are
absent, matching the OpenRouter pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ZEROCLAW_WORKSPACE can now be either:
- Legacy path: /path/to/workspace (config at /path/to/.zeroclaw/config.toml)
- Parent path: /path/to (config at /path/to/config.toml, workspace at /path/to/workspace)
This maintains backward compatibility with Docker's legacy folder structure
while also supporting the new parent-dir layout.
Implements Anthropic's prompt caching API to enable significant cost
reduction (up to 90%) and latency improvements (up to 85%) for
requests with repeated content.
Key features:
- Auto-caching heuristics: large system prompts (>3KB), tool
definitions, and long conversations (>4 messages)
- Full backward compatibility: cache_control fields are optional
- Supports both string and block-array system prompt formats
- Cache control on all content types (text, tool_use, tool_result)
Implementation details:
- Added CacheControl, SystemPrompt, and SystemBlock structures
- Updated NativeContentOut and NativeToolSpec with cache_control
- Strategic cache breakpoint placement (last tool, last message)
- Comprehensive test coverage for serialization and heuristics
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
(cherry picked from commit fff04f4edb5e4cb7e581b1b16035da8cc2e55cef)
Add three onboarding improvements for the Astrai provider:
- Signup URL: users now see "Get your API key at: https://as-trai.com"
during onboarding instead of a blank prompt
- Curated model list: auto (best execution), GPT-4o, Claude Sonnet 4.5,
DeepSeek V3, Llama 3.3 70B
- Live model fetch: Astrai's OpenAI-compatible /v1/models endpoint is
now queried when an API key is present, matching other providers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Problem
The test suite contained several categories of latent brittleness
identified in docs/testing-brittle-tests.md that would surface during
refactoring or cross-platform (Windows) CI execution:
1. Hardcoded Unix paths: \Path::new("/tmp")\ and \PathBuf::from("/tmp")\
used as workspace directories in agent tests, which fail on Windows
where /tmp does not exist.
2. Exact string match assertions: ~20 \ssert_eq!(response, "exact text")\
assertions in agent unit and e2e tests that break on any mock wording
change, even when the underlying orchestration behavior is correct.
3. Fragile error message string matching: \.contains("specific message")\
assertions coupled to internal error wording rather than testing the
error category or behavioral outcome.
## What Changed
### Hardcoded paths → platform-agnostic temp dirs (4 files, 7 locations)
- \src/agent/tests.rs\: Replaced all 4 instances of \Path::new("/tmp")\
and \PathBuf::from("/tmp")\ with \std::env::temp_dir()\ in
\make_memory()\, \uild_agent_with()\, \uild_agent_with_memory()\,
and \uild_agent_with_config()\ helpers.
- \ ests/agent_e2e.rs\: Replaced all 3 instances in \make_memory()\,
\uild_agent()\, and \uild_agent_xml()\ helpers.
### Exact string assertions → behavioral checks (2 files, ~20 locations)
- \src/agent/tests.rs\: Converted 10 \ssert_eq!(response, "...")\ to
\ssert!(!response.is_empty(), "descriptive message")\ across tests for
text pass-through, tool execution, tool failure recovery, XML dispatch,
mixed text+tool responses, multi-tool batch, and run_single delegation.
- \ ests/agent_e2e.rs\: Converted 9 exact-match assertions to behavioral
checks. Multi-turn test now uses \ssert_ne!(r1, r2)\ to verify
sequential responses are distinct without coupling to exact wording.
- Provider error propagation test simplified to \ssert!(result.is_err())\
without asserting on the error message string.
### Fragile error message assertions → structural checks (2 files)
- \src/tools/git_operations.rs\: Replaced fragile OR-branch string match
(\contains("git repository") || contains("Git command failed")\) with
structural assertions: checks \!result.success\, error is non-empty,
and error does NOT mention autonomy/read-only (verifying the failure
is git-related, not permission-related).
- \src/cron/scheduler.rs\: Replaced \contains("agent job failed:")\ with
\!success\ and \!output.is_empty()\ checks that verify failure behavior
without coupling to exact log format.
## What Was NOT Changed (and why)
- \src/agent/loop_.rs\ parser tests: Exact string assertions are the
contract for XML tool call parsing — the exact output IS the spec.
- \src/providers/reliable.rs\: Error message assertions test the error
format contract (provider/model attribution in failure messages).
- \src/service/mod.rs\: Already platform-gated with \#[cfg]\; XML escape
test is a formatting contract where exact match is appropriate.
- \src/config/schema.rs\: TOML test strings use /tmp as data values for
deserialization tests, not filesystem access; HOME tests already use
\std::env::temp_dir()\.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>