Implement chat_with_tools() on CompatibleProvider so OpenAI-compatible
endpoints (OpenRouter, local LLMs, etc.) can use structured tool calling
instead of prompt-injected tool descriptions.
Changes:
- CompatibleProvider: capabilities() reports native_tool_calling, new
chat_with_tools() sends tools in API request and parses tool_calls
from response, chat() bridges to chat_with_tools() when ToolSpecs
are provided
- RouterProvider: chat_with_tools() delegation with model hint resolution
- loop_.rs: expose tools_to_openai_format as pub(crate), add
tools_to_openai_format_from_specs for ToolSpec-based conversion
Adds 9 new tests and updates 1 existing test.
Add two Mattermost channel enhancements:
1. thread_replies config option (default: false)
- When false, replies go to the channel root instead of threading.
- When true, replies thread on the original post.
- Existing thread replies always stay in-thread regardless of setting.
2. Typing indicator (start_typing/stop_typing)
- Implements the Channel trait's typing methods for Mattermost.
- Fires POST /api/v4/users/me/typing every 4s in a background task.
- Supports parent_id for threaded typing indicators.
- Aborts cleanly on stop_typing via JoinHandle.
Updated all MattermostChannel::new call sites (start_channels, scheduler)
and added 9 unit tests covering thread routing and edge cases.
Reasoning/thinking models (Qwen3, GLM-4, DeepSeek, etc.) may return
output in `reasoning_content` instead of `content`. Add automatic
fallback for both OpenAI and OpenAI-compatible providers, including
streaming SSE support.
Changes:
- Add `reasoning_content` field to response structs in both providers
- Add `effective_content()` helper that prefers `content` but falls
back to `reasoning_content` when content is empty/null/missing
- Update all extraction sites to use `effective_content()`
- Add streaming SSE fallback for `reasoning_content` chunks
- Add 16 focused unit tests covering all edge cases
Tested end-to-end against GLM-4.7-flash via local LLM server.
Channel messages (Telegram, Discord, etc.) previously had no multi-turn
context — each incoming message was processed with a fresh history
containing only the system prompt and the current user message.
This patch:
- Maintains a per-sender conversation history map (Arc<Mutex<HashMap>>)
- Restores prior turns when processing each new message
- Saves user + assistant turns after successful LLM response
- Caps history at 50 messages per sender to bound memory usage
Fixes the channel context continuity issue where the bot would respond
with 'I have no context' to every follow-up question.
- Add strip_tool_call_tags() to finalize_draft to prevent Markdown
parse failures from tool-call tags reaching Telegram API
- Deduplicate parse_reply_target() call in update_draft (was called
twice, discarding thread_id both times)
- Replace body.as_object_mut().unwrap() mutation with separate
plain_body JSON literal (eliminates unwrap in runtime path)
- Clean up per-chat rate-limit HashMap entry in finalize_draft to
prevent unbounded growth over long uptimes
- Extract magic number 80 to STREAM_CHUNK_MIN_CHARS constant in
agent loop
- Fix silent parse failures: message_id.parse().unwrap_or(0) replaced
with match + tracing::warn on parse error (update_draft, finalize_draft)
- Fix UTF-8 panic: byte-based truncation replaced with char_indices()
safe boundary detection for TELEGRAM_MAX_MESSAGE_LENGTH
- Fix global rate limiter: Mutex<Option<Instant>> replaced with
Mutex<HashMap<String, Instant>> for per-chat rate limiting so
concurrent conversations don't interfere with each other
- Document Block variant: clarify it's reserved for future use and
currently behaves the same as Partial
Previously on_delta sent the entire completed response as a single
message, defeating the purpose of the streaming draft updates. Now
the text is split into ~80-char chunks on whitespace boundaries
(UTF-8 safe via split_inclusive) and sent progressively through the
channel, so Telegram draft edits show text arriving incrementally.
The consumer in process_channel_message already accumulates chunks
and calls update_draft with the full text so far, and Telegram's
rate-limiting (draft_update_interval_ms) throttles editMessageText
calls to avoid API spam.
Wire the existing provider-layer streaming infrastructure through the
channel trait and agent loop so Telegram users see tokens arrive
progressively via editMessageText, instead of waiting for the full
response.
Changes:
- Add StreamMode enum (off/partial/block) and draft_update_interval_ms
to TelegramConfig (backward-compatible defaults: off, 1000ms)
- Add supports_draft_updates/send_draft/update_draft/finalize_draft to
Channel trait with no-op defaults (zero impact on existing channels)
- Implement draft methods on TelegramChannel using sendMessage +
editMessageText with rate limiting and Markdown fallback
- Add on_delta mpsc::Sender<String> parameter to run_tool_call_loop
(None preserves existing behavior)
- Wire streaming in process_channel_message: when channel supports
drafts, send initial draft, spawn updater task, finalize on completion
Edge cases handled:
- 4096-char limit: finalize draft and fall back to chunked send
- Broken Markdown: use no parse_mode during streaming, apply on finalize
- Edit failures: fall back to sending complete response as new message
- Rate limiting: configurable draft_update_interval_ms (default 1s)
Add native web search capability that works regardless of LLM tool-calling
support. This is particularly useful for GLM models via Z.AI that don't
reliably support standard tool calling formats.
Features:
- DuckDuckGo provider (free, no API key required)
- Brave Search provider (optional, requires API key)
- Configurable max results and timeout
- Enabled by default
Configuration (config.toml):
[web_search]
enabled = true
provider = "duckduckgo"
max_results = 5
The tool allows agents to search the web for current information without
requiring proper tool calling support from the LLM.
Also includes CI workflow fix for first-interaction action inputs.
GLM models output tool calls in proprietary formats that ZeroClaw
doesn't natively support. This adds parsing for GLM-specific formats:
- browser_open/url>https://... -> shell tool with curl command
- shell/command>ls -> shell tool with command arg
- http_request/url>... -> http_request tool
- Plain URLs -> shell tool with curl command
Also adds:
- find_json_end() helper for parsing JSON objects
- Unclosed <toolcall> tag handling
- Unit tests for GLM-style parsing
The parsing is deliberately placed after XML and markdown code block
parsing, so it acts as a fallback for models that don't use standard
tool calling formats.
This enables GLM models (via Z.AI or other providers) to successfully
execute tools in ZeroClaw.
fix(token): update token generation to use rand::rng()
Addresses warning coming from compiler:
❯ rspberrypi@localhost:~/zeroclaw$ cargo build --release --locked
warning: use of deprecated function rand::thread_rng: Renamed to rng
--> src/security/pairing.rs:186:11 |
186 | rand::thread_rng().fill_bytes(&mut bytes); | ^^^^^^^^^^
|
= note: #[warn(deprecated)] on by default
The dep-caching layer creates stub files (src/main.rs and
benches/agent_benchmarks.rs) to warm the cargo registry cache, then
removes them with `rm -rf src benches`. The subsequent real source copy
only restored `src/` — leaving `benches/` absent. Cargo's manifest
parser then failed to locate `benches/agent_benchmarks.rs` referenced
in Cargo.toml, aborting the release build with:
error: failed to parse manifest at `/app/Cargo.toml`
Caused by: can't find `agent_benchmarks` bench at
`benches/agent_benchmarks.rs`
Fix: add `COPY benches/ benches/` alongside the `COPY src/ src/` step
so the real bench source is present for the incremental release build.
- remove trailing whitespace in .env.example Z.AI block
- align documented model defaults/options with current onboard/provider behavior
- keep this PR docs-focused by reverting incidental workflow edits
Add comprehensive documentation for custom API endpoint configuration
to address missing documentation reported in issue #567.
Changes:
- Create docs/custom-providers.md with detailed guide for custom: and anthropic-custom: formats
- Add custom endpoint examples to README.md configuration section
- Add note about daemon requirement for channels in Quick Start
- Add reference link to custom providers guide
Addresses: #567
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem:
Every async fn in SqliteMemory acquired self.conn.lock() and ran
synchronous rusqlite queries directly on the Tokio runtime thread.
This blocks the async executor, preventing other tasks from making
progress — especially harmful under concurrent recall/store load.
Fix:
- Change conn from Mutex<Connection> to Arc<Mutex<Connection>> so
the connection handle can be cloned into spawn_blocking closures.
- Wrap all synchronous database operations (store, recall, get, list,
forget, count, health_check) in tokio::task::spawn_blocking.
- Split get_or_compute_embedding into three phases: cache check
(blocking), embedding computation (async I/O), cache store
(blocking) — ensuring no lock is held across await points.
- Apply the same pattern to the reindex method.
The async I/O (embedding computation) remains on the Tokio runtime
while all SQLite access runs on the blocking thread pool, preventing
executor starvation.
Ref: zeroclaw-labs/zeroclaw#710 (Item 4)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem:
The iMessage listener opened a new SQLite connection to the Messages
database on every ~3-second poll cycle via get_max_rowid() and
fetch_new_messages(), creating ~40 connection open/close cycles per
minute. Each cycle incurs filesystem syscalls, WAL header reads,
and potential page cache cold starts.
Fix:
Open a single read-only connection before the poll loop and reuse it
across iterations using the 'shuttle' pattern: the connection is moved
into each spawn_blocking closure and returned alongside the results,
then reassigned for the next iteration. This eliminates per-poll
connection overhead while preserving the spawn_blocking pattern that
keeps SQLite I/O off the Tokio runtime thread.
The standalone get_max_rowid() and fetch_new_messages() helper
functions are retained for use by tests and other callers.
Ref: zeroclaw-labs/zeroclaw#710 (Item 9)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Classify incoming user messages by keyword/pattern and route to the
appropriate model hint automatically, feeding into the existing
RouterProvider. Disabled by default; opt-in via [query_classification]
config section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All five providers have HTTP clients but did not implement warmup(),
relying on the trait default no-op. This adds lightweight warmup calls
to establish TLS + HTTP/2 connection pools on startup, reducing
first-request latency. Each warmup is skipped when credentials are
absent, matching the OpenRouter pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ZEROCLAW_WORKSPACE can now be either:
- Legacy path: /path/to/workspace (config at /path/to/.zeroclaw/config.toml)
- Parent path: /path/to (config at /path/to/config.toml, workspace at /path/to/workspace)
This maintains backward compatibility with Docker's legacy folder structure
while also supporting the new parent-dir layout.