zeroclaw

Author	SHA1	Message	Date
Alex Gorevski	9a6fa76825	readd tests, remove markdown files	2026-02-18 14:42:39 +08:00
Chummy	2560399423	feat(observability): focus PR 596 on Prometheus backend	2026-02-18 12:06:05 +08:00
argenis de la rosa	eba544dbd4	feat(observability): implement Prometheus metrics backend with /metrics endpoint - Adds PrometheusObserver backend with counters, histograms, and gauges - Tracks agent starts/duration, tool calls, channel messages, heartbeat ticks, errors, request latency, tokens, sessions, queue depth - Adds GET /metrics endpoint to gateway for Prometheus scraping - Adds provider/model labels to AgentStart and AgentEnd events for better observability - Adds as_any() method to Observer trait for backend-specific downcast Metrics exposed: - zeroclaw_agent_starts_total (Counter) with provider/model labels - zeroclaw_agent_duration_seconds (Histogram) with provider/model labels - zeroclaw_tool_calls_total (Counter) with tool/success labels - zeroclaw_tool_duration_seconds (Histogram) with tool label - zeroclaw_channel_messages_total (Counter) with channel/direction labels - zeroclaw_heartbeat_ticks_total (Counter) - zeroclaw_errors_total (Counter) with component label - zeroclaw_request_latency_seconds (Histogram) - zeroclaw_tokens_used_last (Gauge) - zeroclaw_active_sessions (Gauge) - zeroclaw_queue_depth (Gauge) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 12:06:05 +08:00
Edvard	508fb53ac1	fix(provider): delegate native tool calling through ReliableProvider ReliableProvider wraps underlying providers with retry/fallback logic but did not delegate `supports_native_tools()` or `chat_with_tools()`. This caused the agent loop to fall back to prompt-based tool calling for all providers, even those with native tool support (OpenRouter, OpenAI, Anthropic). Models like Gemini 2.0 Flash would then output tool calls as text instead of structured API responses, breaking the tool execution loop entirely. Add `supports_native_tools()` delegation to the primary provider and `chat_with_tools()` with the same retry/fallback logic as the existing `chat_with_system()` and `chat_with_history()` methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 10:15:46 +08:00
Will Sarg	3c4ed2e28e	fix(providers): clarify reliable failure entries for custom providers (#594 ) * fix(workflows): standardize runner configuration for security jobs * ci(actionlint): add Blacksmith runner label to config Add blacksmith-2vcpu-ubuntu-2404 to actionlint self-hosted-runner labels config to suppress "unknown label" warnings during workflow linting. This label is used across all workflows after the Blacksmith migration. * fix(actionlint): adjust indentation for self-hosted runner labels * feat(security): enhance security workflow with CodeQL analysis steps * fix(security): update CodeQL action to version 4 for improved analysis * fix(security): remove duplicate permissions in security workflow * fix(security): revert CodeQL action to v3 for stability The v4 version was causing workflow file validation failures. Reverting to proven v3 version that is working on main branch. * fix(security): remove duplicate permissions causing workflow validation failure The permissions block had duplicate security-events and actions keys, which caused YAML validation errors and prevented workflow execution. Fixes: workflow file validation failures on main branch * fix(security): remove pull_request trigger to reduce costs * fix(security): restore PR trigger but skip codeql on PRs * fix(security): resolve YAML syntax error in security workflow * refactor(security): split CodeQL into dedicated scheduled workflow * fix(security): update workflow name to Rust Package Security Audit * fix(codeql): remove push trigger, keep schedule and on-demand only * feat(codeql): add CodeQL configuration file to ignore specific paths * Potential fix for code scanning alert no. 39: Hard-coded cryptographic value Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fix(ci): resolve auto-response workflow merge markers * fix(build): restore ChannelMessage reply_target usage * ci(workflows): run workflow sanity on workflow pushes for all branches * ci(workflows): rename auto-response workflow to PR Auto Responder * ci(workflows): require owner approval for workflow file changes * ci: add lint-first PR feedback gate * ci(workflows): split label policy checks from workflow sanity * ci(workflows): consolidate policy and rust workflow setup * ci: add safe pull request intake sanity checks * ci(security): switch audit to pinned rustsec audit-check * fix(providers): clarify reliable failure entries for custom providers --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2026-02-17 13:53:03 -05:00
Chummy	0aa35eb669	fix(build): complete strict lint and test cleanup (replacement for #476 )	2026-02-18 00:18:54 +08:00
Will Sarg	9e0958dee5	fix(ci): repair parking_lot migration regressions in PR #535	2026-02-17 09:10:40 -05:00
Will Sarg	ee05d62ce4	Merge branch 'main' into pr-484-clean	2026-02-17 08:54:24 -05:00
argenis de la rosa	1908af3248	fix(discord): use channel_id instead of sender for replies (fixes #483 ) fix(misc): complete parking_lot::Mutex migration (fixes #505) - DiscordChannel: store actual channel_id in ChannelMessage.channel instead of hardcoded "discord" string - channels/mod.rs: use msg.channel instead of msg.sender for replies - Migrate all std::sync::Mutex to parking_lot::Mutex: * src/security/audit.rs * src/memory/sqlite.rs * src/memory/response_cache.rs * src/memory/lucid.rs * src/channels/email_channel.rs * src/gateway/mod.rs * src/observability/traits.rs * src/providers/reliable.rs * src/providers/router.rs * src/agent/agent.rs - Remove all .lock().unwrap() and .map_err(PoisonError) patterns since parking_lot::Mutex never poisons Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 08:05:25 -05:00
Chummy	5d131a8903	fix(security): tighten provider credential log hygiene - remove as_deref credential routing path in provider factory - avoid raw provider error text in warmup/retry failure summaries - keep retry telemetry while reducing secret propagation risk	2026-02-17 19:19:06 +08:00
Argenis	69a9adde33	Merge PR #500 : streaming support and security fixes - feat(streaming): add streaming support for LLM responses (fixes #211) - security(deps): remove vulnerable xmas-elf dependency via embuild (fixes #399) - fix: resolve merge conflicts and integrate chat_with_tools from main Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 05:05:57 -05:00
argenis de la rosa	4070131bb8	fix: apply cargo fmt to fix formatting issues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 05:05:23 -05:00
argenis de la rosa	d94e78c621	feat(streaming): add streaming support for LLM responses (fixes #211 ) Implement Server-Sent Events (SSE) streaming for OpenAI-compatible providers: - Add StreamChunk, StreamOptions, and StreamError types to traits module - Add supports_streaming() and stream_chat_with_system() to Provider trait - Implement SSE parser for OpenAI streaming responses (data: {...} format) - Add streaming support to OpenAiCompatibleProvider - Add streaming support to ReliableProvider with error propagation - Add futures dependency for async stream support Features: - Token-by-token streaming for real-time feedback - Token counting option (estimated ~4 chars per token) - Graceful error handling and logging - Channel-based stream bridging for async compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 05:01:13 -05:00
Chummy	b2dd3582a4	fix(ci): align reliable tests with simple_chat contract	2026-02-17 01:01:56 +08:00
mai1015	b341fdb368	feat: add agent structure and improve tooling for provider	2026-02-17 01:01:56 +08:00
Chummy	3234159c6c	chore(clippy): clear warning backlog and harden conversions (#383 )	2026-02-17 00:32:33 +08:00
Chummy	8bcb5efa8a	fix(ci): align reliable provider tests with ChatResponse	2026-02-16 22:06:40 +08:00
stawky	9a5db46cf7	feat(providers): model failover chain + API key rotation - Add model_fallbacks and api_keys to ReliabilityConfig - Implement per-model fallback chain in ReliableProvider - Add API key rotation on auth failures (401/403) - Add retry-after header parsing and exponential backoff - Integrate failover into chat_with_system and chat_with_history - 20 unit tests covering failover, rotation, and retry logic	2026-02-16 21:59:35 +08:00
chumyin	3b4a4de457	refactor(provider): unify Provider responses with ChatResponse - Switch Provider trait methods to return structured ChatResponse - Map OpenAI-compatible tool_calls into shared ToolCall type - Update reliable/router wrappers and provider tests for new interface - Make agent loop prefer structured tool calls with text fallback parsing - Adapt gateway replies to structured responses with safe tool-call fallback	2026-02-16 19:16:22 +08:00
Chummy	b442a07530	fix(memory): prevent autosave key collisions across runtime flows Fixes #221 - SQLite Memory Override bug. This PR resolves memory overwrite behavior in autosave paths by replacing fixed memory keys with unique keys, and improves short-horizon recall quality in channel runtime. Root Cause SQLite memory uses a unique constraint on `memories.key` and writes with `ON CONFLICT(key) DO UPDATE`. Several autosave paths reused fixed keys (or sender-stable keys), so newer messages overwrote earlier conversation entries. Changes - Channel runtime: autosave key changed from `channel_sender` to `channel_sender_messageId` - Added memory-context injection before provider calls (aligned with agent loop behavior) - Agent loop: autosave keys changed from fixed `user_msg`/`assistant_resp` to UUID-suffixed keys - Gateway: Webhook/WhatsApp autosave keys changed to UUID-suffixed keys All CI checks passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 22:55:52 -05:00
Edvard Schøyen	89b1ec6fa2	feat: add multi-turn conversation history and tool execution * feat: add multi-turn conversation history and tool execution Major enhancement to the agent loop: Multi-turn conversation: - Add `ChatMessage` type with system/user/assistant constructors - Add `chat_with_history` method to Provider trait (default impl delegates to `chat_with_system` for backward compatibility) - Implement native `chat_with_history` on OpenRouter, Compatible, Reliable, and Router providers to send full message history - Interactive mode now maintains persistent history across turns Tool execution: - Agent loop now parses `<tool_call>` XML tags from LLM responses - Executes tools from the registry and feeds results back as `<tool_result>` messages - Agentic loop continues until LLM produces final text (no tool calls) - MAX_TOOL_ITERATIONS (10) safety limit prevents runaway loops - System prompt includes structured tool-use protocol with JSON schemas Types: - `ChatMessage`, `ChatResponse`, `ToolCall`, `ToolResultMessage`, `ConversationMessage` — full conversation modeling types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review comments on multi-turn + tool execution - Add history sliding window (MAX_HISTORY_MESSAGES=50) to prevent unbounded conversation history growth in interactive mode - Add 404→Responses API fallback in compatible.rs chat_with_history, matching chat_with_system behavior - Use super::api_error() for error sanitization in compatible.rs instead of raw error body (prevents secret leakage) - Add missing operational logs in reliable.rs chat_with_history: recovery, non-retryable, fallback switch warnings - Add trim_history tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address second round of review comments - Sanitize raw error text in compatible.rs chat_with_system using sanitize_api_error (prevents leaking secrets in error messages) - Add chat_with_history to MockProvider in reliable.rs tests so the retry/fallback path is exercised end-to-end - Add chat_with_history_retries_then_recovers and chat_with_history_falls_back tests - Log warning on malformed <tool_call> JSON instead of silent drop - Flush stdout after print! in agent_turn so output appears before tool execution on line-buffered terminals - Make interactive mode resilient to transient errors (continue loop instead of terminating session) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 14:43:02 -05:00
Edvard Schøyen	49bb20f961	fix(providers): use Bearer auth for Gemini CLI OAuth tokens * fix(providers): use Bearer auth for Gemini CLI OAuth tokens When credentials come from ~/.gemini/oauth_creds.json (Gemini CLI), send them as Authorization: Bearer header instead of ?key= query parameter. API keys from env vars or config continue using ?key=. Fixes #194 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(gemini): harden OAuth bearer auth flow and tests * fix(gemini): granular auth source tracking and review fixes Build on chumyin's auth model refactor with: - Expand GeminiAuth to 4 variants (ExplicitKey/EnvGeminiKey/EnvGoogleKey/ OAuthToken) so auth_source() uses stored discriminant without re-reading env vars at call time - Add is_api_key()/credential() helpers on the enum - Upgrade expired OAuth token log from debug to warn - Add tests: provider_rejects_empty_key, auth_source_explicit_key, auth_source_none_without_credentials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: apply rustfmt to fix CI lint failures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: root <root@instance-20220913-1738.vcn09131738.oraclevcn.com> Co-authored-by: argenis de la rosa <theonlyhennygod@gmail.com>	2026-02-15 14:32:33 -05:00
Argenis	8694c2e2d2	fix(providers): skip retries on non-retryable HTTP errors (4xx) Skip retries on non-retryable HTTP client errors (4xx) to avoid wasting time on requests that will never succeed. - Added is_non_retryable() function to detect non-retryable errors - 4xx client errors (400, 401, 403, 404) are now non-retryable - Exceptions: 429 (rate limiting) and 408 (timeout) remain retryable - 5xx server errors remain retryable - Fallback logic now skips retries for non-retryable errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 10:11:32 -05:00
Argenis	1e19b12efd	fix(providers): warn on shared API key for fallbacks and warm up all providers (#130 ) - Warn when fallback providers share the same API key as primary (could fail if providers require different keys) - Warm up all providers instead of just the first, continuing on warmup failures Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 08:23:50 -05:00
Edvard	cc13fec16d	fix: add provider warmup to prevent cold-start timeout on first channel message The first API request after daemon startup consistently timed out (120s) when using channels (Telegram, Discord, etc.), requiring a retry before succeeding. This happened because the reqwest HTTP client's connection pool was cold — no TLS handshake, DNS resolution, or HTTP/2 negotiation had occurred yet. The fix adds a `warmup()` method to the Provider trait that establishes the connection pool on startup by hitting a lightweight endpoint (`/api/v1/auth/key` for OpenRouter). The channel server calls this immediately after creating the provider, before entering the message processing loop. Tested on Raspberry Pi 5 (aarch64) with OpenRouter + DeepSeek v3.2 via Telegram channel. Before: first message took 2-7 minutes (120s timeout + retries). After: first message responds in <30s with no retries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 18:43:26 -05:00
argenis de la rosa	ec2d5cc93d	feat: enhance agent personality, tool guidance, and memory hygiene - Expand communication style presets (professional, expressive, custom) - Enrich SOUL.md with human-like tone and emoji-awareness guidance - Add crash recovery and sub-task scoping guidance to AGENTS.md scaffold - Add 'Use when / Don't use when' guidance to TOOLS.md and runtime prompts - Implement memory hygiene system with configurable archiving and retention - Add MemoryConfig options: hygiene_enabled, archive_after_days, purge_after_days, conversation_retention_days - Archive old daily memory and session files to archive subdirectories - Purge old archives and prune stale SQLite conversation rows - Add comprehensive tests for new features	2026-02-14 11:28:39 -05:00

26 commits