zeroclaw/src
Edvard Schøyen 9b2f90018c
feat: add screenshot and image_info vision tools
* feat: add screenshot and image_info vision tools

Add two new tools for visual capabilities:

- `screenshot`: captures screen using platform-native commands
  (screencapture on macOS, gnome-screenshot/scrot/import on Linux),
  returns file path + base64-encoded PNG data
- `image_info`: reads image metadata (format, dimensions, size) from
  header bytes without external deps, optionally returns base64 data
  for future multimodal provider support

Both tools are registered in the tool registry and agent system prompt.
Includes 24 inline tests covering format detection, dimension extraction,
schema validation, and execution edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve unused variable warning after rebase

Prefix unused `resolved_key` with underscore to suppress compiler
warning introduced by upstream changes. Update Cargo.lock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments on vision tools

Security fixes:
- Fix JPEG parser infinite loop on malformed zero-length segments
- Add workspace path restriction to ImageInfoTool (prevents arbitrary
  file exfiltration via include_base64)
- Quote paths in Linux screenshot shell commands to prevent injection
- Add autonomy-level check in ScreenshotTool::execute

Robustness:
- Add file size guard in read_and_encode before loading into memory
- Wire resolve_api_key through all provider match arms (was dead code)
- Gate screenshot_command_exists test on macOS/Linux only
- Infer MIME type from file extension instead of hardcoding image/png

Tests:
- Add JPEG dimension extraction test
- Add JPEG malformed zero-length segment test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: argenis de la rosa <theonlyhennygod@gmail.com>
2026-02-15 14:53:56 -05:00
..
agent feat: add screenshot and image_info vision tools 2026-02-15 14:53:56 -05:00
channels fix(providers): use Bearer auth for Gemini CLI OAuth tokens 2026-02-15 14:32:33 -05:00
config feat: add OpenTelemetry tracing and metrics observer 2026-02-15 14:46:49 -05:00
cron refactor: consolidate CLI command definitions to lib.rs 2026-02-15 06:52:33 -05:00
daemon feat(channels): wire up email channel (IMAP/SMTP) into config and registration 2026-02-15 10:58:30 -05:00
doctor fix: add missing port/host fields to GatewayConfig and apply_env_overrides method 2026-02-14 16:05:13 -05:00
gateway feat(config): make config writes atomic with rollback-safe replacement (#190) 2026-02-15 12:18:45 -05:00
health fix: add missing port/host fields to GatewayConfig and apply_env_overrides method 2026-02-14 16:05:13 -05:00
heartbeat feat: enhance agent personality, tool guidance, and memory hygiene 2026-02-14 11:28:39 -05:00
integrations fix: remove unused import and correct WhatsApp/Email registry status 2026-02-15 14:28:44 -05:00
memory fix(providers): use Bearer auth for Gemini CLI OAuth tokens 2026-02-15 14:32:33 -05:00
observability feat: add OpenTelemetry tracing and metrics observer 2026-02-15 14:46:49 -05:00
onboard build: pin Rust toolchain to 1.92 for reliable builds 2026-02-15 14:36:18 -05:00
providers feat: add screenshot and image_info vision tools 2026-02-15 14:53:56 -05:00
runtime feat(config): make config writes atomic with rollback-safe replacement (#190) 2026-02-15 12:18:45 -05:00
security feat(config): make config writes atomic with rollback-safe replacement (#190) 2026-02-15 12:18:45 -05:00
service refactor: consolidate CLI command definitions to lib.rs 2026-02-15 06:52:33 -05:00
skillforge fix(providers): use Bearer auth for Gemini CLI OAuth tokens 2026-02-15 14:32:33 -05:00
skills fix(skills): prevent path traversal in skill remove command 2026-02-15 08:15:41 -05:00
tools feat: add screenshot and image_info vision tools 2026-02-15 14:53:56 -05:00
tunnel fix: resolve all clippy --all-targets warnings across 15 files 2026-02-14 03:52:57 -05:00
identity.rs fix(providers): use Bearer auth for Gemini CLI OAuth tokens 2026-02-15 14:32:33 -05:00
lib.rs feat: implement AIEOS identity support (#168) 2026-02-15 11:46:02 -05:00
main.rs fix: add channel message timeouts, Telegram fallback, and fix identity/observer tests 2026-02-15 12:31:40 -05:00
migration.rs refactor: consolidate CLI command definitions to lib.rs 2026-02-15 06:52:33 -05:00
util.rs fix(channels): check response status in send() for Telegram, Slack, and Discord 2026-02-15 09:48:58 -05:00