feat: add screenshot and image_info vision tools

* feat: add screenshot and image_info vision tools

Add two new tools for visual capabilities:

- `screenshot`: captures screen using platform-native commands
  (screencapture on macOS, gnome-screenshot/scrot/import on Linux),
  returns file path + base64-encoded PNG data
- `image_info`: reads image metadata (format, dimensions, size) from
  header bytes without external deps, optionally returns base64 data
  for future multimodal provider support

Both tools are registered in the tool registry and agent system prompt.
Includes 24 inline tests covering format detection, dimension extraction,
schema validation, and execution edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve unused variable warning after rebase

Prefix unused `resolved_key` with underscore to suppress compiler
warning introduced by upstream changes. Update Cargo.lock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments on vision tools

Security fixes:
- Fix JPEG parser infinite loop on malformed zero-length segments
- Add workspace path restriction to ImageInfoTool (prevents arbitrary
  file exfiltration via include_base64)
- Quote paths in Linux screenshot shell commands to prevent injection
- Add autonomy-level check in ScreenshotTool::execute

Robustness:
- Add file size guard in read_and_encode before loading into memory
- Wire resolve_api_key through all provider match arms (was dead code)
- Gate screenshot_command_exists test on macOS/Linux only
- Infer MIME type from file extension instead of hardcoding image/png

Tests:
- Add JPEG dimension extraction test
- Add JPEG malformed zero-length segment test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: argenis de la rosa <theonlyhennygod@gmail.com>
This commit is contained in:
Edvard Schøyen 2026-02-15 14:53:56 -05:00 committed by GitHub
parent 0f6648ceb1
commit 9b2f90018c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 837 additions and 25 deletions

View file

@ -36,6 +36,9 @@ tracing-subscriber = { version = "0.3", default-features = false, features = ["f
# Observability - Prometheus metrics
prometheus = { version = "0.13", default-features = false }
# Base64 encoding (screenshots, image data)
base64 = "0.22"
# Error handling
anyhow = "1.0"
thiserror = "2.0"