feat(observability): implement Prometheus metrics backend with /metrics endpoint

- Adds PrometheusObserver backend with counters, histograms, and gauges
- Tracks agent starts/duration, tool calls, channel messages, heartbeat ticks, errors, request latency, tokens, sessions, queue depth
- Adds GET /metrics endpoint to gateway for Prometheus scraping
- Adds provider/model labels to AgentStart and AgentEnd events for better observability
- Adds as_any() method to Observer trait for backend-specific downcast

Metrics exposed:
- zeroclaw_agent_starts_total (Counter) with provider/model labels
- zeroclaw_agent_duration_seconds (Histogram) with provider/model labels
- zeroclaw_tool_calls_total (Counter) with tool/success labels
- zeroclaw_tool_duration_seconds (Histogram) with tool label
- zeroclaw_channel_messages_total (Counter) with channel/direction labels
- zeroclaw_heartbeat_ticks_total (Counter)
- zeroclaw_errors_total (Counter) with component label
- zeroclaw_request_latency_seconds (Histogram)
- zeroclaw_tokens_used_last (Gauge)
- zeroclaw_active_sessions (Gauge)
- zeroclaw_queue_depth (Gauge)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
argenis de la rosa 2026-02-17 14:01:37 -05:00 committed by Chummy
parent c04f2855e4
commit eba544dbd4
11 changed files with 575 additions and 228 deletions

View file

@ -556,9 +556,10 @@ pub async fn run(
}
agent.observer.record_event(&ObserverEvent::AgentEnd {
provider: "cli".to_string(),
model: "unknown".to_string(),
duration: start.elapsed(),
tokens_used: None,
cost_usd: None,
});
Ok(())

View file

@ -1332,9 +1332,10 @@ pub async fn run(
let duration = start.elapsed();
observer.record_event(&ObserverEvent::AgentEnd {
provider: provider_name.to_string(),
model: model_name.to_string(),
duration,
tokens_used: None,
cost_usd: None,
});
Ok(final_output)