The OpenAI Responses API requires assistant messages to use content type
"output_text" while user messages use "input_text". The prior implementation
used "input_text" for both roles, causing 400 errors on multi-turn history.
Extract build_responses_input() helper for testability and add 3 unit tests
covering role→content-type mapping, default instructions, and unknown roles.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both providers only implemented chat_with_system, so the default
chat_with_history trait method was discarding all conversation history
except the last user message. This caused the Telegram bot to lose
context between messages.
Changes:
- OpenAiCodexProvider: extract send_responses_request helper, add
chat_with_history that maps full ChatMessage history to ResponsesInput
- GeminiProvider: extract send_generate_content helper, add
chat_with_history that maps ChatMessage history to Gemini Content
(with assistant→model role mapping)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the non-functional OpenAI-compatible stub with a purpose-built
Bedrock provider that implements AWS SigV4 signing from first principles
using hmac/sha2/hex crates — no AWS SDK dependency.
Key capabilities:
- SigV4 authentication (AKSK + optional session token)
- Converse API with native tool calling support
- Prompt caching via cachePoint heuristics
- Proper URI encoding for model IDs containing colons
- Resilient response parsing with unknown block type fallback
Also updates:
- Factory wiring and credential resolution bypass for AKSK auth
- Onboard wizard with Bedrock-specific model selection and guidance
- Provider reference docs with auth, region, and model ID details
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fallback providers in create_resilient_provider_with_options() were
created via create_provider_with_options() which passed the primary
provider's api_key as credential_override. This caused
resolve_provider_credential() to short-circuit on the override and
never check the fallback provider's own env var (e.g. DEEPSEEK_API_KEY
for a deepseek fallback), resulting in auth failures (401) when the
primary and fallback use different API services.
Switch to create_provider_with_url(fallback, None, None) so each
fallback resolves its own credential via provider-specific env vars.
This also enables custom: URL prefixes (e.g.
custom:http://host.docker.internal:1234/v1) to work as fallback
entries, which was previously impossible through the options path.
Add three focused tests covering independent credential resolution,
custom URL fallbacks, and mixed fallback chains.
MiniMax API rejects role: system in the messages array with error
2013 (invalid message role: system). In channel mode, the history
builder prepends a system message and optionally appends a second
one for delivery instructions, causing 400 errors on every channel
turn.
Additionally, MiniMax reasoning models embed chain-of-thought in
the content field as <think>...</think> blocks rather than using
the separate reasoning_content field, causing raw thinking output
to leak into user-visible responses.
Changes:
- Add merge_system_into_user flag to OpenAiCompatibleProvider;
when set, all system messages are concatenated and prepended to
the first user message before sending to the API
- Add new_merge_system_into_user() constructor used by MiniMax
- Add strip_think_tags() helper that removes <think>...</think>
blocks from response content before returning to the caller
- Apply strip_think_tags in effective_content() and
effective_content_optional() so all non-streaming paths are covered
- Update MiniMax factory registration to use new_merge_system_into_user
- Fix pre-existing rustfmt violation on apply_auth_header call
All other providers continue to use the default path unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The OpenAiProvider overrode chat() with native tool support but never
overrode chat_with_tools(), which is the method called by
run_tool_call_loop in channel mode (IRC/Discord/etc). The trait default
for chat_with_tools() silently drops the tools parameter, sending plain
ChatRequest with no tools — causing the model to never use native tool
calls in channel mode.
Add chat_with_tools() override that deserializes tool specs, uses
convert_messages() for proper tool_call_id handling, and sends
NativeChatRequest with tools and tool_choice.
Also add Deserialize derive to NativeToolSpec and NativeToolFunctionSpec
to support deserialization from OpenAI-format JSON.
- add scope-aware proxy schema and runtime wiring for providers/channels/tools
- add agent callable proxy_config tool for fast proxy setup
- standardize docs system with index, template, and playbooks
Route OVHcloud through OpenAiProvider (with proper tool_call_id
serialization) instead of OpenAiCompatibleProvider, fixing tool-call
round-trips against vLLM-based endpoints.
- Add base_url field and with_base_url() constructor to OpenAiProvider
- Replace all hardcoded api.openai.com URLs with self.base_url
- Pass api_url through for the openai provider arm
- Register ovhcloud/ovh provider with env var OVH_AI_ENDPOINTS_ACCESS_TOKEN
Implement chat_with_tools() on CompatibleProvider so OpenAI-compatible
endpoints (OpenRouter, local LLMs, etc.) can use structured tool calling
instead of prompt-injected tool descriptions.
Changes:
- CompatibleProvider: capabilities() reports native_tool_calling, new
chat_with_tools() sends tools in API request and parses tool_calls
from response, chat() bridges to chat_with_tools() when ToolSpecs
are provided
- RouterProvider: chat_with_tools() delegation with model hint resolution
- loop_.rs: expose tools_to_openai_format as pub(crate), add
tools_to_openai_format_from_specs for ToolSpec-based conversion
Adds 9 new tests and updates 1 existing test.
Reasoning/thinking models (Qwen3, GLM-4, DeepSeek, etc.) may return
output in `reasoning_content` instead of `content`. Add automatic
fallback for both OpenAI and OpenAI-compatible providers, including
streaming SSE support.
Changes:
- Add `reasoning_content` field to response structs in both providers
- Add `effective_content()` helper that prefers `content` but falls
back to `reasoning_content` when content is empty/null/missing
- Update all extraction sites to use `effective_content()`
- Add streaming SSE fallback for `reasoning_content` chunks
- Add 16 focused unit tests covering all edge cases
Tested end-to-end against GLM-4.7-flash via local LLM server.
All five providers have HTTP clients but did not implement warmup(),
relying on the trait default no-op. This adds lightweight warmup calls
to establish TLS + HTTP/2 connection pools on startup, reducing
first-request latency. Each warmup is skipped when credentials are
absent, matching the OpenRouter pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Anthropic's prompt caching API to enable significant cost
reduction (up to 90%) and latency improvements (up to 85%) for
requests with repeated content.
Key features:
- Auto-caching heuristics: large system prompts (>3KB), tool
definitions, and long conversations (>4 messages)
- Full backward compatibility: cache_control fields are optional
- Supports both string and block-array system prompt formats
- Cache control on all content types (text, tool_use, tool_result)
Implementation details:
- Added CacheControl, SystemPrompt, and SystemBlock structures
- Updated NativeContentOut and NativeToolSpec with cache_control
- Strategic cache breakpoint placement (last tool, last message)
- Comprehensive test coverage for serialization and heuristics
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
(cherry picked from commit fff04f4edb5e4cb7e581b1b16035da8cc2e55cef)
The GLM provider previously relied on the trait default for
chat_with_history, which only forwarded the last user message. This adds
a proper multi-turn implementation that sends the full conversation
history to the GLM API, matching the pattern used by OpenRouter, Ollama,
and other providers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Gemini CLI OAuth tokens are scoped for Google's internal Code Assist
API at cloudcode-pa.googleapis.com/v1internal, not the public
generativelanguage.googleapis.com/v1beta endpoint.
This commit:
- Routes OAuth requests to the correct internal endpoint
- Wraps the request payload with model metadata (internal API format)
- Keeps API key auth unchanged on the public endpoint
Fixes#578
ReliableProvider wraps underlying providers with retry/fallback logic
but did not delegate `supports_native_tools()` or `chat_with_tools()`.
This caused the agent loop to fall back to prompt-based tool calling
for all providers, even those with native tool support (OpenRouter,
OpenAI, Anthropic). Models like Gemini 2.0 Flash would then output
tool calls as text instead of structured API responses, breaking the
tool execution loop entirely.
Add `supports_native_tools()` delegation to the primary provider and
`chat_with_tools()` with the same retry/fallback logic as the existing
`chat_with_system()` and `chat_with_history()` methods.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>