feat: add multimodal image marker support with Ollama vision

2026-02-19 20:24:56 +08:00 · 2026-02-19 20:24:56 +08:00 · dcd0bf641d
commit dcd0bf641d
parent 63aacb09ff
21 changed files with 1152 additions and 78 deletions
--- a/docs/providers-reference.md
+++ b/docs/providers-reference.md
@ -56,6 +56,13 @@ credential is not reused for fallback providers.
 | `lmstudio` | `lm-studio` | Yes | (optional; local by default) |
 | `nvidia` | `nvidia-nim`, `build.nvidia.com` | No | `NVIDIA_API_KEY` |

+### Ollama Vision Notes
+
+- Provider ID: `ollama`
+- Vision input is supported through user message image markers: ``[IMAGE:<source>]``.
+- After multimodal normalization, ZeroClaw sends image payloads through Ollama's native `messages[].images` field.
+- If a non-vision provider is selected, ZeroClaw returns a structured capability error instead of silently ignoring images.
+
 ### Bedrock Notes

 - Provider ID: `bedrock` (alias: `aws-bedrock`)