docs: document Ollama image understanding

2026-05-06 10:40:43 +00:00 · 2026-04-21 22:33:53 +01:00
parent f1f6214fd5
commit d2f68af615
3 changed files with 73 additions and 13 deletions
--- a/docs/nodes/media-understanding.md
+++ b/docs/nodes/media-understanding.md
@@ -136,6 +136,9 @@ Rules:
 - If the active primary image model already supports vision natively, OpenClaw
  skips the `[Image]` summary block and passes the original image into the
  model instead.
+- Explicit `openclaw infer image describe --model <provider/model>` requests
+  are different: they run that image-capable provider/model directly, including
+  Ollama refs such as `ollama/qwen2.5vl:7b`.
 - If `<capability>.enabled: true` but no models are configured, OpenClaw tries the
  **active reply model** when its provider supports the capability.

@@ -157,6 +160,9 @@ working option**:
     tried before the bundled fallback order.
   - Image-only config providers with an image-capable model auto-register for
     media understanding even when they are not a bundled vendor plugin.
+   - Ollama image understanding is available when selected explicitly, for
+     example through `agents.defaults.imageModel` or
+     `openclaw infer image describe --model ollama/<vision-model>`.
   - Bundled fallback order:
     - Audio: OpenAI → Groq → Deepgram → Google → Mistral
     - Image: OpenAI → Anthropic → Google → MiniMax → MiniMax Portal → Z.AI