docs(providers): fill undocumented capability gaps (TTS, media understanding, embeddings, xSearch, env vars)

2026-05-06 06:40:44 +00:00 · 2026-04-12 12:06:18 +01:00
parent 3f32aa7582
commit 90fac50987
7 changed files with 225 additions and 0 deletions
--- a/docs/providers/anthropic.md
+++ b/docs/providers/anthropic.md
@@ -227,6 +227,21 @@ OpenClaw supports Anthropic's prompt caching feature for API-key auth.

  </Accordion>

+  <Accordion title="Media understanding (image and PDF)">
+    The bundled Anthropic plugin registers image and PDF understanding. OpenClaw
+    auto-resolves media capabilities from the configured Anthropic auth — no
+    additional config is needed.
+
+    | Property       | Value                |
+    | -------------- | -------------------- |
+    | Default model  | `claude-opus-4-6`    |
+    | Supported input | Images, PDF documents |
+
+    When an image or PDF is attached to a conversation, OpenClaw automatically
+    routes it through the Anthropic media understanding provider.
+
+  </Accordion>
+
  <Accordion title="1M context window (beta)">
    Anthropic's 1M context window is beta-gated. Enable it per model:

--- a/docs/providers/github-copilot.md
+++ b/docs/providers/github-copilot.md
@@ -90,6 +90,23 @@ openclaw models auth login --provider github-copilot --method device --set-defau
    selects the correct transport based on the model ref.
  </Accordion>

+  <Accordion title="Environment variable resolution order">
+    OpenClaw resolves Copilot auth from environment variables in the following
+    priority order:
+
+    | Priority | Variable              | Notes                            |
+    | -------- | --------------------- | -------------------------------- |
+    | 1        | `COPILOT_GITHUB_TOKEN` | Highest priority, Copilot-specific |
+    | 2        | `GH_TOKEN`            | GitHub CLI token (fallback)      |
+    | 3        | `GITHUB_TOKEN`        | Standard GitHub token (lowest)   |
+
+    When multiple variables are set, OpenClaw uses the highest-priority one.
+    The device-login flow (`openclaw models auth login-github-copilot`) stores
+    its token in the auth profile store and takes precedence over all environment
+    variables.
+
+  </Accordion>
+
  <Accordion title="Token storage">
    The login stores a GitHub token in the auth profile store and exchanges it
    for a Copilot API token when OpenClaw runs. You do not need to manage the
--- a/docs/providers/ollama.md
+++ b/docs/providers/ollama.md
@@ -381,6 +381,30 @@ For the full setup and behavior details, see [Ollama Web Search](/tools/ollama-s
    Ollama is free and runs locally, so all model costs are set to $0. This applies to both auto-discovered and manually defined models.
  </Accordion>

+  <Accordion title="Memory embeddings">
+    The bundled Ollama plugin registers a memory embedding provider for
+    [memory search](/concepts/memory). It uses the configured Ollama base URL
+    and API key.
+
+    | Property      | Value               |
+    | ------------- | ------------------- |
+    | Default model | `nomic-embed-text`  |
+    | Auto-pull     | Yes — the embedding model is pulled automatically if not present locally |
+
+    To select Ollama as the memory search embedding provider:
+
+    ```json5
+    {
+      agents: {
+        defaults: {
+          memorySearch: { provider: "ollama" },
+        },
+      },
+    }
+    ```
+
+  </Accordion>
+
  <Accordion title="Streaming configuration">
    OpenClaw's Ollama integration uses the **native Ollama API** (`/api/chat`) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed.

--- a/docs/providers/openai.md
+++ b/docs/providers/openai.md
@@ -237,6 +237,77 @@ OpenClaw adds a small OpenAI-specific prompt overlay for `openai/*` and `openai-
 Values are case-insensitive at runtime, so `"Off"` and `"off"` both disable the overlay.
 </Tip>

+## Voice and speech
+
+<AccordionGroup>
+  <Accordion title="Speech synthesis (TTS)">
+    The bundled `openai` plugin registers speech synthesis for the `messages.tts` surface.
+
+    | Setting | Config path | Default |
+    |---------|------------|---------|
+    | Model | `messages.tts.providers.openai.model` | `gpt-4o-mini-tts` |
+    | Voice | `messages.tts.providers.openai.voice` | `coral` |
+    | Speed | `messages.tts.providers.openai.speed` | (unset) |
+    | Instructions | `messages.tts.providers.openai.instructions` | (unset, `gpt-4o-mini-tts` only) |
+    | Format | `messages.tts.providers.openai.responseFormat` | `opus` for voice notes, `mp3` for files |
+    | API key | `messages.tts.providers.openai.apiKey` | Falls back to `OPENAI_API_KEY` |
+    | Base URL | `messages.tts.providers.openai.baseUrl` | `https://api.openai.com/v1` |
+
+    Available models: `gpt-4o-mini-tts`, `tts-1`, `tts-1-hd`. Available voices: `alloy`, `ash`, `ballad`, `cedar`, `coral`, `echo`, `fable`, `juniper`, `marin`, `onyx`, `nova`, `sage`, `shimmer`, `verse`.
+
+    ```json5
+    {
+      messages: {
+        tts: {
+          providers: {
+            openai: { model: "gpt-4o-mini-tts", voice: "coral" },
+          },
+        },
+      },
+    }
+    ```
+
+    <Note>
+    Set `OPENAI_TTS_BASE_URL` to override the TTS base URL without affecting the chat API endpoint.
+    </Note>
+
+  </Accordion>
+
+  <Accordion title="Realtime transcription">
+    The bundled `openai` plugin registers realtime transcription for the Voice Call plugin.
+
+    | Setting | Config path | Default |
+    |---------|------------|---------|
+    | Model | `plugins.entries.voice-call.config.streaming.providers.openai.model` | `gpt-4o-transcribe` |
+    | Silence duration | `...openai.silenceDurationMs` | `800` |
+    | VAD threshold | `...openai.vadThreshold` | `0.5` |
+    | API key | `...openai.apiKey` | Falls back to `OPENAI_API_KEY` |
+
+    <Note>
+    Uses a WebSocket connection to `wss://api.openai.com/v1/realtime` with G.711 u-law audio.
+    </Note>
+
+  </Accordion>
+
+  <Accordion title="Realtime voice">
+    The bundled `openai` plugin registers realtime voice for the Voice Call plugin.
+
+    | Setting | Config path | Default |
+    |---------|------------|---------|
+    | Model | `plugins.entries.voice-call.config.realtime.providers.openai.model` | `gpt-realtime` |
+    | Voice | `...openai.voice` | `alloy` |
+    | Temperature | `...openai.temperature` | `0.8` |
+    | VAD threshold | `...openai.vadThreshold` | `0.5` |
+    | Silence duration | `...openai.silenceDurationMs` | `500` |
+    | API key | `...openai.apiKey` | Falls back to `OPENAI_API_KEY` |
+
+    <Note>
+    Supports Azure OpenAI via `azureEndpoint` and `azureDeployment` config keys. Supports bidirectional tool calling. Uses G.711 u-law audio format.
+    </Note>
+
+  </Accordion>
+</AccordionGroup>
+
 ## Advanced configuration

 <AccordionGroup>
--- a/docs/providers/qwen.md
+++ b/docs/providers/qwen.md
@@ -198,6 +198,21 @@ See [Video Generation](/tools/video-generation) for shared tool parameters, prov
 ## Advanced

 <AccordionGroup>
+  <Accordion title="Image and video understanding">
+    The bundled Qwen plugin registers media understanding for images and video
+    on the **Standard** DashScope endpoints (not the Coding Plan endpoints).
+
+    | Property      | Value                 |
+    | ------------- | --------------------- |
+    | Model         | `qwen-vl-max-latest`  |
+    | Supported input | Images, video       |
+
+    Media understanding is auto-resolved from the configured Qwen auth — no
+    additional config is needed. Ensure you are using a Standard (pay-as-you-go)
+    endpoint for media understanding support.
+
+  </Accordion>
+
  <Accordion title="Qwen 3.6 Plus availability">
    `qwen3.6-plus` is available on the Standard (pay-as-you-go) Model Studio
    endpoints:
--- a/docs/providers/xai.md
+++ b/docs/providers/xai.md
@@ -132,6 +132,77 @@ Legacy aliases still normalize to the canonical bundled ids:

  </Accordion>

+  <Accordion title="x_search configuration">
+    The bundled xAI plugin exposes `x_search` as an OpenClaw tool for searching
+    X (formerly Twitter) content via Grok.
+
+    Config path: `plugins.entries.xai.config.xSearch`
+
+    | Key                | Type    | Default            | Description                          |
+    | ------------------ | ------- | ------------------ | ------------------------------------ |
+    | `enabled`          | boolean | —                  | Enable or disable x_search           |
+    | `model`            | string  | `grok-4-1-fast`    | Model used for x_search requests     |
+    | `inlineCitations`  | boolean | —                  | Include inline citations in results  |
+    | `maxTurns`         | number  | —                  | Maximum conversation turns           |
+    | `timeoutSeconds`   | number  | —                  | Request timeout in seconds           |
+    | `cacheTtlMinutes`  | number  | —                  | Cache time-to-live in minutes        |
+
+    ```json5
+    {
+      plugins: {
+        entries: {
+          xai: {
+            config: {
+              xSearch: {
+                enabled: true,
+                model: "grok-4-1-fast",
+                inlineCitations: true,
+              },
+            },
+          },
+        },
+      },
+    }
+    ```
+
+  </Accordion>
+
+  <Accordion title="Code execution configuration">
+    The bundled xAI plugin exposes `code_execution` as an OpenClaw tool for
+    remote code execution in xAI's sandbox environment.
+
+    Config path: `plugins.entries.xai.config.codeExecution`
+
+    | Key               | Type    | Default            | Description                              |
+    | ----------------- | ------- | ------------------ | ---------------------------------------- |
+    | `enabled`         | boolean | `true` (if key available) | Enable or disable code execution  |
+    | `model`           | string  | `grok-4-1-fast`    | Model used for code execution requests   |
+    | `maxTurns`        | number  | —                  | Maximum conversation turns               |
+    | `timeoutSeconds`  | number  | —                  | Request timeout in seconds               |
+
+    <Note>
+    This is remote xAI sandbox execution, not local [`exec`](/tools/exec).
+    </Note>
+
+    ```json5
+    {
+      plugins: {
+        entries: {
+          xai: {
+            config: {
+              codeExecution: {
+                enabled: true,
+                model: "grok-4-1-fast",
+              },
+            },
+          },
+        },
+      },
+    }
+    ```
+
+  </Accordion>
+
  <Accordion title="Known limits">
    - Auth is API-key only today. There is no xAI OAuth or device-code flow in
      OpenClaw yet.
--- a/docs/providers/zai.md
+++ b/docs/providers/zai.md
@@ -134,6 +134,18 @@ GLM models are available as `zai/<model>` (example: `zai/glm-5`). The default bu

  </Accordion>

+  <Accordion title="Image understanding">
+    The bundled Z.AI plugin registers image understanding.
+
+    | Property      | Value       |
+    | ------------- | ----------- |
+    | Model         | `glm-4.6v`  |
+
+    Image understanding is auto-resolved from the configured Z.AI auth — no
+    additional config is needed.
+
+  </Accordion>
+
  <Accordion title="Auth details">
    - Z.AI uses Bearer auth with your API key.
    - The `zai-api-key` onboarding choice auto-detects the matching Z.AI endpoint from the key prefix.