feat(agents): add video_generate tool

2026-05-01 03:20:24 +00:00 · 2026-04-05 18:42:08 +01:00
parent b5e87be7f0
commit 5790435975
26 changed files with 1249 additions and 35 deletions
--- a/docs/concepts/models.md
+++ b/docs/concepts/models.md
@@ -30,7 +30,7 @@ Related:
  falls back to `agents.defaults.imageModel`, then the resolved session/default
  model.
 - `agents.defaults.imageGenerationModel` is used by the shared image-generation capability. If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
- `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. Unlike image generation, this does not infer a provider default today. Set an explicit `provider/model` such as `qwen/wan2.6-t2v`, and configure that provider's auth/API key too.
+- `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
 - Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)).

 ## Quick model policy
@@ -252,4 +252,5 @@ This applies whenever OpenClaw regenerates `models.json`, including command-driv
 - [Model Providers](/concepts/model-providers) — provider routing and auth
 - [Model Failover](/concepts/model-failover) — fallback chains
 - [Image Generation](/tools/image-generation) — image model configuration
+- [Video Generation](/tools/video-generation) — video model configuration
 - [Configuration Reference](/gateway/configuration-reference#agent-defaults) — model config keys
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@@ -1026,9 +1026,9 @@ Time format in system prompt. Default: `auto` (OS preference).
  - If you select a provider/model directly, configure the matching provider auth/API key too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` for `openai/*`, `FAL_KEY` for `fal/*`).
  - If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order.
 - `videoGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
-  - Used by the shared video-generation capability.
+  - Used by the shared video-generation capability and the built-in `video_generate` tool.
  - Typical values: `qwen/wan2.6-t2v`, `qwen/wan2.6-i2v`, `qwen/wan2.6-r2v`, `qwen/wan2.6-r2v-flash`, or `qwen/wan2.7-r2v`.
-  - Set this explicitly before using shared video generation. Unlike `imageGenerationModel`, the video-generation runtime does not infer a provider default yet.
+  - If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order.
  - If you select a provider/model directly, configure the matching provider auth/API key too.
  - The bundled Qwen video-generation provider currently supports up to 1 output video, 1 input image, 4 input videos, 10 seconds duration, and provider-level `size`, `aspectRatio`, `resolution`, `audio`, and `watermark` options.
 - `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
@@ -1936,12 +1936,12 @@ Defaults for Talk mode (macOS/iOS/Android).

 Local onboarding defaults new local configs to `tools.profile: "coding"` when unset (existing explicit profiles are preserved).

-| Profile     | Includes                                                                                                      |
-| ----------- | ------------------------------------------------------------------------------------------------------------- |
-| `minimal`   | `session_status` only                                                                                         |
-| `coding`    | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate` |
-| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status`                     |
-| `full`      | No restriction (same as unset)                                                                                |
+| Profile     | Includes                                                                                                                        |
+| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
+| `minimal`   | `session_status` only                                                                                                           |
+| `coding`    | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
+| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status`                                       |
+| `full`      | No restriction (same as unset)                                                                                                  |

 ### Tool groups

@@ -1957,7 +1957,7 @@ Local onboarding defaults new local configs to `tools.profile: "coding"` when un
 | `group:messaging`  | `message`                                                                                                               |
 | `group:nodes`      | `nodes`                                                                                                                 |
 | `group:agents`     | `agents_list`                                                                                                           |
-| `group:media`      | `image`, `image_generate`, `tts`                                                                                        |
+| `group:media`      | `image`, `image_generate`, `video_generate`, `tts`                                                                      |
 | `group:openclaw`   | All built-in tools (excludes provider plugins)                                                                          |

 ### `tools.allow` / `tools.deny`
--- a/docs/gateway/sandbox-vs-tool-policy-vs-elevated.md
+++ b/docs/gateway/sandbox-vs-tool-policy-vs-elevated.md
@@ -98,7 +98,7 @@ Available groups:
 - `group:messaging`: `message`
 - `group:nodes`: `nodes`
 - `group:agents`: `agents_list`
- `group:media`: `image`, `image_generate`, `tts`
+- `group:media`: `image`, `image_generate`, `video_generate`, `tts`
 - `group:openclaw`: all built-in OpenClaw tools (excludes provider plugins)

 ## Elevated: exec-only "run on host"
--- a/docs/providers/qwen.md
+++ b/docs/providers/qwen.md
@@ -123,6 +123,9 @@ Current bundled Qwen video-generation limits:
 - Up to **4** input videos
 - Up to **10 seconds** duration
 - Supports `size`, `aspectRatio`, `resolution`, `audio`, and `watermark`
+- Reference image/video mode currently requires **remote http(s) URLs**. Local
+  file paths are rejected up front because the DashScope video endpoint does not
+  accept uploaded local buffers for those references.

 See [Qwen / Model Studio](/providers/qwen_modelstudio) for endpoint-level detail
 and compatibility notes.
--- a/docs/tools/index.md
+++ b/docs/tools/index.md
@@ -53,25 +53,28 @@ OpenClaw has three layers that work together:

 These tools ship with OpenClaw and are available without installing any plugins:

-| Tool                                       | What it does                                                          | Page                                    |
-| ------------------------------------------ | --------------------------------------------------------------------- | --------------------------------------- |
-| `exec` / `process`                         | Run shell commands, manage background processes                       | [Exec](/tools/exec)                     |
-| `code_execution`                           | Run sandboxed remote Python analysis                                  | [Code Execution](/tools/code-execution) |
-| `browser`                                  | Control a Chromium browser (navigate, click, screenshot)              | [Browser](/tools/browser)               |
-| `web_search` / `x_search` / `web_fetch`    | Search the web, search X posts, fetch page content                    | [Web](/tools/web)                       |
-| `read` / `write` / `edit`                  | File I/O in the workspace                                             |                                         |
-| `apply_patch`                              | Multi-hunk file patches                                               | [Apply Patch](/tools/apply-patch)       |
-| `message`                                  | Send messages across all channels                                     | [Agent Send](/tools/agent-send)         |
-| `canvas`                                   | Drive node Canvas (present, eval, snapshot)                           |                                         |
-| `nodes`                                    | Discover and target paired devices                                    |                                         |
-| `cron` / `gateway`                         | Manage scheduled jobs; inspect, patch, restart, or update the gateway |                                         |
-| `image` / `image_generate`                 | Analyze or generate images                                            |                                         |
-| `tts`                                      | One-shot text-to-speech conversion                                    | [TTS](/tools/tts)                       |
-| `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration               | [Sub-agents](/tools/subagents)          |
-| `session_status`                           | Lightweight `/status`-style readback and session model override       | [Session Tools](/concepts/session-tool) |
+| Tool                                       | What it does                                                          | Page                                        |
+| ------------------------------------------ | --------------------------------------------------------------------- | ------------------------------------------- |
+| `exec` / `process`                         | Run shell commands, manage background processes                       | [Exec](/tools/exec)                         |
+| `code_execution`                           | Run sandboxed remote Python analysis                                  | [Code Execution](/tools/code-execution)     |
+| `browser`                                  | Control a Chromium browser (navigate, click, screenshot)              | [Browser](/tools/browser)                   |
+| `web_search` / `x_search` / `web_fetch`    | Search the web, search X posts, fetch page content                    | [Web](/tools/web)                           |
+| `read` / `write` / `edit`                  | File I/O in the workspace                                             |                                             |
+| `apply_patch`                              | Multi-hunk file patches                                               | [Apply Patch](/tools/apply-patch)           |
+| `message`                                  | Send messages across all channels                                     | [Agent Send](/tools/agent-send)             |
+| `canvas`                                   | Drive node Canvas (present, eval, snapshot)                           |                                             |
+| `nodes`                                    | Discover and target paired devices                                    |                                             |
+| `cron` / `gateway`                         | Manage scheduled jobs; inspect, patch, restart, or update the gateway |                                             |
+| `image` / `image_generate`                 | Analyze or generate images                                            | [Image Generation](/tools/image-generation) |
+| `video_generate`                           | Generate videos                                                       | [Video Generation](/tools/video-generation) |
+| `tts`                                      | One-shot text-to-speech conversion                                    | [TTS](/tools/tts)                           |
+| `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration               | [Sub-agents](/tools/subagents)              |
+| `session_status`                           | Lightweight `/status`-style readback and session model override       | [Session Tools](/concepts/session-tool)     |

 For image work, use `image` for analysis and `image_generate` for generation or editing. If you target `openai/*`, `google/*`, `fal/*`, or another non-default image provider, configure that provider's auth/API key first.

+For video work, use `video_generate`. If you target `qwen/*` or another non-default video provider, configure that provider's auth/API key first.
+
 `session_status` is the lightweight status/readback tool in the sessions group.
 It answers `/status`-style questions about the current session and can
 optionally set a per-session model override; `model=default` clears that
@@ -121,12 +124,12 @@ config. Deny always wins over allow.
 `tools.profile` sets a base allowlist before `allow`/`deny` is applied.
 Per-agent override: `agents.list[].tools.profile`.

-| Profile     | What it includes                                                                                              |
-| ----------- | ------------------------------------------------------------------------------------------------------------- |
-| `full`      | No restriction (same as unset)                                                                                |
-| `coding`    | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate` |
-| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status`                     |
-| `minimal`   | `session_status` only                                                                                         |
+| Profile     | What it includes                                                                                                                |
+| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
+| `full`      | No restriction (same as unset)                                                                                                  |
+| `coding`    | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
+| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status`                                       |
+| `minimal`   | `session_status` only                                                                                                           |

 ### Tool groups

@@ -144,7 +147,7 @@ Use `group:*` shorthands in allow/deny lists:
 | `group:messaging`  | message                                                                                                   |
 | `group:nodes`      | nodes                                                                                                     |
 | `group:agents`     | agents_list                                                                                               |
-| `group:media`      | image, image_generate, tts                                                                                |
+| `group:media`      | image, image_generate, video_generate, tts                                                                |
 | `group:openclaw`   | All built-in OpenClaw tools (excludes plugin tools)                                                       |

 `sessions_history` returns a bounded, safety-filtered recall view. It strips
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@@ -0,0 +1,109 @@
+---
+summary: "Generate videos using configured providers such as Qwen"
+read_when:
+  - Generating videos via the agent
+  - Configuring video generation providers and models
+  - Understanding the video_generate tool parameters
+title: "Video Generation"
+---
+
+# Video Generation
+
+The `video_generate` tool lets the agent create videos using your configured providers. Generated videos are delivered automatically as media attachments in the agent's reply.
+
+<Note>
+The tool only appears when at least one video-generation provider is available. If you don't see `video_generate` in your agent's tools, configure `agents.defaults.videoGenerationModel` or set up a provider API key.
+</Note>
+
+## Quick start
+
+1. Set an API key for at least one provider (for example `QWEN_API_KEY`).
+2. Optionally set your preferred model:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: "qwen/wan2.6-t2v",
+    },
+  },
+}
+```
+
+3. Ask the agent: _"Generate a 5-second cinematic video of a friendly lobster surfing at sunset."_
+
+The agent calls `video_generate` automatically. No tool allow-listing needed — it's enabled by default when a provider is available.
+
+## Supported providers
+
+| Provider | Default model | Reference inputs | API key                                                    |
+| -------- | ------------- | ---------------- | ---------------------------------------------------------- |
+| Qwen     | `wan2.6-t2v`  | Yes, remote URLs | `QWEN_API_KEY`, `MODELSTUDIO_API_KEY`, `DASHSCOPE_API_KEY` |
+
+Use `action: "list"` to inspect available providers and models at runtime:
+
+```
+/tool video_generate action=list
+```
+
+## Tool parameters
+
+| Parameter         | Type     | Description                                                                           |
+| ----------------- | -------- | ------------------------------------------------------------------------------------- |
+| `prompt`          | string   | Video generation prompt (required for `action: "generate"`)                           |
+| `action`          | string   | `"generate"` (default) or `"list"` to inspect providers                               |
+| `model`           | string   | Provider/model override, e.g. `qwen/wan2.6-t2v`                                       |
+| `image`           | string   | Single reference image path or URL                                                    |
+| `images`          | string[] | Multiple reference images (up to 5)                                                   |
+| `video`           | string   | Single reference video path or URL                                                    |
+| `videos`          | string[] | Multiple reference videos (up to 4)                                                   |
+| `size`            | string   | Size hint when the provider supports it                                               |
+| `aspectRatio`     | string   | Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9` |
+| `resolution`      | string   | Resolution hint: `480P`, `720P`, or `1080P`                                           |
+| `durationSeconds` | number   | Target duration in seconds                                                            |
+| `audio`           | boolean  | Enable generated audio when the provider supports it                                  |
+| `watermark`       | boolean  | Toggle provider watermarking when supported                                           |
+| `filename`        | string   | Output filename hint                                                                  |
+
+Not all providers support all parameters. The tool validates provider capability limits before it submits the request.
+
+## Configuration
+
+### Model selection
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "qwen/wan2.6-t2v",
+        fallbacks: ["qwen/wan2.6-r2v-flash"],
+      },
+    },
+  },
+}
+```
+
+### Provider selection order
+
+When generating a video, OpenClaw tries providers in this order:
+
+1. **`model` parameter** from the tool call (if the agent specifies one)
+2. **`videoGenerationModel.primary`** from config
+3. **`videoGenerationModel.fallbacks`** in order
+4. **Auto-detection** — uses auth-backed provider defaults only:
+   - current default provider first
+   - remaining registered video-generation providers in provider-id order
+
+If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.
+
+## Qwen reference inputs
+
+The bundled Qwen provider supports text-to-video plus image/video reference modes, but the upstream DashScope video endpoint currently requires **remote http(s) URLs** for reference inputs. Local file paths and uploaded buffers are rejected up front instead of being silently ignored.
+
+## Related
+
+- [Tools Overview](/tools) — all available agent tools
+- [Qwen](/providers/qwen) — Qwen-specific setup and limits
+- [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `videoGenerationModel` config
+- [Models](/concepts/models) — model configuration and failover