feat(agents): add video_generate tool

This commit is contained in:
Peter Steinberger
2026-04-05 18:42:08 +01:00
parent b5e87be7f0
commit 5790435975
26 changed files with 1249 additions and 35 deletions

View File

@@ -30,7 +30,7 @@ Related:
falls back to `agents.defaults.imageModel`, then the resolved session/default
model.
- `agents.defaults.imageGenerationModel` is used by the shared image-generation capability. If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
- `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. Unlike image generation, this does not infer a provider default today. Set an explicit `provider/model` such as `qwen/wan2.6-t2v`, and configure that provider's auth/API key too.
- `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
- Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)).
## Quick model policy
@@ -252,4 +252,5 @@ This applies whenever OpenClaw regenerates `models.json`, including command-driv
- [Model Providers](/concepts/model-providers) — provider routing and auth
- [Model Failover](/concepts/model-failover) — fallback chains
- [Image Generation](/tools/image-generation) — image model configuration
- [Video Generation](/tools/video-generation) — video model configuration
- [Configuration Reference](/gateway/configuration-reference#agent-defaults) — model config keys

View File

@@ -1026,9 +1026,9 @@ Time format in system prompt. Default: `auto` (OS preference).
- If you select a provider/model directly, configure the matching provider auth/API key too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` for `openai/*`, `FAL_KEY` for `fal/*`).
- If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order.
- `videoGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
- Used by the shared video-generation capability.
- Used by the shared video-generation capability and the built-in `video_generate` tool.
- Typical values: `qwen/wan2.6-t2v`, `qwen/wan2.6-i2v`, `qwen/wan2.6-r2v`, `qwen/wan2.6-r2v-flash`, or `qwen/wan2.7-r2v`.
- Set this explicitly before using shared video generation. Unlike `imageGenerationModel`, the video-generation runtime does not infer a provider default yet.
- If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order.
- If you select a provider/model directly, configure the matching provider auth/API key too.
- The bundled Qwen video-generation provider currently supports up to 1 output video, 1 input image, 4 input videos, 10 seconds duration, and provider-level `size`, `aspectRatio`, `resolution`, `audio`, and `watermark` options.
- `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
@@ -1936,12 +1936,12 @@ Defaults for Talk mode (macOS/iOS/Android).
Local onboarding defaults new local configs to `tools.profile: "coding"` when unset (existing explicit profiles are preserved).
| Profile | Includes |
| ----------- | ------------------------------------------------------------------------------------------------------------- |
| `minimal` | `session_status` only |
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate` |
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
| `full` | No restriction (same as unset) |
| Profile | Includes |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `minimal` | `session_status` only |
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
| `full` | No restriction (same as unset) |
### Tool groups
@@ -1957,7 +1957,7 @@ Local onboarding defaults new local configs to `tools.profile: "coding"` when un
| `group:messaging` | `message` |
| `group:nodes` | `nodes` |
| `group:agents` | `agents_list` |
| `group:media` | `image`, `image_generate`, `tts` |
| `group:media` | `image`, `image_generate`, `video_generate`, `tts` |
| `group:openclaw` | All built-in tools (excludes provider plugins) |
### `tools.allow` / `tools.deny`

View File

@@ -98,7 +98,7 @@ Available groups:
- `group:messaging`: `message`
- `group:nodes`: `nodes`
- `group:agents`: `agents_list`
- `group:media`: `image`, `image_generate`, `tts`
- `group:media`: `image`, `image_generate`, `video_generate`, `tts`
- `group:openclaw`: all built-in OpenClaw tools (excludes provider plugins)
## Elevated: exec-only "run on host"

View File

@@ -123,6 +123,9 @@ Current bundled Qwen video-generation limits:
- Up to **4** input videos
- Up to **10 seconds** duration
- Supports `size`, `aspectRatio`, `resolution`, `audio`, and `watermark`
- Reference image/video mode currently requires **remote http(s) URLs**. Local
file paths are rejected up front because the DashScope video endpoint does not
accept uploaded local buffers for those references.
See [Qwen / Model Studio](/providers/qwen_modelstudio) for endpoint-level detail
and compatibility notes.

View File

@@ -53,25 +53,28 @@ OpenClaw has three layers that work together:
These tools ship with OpenClaw and are available without installing any plugins:
| Tool | What it does | Page |
| ------------------------------------------ | --------------------------------------------------------------------- | --------------------------------------- |
| `exec` / `process` | Run shell commands, manage background processes | [Exec](/tools/exec) |
| `code_execution` | Run sandboxed remote Python analysis | [Code Execution](/tools/code-execution) |
| `browser` | Control a Chromium browser (navigate, click, screenshot) | [Browser](/tools/browser) |
| `web_search` / `x_search` / `web_fetch` | Search the web, search X posts, fetch page content | [Web](/tools/web) |
| `read` / `write` / `edit` | File I/O in the workspace | |
| `apply_patch` | Multi-hunk file patches | [Apply Patch](/tools/apply-patch) |
| `message` | Send messages across all channels | [Agent Send](/tools/agent-send) |
| `canvas` | Drive node Canvas (present, eval, snapshot) | |
| `nodes` | Discover and target paired devices | |
| `cron` / `gateway` | Manage scheduled jobs; inspect, patch, restart, or update the gateway | |
| `image` / `image_generate` | Analyze or generate images | |
| `tts` | One-shot text-to-speech conversion | [TTS](/tools/tts) |
| `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration | [Sub-agents](/tools/subagents) |
| `session_status` | Lightweight `/status`-style readback and session model override | [Session Tools](/concepts/session-tool) |
| Tool | What it does | Page |
| ------------------------------------------ | --------------------------------------------------------------------- | ------------------------------------------- |
| `exec` / `process` | Run shell commands, manage background processes | [Exec](/tools/exec) |
| `code_execution` | Run sandboxed remote Python analysis | [Code Execution](/tools/code-execution) |
| `browser` | Control a Chromium browser (navigate, click, screenshot) | [Browser](/tools/browser) |
| `web_search` / `x_search` / `web_fetch` | Search the web, search X posts, fetch page content | [Web](/tools/web) |
| `read` / `write` / `edit` | File I/O in the workspace | |
| `apply_patch` | Multi-hunk file patches | [Apply Patch](/tools/apply-patch) |
| `message` | Send messages across all channels | [Agent Send](/tools/agent-send) |
| `canvas` | Drive node Canvas (present, eval, snapshot) | |
| `nodes` | Discover and target paired devices | |
| `cron` / `gateway` | Manage scheduled jobs; inspect, patch, restart, or update the gateway | |
| `image` / `image_generate` | Analyze or generate images | [Image Generation](/tools/image-generation) |
| `video_generate` | Generate videos | [Video Generation](/tools/video-generation) |
| `tts` | One-shot text-to-speech conversion | [TTS](/tools/tts) |
| `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration | [Sub-agents](/tools/subagents) |
| `session_status` | Lightweight `/status`-style readback and session model override | [Session Tools](/concepts/session-tool) |
For image work, use `image` for analysis and `image_generate` for generation or editing. If you target `openai/*`, `google/*`, `fal/*`, or another non-default image provider, configure that provider's auth/API key first.
For video work, use `video_generate`. If you target `qwen/*` or another non-default video provider, configure that provider's auth/API key first.
`session_status` is the lightweight status/readback tool in the sessions group.
It answers `/status`-style questions about the current session and can
optionally set a per-session model override; `model=default` clears that
@@ -121,12 +124,12 @@ config. Deny always wins over allow.
`tools.profile` sets a base allowlist before `allow`/`deny` is applied.
Per-agent override: `agents.list[].tools.profile`.
| Profile | What it includes |
| ----------- | ------------------------------------------------------------------------------------------------------------- |
| `full` | No restriction (same as unset) |
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate` |
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
| `minimal` | `session_status` only |
| Profile | What it includes |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `full` | No restriction (same as unset) |
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
| `minimal` | `session_status` only |
### Tool groups
@@ -144,7 +147,7 @@ Use `group:*` shorthands in allow/deny lists:
| `group:messaging` | message |
| `group:nodes` | nodes |
| `group:agents` | agents_list |
| `group:media` | image, image_generate, tts |
| `group:media` | image, image_generate, video_generate, tts |
| `group:openclaw` | All built-in OpenClaw tools (excludes plugin tools) |
`sessions_history` returns a bounded, safety-filtered recall view. It strips

View File

@@ -0,0 +1,109 @@
---
summary: "Generate videos using configured providers such as Qwen"
read_when:
- Generating videos via the agent
- Configuring video generation providers and models
- Understanding the video_generate tool parameters
title: "Video Generation"
---
# Video Generation
The `video_generate` tool lets the agent create videos using your configured providers. Generated videos are delivered automatically as media attachments in the agent's reply.
<Note>
The tool only appears when at least one video-generation provider is available. If you don't see `video_generate` in your agent's tools, configure `agents.defaults.videoGenerationModel` or set up a provider API key.
</Note>
## Quick start
1. Set an API key for at least one provider (for example `QWEN_API_KEY`).
2. Optionally set your preferred model:
```json5
{
agents: {
defaults: {
videoGenerationModel: "qwen/wan2.6-t2v",
},
},
}
```
3. Ask the agent: _"Generate a 5-second cinematic video of a friendly lobster surfing at sunset."_
The agent calls `video_generate` automatically. No tool allow-listing needed — it's enabled by default when a provider is available.
## Supported providers
| Provider | Default model | Reference inputs | API key |
| -------- | ------------- | ---------------- | ---------------------------------------------------------- |
| Qwen | `wan2.6-t2v` | Yes, remote URLs | `QWEN_API_KEY`, `MODELSTUDIO_API_KEY`, `DASHSCOPE_API_KEY` |
Use `action: "list"` to inspect available providers and models at runtime:
```
/tool video_generate action=list
```
## Tool parameters
| Parameter | Type | Description |
| ----------------- | -------- | ------------------------------------------------------------------------------------- |
| `prompt` | string | Video generation prompt (required for `action: "generate"`) |
| `action` | string | `"generate"` (default) or `"list"` to inspect providers |
| `model` | string | Provider/model override, e.g. `qwen/wan2.6-t2v` |
| `image` | string | Single reference image path or URL |
| `images` | string[] | Multiple reference images (up to 5) |
| `video` | string | Single reference video path or URL |
| `videos` | string[] | Multiple reference videos (up to 4) |
| `size` | string | Size hint when the provider supports it |
| `aspectRatio` | string | Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9` |
| `resolution` | string | Resolution hint: `480P`, `720P`, or `1080P` |
| `durationSeconds` | number | Target duration in seconds |
| `audio` | boolean | Enable generated audio when the provider supports it |
| `watermark` | boolean | Toggle provider watermarking when supported |
| `filename` | string | Output filename hint |
Not all providers support all parameters. The tool validates provider capability limits before it submits the request.
## Configuration
### Model selection
```json5
{
agents: {
defaults: {
videoGenerationModel: {
primary: "qwen/wan2.6-t2v",
fallbacks: ["qwen/wan2.6-r2v-flash"],
},
},
},
}
```
### Provider selection order
When generating a video, OpenClaw tries providers in this order:
1. **`model` parameter** from the tool call (if the agent specifies one)
2. **`videoGenerationModel.primary`** from config
3. **`videoGenerationModel.fallbacks`** in order
4. **Auto-detection** — uses auth-backed provider defaults only:
- current default provider first
- remaining registered video-generation providers in provider-id order
If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.
## Qwen reference inputs
The bundled Qwen provider supports text-to-video plus image/video reference modes, but the upstream DashScope video endpoint currently requires **remote http(s) URLs** for reference inputs. Local file paths and uploaded buffers are rejected up front instead of being silently ignored.
## Related
- [Tools Overview](/tools) — all available agent tools
- [Qwen](/providers/qwen) — Qwen-specific setup and limits
- [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `videoGenerationModel` config
- [Models](/concepts/models) — model configuration and failover