mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-01 03:20:24 +00:00
feat(agents): add video_generate tool
This commit is contained in:
@@ -30,7 +30,7 @@ Related:
|
||||
falls back to `agents.defaults.imageModel`, then the resolved session/default
|
||||
model.
|
||||
- `agents.defaults.imageGenerationModel` is used by the shared image-generation capability. If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
|
||||
- `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. Unlike image generation, this does not infer a provider default today. Set an explicit `provider/model` such as `qwen/wan2.6-t2v`, and configure that provider's auth/API key too.
|
||||
- `agents.defaults.videoGenerationModel` is used by the shared video-generation capability. If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order. If you set a specific provider/model, also configure that provider's auth/API key.
|
||||
- Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)).
|
||||
|
||||
## Quick model policy
|
||||
@@ -252,4 +252,5 @@ This applies whenever OpenClaw regenerates `models.json`, including command-driv
|
||||
- [Model Providers](/concepts/model-providers) — provider routing and auth
|
||||
- [Model Failover](/concepts/model-failover) — fallback chains
|
||||
- [Image Generation](/tools/image-generation) — image model configuration
|
||||
- [Video Generation](/tools/video-generation) — video model configuration
|
||||
- [Configuration Reference](/gateway/configuration-reference#agent-defaults) — model config keys
|
||||
|
||||
@@ -1026,9 +1026,9 @@ Time format in system prompt. Default: `auto` (OS preference).
|
||||
- If you select a provider/model directly, configure the matching provider auth/API key too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` for `openai/*`, `FAL_KEY` for `fal/*`).
|
||||
- If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order.
|
||||
- `videoGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
|
||||
- Used by the shared video-generation capability.
|
||||
- Used by the shared video-generation capability and the built-in `video_generate` tool.
|
||||
- Typical values: `qwen/wan2.6-t2v`, `qwen/wan2.6-i2v`, `qwen/wan2.6-r2v`, `qwen/wan2.6-r2v-flash`, or `qwen/wan2.7-r2v`.
|
||||
- Set this explicitly before using shared video generation. Unlike `imageGenerationModel`, the video-generation runtime does not infer a provider default yet.
|
||||
- If omitted, `video_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered video-generation providers in provider-id order.
|
||||
- If you select a provider/model directly, configure the matching provider auth/API key too.
|
||||
- The bundled Qwen video-generation provider currently supports up to 1 output video, 1 input image, 4 input videos, 10 seconds duration, and provider-level `size`, `aspectRatio`, `resolution`, `audio`, and `watermark` options.
|
||||
- `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
|
||||
@@ -1936,12 +1936,12 @@ Defaults for Talk mode (macOS/iOS/Android).
|
||||
|
||||
Local onboarding defaults new local configs to `tools.profile: "coding"` when unset (existing explicit profiles are preserved).
|
||||
|
||||
| Profile | Includes |
|
||||
| ----------- | ------------------------------------------------------------------------------------------------------------- |
|
||||
| `minimal` | `session_status` only |
|
||||
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate` |
|
||||
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
|
||||
| `full` | No restriction (same as unset) |
|
||||
| Profile | Includes |
|
||||
| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `minimal` | `session_status` only |
|
||||
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
|
||||
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
|
||||
| `full` | No restriction (same as unset) |
|
||||
|
||||
### Tool groups
|
||||
|
||||
@@ -1957,7 +1957,7 @@ Local onboarding defaults new local configs to `tools.profile: "coding"` when un
|
||||
| `group:messaging` | `message` |
|
||||
| `group:nodes` | `nodes` |
|
||||
| `group:agents` | `agents_list` |
|
||||
| `group:media` | `image`, `image_generate`, `tts` |
|
||||
| `group:media` | `image`, `image_generate`, `video_generate`, `tts` |
|
||||
| `group:openclaw` | All built-in tools (excludes provider plugins) |
|
||||
|
||||
### `tools.allow` / `tools.deny`
|
||||
|
||||
@@ -98,7 +98,7 @@ Available groups:
|
||||
- `group:messaging`: `message`
|
||||
- `group:nodes`: `nodes`
|
||||
- `group:agents`: `agents_list`
|
||||
- `group:media`: `image`, `image_generate`, `tts`
|
||||
- `group:media`: `image`, `image_generate`, `video_generate`, `tts`
|
||||
- `group:openclaw`: all built-in OpenClaw tools (excludes provider plugins)
|
||||
|
||||
## Elevated: exec-only "run on host"
|
||||
|
||||
@@ -123,6 +123,9 @@ Current bundled Qwen video-generation limits:
|
||||
- Up to **4** input videos
|
||||
- Up to **10 seconds** duration
|
||||
- Supports `size`, `aspectRatio`, `resolution`, `audio`, and `watermark`
|
||||
- Reference image/video mode currently requires **remote http(s) URLs**. Local
|
||||
file paths are rejected up front because the DashScope video endpoint does not
|
||||
accept uploaded local buffers for those references.
|
||||
|
||||
See [Qwen / Model Studio](/providers/qwen_modelstudio) for endpoint-level detail
|
||||
and compatibility notes.
|
||||
|
||||
@@ -53,25 +53,28 @@ OpenClaw has three layers that work together:
|
||||
|
||||
These tools ship with OpenClaw and are available without installing any plugins:
|
||||
|
||||
| Tool | What it does | Page |
|
||||
| ------------------------------------------ | --------------------------------------------------------------------- | --------------------------------------- |
|
||||
| `exec` / `process` | Run shell commands, manage background processes | [Exec](/tools/exec) |
|
||||
| `code_execution` | Run sandboxed remote Python analysis | [Code Execution](/tools/code-execution) |
|
||||
| `browser` | Control a Chromium browser (navigate, click, screenshot) | [Browser](/tools/browser) |
|
||||
| `web_search` / `x_search` / `web_fetch` | Search the web, search X posts, fetch page content | [Web](/tools/web) |
|
||||
| `read` / `write` / `edit` | File I/O in the workspace | |
|
||||
| `apply_patch` | Multi-hunk file patches | [Apply Patch](/tools/apply-patch) |
|
||||
| `message` | Send messages across all channels | [Agent Send](/tools/agent-send) |
|
||||
| `canvas` | Drive node Canvas (present, eval, snapshot) | |
|
||||
| `nodes` | Discover and target paired devices | |
|
||||
| `cron` / `gateway` | Manage scheduled jobs; inspect, patch, restart, or update the gateway | |
|
||||
| `image` / `image_generate` | Analyze or generate images | |
|
||||
| `tts` | One-shot text-to-speech conversion | [TTS](/tools/tts) |
|
||||
| `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration | [Sub-agents](/tools/subagents) |
|
||||
| `session_status` | Lightweight `/status`-style readback and session model override | [Session Tools](/concepts/session-tool) |
|
||||
| Tool | What it does | Page |
|
||||
| ------------------------------------------ | --------------------------------------------------------------------- | ------------------------------------------- |
|
||||
| `exec` / `process` | Run shell commands, manage background processes | [Exec](/tools/exec) |
|
||||
| `code_execution` | Run sandboxed remote Python analysis | [Code Execution](/tools/code-execution) |
|
||||
| `browser` | Control a Chromium browser (navigate, click, screenshot) | [Browser](/tools/browser) |
|
||||
| `web_search` / `x_search` / `web_fetch` | Search the web, search X posts, fetch page content | [Web](/tools/web) |
|
||||
| `read` / `write` / `edit` | File I/O in the workspace | |
|
||||
| `apply_patch` | Multi-hunk file patches | [Apply Patch](/tools/apply-patch) |
|
||||
| `message` | Send messages across all channels | [Agent Send](/tools/agent-send) |
|
||||
| `canvas` | Drive node Canvas (present, eval, snapshot) | |
|
||||
| `nodes` | Discover and target paired devices | |
|
||||
| `cron` / `gateway` | Manage scheduled jobs; inspect, patch, restart, or update the gateway | |
|
||||
| `image` / `image_generate` | Analyze or generate images | [Image Generation](/tools/image-generation) |
|
||||
| `video_generate` | Generate videos | [Video Generation](/tools/video-generation) |
|
||||
| `tts` | One-shot text-to-speech conversion | [TTS](/tools/tts) |
|
||||
| `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration | [Sub-agents](/tools/subagents) |
|
||||
| `session_status` | Lightweight `/status`-style readback and session model override | [Session Tools](/concepts/session-tool) |
|
||||
|
||||
For image work, use `image` for analysis and `image_generate` for generation or editing. If you target `openai/*`, `google/*`, `fal/*`, or another non-default image provider, configure that provider's auth/API key first.
|
||||
|
||||
For video work, use `video_generate`. If you target `qwen/*` or another non-default video provider, configure that provider's auth/API key first.
|
||||
|
||||
`session_status` is the lightweight status/readback tool in the sessions group.
|
||||
It answers `/status`-style questions about the current session and can
|
||||
optionally set a per-session model override; `model=default` clears that
|
||||
@@ -121,12 +124,12 @@ config. Deny always wins over allow.
|
||||
`tools.profile` sets a base allowlist before `allow`/`deny` is applied.
|
||||
Per-agent override: `agents.list[].tools.profile`.
|
||||
|
||||
| Profile | What it includes |
|
||||
| ----------- | ------------------------------------------------------------------------------------------------------------- |
|
||||
| `full` | No restriction (same as unset) |
|
||||
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate` |
|
||||
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
|
||||
| `minimal` | `session_status` only |
|
||||
| Profile | What it includes |
|
||||
| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `full` | No restriction (same as unset) |
|
||||
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
|
||||
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
|
||||
| `minimal` | `session_status` only |
|
||||
|
||||
### Tool groups
|
||||
|
||||
@@ -144,7 +147,7 @@ Use `group:*` shorthands in allow/deny lists:
|
||||
| `group:messaging` | message |
|
||||
| `group:nodes` | nodes |
|
||||
| `group:agents` | agents_list |
|
||||
| `group:media` | image, image_generate, tts |
|
||||
| `group:media` | image, image_generate, video_generate, tts |
|
||||
| `group:openclaw` | All built-in OpenClaw tools (excludes plugin tools) |
|
||||
|
||||
`sessions_history` returns a bounded, safety-filtered recall view. It strips
|
||||
|
||||
109
docs/tools/video-generation.md
Normal file
109
docs/tools/video-generation.md
Normal file
@@ -0,0 +1,109 @@
|
||||
---
|
||||
summary: "Generate videos using configured providers such as Qwen"
|
||||
read_when:
|
||||
- Generating videos via the agent
|
||||
- Configuring video generation providers and models
|
||||
- Understanding the video_generate tool parameters
|
||||
title: "Video Generation"
|
||||
---
|
||||
|
||||
# Video Generation
|
||||
|
||||
The `video_generate` tool lets the agent create videos using your configured providers. Generated videos are delivered automatically as media attachments in the agent's reply.
|
||||
|
||||
<Note>
|
||||
The tool only appears when at least one video-generation provider is available. If you don't see `video_generate` in your agent's tools, configure `agents.defaults.videoGenerationModel` or set up a provider API key.
|
||||
</Note>
|
||||
|
||||
## Quick start
|
||||
|
||||
1. Set an API key for at least one provider (for example `QWEN_API_KEY`).
|
||||
2. Optionally set your preferred model:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
videoGenerationModel: "qwen/wan2.6-t2v",
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
3. Ask the agent: _"Generate a 5-second cinematic video of a friendly lobster surfing at sunset."_
|
||||
|
||||
The agent calls `video_generate` automatically. No tool allow-listing needed — it's enabled by default when a provider is available.
|
||||
|
||||
## Supported providers
|
||||
|
||||
| Provider | Default model | Reference inputs | API key |
|
||||
| -------- | ------------- | ---------------- | ---------------------------------------------------------- |
|
||||
| Qwen | `wan2.6-t2v` | Yes, remote URLs | `QWEN_API_KEY`, `MODELSTUDIO_API_KEY`, `DASHSCOPE_API_KEY` |
|
||||
|
||||
Use `action: "list"` to inspect available providers and models at runtime:
|
||||
|
||||
```
|
||||
/tool video_generate action=list
|
||||
```
|
||||
|
||||
## Tool parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| ----------------- | -------- | ------------------------------------------------------------------------------------- |
|
||||
| `prompt` | string | Video generation prompt (required for `action: "generate"`) |
|
||||
| `action` | string | `"generate"` (default) or `"list"` to inspect providers |
|
||||
| `model` | string | Provider/model override, e.g. `qwen/wan2.6-t2v` |
|
||||
| `image` | string | Single reference image path or URL |
|
||||
| `images` | string[] | Multiple reference images (up to 5) |
|
||||
| `video` | string | Single reference video path or URL |
|
||||
| `videos` | string[] | Multiple reference videos (up to 4) |
|
||||
| `size` | string | Size hint when the provider supports it |
|
||||
| `aspectRatio` | string | Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9` |
|
||||
| `resolution` | string | Resolution hint: `480P`, `720P`, or `1080P` |
|
||||
| `durationSeconds` | number | Target duration in seconds |
|
||||
| `audio` | boolean | Enable generated audio when the provider supports it |
|
||||
| `watermark` | boolean | Toggle provider watermarking when supported |
|
||||
| `filename` | string | Output filename hint |
|
||||
|
||||
Not all providers support all parameters. The tool validates provider capability limits before it submits the request.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Model selection
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
videoGenerationModel: {
|
||||
primary: "qwen/wan2.6-t2v",
|
||||
fallbacks: ["qwen/wan2.6-r2v-flash"],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Provider selection order
|
||||
|
||||
When generating a video, OpenClaw tries providers in this order:
|
||||
|
||||
1. **`model` parameter** from the tool call (if the agent specifies one)
|
||||
2. **`videoGenerationModel.primary`** from config
|
||||
3. **`videoGenerationModel.fallbacks`** in order
|
||||
4. **Auto-detection** — uses auth-backed provider defaults only:
|
||||
- current default provider first
|
||||
- remaining registered video-generation providers in provider-id order
|
||||
|
||||
If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.
|
||||
|
||||
## Qwen reference inputs
|
||||
|
||||
The bundled Qwen provider supports text-to-video plus image/video reference modes, but the upstream DashScope video endpoint currently requires **remote http(s) URLs** for reference inputs. Local file paths and uploaded buffers are rejected up front instead of being silently ignored.
|
||||
|
||||
## Related
|
||||
|
||||
- [Tools Overview](/tools) — all available agent tools
|
||||
- [Qwen](/providers/qwen) — Qwen-specific setup and limits
|
||||
- [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `videoGenerationModel` config
|
||||
- [Models](/concepts/models) — model configuration and failover
|
||||
Reference in New Issue
Block a user