Files
openclaw/docs/tools/video-generation.md
xieyongliang 2c57ec7b5f video_generate: add providerOptions, inputAudios, and imageRoles (#61987)
* video_generate: add providerOptions, inputAudios, and imageRoles

- VideoGenerationSourceAsset gains an optional `role` field (e.g.
  "first_frame", "last_frame"); core treats it as opaque and forwards it
  to the provider unchanged.

- VideoGenerationRequest gains `inputAudios` (reference audio assets,
  e.g. background music) and `providerOptions` (arbitrary
  provider-specific key/value pairs forwarded as-is).

- VideoGenerationProviderCapabilities gains `maxInputAudios`.

- video_generate tool schema adds:
  - `imageRoles` array (parallel to `images`, sets role per asset)
  - `audioRef` / `audioRefs` (single/multi reference audio inputs)
  - `providerOptions` (JSON object passed through to the provider)
  - `MAX_INPUT_IMAGES` bumped 5 → 9; `MAX_INPUT_AUDIOS` = 3

- Capability validation extended to gate on `maxInputAudios`.

- runtime.ts threads `inputAudios` and `providerOptions` through to
  `provider.generateVideo`.

- Docs and runtime tests updated.

Made-with: Cursor

* docs: fix BytePlus Seedance capability table — split 1.5 and 2.0 rows

1.5 Pro supports at most 2 input images (first_frame + last_frame);
2.0 supports up to 9 reference images, 3 videos, and 3 audios.
Provider notes section updated accordingly.

Made-with: Cursor

* docs: list all Seedance 1.0 models in video-generation provider table

- Default model updated to seedance-1-0-pro-250528 (was the T2V lite)
- Provider notes now enumerate all five 1.0 model IDs with T2V/I2V capability notes

Made-with: Cursor

* video_generate: address review feedback (P1/P2)

P1: Add "adaptive" to SUPPORTED_ASPECT_RATIOS so provider-specific ratio
passthrough (used by Seedance 1.5/2.0) is accepted instead of throwing.
Update error message to include "adaptive" in the allowed list.

P1: Fix audio input capability default — when a provider does not declare
maxInputAudios, default to 0 (no audio support) instead of MAX_INPUT_AUDIOS.
Providers must explicitly opt in via maxInputAudios to accept audio inputs.

P2: Remove unnecessary type cast in imageRoles assignment; VideoGenerationSourceAsset
already declares role?: string so a non-null assertion suffices.

P2: Add videoRoles and audioRoles tool parameters, parallel to imageRoles,
so callers can assign semantic role hints to reference video and audio assets
(e.g. "reference_video", "reference_audio" for Seedance 2.0).

Made-with: Cursor

* video_generate: fix check-docs formatting and snake_case param reading

Made-with: Cursor

* video_generate: clarify *Roles are parallel to combined input list (P2)

Made-with: Cursor

* video_generate: add missing duration import; fix corrupted docs section

Made-with: Cursor

* video_generate: pass mode inputs to duration resolver; note plugin requirement (P2)

Made-with: Cursor

* plugin-sdk: sync new video-gen fields — role, inputAudios, providerOptions, maxInputAudios

Add fields introduced by core in the PR1 batch to the public plugin-sdk
mirror so TypeScript provider plugins can declare and consume them
without type assertions:
- VideoGenerationSourceAsset.role?: string
- VideoGenerationRequest.inputAudios and .providerOptions
- VideoGenerationModeCapabilities.maxInputAudios

The AssertAssignable bidirectional checks still pass because all new
fields are optional; this change makes the SDK surface complete.

Made-with: Cursor

* video-gen runtime: skip failover candidates lacking audio capability

Made-with: Cursor

* video-gen: fall back to flat capabilities.maxInputAudios in failover and tool validation

Made-with: Cursor

* video-gen: defer audio-count check to runtime, enabling fallback for audio-capable candidates

Made-with: Cursor

* video-gen: defer maxDurationSeconds check to runtime, enabling fallback for higher-cap candidates

Made-with: Cursor

* video-gen: add VideoGenerationAssetRole union and typed providerOptions capability

Introduces a canonical VideoGenerationAssetRole union (first_frame,
last_frame, reference_image, reference_video, reference_audio) for the
source-asset role hint, and a VideoGenerationProviderOptionType tag
('number' | 'boolean' | 'string') plus a new capabilities.providerOptions
schema that providers use to declare which opaque providerOptions keys
they accept and with what primitive type.

Types are additive and backwards compatible. The role field accepts both
canonical union values and arbitrary provider-specific strings via a
`VideoGenerationAssetRole | (string & {})` union, so autocomplete works
for the common case without blocking provider-specific extensions.

Runtime enforcement of providerOptions (skip-in-fallback, unknown key
and type mismatch) lands in a follow-up commit.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: enforce typed providerOptions schema via skip-in-fallback

Adds `validateProviderOptionsAgainstDeclaration` in the video-generation
runtime and wires it into the `generateVideo` candidate loop alongside
the existing audio-count and duration-cap skip guards.

Behavior:
  - Candidates with no declared `capabilities.providerOptions` skip any
    non-empty providerOptions payload with a clear skip reason, so a
    provider that would ignore `{seed: 42}` and succeed without the
    caller's intent never gets reached.
  - Candidates that declare a schema reject unknown keys with the list
    of accepted keys in the error.
  - Candidates that declare a schema reject type mismatches (expected
    number/boolean/string) with the declared type in the error.
  - All skip reasons push into `attempts` so the aggregated failure
    message at the end of the fallback chain explains exactly why each
    candidate was rejected.

Also hardens the tool boundary: `providerOptions` that is not a plain
JSON object (including bogus arrays like `["seed", 42]`) now throws a
`ToolInputError` up front instead of being cast to `Record` and
forwarded with numeric-string keys.

Consistent with the audio/duration skip-in-fallback pattern introduced
by yongliang.xie in earlier commits on this branch.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: harden *Roles parity + document canonical role values

Replaces the inline `parseRolesArg` lambda with a dedicated
`parseRoleArray` helper that throws a ToolInputError when the caller
supplies more roles than assets. Off-by-one alignment mistakes in
`imageRoles` / `videoRoles` / `audioRoles` now fail loudly at the tool
boundary instead of silently dropping trailing roles.

Also tightens the schema descriptions to document the canonical
VideoGenerationAssetRole values (first_frame, last_frame, reference_*)
and the skip-in-fallback contract on providerOptions, and rejects
non-array inputs to any `*Roles` field early rather than coercing them
to an empty list.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: surface dropped aspectRatio sentinels in ignoredOverrides

"adaptive" and other provider-specific sentinel aspect ratios are
unparseable as numeric ratios, so when the active provider does not
declare the sentinel in caps.aspectRatios, `resolveClosestAspectRatio`
returns undefined and the previous code silently nulled out
`aspectRatio` without surfacing a warning.

Push the dropped value into `ignoredOverrides` so the tool result
warning path ("Ignored unsupported overrides for …") picks it up, and
the caller gets visible feedback that the request was dropped instead
of a silent no-op. Also corrects the tool-side comment on
SUPPORTED_ASPECT_RATIOS to describe actual behavior.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: surface declared providerOptions + maxInputAudios in action=list

`video_generate action=list` now includes the declared providerOptions
schema (key:type) per provider, so agents can discover which opaque
keys each provider accepts without trial and error. Both mode-level and
flat-provider providerOptions declarations are merged, matching the
runtime lookup order in `generateVideo`.

Also surfaces `maxInputAudios` alongside the other max-input counts for
completeness — previously the list output did not expose the audio cap
at all, even though the tool validates against it.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: warn once per request when runtime skips a fallback candidate

The skip-in-fallback guards (audio cap, duration cap, providerOptions)
all logged at debug level, which meant operators had no visible signal
when the primary provider was silently passed over in favor of a
fallback. Add a first-skip log.warn in the runtime loop so the reason
for the first rejection is surfaced once per request, and leave the
rest of the skip events at debug to avoid flooding on long chains.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: cover new tool-level behavior with regression tests

Adds regression tests for:
  - providerOptions shape rejection (arrays, strings)
  - providerOptions happy-path forwarding to runtime
  - imageRoles length-parity guard
  - *Roles non-array rejection
  - positional role attachment to loaded reference images
  - audio data: URL templated rejection branch
  - aspectRatio='adaptive' acceptance and forwarding
  - unsupported aspectRatio rejection (mentions 'adaptive' in the error)

All eight new cases run in the existing video-generate-tool suite and
use the same provider-mock pattern already established in the file.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: cover runtime providerOptions skip-in-fallback branches

Adds runtime regression tests for the new typed-providerOptions guard:
  - candidates without a declared providerOptions schema are skipped
    when any providerOptions is supplied (prevents silent drop)
  - candidates that declare a schema skip on unknown keys with the
    accepted-key list surfaced in the error
  - candidates that declare a schema skip on type mismatches with the
    declared type surfaced in the error
  - end-to-end fallback: openai (no providerOptions) is skipped and
    byteplus (declared schema) accepts the same request, with an
    attempt entry recording the first skip reason

Also updates the existing 'forwards providerOptions to the provider
unchanged' case so the destination provider declares the matching
typed schema, and wires a `warn` stub into the hoisted logger mock
so the new first-skip log.warn call path does not blow up.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* changelog: note video_generate providerOptions / inputAudios / role hints

Adds an Unreleased Changes entry describing the user-visible surface
expansion for video_generate: typed providerOptions capability,
inputAudios reference audio, per-asset role hints via the canonical
VideoGenerationAssetRole union, the 'adaptive' aspect-ratio sentinel,
maxInputAudios capability, and the relaxed 9-image cap.

Credits the original PR author.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* byteplus: declare providerOptions schema (seed, draft, camerafixed) and forward to API

Made-with: Cursor

* byteplus: fix camera_fixed body field (API uses underscore, not camerafixed)

Made-with: Cursor

* fix(byteplus): normalize resolution to lowercase before API call

The Seedance API rejects resolution values with uppercase letters —
"480P", "720P" etc return InvalidParameter, while "480p", "720p"
are accepted. This was breaking the video generation live test
(resolveLiveVideoResolution returns "480P").

Normalize req.resolution to lowercase at the provider layer before
setting body.resolution, so any caller-supplied casing is corrected
without requiring changes to the VideoGenerationResolution type or
live-test helpers.

Verified via direct API call:
  body.resolution = "480P" → HTTP 400 InvalidParameter
  body.resolution = "480p" → task created successfully
  body.resolution = "720p" → task created successfully (t2v, i2v, 1.5-pro)
  body.resolution = "1080p" → task created successfully

Made-with: Cursor

* video-gen/byteplus: auto-select i2v model when input images provided with t2v model

Seedance 1.0 uses separate model IDs for T2V (seedance-1-0-lite-t2v-250428)
and I2V (seedance-1-0-lite-i2v-250428). When the caller requests a T2V model
but also provides inputImages, the API rejects with task_type i2v not supported
on t2v model.

Fix: when inputImages are present and the requested model contains "-t2v-",
auto-substitute "-i2v-" so the API receives the correct model. Seedance 1.5 Pro
uses a single model ID for both modes and is unaffected by this substitution.

Verified via live test: both mode=generate and mode=imageToVideo pass for
byteplus/seedance-1-0-lite-t2v-250428 with no failures.

Co-authored-by: odysseus0 <odysseus0@example.com>
Made-with: Cursor

* video-gen: fix duration rounding + align BytePlus (1.0) docs (P2)

Made-with: Cursor

* video-gen: relax providerOptions gate for undeclared-schema providers (P1)

Distinguish undefined (not declared = backward-compat pass-through) from
{} (explicitly declared empty = no options accepted) in
validateProviderOptionsAgainstDeclaration. Providers without a declared
schema receive providerOptions as-is; providers with an explicit empty
schema still skip. Typed schemas continue to validate key names and types.

Also: restore camera_fixed (underscore) in BytePlus provider schema and
body key (regression from earlier rebase), remove duplicate local
readBooleanToolParam definition now imported from media-tool-shared,
update tests and docs accordingly.

Made-with: Cursor

* video_generate: add landing follow-up coverage

* video_generate: finalize plugin-sdk baseline (#61987) (thanks @xieyongliang)

---------

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
Co-authored-by: George Zhang <georgezhangtj97@gmail.com>
Co-authored-by: odysseus0 <odysseus0@example.com>
2026-04-11 02:23:14 -07:00

372 lines
33 KiB
Markdown

---
summary: "Generate videos from text, images, or existing videos using 14 provider backends"
read_when:
- Generating videos via the agent
- Configuring video generation providers and models
- Understanding the video_generate tool parameters
title: "Video Generation"
---
# Video Generation
OpenClaw agents can generate videos from text prompts, reference images, or existing videos. Fourteen provider backends are supported, each with different model options, input modes, and feature sets. The agent picks the right provider automatically based on your configuration and available API keys.
<Note>
The `video_generate` tool only appears when at least one video-generation provider is available. If you do not see it in your agent tools, set a provider API key or configure `agents.defaults.videoGenerationModel`.
</Note>
OpenClaw treats video generation as three runtime modes:
- `generate` for text-to-video requests with no reference media
- `imageToVideo` when the request includes one or more reference images
- `videoToVideo` when the request includes one or more reference videos
Providers can support any subset of those modes. The tool validates the active
mode before submission and reports supported modes in `action=list`.
## Quick start
1. Set an API key for any supported provider:
```bash
export GEMINI_API_KEY="your-key"
```
2. Optionally pin a default model:
```bash
openclaw config set agents.defaults.videoGenerationModel.primary "google/veo-3.1-fast-generate-preview"
```
3. Ask the agent:
> Generate a 5-second cinematic video of a friendly lobster surfing at sunset.
The agent calls `video_generate` automatically. No tool allowlisting is needed.
## What happens when you generate a video
Video generation is asynchronous. When the agent calls `video_generate` in a session:
1. OpenClaw submits the request to the provider and immediately returns a task ID.
2. The provider processes the job in the background (typically 30 seconds to 5 minutes depending on the provider and resolution).
3. When the video is ready, OpenClaw wakes the same session with an internal completion event.
4. The agent posts the finished video back into the original conversation.
While a job is in flight, duplicate `video_generate` calls in the same session return the current task status instead of starting another generation. Use `openclaw tasks list` or `openclaw tasks show <taskId>` to check progress from the CLI.
Outside of session-backed agent runs (for example, direct tool invocations), the tool falls back to inline generation and returns the final media path in the same turn.
### Task lifecycle
Each `video_generate` request moves through four states:
1. **queued** -- task created, waiting for the provider to accept it.
2. **running** -- provider is processing (typically 30 seconds to 5 minutes depending on provider and resolution).
3. **succeeded** -- video ready; the agent wakes and posts it to the conversation.
4. **failed** -- provider error or timeout; the agent wakes with error details.
Check status from the CLI:
```bash
openclaw tasks list
openclaw tasks show <taskId>
openclaw tasks cancel <taskId>
```
Duplicate prevention: if a video task is already `queued` or `running` for the current session, `video_generate` returns the existing task status instead of starting a new one. Use `action: "status"` to check explicitly without triggering a new generation.
## Supported providers
| Provider | Default model | Text | Image ref | Video ref | API key |
| --------------------- | ------------------------------- | ---- | ---------------------------------------------------- | ---------------- | ---------------------------------------- |
| Alibaba | `wan2.6-t2v` | Yes | Yes (remote URL) | Yes (remote URL) | `MODELSTUDIO_API_KEY` |
| BytePlus (1.0) | `seedance-1-0-pro-250528` | Yes | Up to 2 images (I2V models only; first + last frame) | No | `BYTEPLUS_API_KEY` |
| BytePlus Seedance 1.5 | `seedance-1-5-pro-251215` | Yes | Up to 2 images (first + last frame via role) | No | `BYTEPLUS_API_KEY` |
| BytePlus Seedance 2.0 | `dreamina-seedance-2-0-260128` | Yes | Up to 9 reference images | Up to 3 videos | `BYTEPLUS_API_KEY` |
| ComfyUI | `workflow` | Yes | 1 image | No | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` |
| fal | `fal-ai/minimax/video-01-live` | Yes | 1 image | No | `FAL_KEY` |
| Google | `veo-3.1-fast-generate-preview` | Yes | 1 image | 1 video | `GEMINI_API_KEY` |
| MiniMax | `MiniMax-Hailuo-2.3` | Yes | 1 image | No | `MINIMAX_API_KEY` |
| OpenAI | `sora-2` | Yes | 1 image | 1 video | `OPENAI_API_KEY` |
| Qwen | `wan2.6-t2v` | Yes | Yes (remote URL) | Yes (remote URL) | `QWEN_API_KEY` |
| Runway | `gen4.5` | Yes | 1 image | 1 video | `RUNWAYML_API_SECRET` |
| Together | `Wan-AI/Wan2.2-T2V-A14B` | Yes | 1 image | No | `TOGETHER_API_KEY` |
| Vydra | `veo3` | Yes | 1 image (`kling`) | No | `VYDRA_API_KEY` |
| xAI | `grok-imagine-video` | Yes | 1 image | 1 video | `XAI_API_KEY` |
Some providers accept additional or alternate API key env vars. See individual [provider pages](#related) for details.
Run `video_generate action=list` to inspect available providers, models, and
runtime modes at runtime.
### Declared capability matrix
This is the explicit mode contract used by `video_generate`, contract tests,
and the shared live sweep.
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
| -------- | ---------- | -------------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Alibaba | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| BytePlus | Yes | Yes | No | `generate`, `imageToVideo` |
| ComfyUI | Yes | Yes | No | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
| fal | Yes | Yes | No | `generate`, `imageToVideo` |
| Google | Yes | Yes | Yes | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
| MiniMax | Yes | Yes | No | `generate`, `imageToVideo` |
| OpenAI | Yes | Yes | Yes | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
| Qwen | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| Runway | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
| Together | Yes | Yes | No | `generate`, `imageToVideo` |
| Vydra | Yes | Yes | No | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
| xAI | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
## Tool parameters
### Required
| Parameter | Type | Description |
| --------- | ------ | ----------------------------------------------------------------------------- |
| `prompt` | string | Text description of the video to generate (required for `action: "generate"`) |
### Content inputs
| Parameter | Type | Description |
| ------------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `image` | string | Single reference image (path or URL) |
| `images` | string[] | Multiple reference images (up to 9) |
| `imageRoles` | string[] | Optional per-position role hints parallel to the combined image list. Canonical values: `first_frame`, `last_frame`, `reference_image` |
| `video` | string | Single reference video (path or URL) |
| `videos` | string[] | Multiple reference videos (up to 4) |
| `videoRoles` | string[] | Optional per-position role hints parallel to the combined video list. Canonical value: `reference_video` |
| `audioRef` | string | Single reference audio (path or URL). Used for e.g. background music or voice reference when the provider supports audio inputs |
| `audioRefs` | string[] | Multiple reference audios (up to 3) |
| `audioRoles` | string[] | Optional per-position role hints parallel to the combined audio list. Canonical value: `reference_audio` |
Role hints are forwarded to the provider as-is. Canonical values come from
the `VideoGenerationAssetRole` union but providers may accept additional
role strings. `*Roles` arrays must not have more entries than the
corresponding reference list; off-by-one mistakes fail with a clear error.
Use an empty string to leave a slot unset.
### Style controls
| Parameter | Type | Description |
| ----------------- | ------- | --------------------------------------------------------------------------------------- |
| `aspectRatio` | string | `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`, or `adaptive` |
| `resolution` | string | `480P`, `720P`, `768P`, or `1080P` |
| `durationSeconds` | number | Target duration in seconds (rounded to nearest provider-supported value) |
| `size` | string | Size hint when the provider supports it |
| `audio` | boolean | Enable generated audio in the output when supported. Distinct from `audioRef*` (inputs) |
| `watermark` | boolean | Toggle provider watermarking when supported |
`adaptive` is a provider-specific sentinel: it is forwarded as-is to
providers that declare `adaptive` in their capabilities (e.g. BytePlus
Seedance uses it to auto-detect the ratio from the input image
dimensions). Providers that do not declare it surface the value via
`details.ignoredOverrides` in the tool result so the drop is visible.
### Advanced
| Parameter | Type | Description |
| ----------------- | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `action` | string | `"generate"` (default), `"status"`, or `"list"` |
| `model` | string | Provider/model override (e.g. `runway/gen4.5`) |
| `filename` | string | Output filename hint |
| `providerOptions` | object | Provider-specific options as a JSON object (e.g. `{"seed": 42, "draft": true}`). Providers that declare a typed schema validate the keys and types; unknown keys or mismatches skip the candidate during fallback. Providers without a declared schema receive the options as-is. Run `video_generate action=list` to see what each provider accepts |
Not all providers support all parameters. OpenClaw already normalizes duration to the closest provider-supported value, and it also remaps translated geometry hints such as size-to-aspect-ratio when a fallback provider exposes a different control surface. Truly unsupported overrides are ignored on a best-effort basis and reported as warnings in the tool result. Hard capability limits (such as too many reference inputs) fail before submission.
Tool results report the applied settings. When OpenClaw remaps duration or geometry during provider fallback, the returned `durationSeconds`, `size`, `aspectRatio`, and `resolution` values reflect what was submitted, and `details.normalization` captures the requested-to-applied translation.
Reference inputs also select the runtime mode:
- No reference media: `generate`
- Any image reference: `imageToVideo`
- Any video reference: `videoToVideo`
- Reference audio inputs do not change the resolved mode; they apply on top of whatever mode the image/video references select, and only work with providers that declare `maxInputAudios`
Mixed image and video references are not a stable shared capability surface.
Prefer one reference type per request.
#### Fallback and typed options
Some capability checks are applied at the fallback layer rather than the
tool boundary so that a request that exceeds the primary provider's limits
can still run on a capable fallback:
- If the active candidate declares no `maxInputAudios` (or declares it as
`0`), it is skipped when the request contains audio references, and the
next candidate is tried.
- If the active candidate's `maxDurationSeconds` is below the requested
`durationSeconds` and the candidate does not declare a
`supportedDurationSeconds` list, it is skipped.
- If the request contains `providerOptions` and the active candidate
explicitly declares a typed `providerOptions` schema, the candidate is
skipped when the supplied keys are not in the schema or the value types do
not match. Providers that have not yet declared a schema receive the
options as-is (backward-compatible pass-through). A provider can
explicitly opt out of all provider options by declaring an empty schema
(`capabilities.providerOptions: {}`), which causes the same skip as a
type mismatch.
The first skip reason in a request is logged at `warn` so operators see
when their primary provider was passed over; subsequent skips log at
`debug` to keep long fallback chains quiet. If every candidate is skipped,
the aggregated error includes the skip reason for each.
## Actions
- **generate** (default) -- create a video from the given prompt and optional reference inputs.
- **status** -- check the state of the in-flight video task for the current session without starting another generation.
- **list** -- show available providers, models, and their capabilities.
## Model selection
When generating a video, OpenClaw resolves the model in this order:
1. **`model` tool parameter** -- if the agent specifies one in the call.
2. **`videoGenerationModel.primary`** -- from config.
3. **`videoGenerationModel.fallbacks`** -- tried in order.
4. **Auto-detection** -- uses providers that have valid auth, starting with the current default provider, then remaining providers in alphabetical order.
If a provider fails, the next candidate is tried automatically. If all candidates fail, the error includes details from each attempt.
Set `agents.defaults.mediaGenerationAutoProviderFallback: false` if you want
video generation to use only the explicit `model`, `primary`, and `fallbacks`
entries.
```json5
{
agents: {
defaults: {
videoGenerationModel: {
primary: "google/veo-3.1-fast-generate-preview",
fallbacks: ["runway/gen4.5", "qwen/wan2.6-t2v"],
},
},
},
}
```
## Provider notes
| Provider | Notes |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Alibaba | Uses DashScope/Model Studio async endpoint. Reference images and videos must be remote `http(s)` URLs. |
| BytePlus (1.0) | Provider id `byteplus`. Models: `seedance-1-0-pro-250528` (default), `seedance-1-0-pro-t2v-250528`, `seedance-1-0-pro-fast-251015`, `seedance-1-0-lite-t2v-250428`, `seedance-1-0-lite-i2v-250428`. T2V models (`*-t2v-*`) do not accept image inputs; I2V models and general `*-pro-*` models support a single reference image (first frame). Pass the image positionally or set `role: "first_frame"`. T2V model IDs are automatically switched to the corresponding I2V variant when an image is provided. Supported `providerOptions` keys: `seed` (number), `draft` (boolean, forces 480p), `camera_fixed` (boolean). |
| BytePlus Seedance 1.5 | Requires the [`@openclaw/byteplus-modelark`](https://www.npmjs.com/package/@openclaw/byteplus-modelark) plugin. Provider id `byteplus-seedance15`. Model: `seedance-1-5-pro-251215`. Uses the unified `content[]` API. Supports at most 2 input images (first_frame + last_frame). All inputs must be remote `https://` URLs. Set `role: "first_frame"` / `"last_frame"` on each image, or pass images positionally. `aspectRatio: "adaptive"` auto-detects ratio from the input image. `audio: true` maps to `generate_audio`. `providerOptions.seed` (number) is forwarded. |
| BytePlus Seedance 2.0 | Requires the [`@openclaw/byteplus-modelark`](https://www.npmjs.com/package/@openclaw/byteplus-modelark) plugin. Provider id `byteplus-seedance2`. Models: `dreamina-seedance-2-0-260128`, `dreamina-seedance-2-0-fast-260128`. Uses the unified `content[]` API. Supports up to 9 reference images, 3 reference videos, and 3 reference audios. All inputs must be remote `https://` URLs. Set `role` on each asset — supported values: `"first_frame"`, `"last_frame"`, `"reference_image"`, `"reference_video"`, `"reference_audio"`. `aspectRatio: "adaptive"` auto-detects ratio from the input image. `audio: true` maps to `generate_audio`. `providerOptions.seed` (number) is forwarded. |
| ComfyUI | Workflow-driven local or cloud execution. Supports text-to-video and image-to-video through the configured graph. |
| fal | Uses queue-backed flow for long-running jobs. Single image reference only. |
| Google | Uses Gemini/Veo. Supports one image or one video reference. |
| MiniMax | Single image reference only. |
| OpenAI | Only `size` override is forwarded. Other style overrides (`aspectRatio`, `resolution`, `audio`, `watermark`) are ignored with a warning. |
| Qwen | Same DashScope backend as Alibaba. Reference inputs must be remote `http(s)` URLs; local files are rejected upfront. |
| Runway | Supports local files via data URIs. Video-to-video requires `runway/gen4_aleph`. Text-only runs expose `16:9` and `9:16` aspect ratios. |
| Together | Single image reference only. |
| Vydra | Uses `https://www.vydra.ai/api/v1` directly to avoid auth-dropping redirects. `veo3` is bundled as text-to-video only; `kling` requires a remote image URL. |
| xAI | Supports text-to-video, image-to-video, and remote video edit/extend flows. |
## Provider capability modes
The shared video-generation contract now lets providers declare mode-specific
capabilities instead of only flat aggregate limits. New provider
implementations should prefer explicit mode blocks:
```typescript
capabilities: {
generate: {
maxVideos: 1,
maxDurationSeconds: 10,
supportsResolution: true,
},
imageToVideo: {
enabled: true,
maxVideos: 1,
maxInputImages: 1,
maxDurationSeconds: 5,
},
videoToVideo: {
enabled: true,
maxVideos: 1,
maxInputVideos: 1,
maxDurationSeconds: 5,
},
}
```
Flat aggregate fields such as `maxInputImages` and `maxInputVideos` are not
enough to advertise transform-mode support. Providers should declare
`generate`, `imageToVideo`, and `videoToVideo` explicitly so live tests,
contract tests, and the shared `video_generate` tool can validate mode support
deterministically.
## Live tests
Opt-in live coverage for the shared bundled providers:
```bash
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts
```
Repo wrapper:
```bash
pnpm test:live:media video
```
This live file loads missing provider env vars from `~/.profile`, prefers
live/env API keys ahead of stored auth profiles by default, and runs the
declared modes it can exercise safely with local media:
- `generate` for every provider in the sweep
- `imageToVideo` when `capabilities.imageToVideo.enabled`
- `videoToVideo` when `capabilities.videoToVideo.enabled` and the provider/model
accepts buffer-backed local video input in the shared sweep
Today the shared `videoToVideo` live lane covers:
- `runway` only when you select `runway/gen4_aleph`
## Configuration
Set the default video generation model in your OpenClaw config:
```json5
{
agents: {
defaults: {
videoGenerationModel: {
primary: "qwen/wan2.6-t2v",
fallbacks: ["qwen/wan2.6-r2v-flash"],
},
},
},
}
```
Or via the CLI:
```bash
openclaw config set agents.defaults.videoGenerationModel.primary "qwen/wan2.6-t2v"
```
## Related
- [Tools Overview](/tools)
- [Background Tasks](/automation/tasks) -- task tracking for async video generation
- [Alibaba Model Studio](/providers/alibaba)
- [BytePlus](/concepts/model-providers#byteplus-international)
- [ComfyUI](/providers/comfy)
- [fal](/providers/fal)
- [Google (Gemini)](/providers/google)
- [MiniMax](/providers/minimax)
- [OpenAI](/providers/openai)
- [Qwen](/providers/qwen)
- [Runway](/providers/runway)
- [Together AI](/providers/together)
- [Vydra](/providers/vydra)
- [xAI](/providers/xai)
- [Configuration Reference](/gateway/configuration-reference#agent-defaults)
- [Models](/concepts/models)