* video_generate: add providerOptions, inputAudios, and imageRoles
- VideoGenerationSourceAsset gains an optional `role` field (e.g.
"first_frame", "last_frame"); core treats it as opaque and forwards it
to the provider unchanged.
- VideoGenerationRequest gains `inputAudios` (reference audio assets,
e.g. background music) and `providerOptions` (arbitrary
provider-specific key/value pairs forwarded as-is).
- VideoGenerationProviderCapabilities gains `maxInputAudios`.
- video_generate tool schema adds:
- `imageRoles` array (parallel to `images`, sets role per asset)
- `audioRef` / `audioRefs` (single/multi reference audio inputs)
- `providerOptions` (JSON object passed through to the provider)
- `MAX_INPUT_IMAGES` bumped 5 → 9; `MAX_INPUT_AUDIOS` = 3
- Capability validation extended to gate on `maxInputAudios`.
- runtime.ts threads `inputAudios` and `providerOptions` through to
`provider.generateVideo`.
- Docs and runtime tests updated.
Made-with: Cursor
* docs: fix BytePlus Seedance capability table — split 1.5 and 2.0 rows
1.5 Pro supports at most 2 input images (first_frame + last_frame);
2.0 supports up to 9 reference images, 3 videos, and 3 audios.
Provider notes section updated accordingly.
Made-with: Cursor
* docs: list all Seedance 1.0 models in video-generation provider table
- Default model updated to seedance-1-0-pro-250528 (was the T2V lite)
- Provider notes now enumerate all five 1.0 model IDs with T2V/I2V capability notes
Made-with: Cursor
* video_generate: address review feedback (P1/P2)
P1: Add "adaptive" to SUPPORTED_ASPECT_RATIOS so provider-specific ratio
passthrough (used by Seedance 1.5/2.0) is accepted instead of throwing.
Update error message to include "adaptive" in the allowed list.
P1: Fix audio input capability default — when a provider does not declare
maxInputAudios, default to 0 (no audio support) instead of MAX_INPUT_AUDIOS.
Providers must explicitly opt in via maxInputAudios to accept audio inputs.
P2: Remove unnecessary type cast in imageRoles assignment; VideoGenerationSourceAsset
already declares role?: string so a non-null assertion suffices.
P2: Add videoRoles and audioRoles tool parameters, parallel to imageRoles,
so callers can assign semantic role hints to reference video and audio assets
(e.g. "reference_video", "reference_audio" for Seedance 2.0).
Made-with: Cursor
* video_generate: fix check-docs formatting and snake_case param reading
Made-with: Cursor
* video_generate: clarify *Roles are parallel to combined input list (P2)
Made-with: Cursor
* video_generate: add missing duration import; fix corrupted docs section
Made-with: Cursor
* video_generate: pass mode inputs to duration resolver; note plugin requirement (P2)
Made-with: Cursor
* plugin-sdk: sync new video-gen fields — role, inputAudios, providerOptions, maxInputAudios
Add fields introduced by core in the PR1 batch to the public plugin-sdk
mirror so TypeScript provider plugins can declare and consume them
without type assertions:
- VideoGenerationSourceAsset.role?: string
- VideoGenerationRequest.inputAudios and .providerOptions
- VideoGenerationModeCapabilities.maxInputAudios
The AssertAssignable bidirectional checks still pass because all new
fields are optional; this change makes the SDK surface complete.
Made-with: Cursor
* video-gen runtime: skip failover candidates lacking audio capability
Made-with: Cursor
* video-gen: fall back to flat capabilities.maxInputAudios in failover and tool validation
Made-with: Cursor
* video-gen: defer audio-count check to runtime, enabling fallback for audio-capable candidates
Made-with: Cursor
* video-gen: defer maxDurationSeconds check to runtime, enabling fallback for higher-cap candidates
Made-with: Cursor
* video-gen: add VideoGenerationAssetRole union and typed providerOptions capability
Introduces a canonical VideoGenerationAssetRole union (first_frame,
last_frame, reference_image, reference_video, reference_audio) for the
source-asset role hint, and a VideoGenerationProviderOptionType tag
('number' | 'boolean' | 'string') plus a new capabilities.providerOptions
schema that providers use to declare which opaque providerOptions keys
they accept and with what primitive type.
Types are additive and backwards compatible. The role field accepts both
canonical union values and arbitrary provider-specific strings via a
`VideoGenerationAssetRole | (string & {})` union, so autocomplete works
for the common case without blocking provider-specific extensions.
Runtime enforcement of providerOptions (skip-in-fallback, unknown key
and type mismatch) lands in a follow-up commit.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* video-gen: enforce typed providerOptions schema via skip-in-fallback
Adds `validateProviderOptionsAgainstDeclaration` in the video-generation
runtime and wires it into the `generateVideo` candidate loop alongside
the existing audio-count and duration-cap skip guards.
Behavior:
- Candidates with no declared `capabilities.providerOptions` skip any
non-empty providerOptions payload with a clear skip reason, so a
provider that would ignore `{seed: 42}` and succeed without the
caller's intent never gets reached.
- Candidates that declare a schema reject unknown keys with the list
of accepted keys in the error.
- Candidates that declare a schema reject type mismatches (expected
number/boolean/string) with the declared type in the error.
- All skip reasons push into `attempts` so the aggregated failure
message at the end of the fallback chain explains exactly why each
candidate was rejected.
Also hardens the tool boundary: `providerOptions` that is not a plain
JSON object (including bogus arrays like `["seed", 42]`) now throws a
`ToolInputError` up front instead of being cast to `Record` and
forwarded with numeric-string keys.
Consistent with the audio/duration skip-in-fallback pattern introduced
by yongliang.xie in earlier commits on this branch.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* video-gen: harden *Roles parity + document canonical role values
Replaces the inline `parseRolesArg` lambda with a dedicated
`parseRoleArray` helper that throws a ToolInputError when the caller
supplies more roles than assets. Off-by-one alignment mistakes in
`imageRoles` / `videoRoles` / `audioRoles` now fail loudly at the tool
boundary instead of silently dropping trailing roles.
Also tightens the schema descriptions to document the canonical
VideoGenerationAssetRole values (first_frame, last_frame, reference_*)
and the skip-in-fallback contract on providerOptions, and rejects
non-array inputs to any `*Roles` field early rather than coercing them
to an empty list.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* video-gen: surface dropped aspectRatio sentinels in ignoredOverrides
"adaptive" and other provider-specific sentinel aspect ratios are
unparseable as numeric ratios, so when the active provider does not
declare the sentinel in caps.aspectRatios, `resolveClosestAspectRatio`
returns undefined and the previous code silently nulled out
`aspectRatio` without surfacing a warning.
Push the dropped value into `ignoredOverrides` so the tool result
warning path ("Ignored unsupported overrides for …") picks it up, and
the caller gets visible feedback that the request was dropped instead
of a silent no-op. Also corrects the tool-side comment on
SUPPORTED_ASPECT_RATIOS to describe actual behavior.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* video-gen: surface declared providerOptions + maxInputAudios in action=list
`video_generate action=list` now includes the declared providerOptions
schema (key:type) per provider, so agents can discover which opaque
keys each provider accepts without trial and error. Both mode-level and
flat-provider providerOptions declarations are merged, matching the
runtime lookup order in `generateVideo`.
Also surfaces `maxInputAudios` alongside the other max-input counts for
completeness — previously the list output did not expose the audio cap
at all, even though the tool validates against it.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* video-gen: warn once per request when runtime skips a fallback candidate
The skip-in-fallback guards (audio cap, duration cap, providerOptions)
all logged at debug level, which meant operators had no visible signal
when the primary provider was silently passed over in favor of a
fallback. Add a first-skip log.warn in the runtime loop so the reason
for the first rejection is surfaced once per request, and leave the
rest of the skip events at debug to avoid flooding on long chains.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* video-gen: cover new tool-level behavior with regression tests
Adds regression tests for:
- providerOptions shape rejection (arrays, strings)
- providerOptions happy-path forwarding to runtime
- imageRoles length-parity guard
- *Roles non-array rejection
- positional role attachment to loaded reference images
- audio data: URL templated rejection branch
- aspectRatio='adaptive' acceptance and forwarding
- unsupported aspectRatio rejection (mentions 'adaptive' in the error)
All eight new cases run in the existing video-generate-tool suite and
use the same provider-mock pattern already established in the file.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* video-gen: cover runtime providerOptions skip-in-fallback branches
Adds runtime regression tests for the new typed-providerOptions guard:
- candidates without a declared providerOptions schema are skipped
when any providerOptions is supplied (prevents silent drop)
- candidates that declare a schema skip on unknown keys with the
accepted-key list surfaced in the error
- candidates that declare a schema skip on type mismatches with the
declared type surfaced in the error
- end-to-end fallback: openai (no providerOptions) is skipped and
byteplus (declared schema) accepts the same request, with an
attempt entry recording the first skip reason
Also updates the existing 'forwards providerOptions to the provider
unchanged' case so the destination provider declares the matching
typed schema, and wires a `warn` stub into the hoisted logger mock
so the new first-skip log.warn call path does not blow up.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* changelog: note video_generate providerOptions / inputAudios / role hints
Adds an Unreleased Changes entry describing the user-visible surface
expansion for video_generate: typed providerOptions capability,
inputAudios reference audio, per-asset role hints via the canonical
VideoGenerationAssetRole union, the 'adaptive' aspect-ratio sentinel,
maxInputAudios capability, and the relaxed 9-image cap.
Credits the original PR author.
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
* byteplus: declare providerOptions schema (seed, draft, camerafixed) and forward to API
Made-with: Cursor
* byteplus: fix camera_fixed body field (API uses underscore, not camerafixed)
Made-with: Cursor
* fix(byteplus): normalize resolution to lowercase before API call
The Seedance API rejects resolution values with uppercase letters —
"480P", "720P" etc return InvalidParameter, while "480p", "720p"
are accepted. This was breaking the video generation live test
(resolveLiveVideoResolution returns "480P").
Normalize req.resolution to lowercase at the provider layer before
setting body.resolution, so any caller-supplied casing is corrected
without requiring changes to the VideoGenerationResolution type or
live-test helpers.
Verified via direct API call:
body.resolution = "480P" → HTTP 400 InvalidParameter
body.resolution = "480p" → task created successfully
body.resolution = "720p" → task created successfully (t2v, i2v, 1.5-pro)
body.resolution = "1080p" → task created successfully
Made-with: Cursor
* video-gen/byteplus: auto-select i2v model when input images provided with t2v model
Seedance 1.0 uses separate model IDs for T2V (seedance-1-0-lite-t2v-250428)
and I2V (seedance-1-0-lite-i2v-250428). When the caller requests a T2V model
but also provides inputImages, the API rejects with task_type i2v not supported
on t2v model.
Fix: when inputImages are present and the requested model contains "-t2v-",
auto-substitute "-i2v-" so the API receives the correct model. Seedance 1.5 Pro
uses a single model ID for both modes and is unaffected by this substitution.
Verified via live test: both mode=generate and mode=imageToVideo pass for
byteplus/seedance-1-0-lite-t2v-250428 with no failures.
Co-authored-by: odysseus0 <odysseus0@example.com>
Made-with: Cursor
* video-gen: fix duration rounding + align BytePlus (1.0) docs (P2)
Made-with: Cursor
* video-gen: relax providerOptions gate for undeclared-schema providers (P1)
Distinguish undefined (not declared = backward-compat pass-through) from
{} (explicitly declared empty = no options accepted) in
validateProviderOptionsAgainstDeclaration. Providers without a declared
schema receive providerOptions as-is; providers with an explicit empty
schema still skip. Typed schemas continue to validate key names and types.
Also: restore camera_fixed (underscore) in BytePlus provider schema and
body key (regression from earlier rebase), remove duplicate local
readBooleanToolParam definition now imported from media-tool-shared,
update tests and docs accordingly.
Made-with: Cursor
* video_generate: add landing follow-up coverage
* video_generate: finalize plugin-sdk baseline (#61987) (thanks @xieyongliang)
---------
Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
Co-authored-by: George Zhang <georgezhangtj97@gmail.com>
Co-authored-by: odysseus0 <odysseus0@example.com>
33 KiB
summary, read_when, title
| summary | read_when | title | |||
|---|---|---|---|---|---|
| Generate videos from text, images, or existing videos using 14 provider backends |
|
Video Generation |
Video Generation
OpenClaw agents can generate videos from text prompts, reference images, or existing videos. Fourteen provider backends are supported, each with different model options, input modes, and feature sets. The agent picks the right provider automatically based on your configuration and available API keys.
The `video_generate` tool only appears when at least one video-generation provider is available. If you do not see it in your agent tools, set a provider API key or configure `agents.defaults.videoGenerationModel`.OpenClaw treats video generation as three runtime modes:
generatefor text-to-video requests with no reference mediaimageToVideowhen the request includes one or more reference imagesvideoToVideowhen the request includes one or more reference videos
Providers can support any subset of those modes. The tool validates the active
mode before submission and reports supported modes in action=list.
Quick start
- Set an API key for any supported provider:
export GEMINI_API_KEY="your-key"
- Optionally pin a default model:
openclaw config set agents.defaults.videoGenerationModel.primary "google/veo-3.1-fast-generate-preview"
- Ask the agent:
Generate a 5-second cinematic video of a friendly lobster surfing at sunset.
The agent calls video_generate automatically. No tool allowlisting is needed.
What happens when you generate a video
Video generation is asynchronous. When the agent calls video_generate in a session:
- OpenClaw submits the request to the provider and immediately returns a task ID.
- The provider processes the job in the background (typically 30 seconds to 5 minutes depending on the provider and resolution).
- When the video is ready, OpenClaw wakes the same session with an internal completion event.
- The agent posts the finished video back into the original conversation.
While a job is in flight, duplicate video_generate calls in the same session return the current task status instead of starting another generation. Use openclaw tasks list or openclaw tasks show <taskId> to check progress from the CLI.
Outside of session-backed agent runs (for example, direct tool invocations), the tool falls back to inline generation and returns the final media path in the same turn.
Task lifecycle
Each video_generate request moves through four states:
- queued -- task created, waiting for the provider to accept it.
- running -- provider is processing (typically 30 seconds to 5 minutes depending on provider and resolution).
- succeeded -- video ready; the agent wakes and posts it to the conversation.
- failed -- provider error or timeout; the agent wakes with error details.
Check status from the CLI:
openclaw tasks list
openclaw tasks show <taskId>
openclaw tasks cancel <taskId>
Duplicate prevention: if a video task is already queued or running for the current session, video_generate returns the existing task status instead of starting a new one. Use action: "status" to check explicitly without triggering a new generation.
Supported providers
| Provider | Default model | Text | Image ref | Video ref | API key |
|---|---|---|---|---|---|
| Alibaba | wan2.6-t2v |
Yes | Yes (remote URL) | Yes (remote URL) | MODELSTUDIO_API_KEY |
| BytePlus (1.0) | seedance-1-0-pro-250528 |
Yes | Up to 2 images (I2V models only; first + last frame) | No | BYTEPLUS_API_KEY |
| BytePlus Seedance 1.5 | seedance-1-5-pro-251215 |
Yes | Up to 2 images (first + last frame via role) | No | BYTEPLUS_API_KEY |
| BytePlus Seedance 2.0 | dreamina-seedance-2-0-260128 |
Yes | Up to 9 reference images | Up to 3 videos | BYTEPLUS_API_KEY |
| ComfyUI | workflow |
Yes | 1 image | No | COMFY_API_KEY or COMFY_CLOUD_API_KEY |
| fal | fal-ai/minimax/video-01-live |
Yes | 1 image | No | FAL_KEY |
veo-3.1-fast-generate-preview |
Yes | 1 image | 1 video | GEMINI_API_KEY |
|
| MiniMax | MiniMax-Hailuo-2.3 |
Yes | 1 image | No | MINIMAX_API_KEY |
| OpenAI | sora-2 |
Yes | 1 image | 1 video | OPENAI_API_KEY |
| Qwen | wan2.6-t2v |
Yes | Yes (remote URL) | Yes (remote URL) | QWEN_API_KEY |
| Runway | gen4.5 |
Yes | 1 image | 1 video | RUNWAYML_API_SECRET |
| Together | Wan-AI/Wan2.2-T2V-A14B |
Yes | 1 image | No | TOGETHER_API_KEY |
| Vydra | veo3 |
Yes | 1 image (kling) |
No | VYDRA_API_KEY |
| xAI | grok-imagine-video |
Yes | 1 image | 1 video | XAI_API_KEY |
Some providers accept additional or alternate API key env vars. See individual provider pages for details.
Run video_generate action=list to inspect available providers, models, and
runtime modes at runtime.
Declared capability matrix
This is the explicit mode contract used by video_generate, contract tests,
and the shared live sweep.
| Provider | generate |
imageToVideo |
videoToVideo |
Shared live lanes today |
|---|---|---|---|---|
| Alibaba | Yes | Yes | Yes | generate, imageToVideo; videoToVideo skipped because this provider needs remote http(s) video URLs |
| BytePlus | Yes | Yes | No | generate, imageToVideo |
| ComfyUI | Yes | Yes | No | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
| fal | Yes | Yes | No | generate, imageToVideo |
| Yes | Yes | Yes | generate, imageToVideo; shared videoToVideo skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
|
| MiniMax | Yes | Yes | No | generate, imageToVideo |
| OpenAI | Yes | Yes | Yes | generate, imageToVideo; shared videoToVideo skipped because this org/input path currently needs provider-side inpaint/remix access |
| Qwen | Yes | Yes | Yes | generate, imageToVideo; videoToVideo skipped because this provider needs remote http(s) video URLs |
| Runway | Yes | Yes | Yes | generate, imageToVideo; videoToVideo runs only when the selected model is runway/gen4_aleph |
| Together | Yes | Yes | No | generate, imageToVideo |
| Vydra | Yes | Yes | No | generate; shared imageToVideo skipped because bundled veo3 is text-only and bundled kling requires a remote image URL |
| xAI | Yes | Yes | Yes | generate, imageToVideo; videoToVideo skipped because this provider currently needs a remote MP4 URL |
Tool parameters
Required
| Parameter | Type | Description |
|---|---|---|
prompt |
string | Text description of the video to generate (required for action: "generate") |
Content inputs
| Parameter | Type | Description |
|---|---|---|
image |
string | Single reference image (path or URL) |
images |
string[] | Multiple reference images (up to 9) |
imageRoles |
string[] | Optional per-position role hints parallel to the combined image list. Canonical values: first_frame, last_frame, reference_image |
video |
string | Single reference video (path or URL) |
videos |
string[] | Multiple reference videos (up to 4) |
videoRoles |
string[] | Optional per-position role hints parallel to the combined video list. Canonical value: reference_video |
audioRef |
string | Single reference audio (path or URL). Used for e.g. background music or voice reference when the provider supports audio inputs |
audioRefs |
string[] | Multiple reference audios (up to 3) |
audioRoles |
string[] | Optional per-position role hints parallel to the combined audio list. Canonical value: reference_audio |
Role hints are forwarded to the provider as-is. Canonical values come from
the VideoGenerationAssetRole union but providers may accept additional
role strings. *Roles arrays must not have more entries than the
corresponding reference list; off-by-one mistakes fail with a clear error.
Use an empty string to leave a slot unset.
Style controls
| Parameter | Type | Description |
|---|---|---|
aspectRatio |
string | 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9, or adaptive |
resolution |
string | 480P, 720P, 768P, or 1080P |
durationSeconds |
number | Target duration in seconds (rounded to nearest provider-supported value) |
size |
string | Size hint when the provider supports it |
audio |
boolean | Enable generated audio in the output when supported. Distinct from audioRef* (inputs) |
watermark |
boolean | Toggle provider watermarking when supported |
adaptive is a provider-specific sentinel: it is forwarded as-is to
providers that declare adaptive in their capabilities (e.g. BytePlus
Seedance uses it to auto-detect the ratio from the input image
dimensions). Providers that do not declare it surface the value via
details.ignoredOverrides in the tool result so the drop is visible.
Advanced
| Parameter | Type | Description |
|---|---|---|
action |
string | "generate" (default), "status", or "list" |
model |
string | Provider/model override (e.g. runway/gen4.5) |
filename |
string | Output filename hint |
providerOptions |
object | Provider-specific options as a JSON object (e.g. {"seed": 42, "draft": true}). Providers that declare a typed schema validate the keys and types; unknown keys or mismatches skip the candidate during fallback. Providers without a declared schema receive the options as-is. Run video_generate action=list to see what each provider accepts |
Not all providers support all parameters. OpenClaw already normalizes duration to the closest provider-supported value, and it also remaps translated geometry hints such as size-to-aspect-ratio when a fallback provider exposes a different control surface. Truly unsupported overrides are ignored on a best-effort basis and reported as warnings in the tool result. Hard capability limits (such as too many reference inputs) fail before submission.
Tool results report the applied settings. When OpenClaw remaps duration or geometry during provider fallback, the returned durationSeconds, size, aspectRatio, and resolution values reflect what was submitted, and details.normalization captures the requested-to-applied translation.
Reference inputs also select the runtime mode:
- No reference media:
generate - Any image reference:
imageToVideo - Any video reference:
videoToVideo - Reference audio inputs do not change the resolved mode; they apply on top of whatever mode the image/video references select, and only work with providers that declare
maxInputAudios
Mixed image and video references are not a stable shared capability surface. Prefer one reference type per request.
Fallback and typed options
Some capability checks are applied at the fallback layer rather than the tool boundary so that a request that exceeds the primary provider's limits can still run on a capable fallback:
- If the active candidate declares no
maxInputAudios(or declares it as0), it is skipped when the request contains audio references, and the next candidate is tried. - If the active candidate's
maxDurationSecondsis below the requesteddurationSecondsand the candidate does not declare asupportedDurationSecondslist, it is skipped. - If the request contains
providerOptionsand the active candidate explicitly declares a typedproviderOptionsschema, the candidate is skipped when the supplied keys are not in the schema or the value types do not match. Providers that have not yet declared a schema receive the options as-is (backward-compatible pass-through). A provider can explicitly opt out of all provider options by declaring an empty schema (capabilities.providerOptions: {}), which causes the same skip as a type mismatch.
The first skip reason in a request is logged at warn so operators see
when their primary provider was passed over; subsequent skips log at
debug to keep long fallback chains quiet. If every candidate is skipped,
the aggregated error includes the skip reason for each.
Actions
- generate (default) -- create a video from the given prompt and optional reference inputs.
- status -- check the state of the in-flight video task for the current session without starting another generation.
- list -- show available providers, models, and their capabilities.
Model selection
When generating a video, OpenClaw resolves the model in this order:
modeltool parameter -- if the agent specifies one in the call.videoGenerationModel.primary-- from config.videoGenerationModel.fallbacks-- tried in order.- Auto-detection -- uses providers that have valid auth, starting with the current default provider, then remaining providers in alphabetical order.
If a provider fails, the next candidate is tried automatically. If all candidates fail, the error includes details from each attempt.
Set agents.defaults.mediaGenerationAutoProviderFallback: false if you want
video generation to use only the explicit model, primary, and fallbacks
entries.
{
agents: {
defaults: {
videoGenerationModel: {
primary: "google/veo-3.1-fast-generate-preview",
fallbacks: ["runway/gen4.5", "qwen/wan2.6-t2v"],
},
},
},
}
Provider notes
| Provider | Notes |
|---|---|
| Alibaba | Uses DashScope/Model Studio async endpoint. Reference images and videos must be remote http(s) URLs. |
| BytePlus (1.0) | Provider id byteplus. Models: seedance-1-0-pro-250528 (default), seedance-1-0-pro-t2v-250528, seedance-1-0-pro-fast-251015, seedance-1-0-lite-t2v-250428, seedance-1-0-lite-i2v-250428. T2V models (*-t2v-*) do not accept image inputs; I2V models and general *-pro-* models support a single reference image (first frame). Pass the image positionally or set role: "first_frame". T2V model IDs are automatically switched to the corresponding I2V variant when an image is provided. Supported providerOptions keys: seed (number), draft (boolean, forces 480p), camera_fixed (boolean). |
| BytePlus Seedance 1.5 | Requires the @openclaw/byteplus-modelark plugin. Provider id byteplus-seedance15. Model: seedance-1-5-pro-251215. Uses the unified content[] API. Supports at most 2 input images (first_frame + last_frame). All inputs must be remote https:// URLs. Set role: "first_frame" / "last_frame" on each image, or pass images positionally. aspectRatio: "adaptive" auto-detects ratio from the input image. audio: true maps to generate_audio. providerOptions.seed (number) is forwarded. |
| BytePlus Seedance 2.0 | Requires the @openclaw/byteplus-modelark plugin. Provider id byteplus-seedance2. Models: dreamina-seedance-2-0-260128, dreamina-seedance-2-0-fast-260128. Uses the unified content[] API. Supports up to 9 reference images, 3 reference videos, and 3 reference audios. All inputs must be remote https:// URLs. Set role on each asset — supported values: "first_frame", "last_frame", "reference_image", "reference_video", "reference_audio". aspectRatio: "adaptive" auto-detects ratio from the input image. audio: true maps to generate_audio. providerOptions.seed (number) is forwarded. |
| ComfyUI | Workflow-driven local or cloud execution. Supports text-to-video and image-to-video through the configured graph. |
| fal | Uses queue-backed flow for long-running jobs. Single image reference only. |
| Uses Gemini/Veo. Supports one image or one video reference. | |
| MiniMax | Single image reference only. |
| OpenAI | Only size override is forwarded. Other style overrides (aspectRatio, resolution, audio, watermark) are ignored with a warning. |
| Qwen | Same DashScope backend as Alibaba. Reference inputs must be remote http(s) URLs; local files are rejected upfront. |
| Runway | Supports local files via data URIs. Video-to-video requires runway/gen4_aleph. Text-only runs expose 16:9 and 9:16 aspect ratios. |
| Together | Single image reference only. |
| Vydra | Uses https://www.vydra.ai/api/v1 directly to avoid auth-dropping redirects. veo3 is bundled as text-to-video only; kling requires a remote image URL. |
| xAI | Supports text-to-video, image-to-video, and remote video edit/extend flows. |
Provider capability modes
The shared video-generation contract now lets providers declare mode-specific capabilities instead of only flat aggregate limits. New provider implementations should prefer explicit mode blocks:
capabilities: {
generate: {
maxVideos: 1,
maxDurationSeconds: 10,
supportsResolution: true,
},
imageToVideo: {
enabled: true,
maxVideos: 1,
maxInputImages: 1,
maxDurationSeconds: 5,
},
videoToVideo: {
enabled: true,
maxVideos: 1,
maxInputVideos: 1,
maxDurationSeconds: 5,
},
}
Flat aggregate fields such as maxInputImages and maxInputVideos are not
enough to advertise transform-mode support. Providers should declare
generate, imageToVideo, and videoToVideo explicitly so live tests,
contract tests, and the shared video_generate tool can validate mode support
deterministically.
Live tests
Opt-in live coverage for the shared bundled providers:
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts
Repo wrapper:
pnpm test:live:media video
This live file loads missing provider env vars from ~/.profile, prefers
live/env API keys ahead of stored auth profiles by default, and runs the
declared modes it can exercise safely with local media:
generatefor every provider in the sweepimageToVideowhencapabilities.imageToVideo.enabledvideoToVideowhencapabilities.videoToVideo.enabledand the provider/model accepts buffer-backed local video input in the shared sweep
Today the shared videoToVideo live lane covers:
runwayonly when you selectrunway/gen4_aleph
Configuration
Set the default video generation model in your OpenClaw config:
{
agents: {
defaults: {
videoGenerationModel: {
primary: "qwen/wan2.6-t2v",
fallbacks: ["qwen/wan2.6-r2v-flash"],
},
},
},
}
Or via the CLI:
openclaw config set agents.defaults.videoGenerationModel.primary "qwen/wan2.6-t2v"
Related
- Tools Overview
- Background Tasks -- task tracking for async video generation
- Alibaba Model Studio
- BytePlus
- ComfyUI
- fal
- Google (Gemini)
- MiniMax
- OpenAI
- Qwen
- Runway
- Together AI
- Vydra
- xAI
- Configuration Reference
- Models