docs(video-generation): rewrite around Steps, ParamField, AZ providers

The video-generation page was 454 lines with a 3-step Quick start written as flat numbered prose, four separate parameter tables (Required, Content inputs, Style controls, Advanced), the task lifecycle as a numbered list, and a Related list mixing alphabetic and recency order. Restructure for scan-first reading without losing technical content: - Wrap Quick start in a Steps component (auth -> default model -> ask the agent). - Convert all four parameter tables into ParamField definitions grouped under their existing sub-section headings (Required / Content inputs / Style controls / Advanced), so types, defaults, and required flags show as visual chips and long descriptions wrap cleanly. - Convert the task lifecycle from a numbered list to a 4-row table for at-a-glance scanning. - Convert Yes/No checkmarks in both the Supported providers and Capability matrix tables to ✓ and em-dash, matching the rest of the media docs. - Convert the bullet list under Actions into a 3-row table. - Sentence-case Related entries and alphabetize the Related list. - Add sidebarTitle so the nav reads 'Video generation' explicitly. Schema fields, defaults, model refs, env vars, capability declarations, fallback rules, and provider notes are unchanged. AccordionGroup of 14 provider notes was already alphabetized and is preserved verbatim.
2026-05-06 09:20:43 +00:00 · 2026-04-25 22:27:46 -07:00
parent 503a3aa125
commit 6e3eeb526f
1 changed files with 281 additions and 201 deletions
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@@ -1,74 +1,93 @@
 ---
-summary: "Generate videos from text, images, or existing videos using 14 provider backends"
+summary: "Generate videos via video_generate from text, image, or video references across 14 provider backends"
 read_when:
  - Generating videos via the agent
-  - Configuring video generation providers and models
+  - Configuring video-generation providers and models
  - Understanding the video_generate tool parameters
 title: "Video generation"
+sidebarTitle: "Video generation"
 ---

-OpenClaw agents can generate videos from text prompts, reference images, or existing videos. Fourteen provider backends are supported, each with different model options, input modes, and feature sets. The agent picks the right provider automatically based on your configuration and available API keys.
+OpenClaw agents can generate videos from text prompts, reference images, or
+existing videos. Fourteen provider backends are supported, each with
+different model options, input modes, and feature sets. The agent picks the
+right provider automatically based on your configuration and available API
+keys.

 <Note>
-The `video_generate` tool only appears when at least one video-generation provider is available. If you do not see it in your agent tools, set a provider API key or configure `agents.defaults.videoGenerationModel`.
+The `video_generate` tool only appears when at least one video-generation
+provider is available. If you do not see it in your agent tools, set a
+provider API key or configure `agents.defaults.videoGenerationModel`.
 </Note>

 OpenClaw treats video generation as three runtime modes:

- `generate` for text-to-video requests with no reference media
- `imageToVideo` when the request includes one or more reference images
- `videoToVideo` when the request includes one or more reference videos
+- `generate` — text-to-video requests with no reference media.
+- `imageToVideo` — request includes one or more reference images.
+- `videoToVideo` — request includes one or more reference videos.

-Providers can support any subset of those modes. The tool validates the active
-mode before submission and reports supported modes in `action=list`.
+Providers can support any subset of those modes. The tool validates the
+active mode before submission and reports supported modes in `action=list`.

 ## Quick start

-1. Set an API key for any supported provider:
+<Steps>
+  <Step title="Configure auth">
+    Set an API key for any supported provider:

-```bash
-export GEMINI_API_KEY="your-key"
-```
+    ```bash
+    export GEMINI_API_KEY="your-key"
+    ```

-2. Optionally pin a default model:
+  </Step>
+  <Step title="Pick a default model (optional)">
+    ```bash
+    openclaw config set agents.defaults.videoGenerationModel.primary "google/veo-3.1-fast-generate-preview"
+    ```
+  </Step>
+  <Step title="Ask the agent">
+    > Generate a 5-second cinematic video of a friendly lobster surfing at sunset.

-```bash
-openclaw config set agents.defaults.videoGenerationModel.primary "google/veo-3.1-fast-generate-preview"
-```
+    The agent calls `video_generate` automatically. No tool allowlisting
+    is needed.

-3. Ask the agent:
+  </Step>
+</Steps>

-> Generate a 5-second cinematic video of a friendly lobster surfing at sunset.
+## How async generation works

-The agent calls `video_generate` automatically. No tool allowlisting is needed.
+Video generation is asynchronous. When the agent calls `video_generate` in a
+session:

-## What happens when you generate a video
-
-Video generation is asynchronous. When the agent calls `video_generate` in a session:
-
-1. OpenClaw submits the request to the provider and immediately returns a task ID.
+1. OpenClaw submits the request to the provider and immediately returns a task id.
 2. The provider processes the job in the background (typically 30 seconds to 5 minutes depending on the provider and resolution).
 3. When the video is ready, OpenClaw wakes the same session with an internal completion event.
 4. The agent posts the finished video back into the original conversation.

-While a job is in flight, duplicate `video_generate` calls in the same session return the current task status instead of starting another generation. Use `openclaw tasks list` or `openclaw tasks show <taskId>` to check progress from the CLI.
+While a job is in flight, duplicate `video_generate` calls in the same
+session return the current task status instead of starting another
+generation. Use `openclaw tasks list` or `openclaw tasks show <taskId>` to
+check progress from the CLI.

-Outside of session-backed agent runs (for example, direct tool invocations), the tool falls back to inline generation and returns the final media path in the same turn.
+Outside of session-backed agent runs (for example, direct tool invocations),
+the tool falls back to inline generation and returns the final media path
+in the same turn.

-Generated video files are saved under OpenClaw-managed media storage when the
-provider returns bytes. The default generated-video save cap follows the video
-media limit, and `agents.defaults.mediaMaxMb` raises it for larger renders.
-When a provider also returns a hosted output URL, OpenClaw can deliver that URL
-instead of failing the task if local persistence rejects an oversized file.
+Generated video files are saved under OpenClaw-managed media storage when
+the provider returns bytes. The default generated-video save cap follows
+the video media limit, and `agents.defaults.mediaMaxMb` raises it for
+larger renders. When a provider also returns a hosted output URL, OpenClaw
+can deliver that URL instead of failing the task if local persistence
+rejects an oversized file.

 ### Task lifecycle

-Each `video_generate` request moves through four states:
-
-1. **queued** -- task created, waiting for the provider to accept it.
-2. **running** -- provider is processing (typically 30 seconds to 5 minutes depending on provider and resolution).
-3. **succeeded** -- video ready; the agent wakes and posts it to the conversation.
-4. **failed** -- provider error or timeout; the agent wakes with error details.
+| State       | Meaning                                                                                          |
+| ----------- | ------------------------------------------------------------------------------------------------ |
+| `queued`    | Task created, waiting for the provider to accept it.                                             |
+| `running`   | Provider is processing (typically 30 seconds to 5 minutes depending on provider and resolution). |
+| `succeeded` | Video ready; the agent wakes and posts it to the conversation.                                   |
+| `failed`    | Provider error or timeout; the agent wakes with error details.                                   |

 Check status from the CLI:

@@ -78,92 +97,112 @@ openclaw tasks show <taskId>
 openclaw tasks cancel <taskId>
 ```

-Duplicate prevention: if a video task is already `queued` or `running` for the current session, `video_generate` returns the existing task status instead of starting a new one. Use `action: "status"` to check explicitly without triggering a new generation.
+If a video task is already `queued` or `running` for the current session,
+`video_generate` returns the existing task status instead of starting a new
+one. Use `action: "status"` to check explicitly without triggering a new
+generation.

 ## Supported providers

-| Provider              | Default model                   | Text | Image ref                                            | Video ref                                       | API key                                  |
-| --------------------- | ------------------------------- | ---- | ---------------------------------------------------- | ----------------------------------------------- | ---------------------------------------- |
-| Alibaba               | `wan2.6-t2v`                    | Yes  | Yes (remote URL)                                     | Yes (remote URL)                                | `MODELSTUDIO_API_KEY`                    |
-| BytePlus (1.0)        | `seedance-1-0-pro-250528`       | Yes  | Up to 2 images (I2V models only; first + last frame) | No                                              | `BYTEPLUS_API_KEY`                       |
-| BytePlus Seedance 1.5 | `seedance-1-5-pro-251215`       | Yes  | Up to 2 images (first + last frame via role)         | No                                              | `BYTEPLUS_API_KEY`                       |
-| BytePlus Seedance 2.0 | `dreamina-seedance-2-0-260128`  | Yes  | Up to 9 reference images                             | Up to 3 videos                                  | `BYTEPLUS_API_KEY`                       |
-| ComfyUI               | `workflow`                      | Yes  | 1 image                                              | No                                              | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` |
-| fal                   | `fal-ai/minimax/video-01-live`  | Yes  | 1 image; up to 9 with Seedance reference-to-video    | Up to 3 videos with Seedance reference-to-video | `FAL_KEY`                                |
-| Google                | `veo-3.1-fast-generate-preview` | Yes  | 1 image                                              | 1 video                                         | `GEMINI_API_KEY`                         |
-| MiniMax               | `MiniMax-Hailuo-2.3`            | Yes  | 1 image                                              | No                                              | `MINIMAX_API_KEY` or MiniMax OAuth       |
-| OpenAI                | `sora-2`                        | Yes  | 1 image                                              | 1 video                                         | `OPENAI_API_KEY`                         |
-| Qwen                  | `wan2.6-t2v`                    | Yes  | Yes (remote URL)                                     | Yes (remote URL)                                | `QWEN_API_KEY`                           |
-| Runway                | `gen4.5`                        | Yes  | 1 image                                              | 1 video                                         | `RUNWAYML_API_SECRET`                    |
-| Together              | `Wan-AI/Wan2.2-T2V-A14B`        | Yes  | 1 image                                              | No                                              | `TOGETHER_API_KEY`                       |
-| Vydra                 | `veo3`                          | Yes  | 1 image (`kling`)                                    | No                                              | `VYDRA_API_KEY`                          |
-| xAI                   | `grok-imagine-video`            | Yes  | 1 first-frame image or up to 7 `reference_image`s    | 1 video                                         | `XAI_API_KEY`                            |
+| Provider              | Default model                   | Text | Image ref                                            | Video ref                                       | Auth                                     |
+| --------------------- | ------------------------------- | :--: | ---------------------------------------------------- | ----------------------------------------------- | ---------------------------------------- |
+| Alibaba               | `wan2.6-t2v`                    |  ✓   | Yes (remote URL)                                     | Yes (remote URL)                                | `MODELSTUDIO_API_KEY`                    |
+| BytePlus (1.0)        | `seedance-1-0-pro-250528`       |  ✓   | Up to 2 images (I2V models only; first + last frame) | —                                               | `BYTEPLUS_API_KEY`                       |
+| BytePlus Seedance 1.5 | `seedance-1-5-pro-251215`       |  ✓   | Up to 2 images (first + last frame via role)         | —                                               | `BYTEPLUS_API_KEY`                       |
+| BytePlus Seedance 2.0 | `dreamina-seedance-2-0-260128`  |  ✓   | Up to 9 reference images                             | Up to 3 videos                                  | `BYTEPLUS_API_KEY`                       |
+| ComfyUI               | `workflow`                      |  ✓   | 1 image                                              | —                                               | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` |
+| fal                   | `fal-ai/minimax/video-01-live`  |  ✓   | 1 image; up to 9 with Seedance reference-to-video    | Up to 3 videos with Seedance reference-to-video | `FAL_KEY`                                |
+| Google                | `veo-3.1-fast-generate-preview` |  ✓   | 1 image                                              | 1 video                                         | `GEMINI_API_KEY`                         |
+| MiniMax               | `MiniMax-Hailuo-2.3`            |  ✓   | 1 image                                              | —                                               | `MINIMAX_API_KEY` or MiniMax OAuth       |
+| OpenAI                | `sora-2`                        |  ✓   | 1 image                                              | 1 video                                         | `OPENAI_API_KEY`                         |
+| Qwen                  | `wan2.6-t2v`                    |  ✓   | Yes (remote URL)                                     | Yes (remote URL)                                | `QWEN_API_KEY`                           |
+| Runway                | `gen4.5`                        |  ✓   | 1 image                                              | 1 video                                         | `RUNWAYML_API_SECRET`                    |
+| Together              | `Wan-AI/Wan2.2-T2V-A14B`        |  ✓   | 1 image                                              | —                                               | `TOGETHER_API_KEY`                       |
+| Vydra                 | `veo3`                          |  ✓   | 1 image (`kling`)                                    | —                                               | `VYDRA_API_KEY`                          |
+| xAI                   | `grok-imagine-video`            |  ✓   | 1 first-frame image or up to 7 `reference_image`s    | 1 video                                         | `XAI_API_KEY`                            |

-Some providers accept additional or alternate API key env vars. See individual [provider pages](#related) for details.
+Some providers accept additional or alternate API key env vars. See
+individual [provider pages](#related) for details.

 Run `video_generate action=list` to inspect available providers, models, and
 runtime modes at runtime.

-### Declared capability matrix
+### Capability matrix

-This is the explicit mode contract used by `video_generate`, contract tests,
-and the shared live sweep.
+The explicit mode contract used by `video_generate`, contract tests, and
+the shared live sweep:

 | Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today                                                                                                                  |
-| -------- | ---------- | -------------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
-| Alibaba  | Yes        | Yes            | Yes            | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs                               |
-| BytePlus | Yes        | Yes            | No             | `generate`, `imageToVideo`                                                                                                               |
-| ComfyUI  | Yes        | Yes            | No             | Not in the shared sweep; workflow-specific coverage lives with Comfy tests                                                               |
-| fal      | Yes        | Yes            | Yes            | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video                                                   |
-| Google   | Yes        | Yes            | Yes            | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input  |
-| MiniMax  | Yes        | Yes            | No             | `generate`, `imageToVideo`                                                                                                               |
-| OpenAI   | Yes        | Yes            | Yes            | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
-| Qwen     | Yes        | Yes            | Yes            | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs                               |
-| Runway   | Yes        | Yes            | Yes            | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph`                                      |
-| Together | Yes        | Yes            | No             | `generate`, `imageToVideo`                                                                                                               |
-| Vydra    | Yes        | Yes            | No             | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL            |
-| xAI      | Yes        | Yes            | Yes            | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL                                |
+| -------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
+| Alibaba  |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs                               |
+| BytePlus |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
+| ComfyUI  |     ✓      |       ✓        |       —        | Not in the shared sweep; workflow-specific coverage lives with Comfy tests                                                               |
+| fal      |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video                                                   |
+| Google   |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input  |
+| MiniMax  |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
+| OpenAI   |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
+| Qwen     |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs                               |
+| Runway   |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph`                                      |
+| Together |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
+| Vydra    |     ✓      |       ✓        |       —        | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL            |
+| xAI      |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL                                |

 ## Tool parameters

 ### Required

-| Parameter | Type   | Description                                                                   |
-| --------- | ------ | ----------------------------------------------------------------------------- |
-| `prompt`  | string | Text description of the video to generate (required for `action: "generate"`) |
+<ParamField path="prompt" type="string" required>
+  Text description of the video to generate. Required for `action: "generate"`.
+</ParamField>

 ### Content inputs

-| Parameter    | Type     | Description                                                                                                                            |
-| ------------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------- |
-| `image`      | string   | Single reference image (path or URL)                                                                                                   |
-| `images`     | string[] | Multiple reference images (up to 9)                                                                                                    |
-| `imageRoles` | string[] | Optional per-position role hints parallel to the combined image list. Canonical values: `first_frame`, `last_frame`, `reference_image` |
-| `video`      | string   | Single reference video (path or URL)                                                                                                   |
-| `videos`     | string[] | Multiple reference videos (up to 4)                                                                                                    |
-| `videoRoles` | string[] | Optional per-position role hints parallel to the combined video list. Canonical value: `reference_video`                               |
-| `audioRef`   | string   | Single reference audio (path or URL). Used for e.g. background music or voice reference when the provider supports audio inputs        |
-| `audioRefs`  | string[] | Multiple reference audios (up to 3)                                                                                                    |
-| `audioRoles` | string[] | Optional per-position role hints parallel to the combined audio list. Canonical value: `reference_audio`                               |
+<ParamField path="image" type="string">Single reference image (path or URL).</ParamField>
+<ParamField path="images" type="string[]">Multiple reference images (up to 9).</ParamField>
+<ParamField path="imageRoles" type="string[]">
+Optional per-position role hints parallel to the combined image list.
+Canonical values: `first_frame`, `last_frame`, `reference_image`.
+</ParamField>
+<ParamField path="video" type="string">Single reference video (path or URL).</ParamField>
+<ParamField path="videos" type="string[]">Multiple reference videos (up to 4).</ParamField>
+<ParamField path="videoRoles" type="string[]">
+Optional per-position role hints parallel to the combined video list.
+Canonical value: `reference_video`.
+</ParamField>
+<ParamField path="audioRef" type="string">
+Single reference audio (path or URL). Used for background music or voice
+reference when the provider supports audio inputs.
+</ParamField>
+<ParamField path="audioRefs" type="string[]">Multiple reference audios (up to 3).</ParamField>
+<ParamField path="audioRoles" type="string[]">
+Optional per-position role hints parallel to the combined audio list.
+Canonical value: `reference_audio`.
+</ParamField>

+<Note>
 Role hints are forwarded to the provider as-is. Canonical values come from
 the `VideoGenerationAssetRole` union but providers may accept additional
 role strings. `*Roles` arrays must not have more entries than the
 corresponding reference list; off-by-one mistakes fail with a clear error.
 Use an empty string to leave a slot unset. For xAI, set every image role to
-`reference_image` to use its `reference_images` generation mode; omit the role
-or use `first_frame` for single-image image-to-video.
+`reference_image` to use its `reference_images` generation mode; omit the
+role or use `first_frame` for single-image image-to-video.
+</Note>

 ### Style controls

-| Parameter         | Type    | Description                                                                             |
-| ----------------- | ------- | --------------------------------------------------------------------------------------- |
-| `aspectRatio`     | string  | `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`, or `adaptive`  |
-| `resolution`      | string  | `480P`, `720P`, `768P`, or `1080P`                                                      |
-| `durationSeconds` | number  | Target duration in seconds (rounded to nearest provider-supported value)                |
-| `size`            | string  | Size hint when the provider supports it                                                 |
-| `audio`           | boolean | Enable generated audio in the output when supported. Distinct from `audioRef*` (inputs) |
-| `watermark`       | boolean | Toggle provider watermarking when supported                                             |
+<ParamField path="aspectRatio" type="string">
+  `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`, or `adaptive`.
+</ParamField>
+<ParamField path="resolution" type="string">`480P`, `720P`, `768P`, or `1080P`.</ParamField>
+<ParamField path="durationSeconds" type="number">
+  Target duration in seconds (rounded to nearest provider-supported value).
+</ParamField>
+<ParamField path="size" type="string">Size hint when the provider supports it.</ParamField>
+<ParamField path="audio" type="boolean">
+  Enable generated audio in the output when supported. Distinct from `audioRef*` (inputs).
+</ParamField>
+<ParamField path="watermark" type="boolean">Toggle provider watermarking when supported.</ParamField>

 `adaptive` is a provider-specific sentinel: it is forwarded as-is to
 providers that declare `adaptive` in their capabilities (e.g. BytePlus
@@ -173,24 +212,39 @@ dimensions). Providers that do not declare it surface the value via

 ### Advanced

-| Parameter         | Type   | Description                                                                                                                                                                                                                                                                                                                                          |
-| ----------------- | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `action`          | string | `"generate"` (default), `"status"`, or `"list"`                                                                                                                                                                                                                                                                                                      |
-| `model`           | string | Provider/model override (e.g. `runway/gen4.5`)                                                                                                                                                                                                                                                                                                       |
-| `filename`        | string | Output filename hint                                                                                                                                                                                                                                                                                                                                 |
-| `timeoutMs`       | number | Optional provider request timeout in milliseconds                                                                                                                                                                                                                                                                                                    |
-| `providerOptions` | object | Provider-specific options as a JSON object (e.g. `{"seed": 42, "draft": true}`). Providers that declare a typed schema validate the keys and types; unknown keys or mismatches skip the candidate during fallback. Providers without a declared schema receive the options as-is. Run `video_generate action=list` to see what each provider accepts |
+<ParamField path="action" type='"generate" | "status" | "list"' default="generate">
+  `"status"` returns the current session task; `"list"` inspects providers.
+</ParamField>
+<ParamField path="model" type="string">Provider/model override (e.g. `runway/gen4.5`).</ParamField>
+<ParamField path="filename" type="string">Output filename hint.</ParamField>
+<ParamField path="timeoutMs" type="number">Optional provider request timeout in milliseconds.</ParamField>
+<ParamField path="providerOptions" type="object">
+  Provider-specific options as a JSON object (e.g. `{"seed": 42, "draft": true}`).
+  Providers that declare a typed schema validate the keys and types; unknown
+  keys or mismatches skip the candidate during fallback. Providers without a
+  declared schema receive the options as-is. Run `video_generate action=list`
+  to see what each provider accepts.
+</ParamField>

-Not all providers support all parameters. OpenClaw already normalizes duration to the closest provider-supported value, and it also remaps translated geometry hints such as size-to-aspect-ratio when a fallback provider exposes a different control surface. Truly unsupported overrides are ignored on a best-effort basis and reported as warnings in the tool result. Hard capability limits (such as too many reference inputs) fail before submission.
+<Note>
+Not all providers support all parameters. OpenClaw normalizes duration to
+the closest provider-supported value, and remaps translated geometry hints
+such as size-to-aspect-ratio when a fallback provider exposes a different
+control surface. Truly unsupported overrides are ignored on a best-effort
+basis and reported as warnings in the tool result. Hard capability limits
+(such as too many reference inputs) fail before submission. Tool results
+report applied settings; `details.normalization` captures any
+requested-to-applied translation.
+</Note>

-Tool results report the applied settings. When OpenClaw remaps duration or geometry during provider fallback, the returned `durationSeconds`, `size`, `aspectRatio`, and `resolution` values reflect what was submitted, and `details.normalization` captures the requested-to-applied translation.
+Reference inputs select the runtime mode:

-Reference inputs also select the runtime mode:
-
- No reference media: `generate`
- Any image reference: `imageToVideo`
- Any video reference: `videoToVideo`
- Reference audio inputs do not change the resolved mode; they apply on top of whatever mode the image/video references select, and only work with providers that declare `maxInputAudios`
+- No reference media → `generate`
+- Any image reference → `imageToVideo`
+- Any video reference → `videoToVideo`
+- Reference audio inputs **do not** change the resolved mode; they apply on
+  top of whatever mode the image/video references select, and only work
+  with providers that declare `maxInputAudios`.

 Mixed image and video references are not a stable shared capability surface.
 Prefer one reference type per request.
@@ -198,49 +252,50 @@ Prefer one reference type per request.
 #### Fallback and typed options

 Some capability checks are applied at the fallback layer rather than the
-tool boundary so that a request that exceeds the primary provider's limits
-can still run on a capable fallback:
+tool boundary, so a request that exceeds the primary provider's limits can
+still run on a capable fallback:

- If the active candidate declares no `maxInputAudios` (or declares it as
-  `0`), it is skipped when the request contains audio references, and the
-  next candidate is tried.
- If the active candidate's `maxDurationSeconds` is below the requested
-  `durationSeconds` and the candidate does not declare a
-  `supportedDurationSeconds` list, it is skipped.
- If the request contains `providerOptions` and the active candidate
-  explicitly declares a typed `providerOptions` schema, the candidate is
-  skipped when the supplied keys are not in the schema or the value types do
-  not match. Providers that have not yet declared a schema receive the
-  options as-is (backward-compatible pass-through). A provider can
-  explicitly opt out of all provider options by declaring an empty schema
-  (`capabilities.providerOptions: {}`), which causes the same skip as a
-  type mismatch.
+- Active candidate declaring no `maxInputAudios` (or `0`) is skipped when
+  the request contains audio references; next candidate is tried.
+- Active candidate's `maxDurationSeconds` below the requested `durationSeconds`
+  with no declared `supportedDurationSeconds` list → skipped.
+- Request contains `providerOptions` and the active candidate explicitly
+  declares a typed `providerOptions` schema → skipped if supplied keys are
+  not in the schema or value types do not match. Providers without a
+  declared schema receive options as-is (backward-compatible
+  pass-through). A provider can opt out of all provider options by
+  declaring an empty schema (`capabilities.providerOptions: {}`), which
+  causes the same skip as a type mismatch.

-The first skip reason in a request is logged at `warn` so operators see
-when their primary provider was passed over; subsequent skips log at
-`debug` to keep long fallback chains quiet. If every candidate is skipped,
-the aggregated error includes the skip reason for each.
+The first skip reason in a request logs at `warn` so operators see when
+their primary provider was passed over; subsequent skips log at `debug` to
+keep long fallback chains quiet. If every candidate is skipped, the
+aggregated error includes the skip reason for each.

 ## Actions

- **generate** (default) -- create a video from the given prompt and optional reference inputs.
- **status** -- check the state of the in-flight video task for the current session without starting another generation.
- **list** -- show available providers, models, and their capabilities.
+| Action     | What it does                                                                                             |
+| ---------- | -------------------------------------------------------------------------------------------------------- |
+| `generate` | Default. Create a video from the given prompt and optional reference inputs.                             |
+| `status`   | Check the state of the in-flight video task for the current session without starting another generation. |
+| `list`     | Show available providers, models, and their capabilities.                                                |

 ## Model selection

-When generating a video, OpenClaw resolves the model in this order:
+OpenClaw resolves the model in this order:

-1. **`model` tool parameter** -- if the agent specifies one in the call.
-2. **`videoGenerationModel.primary`** -- from config.
-3. **`videoGenerationModel.fallbacks`** -- tried in order.
-4. **Auto-detection** -- uses providers that have valid auth, starting with the current default provider, then remaining providers in alphabetical order.
+1. **`model` tool parameter** — if the agent specifies one in the call.
+2. **`videoGenerationModel.primary`** from config.
+3. **`videoGenerationModel.fallbacks`** in order.
+4. **Auto-detection** — providers that have valid auth, starting with the
+   current default provider, then remaining providers in alphabetical
+   order.

-If a provider fails, the next candidate is tried automatically. If all candidates fail, the error includes details from each attempt.
+If a provider fails, the next candidate is tried automatically. If all
+candidates fail, the error includes details from each attempt.

-Set `agents.defaults.mediaGenerationAutoProviderFallback: false` if you want
-video generation to use only the explicit `model`, `primary`, and `fallbacks`
-entries.
+Set `agents.defaults.mediaGenerationAutoProviderFallback: false` to use
+only the explicit `model`, `primary`, and `fallbacks` entries.

 ```json5
 {
@@ -259,84 +314,108 @@ entries.

 <AccordionGroup>
  <Accordion title="Alibaba">
-    Uses DashScope / Model Studio async endpoint. Reference images and videos must be remote `http(s)` URLs.
+    Uses DashScope / Model Studio async endpoint. Reference images and
+    videos must be remote `http(s)` URLs.
  </Accordion>
-
  <Accordion title="BytePlus (1.0)">
    Provider id: `byteplus`.

-    Models: `seedance-1-0-pro-250528` (default), `seedance-1-0-pro-t2v-250528`, `seedance-1-0-pro-fast-251015`, `seedance-1-0-lite-t2v-250428`, `seedance-1-0-lite-i2v-250428`.
+    Models: `seedance-1-0-pro-250528` (default),
+    `seedance-1-0-pro-t2v-250528`, `seedance-1-0-pro-fast-251015`,
+    `seedance-1-0-lite-t2v-250428`, `seedance-1-0-lite-i2v-250428`.

-    T2V models (`*-t2v-*`) do not accept image inputs; I2V models and general `*-pro-*` models support a single reference image (first frame). Pass the image positionally or set `role: "first_frame"`. T2V model IDs are automatically switched to the corresponding I2V variant when an image is provided.
+    T2V models (`*-t2v-*`) do not accept image inputs; I2V models and
+    general `*-pro-*` models support a single reference image (first
+    frame). Pass the image positionally or set `role: "first_frame"`.
+    T2V model IDs are automatically switched to the corresponding I2V
+    variant when an image is provided.

-    Supported `providerOptions` keys: `seed` (number), `draft` (boolean — forces 480p), `camera_fixed` (boolean).
+    Supported `providerOptions` keys: `seed` (number), `draft` (boolean —
+    forces 480p), `camera_fixed` (boolean).

  </Accordion>
-
  <Accordion title="BytePlus Seedance 1.5">
-    Requires the [`@openclaw/byteplus-modelark`](https://www.npmjs.com/package/@openclaw/byteplus-modelark) plugin. Provider id: `byteplus-seedance15`. Model: `seedance-1-5-pro-251215`.
+    Requires the [`@openclaw/byteplus-modelark`](https://www.npmjs.com/package/@openclaw/byteplus-modelark)
+    plugin. Provider id: `byteplus-seedance15`. Model:
+    `seedance-1-5-pro-251215`.

-    Uses the unified `content[]` API. Supports at most 2 input images (`first_frame` + `last_frame`). All inputs must be remote `https://` URLs. Set `role: "first_frame"` / `"last_frame"` on each image, or pass images positionally.
+    Uses the unified `content[]` API. Supports at most 2 input images
+    (`first_frame` + `last_frame`). All inputs must be remote `https://`
+    URLs. Set `role: "first_frame"` / `"last_frame"` on each image, or
+    pass images positionally.

-    `aspectRatio: "adaptive"` auto-detects ratio from the input image. `audio: true` maps to `generate_audio`. `providerOptions.seed` (number) is forwarded.
+    `aspectRatio: "adaptive"` auto-detects ratio from the input image.
+    `audio: true` maps to `generate_audio`. `providerOptions.seed`
+    (number) is forwarded.

  </Accordion>
-
  <Accordion title="BytePlus Seedance 2.0">
-    Requires the [`@openclaw/byteplus-modelark`](https://www.npmjs.com/package/@openclaw/byteplus-modelark) plugin. Provider id: `byteplus-seedance2`. Models: `dreamina-seedance-2-0-260128`, `dreamina-seedance-2-0-fast-260128`.
+    Requires the [`@openclaw/byteplus-modelark`](https://www.npmjs.com/package/@openclaw/byteplus-modelark)
+    plugin. Provider id: `byteplus-seedance2`. Models:
+    `dreamina-seedance-2-0-260128`,
+    `dreamina-seedance-2-0-fast-260128`.

-    Uses the unified `content[]` API. Supports up to 9 reference images, 3 reference videos, and 3 reference audios. All inputs must be remote `https://` URLs. Set `role` on each asset — supported values: `"first_frame"`, `"last_frame"`, `"reference_image"`, `"reference_video"`, `"reference_audio"`.
+    Uses the unified `content[]` API. Supports up to 9 reference images,
+    3 reference videos, and 3 reference audios. All inputs must be remote
+    `https://` URLs. Set `role` on each asset — supported values:
+    `"first_frame"`, `"last_frame"`, `"reference_image"`,
+    `"reference_video"`, `"reference_audio"`.

-    `aspectRatio: "adaptive"` auto-detects ratio from the input image. `audio: true` maps to `generate_audio`. `providerOptions.seed` (number) is forwarded.
+    `aspectRatio: "adaptive"` auto-detects ratio from the input image.
+    `audio: true` maps to `generate_audio`. `providerOptions.seed`
+    (number) is forwarded.

  </Accordion>
-
  <Accordion title="ComfyUI">
-    Workflow-driven local or cloud execution. Supports text-to-video and image-to-video through the configured graph.
+    Workflow-driven local or cloud execution. Supports text-to-video and
+    image-to-video through the configured graph.
  </Accordion>
-
  <Accordion title="fal">
-    Uses a queue-backed flow for long-running jobs. Most fal video models accept a single image reference. Seedance 2.0 reference-to-video models accept up to 9 images, 3 videos, and 3 audio references, with at most 12 total reference files.
+    Uses a queue-backed flow for long-running jobs. Most fal video models
+    accept a single image reference. Seedance 2.0 reference-to-video
+    models accept up to 9 images, 3 videos, and 3 audio references, with
+    at most 12 total reference files.
  </Accordion>
-
  <Accordion title="Google (Gemini / Veo)">
    Supports one image or one video reference.
  </Accordion>
-
  <Accordion title="MiniMax">
    Single image reference only.
  </Accordion>
-
  <Accordion title="OpenAI">
-    Only `size` override is forwarded. Other style overrides (`aspectRatio`, `resolution`, `audio`, `watermark`) are ignored with a warning.
+    Only `size` override is forwarded. Other style overrides
+    (`aspectRatio`, `resolution`, `audio`, `watermark`) are ignored with
+    a warning.
  </Accordion>
-
  <Accordion title="Qwen">
-    Same DashScope backend as Alibaba. Reference inputs must be remote `http(s)` URLs; local files are rejected upfront.
+    Same DashScope backend as Alibaba. Reference inputs must be remote
+    `http(s)` URLs; local files are rejected upfront.
  </Accordion>
-
  <Accordion title="Runway">
-    Supports local files via data URIs. Video-to-video requires `runway/gen4_aleph`. Text-only runs expose `16:9` and `9:16` aspect ratios.
+    Supports local files via data URIs. Video-to-video requires
+    `runway/gen4_aleph`. Text-only runs expose `16:9` and `9:16` aspect
+    ratios.
  </Accordion>
-
  <Accordion title="Together">
    Single image reference only.
  </Accordion>
-
  <Accordion title="Vydra">
-    Uses `https://www.vydra.ai/api/v1` directly to avoid auth-dropping redirects. `veo3` is bundled as text-to-video only; `kling` requires a remote image URL.
+    Uses `https://www.vydra.ai/api/v1` directly to avoid auth-dropping
+    redirects. `veo3` is bundled as text-to-video only; `kling` requires
+    a remote image URL.
  </Accordion>
-
  <Accordion title="xAI">
-    Supports text-to-video, single first-frame image-to-video, up to 7 `reference_image` inputs through xAI `reference_images`, and remote video edit/extend flows.
+    Supports text-to-video, single first-frame image-to-video, up to 7
+    `reference_image` inputs through xAI `reference_images`, and remote
+    video edit/extend flows.
  </Accordion>
 </AccordionGroup>

 ## Provider capability modes

-The shared video-generation contract now lets providers declare mode-specific
-capabilities instead of only flat aggregate limits. New provider
-implementations should prefer explicit mode blocks:
+The shared video-generation contract supports mode-specific capabilities
+instead of only flat aggregate limits. New provider implementations
+should prefer explicit mode blocks:

 ```typescript
 capabilities: {
@@ -361,14 +440,14 @@ capabilities: {
 }
 ```

-Flat aggregate fields such as `maxInputImages` and `maxInputVideos` are not
-enough to advertise transform-mode support. Providers should declare
-`generate`, `imageToVideo`, and `videoToVideo` explicitly so live tests,
-contract tests, and the shared `video_generate` tool can validate mode support
-deterministically.
+Flat aggregate fields such as `maxInputImages` and `maxInputVideos` are
+**not** enough to advertise transform-mode support. Providers should
+declare `generate`, `imageToVideo`, and `videoToVideo` explicitly so live
+tests, contract tests, and the shared `video_generate` tool can validate
+mode support deterministically.

-When one model in a provider has wider reference-input support than the rest,
-use `maxInputImagesByModel`, `maxInputVideosByModel`, or
+When one model in a provider has wider reference-input support than the
+rest, use `maxInputImagesByModel`, `maxInputVideosByModel`, or
 `maxInputAudiosByModel` instead of raising the mode-wide limit.

 ## Live tests
@@ -389,31 +468,32 @@ This live file loads missing provider env vars from `~/.profile`, prefers
 live/env API keys ahead of stored auth profiles by default, and runs a
 release-safe smoke by default:

- `generate` for every non-FAL provider in the sweep
- one-second lobster prompt
- per-provider operation cap from `OPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS`
-  (`180000` by default)
+- `generate` for every non-FAL provider in the sweep.
+- One-second lobster prompt.
+- Per-provider operation cap from
+  `OPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS` (`180000` by default).

-FAL is opt-in because provider-side queue latency can dominate release time:
+FAL is opt-in because provider-side queue latency can dominate release
+time:

 ```bash
 pnpm test:live:media video --video-providers fal
 ```

-Set `OPENCLAW_LIVE_VIDEO_GENERATION_FULL_MODES=1` to also run declared transform
-modes the shared sweep can exercise safely with local media:
+Set `OPENCLAW_LIVE_VIDEO_GENERATION_FULL_MODES=1` to also run declared
+transform modes the shared sweep can exercise safely with local media:

- `imageToVideo` when `capabilities.imageToVideo.enabled`
- `videoToVideo` when `capabilities.videoToVideo.enabled` and the provider/model
-  accepts buffer-backed local video input in the shared sweep
+- `imageToVideo` when `capabilities.imageToVideo.enabled`.
+- `videoToVideo` when `capabilities.videoToVideo.enabled` and the
+  provider/model accepts buffer-backed local video input in the shared
+  sweep.

-Today the shared `videoToVideo` live lane covers:
-
- `runway` only when you select `runway/gen4_aleph`
+Today the shared `videoToVideo` live lane covers `runway` only when you
+select `runway/gen4_aleph`.

 ## Configuration

-Set the default video generation model in your OpenClaw config:
+Set the default video-generation model in your OpenClaw config:

 ```json5
 {
@@ -436,19 +516,19 @@ openclaw config set agents.defaults.videoGenerationModel.primary "qwen/wan2.6-t2

 ## Related

- [Tools Overview](/tools)
- [Background Tasks](/automation/tasks) -- task tracking for async video generation
 - [Alibaba Model Studio](/providers/alibaba)
+- [Background tasks](/automation/tasks) — task tracking for async video generation
 - [BytePlus](/concepts/model-providers#byteplus-international)
 - [ComfyUI](/providers/comfy)
+- [Configuration reference](/gateway/config-agents#agent-defaults)
 - [fal](/providers/fal)
 - [Google (Gemini)](/providers/google)
 - [MiniMax](/providers/minimax)
+- [Models](/concepts/models)
 - [OpenAI](/providers/openai)
 - [Qwen](/providers/qwen)
 - [Runway](/providers/runway)
 - [Together AI](/providers/together)
+- [Tools overview](/tools)
 - [Vydra](/providers/vydra)
 - [xAI](/providers/xai)
- [Configuration Reference](/gateway/config-agents#agent-defaults)
- [Models](/concepts/models)