docs: update video generation timeout notes

2026-05-06 05:20:43 +00:00 · 2026-05-05 06:42:46 +01:00
parent a34d4ef9d9
commit 123f7a697d
3 changed files with 25 additions and 18 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -66,6 +66,7 @@ Docs: https://docs.openclaw.ai

 ### Fixes

+- Video generation: wait up to 20 minutes for slow fal/MiniMax queue-backed jobs, stop forwarding unsupported Google Veo generated-audio options, and normalize MiniMax `720P` requests to its supported `768P` resolution with the usual override warning/details instead of failing fallback.
 - Update/restart: probe managed Gateway restarts with the service environment and add a Docker product lane that exercises candidate-owned `openclaw update --yes --json` restarts, so SecretRef-backed local gateway auth cannot regress behind mocked restart checks. Thanks @vincentkoc.
 - Webhooks/Gmail/Windows: resolve `gcloud`, `gog`, and `tailscale` PATH/PATHEXT shims before setup and watcher spawns, using the Windows-safe `.cmd` wrapper for long-lived `gog serve` processes. (#74881, fixes #54470) Thanks @Angfr95.
 - Video generation: accept provider-specific aspect-ratio and resolution hints at the tool boundary, normalize `720P` to MiniMax's supported `768P`, and stop sending Google `generateAudio` on Gemini video requests so provider fallback can recover from model-specific parameter differences. Thanks @vincentkoc.
--- a/docs/tools/media-overview.md
+++ b/docs/tools/media-overview.md
@@ -80,13 +80,13 @@ reply model.

 ## Async vs synchronous

-| Capability      | Mode         | Why                                                                |
-| --------------- | ------------ | ------------------------------------------------------------------ |
-| Image           | Synchronous  | Provider responses return in seconds; completes inline with reply. |
-| Text-to-speech  | Synchronous  | Provider responses return in seconds; attached to the reply audio. |
-| Video           | Asynchronous | Provider processing takes 30 s to several minutes.                 |
-| Music (shared)  | Asynchronous | Same provider-processing characteristic as video.                  |
-| Music (ComfyUI) | Synchronous  | Local workflow runs inline against the configured ComfyUI server.  |
+| Capability      | Mode         | Why                                                                                                  |
+| --------------- | ------------ | ---------------------------------------------------------------------------------------------------- |
+| Image           | Synchronous  | Provider responses return in seconds; completes inline with reply.                                   |
+| Text-to-speech  | Synchronous  | Provider responses return in seconds; attached to the reply audio.                                   |
+| Video           | Asynchronous | Provider processing takes 30 s to several minutes; slow queues can run up to the configured timeout. |
+| Music (shared)  | Asynchronous | Same provider-processing characteristic as video.                                                    |
+| Music (ComfyUI) | Synchronous  | Local workflow runs inline against the configured ComfyUI server.                                    |

 For async tools, OpenClaw submits the request to the provider, returns a task
 id immediately, and tracks the job in the task ledger. The agent continues
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@@ -60,7 +60,7 @@ Video generation is asynchronous. When the agent calls `video_generate` in a
 session:

 1. OpenClaw submits the request to the provider and immediately returns a task id.
-2. The provider processes the job in the background (typically 30 seconds to 5 minutes depending on the provider and resolution).
+2. The provider processes the job in the background (typically 30 seconds to several minutes depending on the provider and resolution; slow queue-backed providers can run up to the configured timeout).
 3. When the video is ready, OpenClaw wakes the same session with an internal completion event.
 4. The agent tells the user and attaches the finished video. In group/channel
   chats that use message-tool-only visible delivery, the agent relays the
@@ -84,12 +84,12 @@ rejects an oversized file.

 ### Task lifecycle

-| State       | Meaning                                                                                          |
-| ----------- | ------------------------------------------------------------------------------------------------ |
-| `queued`    | Task created, waiting for the provider to accept it.                                             |
-| `running`   | Provider is processing (typically 30 seconds to 5 minutes depending on provider and resolution). |
-| `succeeded` | Video ready; the agent wakes and posts it to the conversation.                                   |
-| `failed`    | Provider error or timeout; the agent wakes with error details.                                   |
+| State       | Meaning                                                                                                |
+| ----------- | ------------------------------------------------------------------------------------------------------ |
+| `queued`    | Task created, waiting for the provider to accept it.                                                   |
+| `running`   | Provider is processing (typically 30 seconds to several minutes depending on provider and resolution). |
+| `succeeded` | Video ready; the agent wakes and posts it to the conversation.                                         |
+| `failed`    | Provider error or timeout; the agent wakes with error details.                                         |

 Check status from the CLI:

@@ -223,7 +223,7 @@ dimensions). Providers that do not declare it surface the value via
 </ParamField>
 <ParamField path="model" type="string">Provider/model override (e.g. `runway/gen4.5`).</ParamField>
 <ParamField path="filename" type="string">Output filename hint.</ParamField>
-<ParamField path="timeoutMs" type="number">Optional provider request timeout in milliseconds.</ParamField>
+<ParamField path="timeoutMs" type="number">Optional provider operation timeout in milliseconds.</ParamField>
 <ParamField path="providerOptions" type="object">
  Provider-specific options as a JSON object (e.g. `{"seed": 42, "draft": true}`).
  Providers that declare a typed schema validate the keys and types; unknown
@@ -377,16 +377,22 @@ only the explicit `model`, `primary`, and `fallbacks` entries.
    image-to-video through the configured graph.
  </Accordion>
  <Accordion title="fal">
-    Uses a queue-backed flow for long-running jobs. Most fal video models
+    Uses a queue-backed flow for long-running jobs. OpenClaw waits up to 20
+    minutes by default before treating an in-progress fal queue job as timed
+    out. Most fal video models
    accept a single image reference. Seedance 2.0 reference-to-video
    models accept up to 9 images, 3 videos, and 3 audio references, with
    at most 12 total reference files.
  </Accordion>
  <Accordion title="Google (Gemini / Veo)">
-    Supports one image or one video reference.
+    Supports one image or one video reference. Generated-audio requests are
+    ignored with a warning on the Gemini API path because that API rejects
+    the `generateAudio` parameter for current Veo video generation.
  </Accordion>
  <Accordion title="MiniMax">
-    Single image reference only.
+    Single image reference only. MiniMax accepts `768P` and `1080P`
+    resolutions; requests such as `720P` are normalized to the closest
+    supported value before submission.
  </Accordion>
  <Accordion title="OpenAI">
    Only `size` override is forwarded. Other style overrides