diff --git a/CHANGELOG.md b/CHANGELOG.md
index cbc4613df8b..01c3cfebc66 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -66,6 +66,7 @@ Docs: https://docs.openclaw.ai
### Fixes
+- Video generation: wait up to 20 minutes for slow fal/MiniMax queue-backed jobs, stop forwarding unsupported Google Veo generated-audio options, and normalize MiniMax `720P` requests to its supported `768P` resolution with the usual override warning/details instead of failing fallback.
- Update/restart: probe managed Gateway restarts with the service environment and add a Docker product lane that exercises candidate-owned `openclaw update --yes --json` restarts, so SecretRef-backed local gateway auth cannot regress behind mocked restart checks. Thanks @vincentkoc.
- Webhooks/Gmail/Windows: resolve `gcloud`, `gog`, and `tailscale` PATH/PATHEXT shims before setup and watcher spawns, using the Windows-safe `.cmd` wrapper for long-lived `gog serve` processes. (#74881, fixes #54470) Thanks @Angfr95.
- Video generation: accept provider-specific aspect-ratio and resolution hints at the tool boundary, normalize `720P` to MiniMax's supported `768P`, and stop sending Google `generateAudio` on Gemini video requests so provider fallback can recover from model-specific parameter differences. Thanks @vincentkoc.
diff --git a/docs/tools/media-overview.md b/docs/tools/media-overview.md
index 34fde3f7926..b1bea44b68f 100644
--- a/docs/tools/media-overview.md
+++ b/docs/tools/media-overview.md
@@ -80,13 +80,13 @@ reply model.
## Async vs synchronous
-| Capability | Mode | Why |
-| --------------- | ------------ | ------------------------------------------------------------------ |
-| Image | Synchronous | Provider responses return in seconds; completes inline with reply. |
-| Text-to-speech | Synchronous | Provider responses return in seconds; attached to the reply audio. |
-| Video | Asynchronous | Provider processing takes 30 s to several minutes. |
-| Music (shared) | Asynchronous | Same provider-processing characteristic as video. |
-| Music (ComfyUI) | Synchronous | Local workflow runs inline against the configured ComfyUI server. |
+| Capability | Mode | Why |
+| --------------- | ------------ | ---------------------------------------------------------------------------------------------------- |
+| Image | Synchronous | Provider responses return in seconds; completes inline with reply. |
+| Text-to-speech | Synchronous | Provider responses return in seconds; attached to the reply audio. |
+| Video | Asynchronous | Provider processing takes 30 s to several minutes; slow queues can run up to the configured timeout. |
+| Music (shared) | Asynchronous | Same provider-processing characteristic as video. |
+| Music (ComfyUI) | Synchronous | Local workflow runs inline against the configured ComfyUI server. |
For async tools, OpenClaw submits the request to the provider, returns a task
id immediately, and tracks the job in the task ledger. The agent continues
diff --git a/docs/tools/video-generation.md b/docs/tools/video-generation.md
index a70299e81e6..a4a73d566f8 100644
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@@ -60,7 +60,7 @@ Video generation is asynchronous. When the agent calls `video_generate` in a
session:
1. OpenClaw submits the request to the provider and immediately returns a task id.
-2. The provider processes the job in the background (typically 30 seconds to 5 minutes depending on the provider and resolution).
+2. The provider processes the job in the background (typically 30 seconds to several minutes depending on the provider and resolution; slow queue-backed providers can run up to the configured timeout).
3. When the video is ready, OpenClaw wakes the same session with an internal completion event.
4. The agent tells the user and attaches the finished video. In group/channel
chats that use message-tool-only visible delivery, the agent relays the
@@ -84,12 +84,12 @@ rejects an oversized file.
### Task lifecycle
-| State | Meaning |
-| ----------- | ------------------------------------------------------------------------------------------------ |
-| `queued` | Task created, waiting for the provider to accept it. |
-| `running` | Provider is processing (typically 30 seconds to 5 minutes depending on provider and resolution). |
-| `succeeded` | Video ready; the agent wakes and posts it to the conversation. |
-| `failed` | Provider error or timeout; the agent wakes with error details. |
+| State | Meaning |
+| ----------- | ------------------------------------------------------------------------------------------------------ |
+| `queued` | Task created, waiting for the provider to accept it. |
+| `running` | Provider is processing (typically 30 seconds to several minutes depending on provider and resolution). |
+| `succeeded` | Video ready; the agent wakes and posts it to the conversation. |
+| `failed` | Provider error or timeout; the agent wakes with error details. |
Check status from the CLI:
@@ -223,7 +223,7 @@ dimensions). Providers that do not declare it surface the value via
Provider/model override (e.g. `runway/gen4.5`).
Output filename hint.
-Optional provider request timeout in milliseconds.
+Optional provider operation timeout in milliseconds.
Provider-specific options as a JSON object (e.g. `{"seed": 42, "draft": true}`).
Providers that declare a typed schema validate the keys and types; unknown
@@ -377,16 +377,22 @@ only the explicit `model`, `primary`, and `fallbacks` entries.
image-to-video through the configured graph.
- Uses a queue-backed flow for long-running jobs. Most fal video models
+ Uses a queue-backed flow for long-running jobs. OpenClaw waits up to 20
+ minutes by default before treating an in-progress fal queue job as timed
+ out. Most fal video models
accept a single image reference. Seedance 2.0 reference-to-video
models accept up to 9 images, 3 videos, and 3 audio references, with
at most 12 total reference files.
- Supports one image or one video reference.
+ Supports one image or one video reference. Generated-audio requests are
+ ignored with a warning on the Gemini API path because that API rejects
+ the `generateAudio` parameter for current Veo video generation.
- Single image reference only.
+ Single image reference only. MiniMax accepts `768P` and `1080P`
+ resolutions; requests such as `720P` are normalized to the closest
+ supported value before submission.
Only `size` override is forwarded. Other style overrides