feat(openrouter): add video generation provider (#72700)

Adds OpenRouter video generation via video_generate, with hardened async polling/download handling, docs, and regression coverage. Validation: - pnpm test src/plugins/plugin-lookup-table.test.ts src/secrets/target-registry.fast-path.test.ts src/gateway/server-startup-post-attach.test.ts extensions/openrouter/video-generation-provider.test.ts src/video-generation/live-test-helpers.test.ts src/media-generation/provider-capabilities.contract.test.ts src/agents/pi-embedded-helpers/failover-matches.test.ts src/plugins/manifest-metadata-scan.test.ts src/agents/openai-transport-stream.test.ts src/media-understanding/openai-compatible-audio.test.ts src/agents/schema-normalization-runtime-contract.test.ts src/agents/provider-request-config.test.ts src/plugin-sdk/provider-stream.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.websocket.test.ts -- --reporter=verbose - OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=0 OPENCLAW_LIVE_VIDEO_GENERATION_MODELS=openrouter/google/veo-3.1-fast pnpm test:live src/video-generation/video-generation.live.test.ts -- --runInBand Co-authored-by: notamicrodose <gabrielkripalani@me.com>
2026-05-06 14:40:43 +00:00 · 2026-04-28 11:57:31 +02:00
parent 5915489631
commit 17ef9ef895
19 changed files with 957 additions and 21 deletions
--- a/docs/providers/openrouter.md
+++ b/docs/providers/openrouter.md
@@ -4,6 +4,7 @@ read_when:
  - You want a single API key for many LLMs
  - You want to run models via OpenRouter in OpenClaw
  - You want to use OpenRouter for image generation
+  - You want to use OpenRouter for video generation
 title: "OpenRouter"
 ---

@@ -78,6 +79,33 @@ OpenRouter can also back the `image_generate` tool. Use an OpenRouter image mode

 OpenClaw sends image requests to OpenRouter's chat completions image API with `modalities: ["image", "text"]`. Gemini image models receive supported `aspectRatio` and `resolution` hints through OpenRouter's `image_config`. Use `agents.defaults.imageGenerationModel.timeoutMs` for slower OpenRouter image models; the `image_generate` tool's per-call `timeoutMs` parameter still wins.

+## Video generation
+
+OpenRouter can also back the `video_generate` tool through its asynchronous `/videos` API. Use an OpenRouter video model under `agents.defaults.videoGenerationModel`:
+
+```json5
+{
+  env: { OPENROUTER_API_KEY: "sk-or-..." },
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "openrouter/google/veo-3.1-fast",
+      },
+    },
+  },
+}
+```
+
+OpenClaw submits text-to-video and image-to-video jobs to OpenRouter, polls
+the returned `polling_url`, and downloads the completed video from
+OpenRouter's `unsigned_urls` or the documented job content endpoint.
+Reference images are sent as first/last frame images by default; images
+tagged with `reference_image` are sent as OpenRouter input references. The
+bundled `google/veo-3.1-fast` default advertises the currently supported 4/6/8
+second durations, `720P`/`1080P` resolutions, and `16:9`/`9:16` aspect
+ratios. Video-to-video is not registered for OpenRouter because the upstream
+video generation API currently accepts text and image references.
+
 ## Text-to-speech

 OpenRouter can also be used as a TTS provider through its OpenAI-compatible
--- a/docs/tools/media-overview.md
+++ b/docs/tools/media-overview.md
@@ -61,6 +61,7 @@ provider is configured.
 | MiniMax     |   ✓   |   ✓   |   ✓   |  ✓  |     |                |                     |
 | Mistral     |       |       |       |     |  ✓  |                |                     |
 | OpenAI      |   ✓   |   ✓   |       |  ✓  |  ✓  |       ✓        |          ✓          |
+| OpenRouter  |   ✓   |   ✓   |       |  ✓  |     |                |          ✓          |
 | Qwen        |       |   ✓   |       |     |     |                |                     |
 | Runway      |       |   ✓   |       |     |     |                |                     |
 | SenseAudio  |       |       |       |     |  ✓  |                |                     |
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@@ -1,5 +1,5 @@
 ---
-summary: "Generate videos via video_generate from text, image, or video references across 14 provider backends"
+summary: "Generate videos via video_generate from text, image, or video references across 16 provider backends"
 read_when:
  - Generating videos via the agent
  - Configuring video-generation providers and models
@@ -9,7 +9,7 @@ sidebarTitle: "Video generation"
 ---

 OpenClaw agents can generate videos from text prompts, reference images, or
-existing videos. Fifteen provider backends are supported, each with
+existing videos. Sixteen provider backends are supported, each with
 different model options, input modes, and feature sets. The agent picks the
 right provider automatically based on your configuration and available API
 keys.
@@ -116,6 +116,7 @@ generation.
 | Google                | `veo-3.1-fast-generate-preview` |  ✓   | 1 image                                              | 1 video                                         | `GEMINI_API_KEY`                         |
 | MiniMax               | `MiniMax-Hailuo-2.3`            |  ✓   | 1 image                                              | —                                               | `MINIMAX_API_KEY` or MiniMax OAuth       |
 | OpenAI                | `sora-2`                        |  ✓   | 1 image                                              | 1 video                                         | `OPENAI_API_KEY`                         |
+| OpenRouter            | `google/veo-3.1-fast`           |  ✓   | Up to 4 images (first/last frame or references)      | —                                               | `OPENROUTER_API_KEY`                     |
 | Qwen                  | `wan2.6-t2v`                    |  ✓   | Yes (remote URL)                                     | Yes (remote URL)                                | `QWEN_API_KEY`                           |
 | Runway                | `gen4.5`                        |  ✓   | 1 image                                              | 1 video                                         | `RUNWAYML_API_SECRET`                    |
 | Together              | `Wan-AI/Wan2.2-T2V-A14B`        |  ✓   | 1 image                                              | —                                               | `TOGETHER_API_KEY`                       |
@@ -133,21 +134,22 @@ runtime modes at runtime.
 The explicit mode contract used by `video_generate`, contract tests, and
 the shared live sweep:

-| Provider  | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today                                                                                                                  |
-| --------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
-| Alibaba   |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs                               |
-| BytePlus  |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
-| ComfyUI   |     ✓      |       ✓        |       —        | Not in the shared sweep; workflow-specific coverage lives with Comfy tests                                                               |
-| DeepInfra |     ✓      |       —        |       —        | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract                                                     |
-| fal       |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video                                                   |
-| Google    |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input  |
-| MiniMax   |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
-| OpenAI    |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
-| Qwen      |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs                               |
-| Runway    |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph`                                      |
-| Together  |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
-| Vydra     |     ✓      |       ✓        |       —        | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL            |
-| xAI       |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL                                |
+| Provider   | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today                                                                                                                  |
+| ---------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
+| Alibaba    |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs                               |
+| BytePlus   |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
+| ComfyUI    |     ✓      |       ✓        |       —        | Not in the shared sweep; workflow-specific coverage lives with Comfy tests                                                               |
+| DeepInfra  |     ✓      |       —        |       —        | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract                                                     |
+| fal        |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video                                                   |
+| Google     |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input  |
+| MiniMax    |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
+| OpenAI     |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
+| OpenRouter |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
+| Qwen       |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs                               |
+| Runway     |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph`                                      |
+| Together   |     ✓      |       ✓        |       —        | `generate`, `imageToVideo`                                                                                                               |
+| Vydra      |     ✓      |       ✓        |       —        | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL            |
+| xAI        |     ✓      |       ✓        |       ✓        | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL                                |

 ## Tool parameters

@@ -389,6 +391,13 @@ only the explicit `model`, `primary`, and `fallbacks` entries.
    (`aspectRatio`, `resolution`, `audio`, `watermark`) are ignored with
    a warning.
  </Accordion>
+  <Accordion title="OpenRouter">
+    Uses OpenRouter's asynchronous `/videos` API. OpenClaw submits the
+    job, polls `polling_url`, and downloads either `unsigned_urls` or the
+    documented job content endpoint. The bundled `google/veo-3.1-fast` default
+    advertises 4/6/8 second durations, `720P`/`1080P` resolutions, and
+    `16:9`/`9:16` aspect ratios.
+  </Accordion>
  <Accordion title="Qwen">
    Same DashScope backend as Alibaba. Reference inputs must be remote
    `http(s)` URLs; local files are rejected upfront.