feat(openrouter): add video generation provider (#72700)

Adds OpenRouter video generation via video_generate, with hardened async polling/download handling, docs, and regression coverage.

Validation:
- pnpm test src/plugins/plugin-lookup-table.test.ts src/secrets/target-registry.fast-path.test.ts src/gateway/server-startup-post-attach.test.ts extensions/openrouter/video-generation-provider.test.ts src/video-generation/live-test-helpers.test.ts src/media-generation/provider-capabilities.contract.test.ts src/agents/pi-embedded-helpers/failover-matches.test.ts src/plugins/manifest-metadata-scan.test.ts src/agents/openai-transport-stream.test.ts src/media-understanding/openai-compatible-audio.test.ts src/agents/schema-normalization-runtime-contract.test.ts src/agents/provider-request-config.test.ts src/plugin-sdk/provider-stream.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.websocket.test.ts -- --reporter=verbose
- OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=0 OPENCLAW_LIVE_VIDEO_GENERATION_MODELS=openrouter/google/veo-3.1-fast pnpm test:live src/video-generation/video-generation.live.test.ts -- --runInBand

Co-authored-by: notamicrodose <gabrielkripalani@me.com>
This commit is contained in:
Gabriel Kripalani
2026-04-28 11:57:31 +02:00
committed by GitHub
parent 5915489631
commit 17ef9ef895
19 changed files with 957 additions and 21 deletions

View File

@@ -4,6 +4,7 @@ read_when:
- You want a single API key for many LLMs
- You want to run models via OpenRouter in OpenClaw
- You want to use OpenRouter for image generation
- You want to use OpenRouter for video generation
title: "OpenRouter"
---
@@ -78,6 +79,33 @@ OpenRouter can also back the `image_generate` tool. Use an OpenRouter image mode
OpenClaw sends image requests to OpenRouter's chat completions image API with `modalities: ["image", "text"]`. Gemini image models receive supported `aspectRatio` and `resolution` hints through OpenRouter's `image_config`. Use `agents.defaults.imageGenerationModel.timeoutMs` for slower OpenRouter image models; the `image_generate` tool's per-call `timeoutMs` parameter still wins.
## Video generation
OpenRouter can also back the `video_generate` tool through its asynchronous `/videos` API. Use an OpenRouter video model under `agents.defaults.videoGenerationModel`:
```json5
{
env: { OPENROUTER_API_KEY: "sk-or-..." },
agents: {
defaults: {
videoGenerationModel: {
primary: "openrouter/google/veo-3.1-fast",
},
},
},
}
```
OpenClaw submits text-to-video and image-to-video jobs to OpenRouter, polls
the returned `polling_url`, and downloads the completed video from
OpenRouter's `unsigned_urls` or the documented job content endpoint.
Reference images are sent as first/last frame images by default; images
tagged with `reference_image` are sent as OpenRouter input references. The
bundled `google/veo-3.1-fast` default advertises the currently supported 4/6/8
second durations, `720P`/`1080P` resolutions, and `16:9`/`9:16` aspect
ratios. Video-to-video is not registered for OpenRouter because the upstream
video generation API currently accepts text and image references.
## Text-to-speech
OpenRouter can also be used as a TTS provider through its OpenAI-compatible

View File

@@ -61,6 +61,7 @@ provider is configured.
| MiniMax | ✓ | ✓ | ✓ | ✓ | | | |
| Mistral | | | | | ✓ | | |
| OpenAI | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ |
| OpenRouter | ✓ | ✓ | | ✓ | | | ✓ |
| Qwen | | ✓ | | | | | |
| Runway | | ✓ | | | | | |
| SenseAudio | | | | | ✓ | | |

View File

@@ -1,5 +1,5 @@
---
summary: "Generate videos via video_generate from text, image, or video references across 14 provider backends"
summary: "Generate videos via video_generate from text, image, or video references across 16 provider backends"
read_when:
- Generating videos via the agent
- Configuring video-generation providers and models
@@ -9,7 +9,7 @@ sidebarTitle: "Video generation"
---
OpenClaw agents can generate videos from text prompts, reference images, or
existing videos. Fifteen provider backends are supported, each with
existing videos. Sixteen provider backends are supported, each with
different model options, input modes, and feature sets. The agent picks the
right provider automatically based on your configuration and available API
keys.
@@ -116,6 +116,7 @@ generation.
| Google | `veo-3.1-fast-generate-preview` | ✓ | 1 image | 1 video | `GEMINI_API_KEY` |
| MiniMax | `MiniMax-Hailuo-2.3` | ✓ | 1 image | — | `MINIMAX_API_KEY` or MiniMax OAuth |
| OpenAI | `sora-2` | ✓ | 1 image | 1 video | `OPENAI_API_KEY` |
| OpenRouter | `google/veo-3.1-fast` | ✓ | Up to 4 images (first/last frame or references) | — | `OPENROUTER_API_KEY` |
| Qwen | `wan2.6-t2v` | ✓ | Yes (remote URL) | Yes (remote URL) | `QWEN_API_KEY` |
| Runway | `gen4.5` | ✓ | 1 image | 1 video | `RUNWAYML_API_SECRET` |
| Together | `Wan-AI/Wan2.2-T2V-A14B` | ✓ | 1 image | — | `TOGETHER_API_KEY` |
@@ -133,21 +134,22 @@ runtime modes at runtime.
The explicit mode contract used by `video_generate`, contract tests, and
the shared live sweep:
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
| --------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
| DeepInfra | ✓ | — | — | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract |
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
| MiniMax | ✓ | ✓ | — | `generate`, `imageToVideo` |
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
| Qwen | ✓ | ✓ | | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| Runway | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
| Together | ✓ | ✓ | | `generate`, `imageToVideo` |
| Vydra | ✓ | ✓ | — | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
| xAI | ✓ | ✓ | | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
| ---------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
| DeepInfra | ✓ | — | — | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract |
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
| MiniMax | ✓ | ✓ | — | `generate`, `imageToVideo` |
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
| OpenRouter | ✓ | ✓ | | `generate`, `imageToVideo` |
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| Runway | ✓ | ✓ | | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
| Together | ✓ | ✓ | — | `generate`, `imageToVideo` |
| Vydra | ✓ | ✓ | | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
| xAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
## Tool parameters
@@ -389,6 +391,13 @@ only the explicit `model`, `primary`, and `fallbacks` entries.
(`aspectRatio`, `resolution`, `audio`, `watermark`) are ignored with
a warning.
</Accordion>
<Accordion title="OpenRouter">
Uses OpenRouter's asynchronous `/videos` API. OpenClaw submits the
job, polls `polling_url`, and downloads either `unsigned_urls` or the
documented job content endpoint. The bundled `google/veo-3.1-fast` default
advertises 4/6/8 second durations, `720P`/`1080P` resolutions, and
`16:9`/`9:16` aspect ratios.
</Accordion>
<Accordion title="Qwen">
Same DashScope backend as Alibaba. Reference inputs must be remote
`http(s)` URLs; local files are rejected upfront.