mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 14:40:43 +00:00
feat(openrouter): add video generation provider (#72700)
Adds OpenRouter video generation via video_generate, with hardened async polling/download handling, docs, and regression coverage. Validation: - pnpm test src/plugins/plugin-lookup-table.test.ts src/secrets/target-registry.fast-path.test.ts src/gateway/server-startup-post-attach.test.ts extensions/openrouter/video-generation-provider.test.ts src/video-generation/live-test-helpers.test.ts src/media-generation/provider-capabilities.contract.test.ts src/agents/pi-embedded-helpers/failover-matches.test.ts src/plugins/manifest-metadata-scan.test.ts src/agents/openai-transport-stream.test.ts src/media-understanding/openai-compatible-audio.test.ts src/agents/schema-normalization-runtime-contract.test.ts src/agents/provider-request-config.test.ts src/plugin-sdk/provider-stream.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.websocket.test.ts -- --reporter=verbose - OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=0 OPENCLAW_LIVE_VIDEO_GENERATION_MODELS=openrouter/google/veo-3.1-fast pnpm test:live src/video-generation/video-generation.live.test.ts -- --runInBand Co-authored-by: notamicrodose <gabrielkripalani@me.com>
This commit is contained in:
committed by
GitHub
parent
5915489631
commit
17ef9ef895
@@ -4,6 +4,7 @@ read_when:
|
||||
- You want a single API key for many LLMs
|
||||
- You want to run models via OpenRouter in OpenClaw
|
||||
- You want to use OpenRouter for image generation
|
||||
- You want to use OpenRouter for video generation
|
||||
title: "OpenRouter"
|
||||
---
|
||||
|
||||
@@ -78,6 +79,33 @@ OpenRouter can also back the `image_generate` tool. Use an OpenRouter image mode
|
||||
|
||||
OpenClaw sends image requests to OpenRouter's chat completions image API with `modalities: ["image", "text"]`. Gemini image models receive supported `aspectRatio` and `resolution` hints through OpenRouter's `image_config`. Use `agents.defaults.imageGenerationModel.timeoutMs` for slower OpenRouter image models; the `image_generate` tool's per-call `timeoutMs` parameter still wins.
|
||||
|
||||
## Video generation
|
||||
|
||||
OpenRouter can also back the `video_generate` tool through its asynchronous `/videos` API. Use an OpenRouter video model under `agents.defaults.videoGenerationModel`:
|
||||
|
||||
```json5
|
||||
{
|
||||
env: { OPENROUTER_API_KEY: "sk-or-..." },
|
||||
agents: {
|
||||
defaults: {
|
||||
videoGenerationModel: {
|
||||
primary: "openrouter/google/veo-3.1-fast",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
OpenClaw submits text-to-video and image-to-video jobs to OpenRouter, polls
|
||||
the returned `polling_url`, and downloads the completed video from
|
||||
OpenRouter's `unsigned_urls` or the documented job content endpoint.
|
||||
Reference images are sent as first/last frame images by default; images
|
||||
tagged with `reference_image` are sent as OpenRouter input references. The
|
||||
bundled `google/veo-3.1-fast` default advertises the currently supported 4/6/8
|
||||
second durations, `720P`/`1080P` resolutions, and `16:9`/`9:16` aspect
|
||||
ratios. Video-to-video is not registered for OpenRouter because the upstream
|
||||
video generation API currently accepts text and image references.
|
||||
|
||||
## Text-to-speech
|
||||
|
||||
OpenRouter can also be used as a TTS provider through its OpenAI-compatible
|
||||
|
||||
@@ -61,6 +61,7 @@ provider is configured.
|
||||
| MiniMax | ✓ | ✓ | ✓ | ✓ | | | |
|
||||
| Mistral | | | | | ✓ | | |
|
||||
| OpenAI | ✓ | ✓ | | ✓ | ✓ | ✓ | ✓ |
|
||||
| OpenRouter | ✓ | ✓ | | ✓ | | | ✓ |
|
||||
| Qwen | | ✓ | | | | | |
|
||||
| Runway | | ✓ | | | | | |
|
||||
| SenseAudio | | | | | ✓ | | |
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
summary: "Generate videos via video_generate from text, image, or video references across 14 provider backends"
|
||||
summary: "Generate videos via video_generate from text, image, or video references across 16 provider backends"
|
||||
read_when:
|
||||
- Generating videos via the agent
|
||||
- Configuring video-generation providers and models
|
||||
@@ -9,7 +9,7 @@ sidebarTitle: "Video generation"
|
||||
---
|
||||
|
||||
OpenClaw agents can generate videos from text prompts, reference images, or
|
||||
existing videos. Fifteen provider backends are supported, each with
|
||||
existing videos. Sixteen provider backends are supported, each with
|
||||
different model options, input modes, and feature sets. The agent picks the
|
||||
right provider automatically based on your configuration and available API
|
||||
keys.
|
||||
@@ -116,6 +116,7 @@ generation.
|
||||
| Google | `veo-3.1-fast-generate-preview` | ✓ | 1 image | 1 video | `GEMINI_API_KEY` |
|
||||
| MiniMax | `MiniMax-Hailuo-2.3` | ✓ | 1 image | — | `MINIMAX_API_KEY` or MiniMax OAuth |
|
||||
| OpenAI | `sora-2` | ✓ | 1 image | 1 video | `OPENAI_API_KEY` |
|
||||
| OpenRouter | `google/veo-3.1-fast` | ✓ | Up to 4 images (first/last frame or references) | — | `OPENROUTER_API_KEY` |
|
||||
| Qwen | `wan2.6-t2v` | ✓ | Yes (remote URL) | Yes (remote URL) | `QWEN_API_KEY` |
|
||||
| Runway | `gen4.5` | ✓ | 1 image | 1 video | `RUNWAYML_API_SECRET` |
|
||||
| Together | `Wan-AI/Wan2.2-T2V-A14B` | ✓ | 1 image | — | `TOGETHER_API_KEY` |
|
||||
@@ -133,21 +134,22 @@ runtime modes at runtime.
|
||||
The explicit mode contract used by `video_generate`, contract tests, and
|
||||
the shared live sweep:
|
||||
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| --------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| DeepInfra | ✓ | — | — | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract |
|
||||
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
|
||||
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
|
||||
| MiniMax | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
|
||||
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| Vydra | ✓ | ✓ | — | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
|
||||
| xAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| ---------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| DeepInfra | ✓ | — | — | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract |
|
||||
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
|
||||
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
|
||||
| MiniMax | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
|
||||
| OpenRouter | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| Vydra | ✓ | ✓ | — | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
|
||||
| xAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
|
||||
## Tool parameters
|
||||
|
||||
@@ -389,6 +391,13 @@ only the explicit `model`, `primary`, and `fallbacks` entries.
|
||||
(`aspectRatio`, `resolution`, `audio`, `watermark`) are ignored with
|
||||
a warning.
|
||||
</Accordion>
|
||||
<Accordion title="OpenRouter">
|
||||
Uses OpenRouter's asynchronous `/videos` API. OpenClaw submits the
|
||||
job, polls `polling_url`, and downloads either `unsigned_urls` or the
|
||||
documented job content endpoint. The bundled `google/veo-3.1-fast` default
|
||||
advertises 4/6/8 second durations, `720P`/`1080P` resolutions, and
|
||||
`16:9`/`9:16` aspect ratios.
|
||||
</Accordion>
|
||||
<Accordion title="Qwen">
|
||||
Same DashScope backend as Alibaba. Reference inputs must be remote
|
||||
`http(s)` URLs; local files are rejected upfront.
|
||||
|
||||
Reference in New Issue
Block a user