feat: declare explicit media provider capabilities

This commit is contained in:
Peter Steinberger
2026-04-06 15:24:16 +01:00
parent 29df67c491
commit cd5b1653f6
46 changed files with 1623 additions and 393 deletions

View File

@@ -79,6 +79,26 @@ Some providers accept additional or alternate API key env vars. See individual [
Run `video_generate action=list` to inspect available providers, models, and
runtime modes at runtime.
### Declared capability matrix
This is the explicit mode contract used by `video_generate`, contract tests,
and the shared live sweep.
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
| -------- | ---------- | -------------- | -------------- | ---------------------------------------------------------------------------------------------------------- |
| Alibaba | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| BytePlus | Yes | Yes | No | `generate`, `imageToVideo` |
| ComfyUI | Yes | Yes | No | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
| fal | Yes | Yes | No | `generate`, `imageToVideo` |
| Google | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
| MiniMax | Yes | Yes | No | `generate`, `imageToVideo` |
| OpenAI | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
| Qwen | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| Runway | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
| Together | Yes | Yes | No | `generate`, `imageToVideo` |
| Vydra | Yes | Yes | No | `generate`, `imageToVideo` |
| xAI | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
## Tool parameters
### Required
@@ -201,9 +221,34 @@ capabilities: {
}
```
Legacy flat fields such as `maxInputImages` and `maxInputVideos` still work as
backward-compatible aggregate caps, but they cannot express per-mode limits as
precisely.
Flat aggregate fields such as `maxInputImages` and `maxInputVideos` are not
enough to advertise transform-mode support. Providers should declare
`generate`, `imageToVideo`, and `videoToVideo` explicitly so live tests,
contract tests, and the shared `video_generate` tool can validate mode support
deterministically.
## Live tests
Opt-in live coverage for the shared bundled providers:
```bash
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts
```
This live file loads missing provider env vars from `~/.profile`, prefers
live/env API keys ahead of stored auth profiles by default, and runs the
declared modes it can exercise safely with local media:
- `generate` for every provider in the sweep
- `imageToVideo` when `capabilities.imageToVideo.enabled`
- `videoToVideo` when `capabilities.videoToVideo.enabled` and the provider/model
accepts buffer-backed local video input in the shared sweep
Today the shared `videoToVideo` live lane covers:
- `google`
- `openai`
- `runway` only when you select `runway/gen4_aleph`
## Configuration