mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-12 09:41:11 +00:00
feat: declare explicit media provider capabilities
This commit is contained in:
@@ -7,6 +7,7 @@ Docs: https://docs.openclaw.ai
|
||||
### Changes
|
||||
|
||||
- Plugins/webhooks: add a bundled webhook ingress plugin so external automation can create and drive bound TaskFlows through per-route shared-secret endpoints. (#61892) Thanks @mbelinky.
|
||||
- Tools/media: document per-provider music and video generation capabilities, and add shared live video-to-video sweep coverage for providers that support local reference clips.
|
||||
|
||||
### Fixes
|
||||
|
||||
|
||||
@@ -475,10 +475,45 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
|
||||
- Exercises the shared bundled music-generation provider path
|
||||
- Currently covers Google and MiniMax
|
||||
- Loads provider env vars from your login shell (`~/.profile`) before probing
|
||||
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in `auth-profiles.json` do not mask real shell credentials
|
||||
- Skips providers with no usable auth/profile/model
|
||||
- Runs both declared runtime modes when available:
|
||||
- `generate` with prompt-only input
|
||||
- `edit` when the provider declares `capabilities.edit.enabled`
|
||||
- Current shared-lane coverage:
|
||||
- `google`: `generate`, `edit`
|
||||
- `minimax`: `generate`
|
||||
- `comfy`: separate Comfy live file, not this shared sweep
|
||||
- Optional narrowing:
|
||||
- `OPENCLAW_LIVE_MUSIC_GENERATION_PROVIDERS="google,minimax"`
|
||||
- `OPENCLAW_LIVE_MUSIC_GENERATION_MODELS="google/lyria-3-clip-preview,minimax/music-2.5+"`
|
||||
- Optional auth behavior:
|
||||
- `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to force profile-store auth and ignore env-only overrides
|
||||
|
||||
## Video generation live
|
||||
|
||||
- Test: `extensions/video-generation-providers.live.test.ts`
|
||||
- Enable: `OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts`
|
||||
- Scope:
|
||||
- Exercises the shared bundled video-generation provider path
|
||||
- Loads provider env vars from your login shell (`~/.profile`) before probing
|
||||
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in `auth-profiles.json` do not mask real shell credentials
|
||||
- Skips providers with no usable auth/profile/model
|
||||
- Runs both declared runtime modes when available:
|
||||
- `generate` with prompt-only input
|
||||
- `imageToVideo` when the provider declares `capabilities.imageToVideo.enabled`
|
||||
- `videoToVideo` when the provider declares `capabilities.videoToVideo.enabled` and the selected provider/model accepts buffer-backed local video input in the shared sweep
|
||||
- Current `videoToVideo` live coverage:
|
||||
- `google`
|
||||
- `openai`
|
||||
- `runway` only when the selected model is `runway/gen4_aleph`
|
||||
- Current declared-but-skipped `videoToVideo` providers in the shared sweep:
|
||||
- `alibaba`, `qwen`, `xai` because those paths currently require remote `http(s)` / MP4 reference URLs
|
||||
- Optional narrowing:
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="google,openai,runway"`
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_MODELS="google/veo-3.1-fast-generate-preview,openai/sora-2,runway/gen4_aleph"`
|
||||
- Optional auth behavior:
|
||||
- `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to force profile-store auth and ignore env-only overrides
|
||||
|
||||
## Docker runners (optional "works in Linux" checks)
|
||||
|
||||
|
||||
@@ -643,10 +643,15 @@ API key auth, and dynamic model resolution.
|
||||
[Internals: Capability Ownership](/plugins/architecture#capability-ownership-model).
|
||||
|
||||
For video generation, prefer the mode-aware capability shape shown above:
|
||||
`generate`, `imageToVideo`, and `videoToVideo`. The older flat fields such
|
||||
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` still work
|
||||
as aggregate fallback caps, but they cannot describe per-mode limits or
|
||||
disabled transform modes as cleanly.
|
||||
`generate`, `imageToVideo`, and `videoToVideo`. Flat aggregate fields such
|
||||
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` are not
|
||||
enough to advertise transform-mode support or disabled modes cleanly.
|
||||
|
||||
Music-generation providers should follow the same pattern:
|
||||
`generate` for prompt-only generation and `edit` for reference-image-based
|
||||
generation. Flat aggregate fields such as `maxInputImages`,
|
||||
`supportsLyrics`, and `supportsFormat` are not enough to advertise edit
|
||||
support; explicit `generate` / `edit` blocks are the expected contract.
|
||||
|
||||
</Step>
|
||||
|
||||
|
||||
@@ -85,6 +85,17 @@ Example:
|
||||
| Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
|
||||
| MiniMax | `music-2.5+` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` |
|
||||
|
||||
### Declared capability matrix
|
||||
|
||||
This is the explicit mode contract used by `music_generate`, contract tests,
|
||||
and the shared live sweep.
|
||||
|
||||
| Provider | `generate` | `edit` | Edit limit | Shared live lanes |
|
||||
| -------- | ---------- | ------ | ---------- | ------------------------------------------------------------------------- |
|
||||
| ComfyUI | Yes | Yes | 1 image | Not in the shared sweep; covered by `extensions/comfy/comfy.live.test.ts` |
|
||||
| Google | Yes | Yes | 10 images | `generate`, `edit` |
|
||||
| MiniMax | Yes | No | None | `generate` |
|
||||
|
||||
Use `action: "list"` to inspect available shared providers and models at
|
||||
runtime:
|
||||
|
||||
@@ -174,6 +185,36 @@ error includes details from each attempt.
|
||||
- ComfyUI support is workflow-driven and depends on the configured graph plus
|
||||
node mapping for prompt/output fields.
|
||||
|
||||
## Provider capability modes
|
||||
|
||||
The shared music-generation contract now supports explicit mode declarations:
|
||||
|
||||
- `generate` for prompt-only generation
|
||||
- `edit` when the request includes one or more reference images
|
||||
|
||||
New provider implementations should prefer explicit mode blocks:
|
||||
|
||||
```typescript
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxTracks: 1,
|
||||
supportsLyrics: true,
|
||||
supportsFormat: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxTracks: 1,
|
||||
maxInputImages: 1,
|
||||
supportsFormat: true,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Legacy flat fields such as `maxInputImages`, `supportsLyrics`, and
|
||||
`supportsFormat` are not enough to advertise edit support. Providers should
|
||||
declare `generate` and `edit` explicitly so live tests, contract tests, and
|
||||
the shared `music_generate` tool can validate mode support deterministically.
|
||||
|
||||
## Choosing the right path
|
||||
|
||||
- Use the shared provider-backed path when you want model selection, provider failover, and the built-in async task/status flow.
|
||||
@@ -188,6 +229,16 @@ Opt-in live coverage for the shared bundled providers:
|
||||
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
|
||||
```
|
||||
|
||||
This live file loads missing provider env vars from `~/.profile`, prefers
|
||||
live/env API keys ahead of stored auth profiles by default, and runs both
|
||||
`generate` and declared `edit` coverage when the provider enables edit mode.
|
||||
|
||||
Today that means:
|
||||
|
||||
- `google`: `generate` plus `edit`
|
||||
- `minimax`: `generate` only
|
||||
- `comfy`: separate Comfy live coverage, not the shared provider sweep
|
||||
|
||||
Opt-in live coverage for the bundled ComfyUI music path:
|
||||
|
||||
```bash
|
||||
|
||||
@@ -79,6 +79,26 @@ Some providers accept additional or alternate API key env vars. See individual [
|
||||
Run `video_generate action=list` to inspect available providers, models, and
|
||||
runtime modes at runtime.
|
||||
|
||||
### Declared capability matrix
|
||||
|
||||
This is the explicit mode contract used by `video_generate`, contract tests,
|
||||
and the shared live sweep.
|
||||
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| -------- | ---------- | -------------- | -------------- | ---------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| ComfyUI | Yes | Yes | No | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| fal | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| Google | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
|
||||
| MiniMax | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| OpenAI | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
|
||||
| Qwen | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| Vydra | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| xAI | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
|
||||
## Tool parameters
|
||||
|
||||
### Required
|
||||
@@ -201,9 +221,34 @@ capabilities: {
|
||||
}
|
||||
```
|
||||
|
||||
Legacy flat fields such as `maxInputImages` and `maxInputVideos` still work as
|
||||
backward-compatible aggregate caps, but they cannot express per-mode limits as
|
||||
precisely.
|
||||
Flat aggregate fields such as `maxInputImages` and `maxInputVideos` are not
|
||||
enough to advertise transform-mode support. Providers should declare
|
||||
`generate`, `imageToVideo`, and `videoToVideo` explicitly so live tests,
|
||||
contract tests, and the shared `video_generate` tool can validate mode support
|
||||
deterministically.
|
||||
|
||||
## Live tests
|
||||
|
||||
Opt-in live coverage for the shared bundled providers:
|
||||
|
||||
```bash
|
||||
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts
|
||||
```
|
||||
|
||||
This live file loads missing provider env vars from `~/.profile`, prefers
|
||||
live/env API keys ahead of stored auth profiles by default, and runs the
|
||||
declared modes it can exercise safely with local media:
|
||||
|
||||
- `generate` for every provider in the sweep
|
||||
- `imageToVideo` when `capabilities.imageToVideo.enabled`
|
||||
- `videoToVideo` when `capabilities.videoToVideo.enabled` and the provider/model
|
||||
accepts buffer-backed local video input in the shared sweep
|
||||
|
||||
Today the shared `videoToVideo` live lane covers:
|
||||
|
||||
- `google`
|
||||
- `openai`
|
||||
- `runway` only when you select `runway/gen4_aleph`
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
@@ -198,15 +198,37 @@ export function buildAlibabaVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 4,
|
||||
maxDurationSeconds: 10,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: 10,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 10,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputVideos: 4,
|
||||
maxDurationSeconds: 10,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
},
|
||||
},
|
||||
async generateVideo(req): Promise<VideoGenerationResult> {
|
||||
const fetchFn = fetch;
|
||||
|
||||
@@ -135,14 +135,27 @@ export function buildBytePlusVideoGenerationProvider(): VideoGenerationProvider
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 0,
|
||||
maxDurationSeconds: 12,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: 12,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 12,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: false,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
if ((req.inputVideos?.length ?? 0) > 0) {
|
||||
|
||||
@@ -12,7 +12,7 @@ describe("comfy music-generation provider", () => {
|
||||
|
||||
expect(provider.defaultModel).toBe("workflow");
|
||||
expect(provider.models).toEqual(["workflow"]);
|
||||
expect(provider.capabilities.maxInputImages).toBe(1);
|
||||
expect(provider.capabilities.edit?.maxInputImages).toBe(1);
|
||||
});
|
||||
|
||||
it("runs a music workflow and returns audio outputs", async () => {
|
||||
|
||||
@@ -50,7 +50,11 @@ export function buildComfyMusicGenerationProvider(): MusicGenerationProvider {
|
||||
capability: "music",
|
||||
}),
|
||||
capabilities: {
|
||||
maxInputImages: COMFY_MAX_INPUT_IMAGES,
|
||||
generate: {},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxInputImages: COMFY_MAX_INPUT_IMAGES,
|
||||
},
|
||||
},
|
||||
async generateMusic(req) {
|
||||
if ((req.inputImages?.length ?? 0) > COMFY_MAX_INPUT_IMAGES) {
|
||||
|
||||
@@ -39,14 +39,27 @@ export function buildComfyVideoGenerationProvider(): VideoGenerationProvider {
|
||||
capability: "video",
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 0,
|
||||
supportsSize: false,
|
||||
supportsAspectRatio: false,
|
||||
supportsResolution: false,
|
||||
supportsAudio: false,
|
||||
supportsWatermark: false,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
supportsSize: false,
|
||||
supportsAspectRatio: false,
|
||||
supportsResolution: false,
|
||||
supportsAudio: false,
|
||||
supportsWatermark: false,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
supportsSize: false,
|
||||
supportsAspectRatio: false,
|
||||
supportsResolution: false,
|
||||
supportsAudio: false,
|
||||
supportsWatermark: false,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: false,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
if ((req.inputImages?.length ?? 0) > 1) {
|
||||
|
||||
@@ -251,12 +251,23 @@ export function buildFalVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 0,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: false,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
if ((req.inputVideos?.length ?? 0) > 0) {
|
||||
|
||||
@@ -102,14 +102,27 @@ export function buildGoogleMusicGenerationProvider(): MusicGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxTracks: 1,
|
||||
maxInputImages: GOOGLE_MAX_INPUT_IMAGES,
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsFormat: true,
|
||||
supportedFormatsByModel: {
|
||||
[DEFAULT_GOOGLE_MUSIC_MODEL]: ["mp3"],
|
||||
[GOOGLE_PRO_MUSIC_MODEL]: ["mp3", "wav"],
|
||||
generate: {
|
||||
maxTracks: 1,
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsFormat: true,
|
||||
supportedFormatsByModel: {
|
||||
[DEFAULT_GOOGLE_MUSIC_MODEL]: ["mp3"],
|
||||
[GOOGLE_PRO_MUSIC_MODEL]: ["mp3", "wav"],
|
||||
},
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxTracks: 1,
|
||||
maxInputImages: GOOGLE_MAX_INPUT_IMAGES,
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsFormat: true,
|
||||
supportedFormatsByModel: {
|
||||
[DEFAULT_GOOGLE_MUSIC_MODEL]: ["mp3"],
|
||||
[GOOGLE_PRO_MUSIC_MODEL]: ["mp3", "wav"],
|
||||
},
|
||||
},
|
||||
},
|
||||
async generateMusic(req) {
|
||||
|
||||
@@ -158,15 +158,37 @@ export function buildGoogleVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 1,
|
||||
maxDurationSeconds: GOOGLE_VIDEO_MAX_DURATION_SECONDS,
|
||||
supportedDurationSeconds: GOOGLE_VIDEO_ALLOWED_DURATION_SECONDS,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: true,
|
||||
supportsAudio: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: GOOGLE_VIDEO_MAX_DURATION_SECONDS,
|
||||
supportedDurationSeconds: GOOGLE_VIDEO_ALLOWED_DURATION_SECONDS,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: true,
|
||||
supportsAudio: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: GOOGLE_VIDEO_MAX_DURATION_SECONDS,
|
||||
supportedDurationSeconds: GOOGLE_VIDEO_ALLOWED_DURATION_SECONDS,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: true,
|
||||
supportsAudio: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputVideos: 1,
|
||||
maxDurationSeconds: GOOGLE_VIDEO_MAX_DURATION_SECONDS,
|
||||
supportedDurationSeconds: GOOGLE_VIDEO_ALLOWED_DURATION_SECONDS,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: true,
|
||||
supportsAudio: true,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
if ((req.inputImages?.length ?? 0) > 1) {
|
||||
|
||||
@@ -118,12 +118,17 @@ export function buildMinimaxMusicGenerationProvider(): MusicGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxTracks: 1,
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsDuration: true,
|
||||
supportsFormat: true,
|
||||
supportedFormats: ["mp3"],
|
||||
generate: {
|
||||
maxTracks: 1,
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsDuration: true,
|
||||
supportsFormat: true,
|
||||
supportedFormats: ["mp3"],
|
||||
},
|
||||
edit: {
|
||||
enabled: false,
|
||||
},
|
||||
},
|
||||
async generateMusic(req) {
|
||||
if ((req.inputImages?.length ?? 0) > 0) {
|
||||
|
||||
@@ -228,13 +228,25 @@ export function buildMinimaxVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 0,
|
||||
maxDurationSeconds: 10,
|
||||
supportedDurationSecondsByModel: MINIMAX_MODEL_ALLOWED_DURATIONS,
|
||||
supportsResolution: true,
|
||||
supportsWatermark: false,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: 10,
|
||||
supportedDurationSecondsByModel: MINIMAX_MODEL_ALLOWED_DURATIONS,
|
||||
supportsResolution: true,
|
||||
supportsWatermark: false,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 10,
|
||||
supportedDurationSecondsByModel: MINIMAX_MODEL_ALLOWED_DURATIONS,
|
||||
supportsResolution: true,
|
||||
supportsWatermark: false,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: false,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
if ((req.inputVideos?.length ?? 0) > 0) {
|
||||
|
||||
@@ -1,14 +1,21 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { resolveOpenClawAgentDir } from "../src/agents/agent-paths.js";
|
||||
import { collectProviderApiKeys } from "../src/agents/live-auth-keys.js";
|
||||
import { isLiveTestEnabled } from "../src/agents/live-test-helpers.js";
|
||||
import type { OpenClawConfig } from "../src/config/config.js";
|
||||
import { DEFAULT_LIVE_MUSIC_MODELS } from "../src/music-generation/live-test-helpers.js";
|
||||
import { parseMusicGenerationModelRef } from "../src/music-generation/model-ref.js";
|
||||
import { getProviderEnvVars } from "../src/secrets/provider-env-vars.js";
|
||||
import { isLiveProfileKeyModeEnabled, isLiveTestEnabled } from "../src/agents/live-test-helpers.js";
|
||||
import { resolveApiKeyForProvider } from "../src/agents/model-auth.js";
|
||||
import { loadConfig, type OpenClawConfig } from "../src/config/config.js";
|
||||
import { isTruthyEnvValue } from "../src/infra/env.js";
|
||||
import { getShellEnvAppliedKeys, loadShellEnvFallback } from "../src/infra/shell-env.js";
|
||||
import { encodePngRgba, fillPixel } from "../src/media/png-encode.js";
|
||||
import {
|
||||
DEFAULT_LIVE_MUSIC_MODELS,
|
||||
parseCsvFilter,
|
||||
parseProviderModelMap,
|
||||
} from "../src/video-generation/live-test-helpers.js";
|
||||
redactLiveApiKey,
|
||||
resolveConfiguredLiveMusicModels,
|
||||
resolveLiveMusicAuthStore,
|
||||
} from "../src/music-generation/live-test-helpers.js";
|
||||
import { getProviderEnvVars } from "../src/secrets/provider-env-vars.js";
|
||||
import {
|
||||
registerProviderPlugin,
|
||||
requireRegisteredProvider,
|
||||
@@ -17,6 +24,9 @@ import googlePlugin from "./google/index.js";
|
||||
import minimaxPlugin from "./minimax/index.js";
|
||||
|
||||
const LIVE = isLiveTestEnabled();
|
||||
const REQUIRE_PROFILE_KEYS =
|
||||
isLiveProfileKeyModeEnabled() || isTruthyEnvValue(process.env.OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS);
|
||||
const describeLive = LIVE ? describe : describe.skip;
|
||||
const providerFilter = parseCsvFilter(process.env.OPENCLAW_LIVE_MUSIC_GENERATION_PROVIDERS);
|
||||
const envModelMap = parseProviderModelMap(process.env.OPENCLAW_LIVE_MUSIC_GENERATION_MODELS);
|
||||
|
||||
@@ -44,29 +54,107 @@ const CASES: LiveProviderCase[] = [
|
||||
.filter((entry) => (providerFilter ? providerFilter.has(entry.providerId) : true))
|
||||
.toSorted((left, right) => left.providerId.localeCompare(right.providerId));
|
||||
|
||||
function asConfig(value: unknown): OpenClawConfig {
|
||||
return value as OpenClawConfig;
|
||||
function withPluginsEnabled(cfg: OpenClawConfig): OpenClawConfig {
|
||||
return {
|
||||
...cfg,
|
||||
plugins: {
|
||||
...cfg.plugins,
|
||||
enabled: true,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function createEditReferencePng(): Buffer {
|
||||
const width = 192;
|
||||
const height = 192;
|
||||
const buf = Buffer.alloc(width * height * 4, 255);
|
||||
|
||||
for (let y = 0; y < height; y += 1) {
|
||||
for (let x = 0; x < width; x += 1) {
|
||||
fillPixel(buf, x, y, width, 250, 246, 240, 255);
|
||||
}
|
||||
}
|
||||
|
||||
for (let y = 24; y < 168; y += 1) {
|
||||
for (let x = 24; x < 168; x += 1) {
|
||||
fillPixel(buf, x, y, width, 255, 143, 77, 255);
|
||||
}
|
||||
}
|
||||
|
||||
for (let y = 48; y < 144; y += 1) {
|
||||
for (let x = 48; x < 144; x += 1) {
|
||||
fillPixel(buf, x, y, width, 34, 40, 49, 255);
|
||||
}
|
||||
}
|
||||
|
||||
return encodePngRgba(buf, width, height);
|
||||
}
|
||||
|
||||
function resolveProviderModelForLiveTest(providerId: string, modelRef: string): string {
|
||||
const parsed = parseMusicGenerationModelRef(modelRef);
|
||||
if (parsed && parsed.provider === providerId) {
|
||||
return parsed.model;
|
||||
const slash = modelRef.indexOf("/");
|
||||
if (slash <= 0 || slash === modelRef.length - 1) {
|
||||
return modelRef;
|
||||
}
|
||||
return modelRef;
|
||||
return modelRef.slice(0, slash) === providerId ? modelRef.slice(slash + 1) : modelRef;
|
||||
}
|
||||
|
||||
describe.skipIf(!LIVE)("music generation provider live", () => {
|
||||
for (const testCase of CASES) {
|
||||
const modelRef =
|
||||
envModelMap.get(testCase.providerId) ?? DEFAULT_LIVE_MUSIC_MODELS[testCase.providerId];
|
||||
const hasAuth = collectProviderApiKeys(testCase.providerId).length > 0;
|
||||
const expectedEnvVars = getProviderEnvVars(testCase.providerId).join(", ");
|
||||
function maybeLoadShellEnvForMusicProviders(providerIds: string[]): void {
|
||||
const expectedKeys = [
|
||||
...new Set(providerIds.flatMap((providerId) => getProviderEnvVars(providerId))),
|
||||
];
|
||||
if (expectedKeys.length === 0) {
|
||||
return;
|
||||
}
|
||||
loadShellEnvFallback({
|
||||
enabled: true,
|
||||
env: process.env,
|
||||
expectedKeys,
|
||||
logger: { warn: (message: string) => console.warn(message) },
|
||||
});
|
||||
}
|
||||
|
||||
describeLive("music generation provider live", () => {
|
||||
it(
|
||||
"covers generate plus declared edit paths with shell/profile auth",
|
||||
async () => {
|
||||
const cfg = withPluginsEnabled(loadConfig());
|
||||
const configuredModels = resolveConfiguredLiveMusicModels(cfg);
|
||||
const agentDir = resolveOpenClawAgentDir();
|
||||
const attempted: string[] = [];
|
||||
const skipped: string[] = [];
|
||||
const failures: string[] = [];
|
||||
|
||||
maybeLoadShellEnvForMusicProviders(CASES.map((entry) => entry.providerId));
|
||||
|
||||
for (const testCase of CASES) {
|
||||
const modelRef =
|
||||
envModelMap.get(testCase.providerId) ??
|
||||
configuredModels.get(testCase.providerId) ??
|
||||
DEFAULT_LIVE_MUSIC_MODELS[testCase.providerId];
|
||||
if (!modelRef) {
|
||||
skipped.push(`${testCase.providerId}: no model configured`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const hasLiveKeys = collectProviderApiKeys(testCase.providerId).length > 0;
|
||||
const authStore = resolveLiveMusicAuthStore({
|
||||
requireProfileKeys: REQUIRE_PROFILE_KEYS,
|
||||
hasLiveKeys,
|
||||
});
|
||||
let authLabel = "unresolved";
|
||||
try {
|
||||
const auth = await resolveApiKeyForProvider({
|
||||
provider: testCase.providerId,
|
||||
cfg,
|
||||
agentDir,
|
||||
store: authStore,
|
||||
});
|
||||
authLabel = `${auth.source} ${redactLiveApiKey(auth.apiKey)}`;
|
||||
} catch {
|
||||
skipped.push(`${testCase.providerId}: no usable auth`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const liveIt = hasAuth && modelRef ? it : it.skip;
|
||||
liveIt(
|
||||
`generates a short track via ${testCase.providerId}`,
|
||||
async () => {
|
||||
const { musicProviders } = await registerProviderPlugin({
|
||||
plugin: testCase.plugin,
|
||||
id: testCase.pluginId,
|
||||
@@ -78,27 +166,78 @@ describe.skipIf(!LIVE)("music generation provider live", () => {
|
||||
"music provider",
|
||||
);
|
||||
const providerModel = resolveProviderModelForLiveTest(testCase.providerId, modelRef);
|
||||
const generateCaps = provider.capabilities.generate;
|
||||
|
||||
const result = await provider.generateMusic({
|
||||
provider: testCase.providerId,
|
||||
model: providerModel,
|
||||
prompt: "Upbeat instrumental synthwave with warm neon pads and a simple driving beat.",
|
||||
cfg: asConfig({ plugins: { enabled: true } }),
|
||||
agentDir: "/tmp/openclaw-live-music",
|
||||
instrumental: true,
|
||||
...(provider.capabilities.supportsDuration ? { durationSeconds: 12 } : {}),
|
||||
...(provider.capabilities.supportsFormat ? { format: "mp3" as const } : {}),
|
||||
});
|
||||
try {
|
||||
const result = await provider.generateMusic({
|
||||
provider: testCase.providerId,
|
||||
model: providerModel,
|
||||
prompt: "Upbeat instrumental synthwave with warm neon pads and a simple driving beat.",
|
||||
cfg,
|
||||
agentDir,
|
||||
authStore,
|
||||
...(generateCaps?.supportsDuration ? { durationSeconds: 12 } : {}),
|
||||
...(generateCaps?.supportsFormat ? { format: "mp3" as const } : {}),
|
||||
...(generateCaps?.supportsInstrumental ? { instrumental: true } : {}),
|
||||
});
|
||||
|
||||
expect(result.tracks.length).toBeGreaterThan(0);
|
||||
expect(result.tracks[0]?.mimeType.startsWith("audio/")).toBe(true);
|
||||
expect(result.tracks[0]?.buffer.byteLength).toBeGreaterThan(1024);
|
||||
},
|
||||
6 * 60_000,
|
||||
);
|
||||
expect(result.tracks.length).toBeGreaterThan(0);
|
||||
expect(result.tracks[0]?.mimeType.startsWith("audio/")).toBe(true);
|
||||
expect(result.tracks[0]?.buffer.byteLength).toBeGreaterThan(1024);
|
||||
attempted.push(`${testCase.providerId}:generate:${providerModel} (${authLabel})`);
|
||||
} catch (error) {
|
||||
failures.push(
|
||||
`${testCase.providerId}:generate (${authLabel}): ${
|
||||
error instanceof Error ? error.message : String(error)
|
||||
}`,
|
||||
);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (!hasAuth || !modelRef) {
|
||||
it.skip(`skips ${testCase.providerId} without live auth/model (${expectedEnvVars || "no env vars"})`, () => {});
|
||||
}
|
||||
}
|
||||
if (!provider.capabilities.edit?.enabled) {
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await provider.generateMusic({
|
||||
provider: testCase.providerId,
|
||||
model: providerModel,
|
||||
prompt: "Turn the reference cover art into a short dramatic trailer sting.",
|
||||
cfg,
|
||||
agentDir,
|
||||
authStore,
|
||||
inputImages: [
|
||||
{
|
||||
buffer: createEditReferencePng(),
|
||||
mimeType: "image/png",
|
||||
fileName: "reference.png",
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(result.tracks.length).toBeGreaterThan(0);
|
||||
expect(result.tracks[0]?.mimeType.startsWith("audio/")).toBe(true);
|
||||
expect(result.tracks[0]?.buffer.byteLength).toBeGreaterThan(1024);
|
||||
attempted.push(`${testCase.providerId}:edit:${providerModel} (${authLabel})`);
|
||||
} catch (error) {
|
||||
failures.push(
|
||||
`${testCase.providerId}:edit (${authLabel}): ${
|
||||
error instanceof Error ? error.message : String(error)
|
||||
}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(
|
||||
`[live:music-generation] attempted=${attempted.join(", ") || "none"} skipped=${skipped.join(", ") || "none"} failures=${failures.join(" | ") || "none"} shellEnv=${getShellEnvAppliedKeys().join(", ") || "none"}`,
|
||||
);
|
||||
|
||||
if (attempted.length === 0) {
|
||||
console.warn("[live:music-generation] no provider had usable auth; skipping assertions");
|
||||
return;
|
||||
}
|
||||
expect(failures).toEqual([]);
|
||||
},
|
||||
10 * 60_000,
|
||||
);
|
||||
});
|
||||
|
||||
@@ -190,12 +190,28 @@ export function buildOpenAIVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 1,
|
||||
maxDurationSeconds: 12,
|
||||
supportedDurationSeconds: OPENAI_VIDEO_SECONDS,
|
||||
supportsSize: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: 12,
|
||||
supportedDurationSeconds: OPENAI_VIDEO_SECONDS,
|
||||
supportsSize: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 12,
|
||||
supportedDurationSeconds: OPENAI_VIDEO_SECONDS,
|
||||
supportsSize: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputVideos: 1,
|
||||
maxDurationSeconds: 12,
|
||||
supportedDurationSeconds: OPENAI_VIDEO_SECONDS,
|
||||
supportsSize: true,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
const auth = await resolveApiKeyForProvider({
|
||||
|
||||
@@ -226,15 +226,37 @@ export function buildQwenVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 4,
|
||||
maxDurationSeconds: 10,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: 10,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 10,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputVideos: 4,
|
||||
maxDurationSeconds: 10,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: true,
|
||||
},
|
||||
},
|
||||
async generateVideo(req): Promise<VideoGenerationResult> {
|
||||
const fetchFn = fetch;
|
||||
|
||||
@@ -261,11 +261,24 @@ export function buildRunwayVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 1,
|
||||
maxDurationSeconds: MAX_DURATION_SECONDS,
|
||||
supportsAspectRatio: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: MAX_DURATION_SECONDS,
|
||||
supportsAspectRatio: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: MAX_DURATION_SECONDS,
|
||||
supportsAspectRatio: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputVideos: 1,
|
||||
supportsAspectRatio: true,
|
||||
},
|
||||
},
|
||||
async generateVideo(req): Promise<VideoGenerationResult> {
|
||||
const auth = await resolveApiKeyForProvider({
|
||||
|
||||
@@ -126,11 +126,21 @@ export function buildTogetherVideoGenerationProvider(): VideoGenerationProvider
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 0,
|
||||
maxDurationSeconds: 12,
|
||||
supportsSize: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: 12,
|
||||
supportsSize: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 12,
|
||||
supportsSize: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: false,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
if ((req.inputVideos?.length ?? 0) > 0) {
|
||||
|
||||
@@ -126,7 +126,9 @@ describe("video-generation runtime", () => {
|
||||
defaultModel: "vid-v1",
|
||||
models: ["vid-v1"],
|
||||
capabilities: {
|
||||
supportsAudio: true,
|
||||
generate: {
|
||||
supportsAudio: true,
|
||||
},
|
||||
},
|
||||
generateVideo: async () => ({
|
||||
videos: [{ buffer: Buffer.from("mp4-bytes"), mimeType: "video/mp4" }],
|
||||
@@ -177,7 +179,9 @@ describe("video-generation runtime", () => {
|
||||
mocks.getVideoGenerationProvider.mockReturnValue({
|
||||
id: "openai",
|
||||
capabilities: {
|
||||
supportsSize: true,
|
||||
generate: {
|
||||
supportsSize: true,
|
||||
},
|
||||
},
|
||||
generateVideo: async (req) => {
|
||||
seenRequest = {
|
||||
|
||||
@@ -53,30 +53,24 @@ function resolveVideoGenerationModeCapabilities(params: {
|
||||
if (mode === "generate") {
|
||||
return {
|
||||
mode,
|
||||
capabilities: capabilities.generate ?? capabilities,
|
||||
capabilities: capabilities.generate,
|
||||
};
|
||||
}
|
||||
if (mode === "imageToVideo") {
|
||||
return {
|
||||
mode,
|
||||
capabilities: capabilities.imageToVideo ?? {
|
||||
...capabilities,
|
||||
enabled: (capabilities.maxInputImages ?? 0) > 0,
|
||||
},
|
||||
capabilities: capabilities.imageToVideo,
|
||||
};
|
||||
}
|
||||
if (mode === "videoToVideo") {
|
||||
return {
|
||||
mode,
|
||||
capabilities: capabilities.videoToVideo ?? {
|
||||
...capabilities,
|
||||
enabled: (capabilities.maxInputVideos ?? 0) > 0,
|
||||
},
|
||||
capabilities: capabilities.videoToVideo,
|
||||
};
|
||||
}
|
||||
return {
|
||||
mode,
|
||||
capabilities,
|
||||
capabilities: undefined,
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
@@ -1,12 +1,21 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { resolveOpenClawAgentDir } from "../src/agents/agent-paths.js";
|
||||
import { collectProviderApiKeys } from "../src/agents/live-auth-keys.js";
|
||||
import { isLiveTestEnabled } from "../src/agents/live-test-helpers.js";
|
||||
import type { OpenClawConfig } from "../src/config/config.js";
|
||||
import { isLiveProfileKeyModeEnabled, isLiveTestEnabled } from "../src/agents/live-test-helpers.js";
|
||||
import { resolveApiKeyForProvider } from "../src/agents/model-auth.js";
|
||||
import { loadConfig, type OpenClawConfig } from "../src/config/config.js";
|
||||
import { isTruthyEnvValue } from "../src/infra/env.js";
|
||||
import { getShellEnvAppliedKeys, loadShellEnvFallback } from "../src/infra/shell-env.js";
|
||||
import { encodePngRgba, fillPixel } from "../src/media/png-encode.js";
|
||||
import { getProviderEnvVars } from "../src/secrets/provider-env-vars.js";
|
||||
import {
|
||||
canRunBufferBackedVideoToVideoLiveLane,
|
||||
DEFAULT_LIVE_VIDEO_MODELS,
|
||||
parseCsvFilter,
|
||||
parseProviderModelMap,
|
||||
redactLiveApiKey,
|
||||
resolveConfiguredLiveVideoModels,
|
||||
resolveLiveVideoAuthStore,
|
||||
} from "../src/video-generation/live-test-helpers.js";
|
||||
import { parseVideoGenerationModelRef } from "../src/video-generation/model-ref.js";
|
||||
import {
|
||||
@@ -26,6 +35,9 @@ import vydraPlugin from "./vydra/index.js";
|
||||
import xaiPlugin from "./xai/index.js";
|
||||
|
||||
const LIVE = isLiveTestEnabled();
|
||||
const REQUIRE_PROFILE_KEYS =
|
||||
isLiveProfileKeyModeEnabled() || isTruthyEnvValue(process.env.OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS);
|
||||
const describeLive = LIVE ? describe : describe.skip;
|
||||
const providerFilter = parseCsvFilter(process.env.OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS);
|
||||
const envModelMap = parseProviderModelMap(process.env.OPENCLAW_LIVE_VIDEO_GENERATION_MODELS);
|
||||
|
||||
@@ -72,8 +84,40 @@ const CASES: LiveProviderCase[] = [
|
||||
.filter((entry) => (providerFilter ? providerFilter.has(entry.providerId) : true))
|
||||
.toSorted((left, right) => left.providerId.localeCompare(right.providerId));
|
||||
|
||||
function asConfig(value: unknown): OpenClawConfig {
|
||||
return value as OpenClawConfig;
|
||||
function withPluginsEnabled(cfg: OpenClawConfig): OpenClawConfig {
|
||||
return {
|
||||
...cfg,
|
||||
plugins: {
|
||||
...cfg.plugins,
|
||||
enabled: true,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function createEditReferencePng(): Buffer {
|
||||
const width = 192;
|
||||
const height = 192;
|
||||
const buf = Buffer.alloc(width * height * 4, 255);
|
||||
|
||||
for (let y = 0; y < height; y += 1) {
|
||||
for (let x = 0; x < width; x += 1) {
|
||||
fillPixel(buf, x, y, width, 238, 247, 255, 255);
|
||||
}
|
||||
}
|
||||
|
||||
for (let y = 24; y < 168; y += 1) {
|
||||
for (let x = 24; x < 168; x += 1) {
|
||||
fillPixel(buf, x, y, width, 76, 154, 255, 255);
|
||||
}
|
||||
}
|
||||
|
||||
for (let y = 48; y < 144; y += 1) {
|
||||
for (let x = 48; x < 144; x += 1) {
|
||||
fillPixel(buf, x, y, width, 255, 255, 255, 255);
|
||||
}
|
||||
}
|
||||
|
||||
return encodePngRgba(buf, width, height);
|
||||
}
|
||||
|
||||
function resolveProviderModelForLiveTest(providerId: string, modelRef: string): string {
|
||||
@@ -84,17 +128,63 @@ function resolveProviderModelForLiveTest(providerId: string, modelRef: string):
|
||||
return modelRef;
|
||||
}
|
||||
|
||||
describe.skipIf(!LIVE)("video generation provider live", () => {
|
||||
for (const testCase of CASES) {
|
||||
const modelRef =
|
||||
envModelMap.get(testCase.providerId) ?? DEFAULT_LIVE_VIDEO_MODELS[testCase.providerId];
|
||||
const hasAuth = collectProviderApiKeys(testCase.providerId).length > 0;
|
||||
const expectedEnvVars = getProviderEnvVars(testCase.providerId).join(", ");
|
||||
function maybeLoadShellEnvForVideoProviders(providerIds: string[]): void {
|
||||
const expectedKeys = [
|
||||
...new Set(providerIds.flatMap((providerId) => getProviderEnvVars(providerId))),
|
||||
];
|
||||
if (expectedKeys.length === 0) {
|
||||
return;
|
||||
}
|
||||
loadShellEnvFallback({
|
||||
enabled: true,
|
||||
env: process.env,
|
||||
expectedKeys,
|
||||
logger: { warn: (message: string) => console.warn(message) },
|
||||
});
|
||||
}
|
||||
|
||||
describeLive("video generation provider live", () => {
|
||||
it(
|
||||
"covers declared video-generation modes with shell/profile auth",
|
||||
async () => {
|
||||
const cfg = withPluginsEnabled(loadConfig());
|
||||
const configuredModels = resolveConfiguredLiveVideoModels(cfg);
|
||||
const agentDir = resolveOpenClawAgentDir();
|
||||
const attempted: string[] = [];
|
||||
const skipped: string[] = [];
|
||||
const failures: string[] = [];
|
||||
|
||||
maybeLoadShellEnvForVideoProviders(CASES.map((entry) => entry.providerId));
|
||||
|
||||
for (const testCase of CASES) {
|
||||
const modelRef =
|
||||
envModelMap.get(testCase.providerId) ??
|
||||
configuredModels.get(testCase.providerId) ??
|
||||
DEFAULT_LIVE_VIDEO_MODELS[testCase.providerId];
|
||||
if (!modelRef) {
|
||||
skipped.push(`${testCase.providerId}: no model configured`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const hasLiveKeys = collectProviderApiKeys(testCase.providerId).length > 0;
|
||||
const authStore = resolveLiveVideoAuthStore({
|
||||
requireProfileKeys: REQUIRE_PROFILE_KEYS,
|
||||
hasLiveKeys,
|
||||
});
|
||||
let authLabel = "unresolved";
|
||||
try {
|
||||
const auth = await resolveApiKeyForProvider({
|
||||
provider: testCase.providerId,
|
||||
cfg,
|
||||
agentDir,
|
||||
store: authStore,
|
||||
});
|
||||
authLabel = `${auth.source} ${redactLiveApiKey(auth.apiKey)}`;
|
||||
} catch {
|
||||
skipped.push(`${testCase.providerId}: no usable auth`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const liveIt = hasAuth && modelRef ? it : it.skip;
|
||||
liveIt(
|
||||
`generates a short video via ${testCase.providerId}`,
|
||||
async () => {
|
||||
const { videoProviders } = await registerProviderPlugin({
|
||||
plugin: testCase.plugin,
|
||||
id: testCase.pluginId,
|
||||
@@ -105,32 +195,144 @@ describe.skipIf(!LIVE)("video generation provider live", () => {
|
||||
testCase.providerId,
|
||||
"video provider",
|
||||
);
|
||||
const durationSeconds = Math.min(provider.capabilities.maxDurationSeconds ?? 3, 3);
|
||||
const providerModel = resolveProviderModelForLiveTest(testCase.providerId, modelRef);
|
||||
const generateCaps = provider.capabilities.generate;
|
||||
const imageToVideoCaps = provider.capabilities.imageToVideo;
|
||||
const videoToVideoCaps = provider.capabilities.videoToVideo;
|
||||
const durationSeconds = Math.min(generateCaps?.maxDurationSeconds ?? 3, 3);
|
||||
let generatedVideo = null as {
|
||||
buffer: Buffer;
|
||||
mimeType: string;
|
||||
fileName?: string;
|
||||
} | null;
|
||||
|
||||
const result = await provider.generateVideo({
|
||||
provider: testCase.providerId,
|
||||
model: providerModel,
|
||||
prompt:
|
||||
"A tiny paper diorama city at sunrise with slow cinematic camera motion and no text.",
|
||||
cfg: asConfig({ plugins: { enabled: true } }),
|
||||
agentDir: "/tmp/openclaw-live-video",
|
||||
durationSeconds,
|
||||
...(provider.capabilities.supportsAspectRatio ? { aspectRatio: "16:9" } : {}),
|
||||
...(provider.capabilities.supportsResolution ? { resolution: "480P" as const } : {}),
|
||||
...(provider.capabilities.supportsAudio ? { audio: false } : {}),
|
||||
...(provider.capabilities.supportsWatermark ? { watermark: false } : {}),
|
||||
});
|
||||
try {
|
||||
const result = await provider.generateVideo({
|
||||
provider: testCase.providerId,
|
||||
model: providerModel,
|
||||
prompt:
|
||||
"A tiny paper diorama city at sunrise with slow cinematic camera motion and no text.",
|
||||
cfg,
|
||||
agentDir,
|
||||
authStore,
|
||||
durationSeconds,
|
||||
...(generateCaps?.supportsAspectRatio ? { aspectRatio: "16:9" } : {}),
|
||||
...(generateCaps?.supportsResolution ? { resolution: "480P" as const } : {}),
|
||||
...(generateCaps?.supportsAudio ? { audio: false } : {}),
|
||||
...(generateCaps?.supportsWatermark ? { watermark: false } : {}),
|
||||
});
|
||||
|
||||
expect(result.videos.length).toBeGreaterThan(0);
|
||||
expect(result.videos[0]?.mimeType.startsWith("video/")).toBe(true);
|
||||
expect(result.videos[0]?.buffer.byteLength).toBeGreaterThan(1024);
|
||||
},
|
||||
8 * 60_000,
|
||||
);
|
||||
expect(result.videos.length).toBeGreaterThan(0);
|
||||
expect(result.videos[0]?.mimeType.startsWith("video/")).toBe(true);
|
||||
expect(result.videos[0]?.buffer.byteLength).toBeGreaterThan(1024);
|
||||
generatedVideo = result.videos[0] ?? null;
|
||||
attempted.push(`${testCase.providerId}:generate:${providerModel} (${authLabel})`);
|
||||
} catch (error) {
|
||||
failures.push(
|
||||
`${testCase.providerId}:generate (${authLabel}): ${
|
||||
error instanceof Error ? error.message : String(error)
|
||||
}`,
|
||||
);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (!hasAuth || !modelRef) {
|
||||
it.skip(`skips ${testCase.providerId} without live auth/model (${expectedEnvVars || "no env vars"})`, () => {});
|
||||
}
|
||||
}
|
||||
if (!imageToVideoCaps?.enabled) {
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await provider.generateVideo({
|
||||
provider: testCase.providerId,
|
||||
model: providerModel,
|
||||
prompt:
|
||||
"Animate the reference art with subtle parallax motion and drifting camera movement.",
|
||||
cfg,
|
||||
agentDir,
|
||||
authStore,
|
||||
durationSeconds,
|
||||
inputImages: [
|
||||
{
|
||||
buffer: createEditReferencePng(),
|
||||
mimeType: "image/png",
|
||||
fileName: "reference.png",
|
||||
},
|
||||
],
|
||||
...(imageToVideoCaps.supportsAspectRatio ? { aspectRatio: "16:9" } : {}),
|
||||
...(imageToVideoCaps.supportsResolution ? { resolution: "480P" as const } : {}),
|
||||
...(imageToVideoCaps.supportsAudio ? { audio: false } : {}),
|
||||
...(imageToVideoCaps.supportsWatermark ? { watermark: false } : {}),
|
||||
});
|
||||
|
||||
expect(result.videos.length).toBeGreaterThan(0);
|
||||
expect(result.videos[0]?.mimeType.startsWith("video/")).toBe(true);
|
||||
expect(result.videos[0]?.buffer.byteLength).toBeGreaterThan(1024);
|
||||
attempted.push(`${testCase.providerId}:imageToVideo:${providerModel} (${authLabel})`);
|
||||
} catch (error) {
|
||||
failures.push(
|
||||
`${testCase.providerId}:imageToVideo (${authLabel}): ${
|
||||
error instanceof Error ? error.message : String(error)
|
||||
}`,
|
||||
);
|
||||
}
|
||||
|
||||
if (!videoToVideoCaps?.enabled) {
|
||||
continue;
|
||||
}
|
||||
if (
|
||||
!canRunBufferBackedVideoToVideoLiveLane({
|
||||
providerId: testCase.providerId,
|
||||
modelRef,
|
||||
})
|
||||
) {
|
||||
skipped.push(
|
||||
`${testCase.providerId}:videoToVideo requires remote URL or model-specific input`,
|
||||
);
|
||||
continue;
|
||||
}
|
||||
if (!generatedVideo?.buffer) {
|
||||
skipped.push(`${testCase.providerId}:videoToVideo missing generated seed video`);
|
||||
continue;
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await provider.generateVideo({
|
||||
provider: testCase.providerId,
|
||||
model: providerModel,
|
||||
prompt: "Rework the reference clip into a brighter, steadier cinematic continuation.",
|
||||
cfg,
|
||||
agentDir,
|
||||
authStore,
|
||||
durationSeconds: Math.min(videoToVideoCaps.maxDurationSeconds ?? durationSeconds, 3),
|
||||
inputVideos: [generatedVideo],
|
||||
...(videoToVideoCaps.supportsAspectRatio ? { aspectRatio: "16:9" } : {}),
|
||||
...(videoToVideoCaps.supportsResolution ? { resolution: "480P" as const } : {}),
|
||||
...(videoToVideoCaps.supportsAudio ? { audio: false } : {}),
|
||||
...(videoToVideoCaps.supportsWatermark ? { watermark: false } : {}),
|
||||
});
|
||||
|
||||
expect(result.videos.length).toBeGreaterThan(0);
|
||||
expect(result.videos[0]?.mimeType.startsWith("video/")).toBe(true);
|
||||
expect(result.videos[0]?.buffer.byteLength).toBeGreaterThan(1024);
|
||||
attempted.push(`${testCase.providerId}:videoToVideo:${providerModel} (${authLabel})`);
|
||||
} catch (error) {
|
||||
failures.push(
|
||||
`${testCase.providerId}:videoToVideo (${authLabel}): ${
|
||||
error instanceof Error ? error.message : String(error)
|
||||
}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(
|
||||
`[live:video-generation] attempted=${attempted.join(", ") || "none"} skipped=${skipped.join(", ") || "none"} failures=${failures.join(" | ") || "none"} shellEnv=${getShellEnvAppliedKeys().join(", ") || "none"}`,
|
||||
);
|
||||
|
||||
if (attempted.length === 0) {
|
||||
console.warn("[live:video-generation] no provider had usable auth; skipping assertions");
|
||||
return;
|
||||
}
|
||||
expect(failures).toEqual([]);
|
||||
},
|
||||
15 * 60_000,
|
||||
);
|
||||
});
|
||||
|
||||
@@ -63,9 +63,17 @@ export function buildVydraVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 0,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: false,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
if ((req.inputVideos?.length ?? 0) > 0) {
|
||||
|
||||
@@ -254,12 +254,28 @@ export function buildXaiVideoGenerationProvider(): VideoGenerationProvider {
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 1,
|
||||
maxDurationSeconds: 15,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
generate: {
|
||||
maxVideos: 1,
|
||||
maxDurationSeconds: 15,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 15,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
videoToVideo: {
|
||||
enabled: true,
|
||||
maxVideos: 1,
|
||||
maxInputVideos: 1,
|
||||
maxDurationSeconds: 15,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
},
|
||||
async generateVideo(req) {
|
||||
const auth = await resolveApiKeyForProvider({
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import type { OpenClawConfig } from "../../config/config.js";
|
||||
import { listSupportedMusicGenerationModes } from "../../music-generation/capabilities.js";
|
||||
import { listRuntimeMusicGenerationProviders } from "../../music-generation/runtime.js";
|
||||
import { getProviderEnvVars } from "../../secrets/provider-env-vars.js";
|
||||
import {
|
||||
@@ -16,6 +17,35 @@ function getMusicGenerationProviderAuthEnvVars(providerId: string): string[] {
|
||||
return getProviderEnvVars(providerId);
|
||||
}
|
||||
|
||||
function summarizeMusicGenerationCapabilities(
|
||||
provider: ReturnType<typeof listRuntimeMusicGenerationProviders>[number],
|
||||
): string {
|
||||
const supportedModes = listSupportedMusicGenerationModes(provider);
|
||||
const generate = provider.capabilities.generate;
|
||||
const edit = provider.capabilities.edit;
|
||||
const capabilities = [
|
||||
supportedModes.length > 0 ? `modes=${supportedModes.join("/")}` : null,
|
||||
generate?.maxTracks ? `maxTracks=${generate.maxTracks}` : null,
|
||||
edit?.maxInputImages ? `maxInputImages=${edit.maxInputImages}` : null,
|
||||
generate?.maxDurationSeconds ? `maxDurationSeconds=${generate.maxDurationSeconds}` : null,
|
||||
generate?.supportsLyrics ? "lyrics" : null,
|
||||
generate?.supportsInstrumental ? "instrumental" : null,
|
||||
generate?.supportsDuration ? "duration" : null,
|
||||
generate?.supportsFormat ? "format" : null,
|
||||
generate?.supportedFormats?.length
|
||||
? `supportedFormats=${generate.supportedFormats.join("/")}`
|
||||
: null,
|
||||
generate?.supportedFormatsByModel && Object.keys(generate.supportedFormatsByModel).length > 0
|
||||
? `supportedFormatsByModel=${Object.entries(generate.supportedFormatsByModel)
|
||||
.map(([modelId, formats]) => `${modelId}:${formats.join("/")}`)
|
||||
.join("; ")}`
|
||||
: null,
|
||||
]
|
||||
.filter((entry): entry is string => Boolean(entry))
|
||||
.join(", ");
|
||||
return capabilities;
|
||||
}
|
||||
|
||||
export function createMusicGenerateListActionResult(
|
||||
config?: OpenClawConfig,
|
||||
): MusicGenerateActionResult {
|
||||
@@ -28,30 +58,7 @@ export function createMusicGenerateListActionResult(
|
||||
}
|
||||
const lines = providers.map((provider) => {
|
||||
const authHints = getMusicGenerationProviderAuthEnvVars(provider.id);
|
||||
const capabilities = [
|
||||
provider.capabilities.maxTracks ? `maxTracks=${provider.capabilities.maxTracks}` : null,
|
||||
provider.capabilities.maxInputImages
|
||||
? `maxInputImages=${provider.capabilities.maxInputImages}`
|
||||
: null,
|
||||
provider.capabilities.maxDurationSeconds
|
||||
? `maxDurationSeconds=${provider.capabilities.maxDurationSeconds}`
|
||||
: null,
|
||||
provider.capabilities.supportsLyrics ? "lyrics" : null,
|
||||
provider.capabilities.supportsInstrumental ? "instrumental" : null,
|
||||
provider.capabilities.supportsDuration ? "duration" : null,
|
||||
provider.capabilities.supportsFormat ? "format" : null,
|
||||
provider.capabilities.supportedFormats?.length
|
||||
? `supportedFormats=${provider.capabilities.supportedFormats.join("/")}`
|
||||
: null,
|
||||
provider.capabilities.supportedFormatsByModel &&
|
||||
Object.keys(provider.capabilities.supportedFormatsByModel).length > 0
|
||||
? `supportedFormatsByModel=${Object.entries(provider.capabilities.supportedFormatsByModel)
|
||||
.map(([modelId, formats]) => `${modelId}:${formats.join("/")}`)
|
||||
.join("; ")}`
|
||||
: null,
|
||||
]
|
||||
.filter((entry): entry is string => Boolean(entry))
|
||||
.join(", ");
|
||||
const capabilities = summarizeMusicGenerationCapabilities(provider);
|
||||
return [
|
||||
`${provider.id}: default=${provider.defaultModel ?? "none"}`,
|
||||
provider.models?.length ? `models=${provider.models.join(", ")}` : null,
|
||||
@@ -68,6 +75,7 @@ export function createMusicGenerateListActionResult(
|
||||
id: provider.id,
|
||||
defaultModel: provider.defaultModel,
|
||||
models: provider.models ?? [],
|
||||
modes: listSupportedMusicGenerationModes(provider),
|
||||
authEnvVars: getMusicGenerationProviderAuthEnvVars(provider.id),
|
||||
capabilities: provider.capabilities,
|
||||
})),
|
||||
|
||||
@@ -241,12 +241,14 @@ describe("createMusicGenerateTool", () => {
|
||||
defaultModel: "music-2.5+",
|
||||
models: ["music-2.5+"],
|
||||
capabilities: {
|
||||
maxTracks: 1,
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsDuration: true,
|
||||
supportsFormat: true,
|
||||
supportedFormats: ["mp3"],
|
||||
generate: {
|
||||
maxTracks: 1,
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsDuration: true,
|
||||
supportsFormat: true,
|
||||
supportedFormats: ["mp3"],
|
||||
},
|
||||
},
|
||||
generateMusic: vi.fn(async () => {
|
||||
throw new Error("not used");
|
||||
@@ -280,11 +282,13 @@ describe("createMusicGenerateTool", () => {
|
||||
defaultModel: "lyria-3-clip-preview",
|
||||
models: ["lyria-3-clip-preview"],
|
||||
capabilities: {
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsFormat: true,
|
||||
supportedFormatsByModel: {
|
||||
"lyria-3-clip-preview": ["mp3"],
|
||||
generate: {
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsFormat: true,
|
||||
supportedFormatsByModel: {
|
||||
"lyria-3-clip-preview": ["mp3"],
|
||||
},
|
||||
},
|
||||
},
|
||||
generateMusic: vi.fn(async () => {
|
||||
|
||||
@@ -4,6 +4,7 @@ import { loadConfig } from "../../config/config.js";
|
||||
import { createSubsystemLogger } from "../../logging/subsystem.js";
|
||||
import { saveMediaBuffer } from "../../media/store.js";
|
||||
import { loadWebMedia } from "../../media/web-media.js";
|
||||
import { resolveMusicGenerationModeCapabilities } from "../../music-generation/capabilities.js";
|
||||
import { parseMusicGenerationModelRef } from "../../music-generation/model-ref.js";
|
||||
import {
|
||||
generateMusic,
|
||||
@@ -213,15 +214,28 @@ function validateMusicGenerationCapabilities(params: {
|
||||
if (!provider) {
|
||||
return;
|
||||
}
|
||||
const caps = provider.capabilities;
|
||||
const { capabilities: caps } = resolveMusicGenerationModeCapabilities({
|
||||
provider,
|
||||
inputImageCount: params.inputImageCount,
|
||||
});
|
||||
if (params.inputImageCount > 0) {
|
||||
const maxInputImages = caps.maxInputImages ?? MAX_INPUT_IMAGES;
|
||||
if (!caps) {
|
||||
throw new ToolInputError(`${provider.id} does not support reference-image edit inputs.`);
|
||||
}
|
||||
if ("enabled" in caps && !caps.enabled) {
|
||||
throw new ToolInputError(`${provider.id} does not support reference-image edit inputs.`);
|
||||
}
|
||||
const maxInputImages =
|
||||
("maxInputImages" in caps ? caps.maxInputImages : undefined) ?? MAX_INPUT_IMAGES;
|
||||
if (params.inputImageCount > maxInputImages) {
|
||||
throw new ToolInputError(
|
||||
`${provider.id} supports at most ${maxInputImages} reference image${maxInputImages === 1 ? "" : "s"}.`,
|
||||
);
|
||||
}
|
||||
}
|
||||
if (!caps) {
|
||||
return;
|
||||
}
|
||||
if (
|
||||
typeof params.durationSeconds === "number" &&
|
||||
caps.supportsDuration &&
|
||||
|
||||
@@ -17,6 +17,39 @@ function getVideoGenerationProviderAuthEnvVars(providerId: string): string[] {
|
||||
return getProviderEnvVars(providerId);
|
||||
}
|
||||
|
||||
function summarizeVideoGenerationCapabilities(
|
||||
provider: ReturnType<typeof listRuntimeVideoGenerationProviders>[number],
|
||||
): string {
|
||||
const supportedModes = listSupportedVideoGenerationModes(provider);
|
||||
const generate = provider.capabilities.generate;
|
||||
const imageToVideo = provider.capabilities.imageToVideo;
|
||||
const videoToVideo = provider.capabilities.videoToVideo;
|
||||
const capabilities = [
|
||||
supportedModes.length > 0 ? `modes=${supportedModes.join("/")}` : null,
|
||||
generate?.maxVideos ? `maxVideos=${generate.maxVideos}` : null,
|
||||
imageToVideo?.maxInputImages ? `maxInputImages=${imageToVideo.maxInputImages}` : null,
|
||||
videoToVideo?.maxInputVideos ? `maxInputVideos=${videoToVideo.maxInputVideos}` : null,
|
||||
generate?.maxDurationSeconds ? `maxDurationSeconds=${generate.maxDurationSeconds}` : null,
|
||||
generate?.supportedDurationSeconds?.length
|
||||
? `supportedDurationSeconds=${generate.supportedDurationSeconds.join("/")}`
|
||||
: null,
|
||||
generate?.supportedDurationSecondsByModel &&
|
||||
Object.keys(generate.supportedDurationSecondsByModel).length > 0
|
||||
? `supportedDurationSecondsByModel=${Object.entries(generate.supportedDurationSecondsByModel)
|
||||
.map(([modelId, durations]) => `${modelId}:${durations.join("/")}`)
|
||||
.join("; ")}`
|
||||
: null,
|
||||
generate?.supportsResolution ? "resolution" : null,
|
||||
generate?.supportsAspectRatio ? "aspectRatio" : null,
|
||||
generate?.supportsSize ? "size" : null,
|
||||
generate?.supportsAudio ? "audio" : null,
|
||||
generate?.supportsWatermark ? "watermark" : null,
|
||||
]
|
||||
.filter((entry): entry is string => Boolean(entry))
|
||||
.join(", ");
|
||||
return capabilities;
|
||||
}
|
||||
|
||||
export function createVideoGenerateListActionResult(
|
||||
config?: OpenClawConfig,
|
||||
): VideoGenerateActionResult {
|
||||
@@ -29,38 +62,7 @@ export function createVideoGenerateListActionResult(
|
||||
}
|
||||
const lines = providers.map((provider) => {
|
||||
const authHints = getVideoGenerationProviderAuthEnvVars(provider.id);
|
||||
const supportedModes = listSupportedVideoGenerationModes(provider);
|
||||
const capabilities = [
|
||||
supportedModes.length > 0 ? `modes=${supportedModes.join("/")}` : null,
|
||||
provider.capabilities.maxVideos ? `maxVideos=${provider.capabilities.maxVideos}` : null,
|
||||
provider.capabilities.maxInputImages
|
||||
? `maxInputImages=${provider.capabilities.maxInputImages}`
|
||||
: null,
|
||||
provider.capabilities.maxInputVideos
|
||||
? `maxInputVideos=${provider.capabilities.maxInputVideos}`
|
||||
: null,
|
||||
provider.capabilities.maxDurationSeconds
|
||||
? `maxDurationSeconds=${provider.capabilities.maxDurationSeconds}`
|
||||
: null,
|
||||
provider.capabilities.supportedDurationSeconds?.length
|
||||
? `supportedDurationSeconds=${provider.capabilities.supportedDurationSeconds.join("/")}`
|
||||
: null,
|
||||
provider.capabilities.supportedDurationSecondsByModel &&
|
||||
Object.keys(provider.capabilities.supportedDurationSecondsByModel).length > 0
|
||||
? `supportedDurationSecondsByModel=${Object.entries(
|
||||
provider.capabilities.supportedDurationSecondsByModel,
|
||||
)
|
||||
.map(([modelId, durations]) => `${modelId}:${durations.join("/")}`)
|
||||
.join("; ")}`
|
||||
: null,
|
||||
provider.capabilities.supportsResolution ? "resolution" : null,
|
||||
provider.capabilities.supportsAspectRatio ? "aspectRatio" : null,
|
||||
provider.capabilities.supportsSize ? "size" : null,
|
||||
provider.capabilities.supportsAudio ? "audio" : null,
|
||||
provider.capabilities.supportsWatermark ? "watermark" : null,
|
||||
]
|
||||
.filter((entry): entry is string => Boolean(entry))
|
||||
.join(", ");
|
||||
const capabilities = summarizeVideoGenerationCapabilities(provider);
|
||||
return [
|
||||
`${provider.id}: default=${provider.defaultModel ?? "none"}`,
|
||||
provider.models?.length ? `models=${provider.models.join(", ")}` : null,
|
||||
|
||||
@@ -305,9 +305,16 @@ describe("createVideoGenerateTool", () => {
|
||||
defaultModel: "veo-3.1-fast-generate-preview",
|
||||
models: ["veo-3.1-fast-generate-preview"],
|
||||
capabilities: {
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 8,
|
||||
supportedDurationSeconds: [4, 6, 8],
|
||||
generate: {
|
||||
maxDurationSeconds: 8,
|
||||
supportedDurationSeconds: [4, 6, 8],
|
||||
},
|
||||
imageToVideo: {
|
||||
enabled: true,
|
||||
maxInputImages: 1,
|
||||
maxDurationSeconds: 8,
|
||||
supportedDurationSeconds: [4, 6, 8],
|
||||
},
|
||||
},
|
||||
generateVideo: vi.fn(async () => {
|
||||
throw new Error("not used");
|
||||
@@ -389,7 +396,9 @@ describe("createVideoGenerateTool", () => {
|
||||
defaultModel: "sora-2",
|
||||
models: ["sora-2"],
|
||||
capabilities: {
|
||||
supportsSize: true,
|
||||
generate: {
|
||||
supportsSize: true,
|
||||
},
|
||||
},
|
||||
generateVideo: vi.fn(async () => {
|
||||
throw new Error("not used");
|
||||
|
||||
@@ -281,6 +281,12 @@ function validateVideoGenerationCapabilities(params: {
|
||||
inputImageCount: params.inputImageCount,
|
||||
inputVideoCount: params.inputVideoCount,
|
||||
});
|
||||
if (!caps && mode === "imageToVideo" && params.inputVideoCount === 0) {
|
||||
throw new ToolInputError(`${provider.id} does not support image-to-video reference inputs.`);
|
||||
}
|
||||
if (!caps && mode === "videoToVideo" && params.inputImageCount === 0) {
|
||||
throw new ToolInputError(`${provider.id} does not support video-to-video reference inputs.`);
|
||||
}
|
||||
if (!caps) {
|
||||
return;
|
||||
}
|
||||
|
||||
77
src/music-generation/capabilities.test.ts
Normal file
77
src/music-generation/capabilities.test.ts
Normal file
@@ -0,0 +1,77 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import {
|
||||
listSupportedMusicGenerationModes,
|
||||
resolveMusicGenerationMode,
|
||||
resolveMusicGenerationModeCapabilities,
|
||||
} from "./capabilities.js";
|
||||
import type { MusicGenerationProvider } from "./types.js";
|
||||
|
||||
function createProvider(
|
||||
capabilities: MusicGenerationProvider["capabilities"],
|
||||
): MusicGenerationProvider {
|
||||
return {
|
||||
id: "music-plugin",
|
||||
capabilities,
|
||||
async generateMusic() {
|
||||
throw new Error("not used");
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
describe("music-generation capabilities", () => {
|
||||
it("requires explicit edit capabilities before advertising edit mode", () => {
|
||||
const provider = createProvider({
|
||||
maxInputImages: 2,
|
||||
});
|
||||
|
||||
expect(listSupportedMusicGenerationModes(provider)).toEqual(["generate"]);
|
||||
});
|
||||
|
||||
it("prefers explicit edit capabilities for reference-image requests", () => {
|
||||
const provider = createProvider({
|
||||
supportsDuration: true,
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxInputImages: 1,
|
||||
supportsDuration: false,
|
||||
supportsLyrics: true,
|
||||
},
|
||||
});
|
||||
|
||||
expect(
|
||||
resolveMusicGenerationModeCapabilities({
|
||||
provider,
|
||||
inputImageCount: 1,
|
||||
}),
|
||||
).toEqual({
|
||||
mode: "edit",
|
||||
capabilities: {
|
||||
enabled: true,
|
||||
maxInputImages: 1,
|
||||
supportsDuration: false,
|
||||
supportsLyrics: true,
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it("detects generate vs edit mode from reference images", () => {
|
||||
expect(resolveMusicGenerationMode({ inputImageCount: 0 })).toBe("generate");
|
||||
expect(resolveMusicGenerationMode({ inputImageCount: 1 })).toBe("edit");
|
||||
});
|
||||
|
||||
it("does not infer edit capabilities from aggregate fields", () => {
|
||||
const provider = createProvider({
|
||||
maxInputImages: 1,
|
||||
});
|
||||
|
||||
expect(
|
||||
resolveMusicGenerationModeCapabilities({
|
||||
provider,
|
||||
inputImageCount: 1,
|
||||
}),
|
||||
).toEqual({
|
||||
mode: "edit",
|
||||
capabilities: undefined,
|
||||
});
|
||||
});
|
||||
});
|
||||
47
src/music-generation/capabilities.ts
Normal file
47
src/music-generation/capabilities.ts
Normal file
@@ -0,0 +1,47 @@
|
||||
import type {
|
||||
MusicGenerationEditCapabilities,
|
||||
MusicGenerationMode,
|
||||
MusicGenerationModeCapabilities,
|
||||
MusicGenerationProvider,
|
||||
} from "./types.js";
|
||||
|
||||
export function resolveMusicGenerationMode(params: {
|
||||
inputImageCount?: number;
|
||||
}): MusicGenerationMode {
|
||||
return (params.inputImageCount ?? 0) > 0 ? "edit" : "generate";
|
||||
}
|
||||
|
||||
export function listSupportedMusicGenerationModes(
|
||||
provider: Pick<MusicGenerationProvider, "capabilities">,
|
||||
): MusicGenerationMode[] {
|
||||
const modes: MusicGenerationMode[] = ["generate"];
|
||||
const edit = provider.capabilities.edit;
|
||||
if (edit?.enabled) {
|
||||
modes.push("edit");
|
||||
}
|
||||
return modes;
|
||||
}
|
||||
|
||||
export function resolveMusicGenerationModeCapabilities(params: {
|
||||
provider?: Pick<MusicGenerationProvider, "capabilities">;
|
||||
inputImageCount?: number;
|
||||
}): {
|
||||
mode: MusicGenerationMode;
|
||||
capabilities: MusicGenerationModeCapabilities | MusicGenerationEditCapabilities | undefined;
|
||||
} {
|
||||
const mode = resolveMusicGenerationMode(params);
|
||||
const capabilities = params.provider?.capabilities;
|
||||
if (!capabilities) {
|
||||
return { mode, capabilities: undefined };
|
||||
}
|
||||
if (mode === "generate") {
|
||||
return {
|
||||
mode,
|
||||
capabilities: capabilities.generate,
|
||||
};
|
||||
}
|
||||
return {
|
||||
mode,
|
||||
capabilities: capabilities.edit,
|
||||
};
|
||||
}
|
||||
@@ -1,4 +1,84 @@
|
||||
import type { AuthProfileStore } from "../agents/auth-profiles.js";
|
||||
import type { OpenClawConfig } from "../config/config.js";
|
||||
|
||||
export const DEFAULT_LIVE_MUSIC_MODELS: Record<string, string> = {
|
||||
google: "google/lyria-3-clip-preview",
|
||||
minimax: "minimax/music-2.5+",
|
||||
};
|
||||
|
||||
export function redactLiveApiKey(value: string | undefined): string {
|
||||
const trimmed = value?.trim();
|
||||
if (!trimmed) {
|
||||
return "none";
|
||||
}
|
||||
if (trimmed.length <= 12) {
|
||||
return trimmed;
|
||||
}
|
||||
return `${trimmed.slice(0, 8)}...${trimmed.slice(-4)}`;
|
||||
}
|
||||
|
||||
export function parseCsvFilter(raw?: string): Set<string> | null {
|
||||
const trimmed = raw?.trim();
|
||||
if (!trimmed || trimmed === "all") {
|
||||
return null;
|
||||
}
|
||||
const values = trimmed
|
||||
.split(",")
|
||||
.map((entry) => entry.trim().toLowerCase())
|
||||
.filter(Boolean);
|
||||
return values.length > 0 ? new Set(values) : null;
|
||||
}
|
||||
|
||||
export function parseProviderModelMap(raw?: string): Map<string, string> {
|
||||
const entries = new Map<string, string>();
|
||||
for (const token of raw?.split(",") ?? []) {
|
||||
const trimmed = token.trim();
|
||||
if (!trimmed) {
|
||||
continue;
|
||||
}
|
||||
const slash = trimmed.indexOf("/");
|
||||
if (slash <= 0 || slash === trimmed.length - 1) {
|
||||
continue;
|
||||
}
|
||||
entries.set(trimmed.slice(0, slash).trim().toLowerCase(), trimmed);
|
||||
}
|
||||
return entries;
|
||||
}
|
||||
|
||||
export function resolveConfiguredLiveMusicModels(cfg: OpenClawConfig): Map<string, string> {
|
||||
const resolved = new Map<string, string>();
|
||||
const configured = cfg.agents?.defaults?.musicGenerationModel;
|
||||
const add = (value: string | undefined) => {
|
||||
const trimmed = value?.trim();
|
||||
if (!trimmed) {
|
||||
return;
|
||||
}
|
||||
const slash = trimmed.indexOf("/");
|
||||
if (slash <= 0 || slash === trimmed.length - 1) {
|
||||
return;
|
||||
}
|
||||
resolved.set(trimmed.slice(0, slash).trim().toLowerCase(), trimmed);
|
||||
};
|
||||
if (typeof configured === "string") {
|
||||
add(configured);
|
||||
return resolved;
|
||||
}
|
||||
add(configured?.primary);
|
||||
for (const fallback of configured?.fallbacks ?? []) {
|
||||
add(fallback);
|
||||
}
|
||||
return resolved;
|
||||
}
|
||||
|
||||
export function resolveLiveMusicAuthStore(params: {
|
||||
requireProfileKeys: boolean;
|
||||
hasLiveKeys: boolean;
|
||||
}): AuthProfileStore | undefined {
|
||||
if (params.requireProfileKeys || !params.hasLiveKeys) {
|
||||
return undefined;
|
||||
}
|
||||
return {
|
||||
version: 1,
|
||||
profiles: {},
|
||||
};
|
||||
}
|
||||
|
||||
33
src/music-generation/provider-capabilities.contract.test.ts
Normal file
33
src/music-generation/provider-capabilities.contract.test.ts
Normal file
@@ -0,0 +1,33 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { musicGenerationProviderContractRegistry } from "../plugins/contracts/registry.js";
|
||||
import { listSupportedMusicGenerationModes } from "./capabilities.js";
|
||||
|
||||
describe("bundled music-generation provider capabilities", () => {
|
||||
it("declares explicit generate/edit support for every bundled provider", () => {
|
||||
expect(musicGenerationProviderContractRegistry.length).toBeGreaterThan(0);
|
||||
|
||||
for (const entry of musicGenerationProviderContractRegistry) {
|
||||
const { provider } = entry;
|
||||
expect(
|
||||
provider.capabilities.generate,
|
||||
`${provider.id} missing generate capabilities`,
|
||||
).toBeDefined();
|
||||
expect(provider.capabilities.edit, `${provider.id} missing edit capabilities`).toBeDefined();
|
||||
|
||||
const edit = provider.capabilities.edit;
|
||||
if (!edit) {
|
||||
continue;
|
||||
}
|
||||
|
||||
if (edit.enabled) {
|
||||
expect(
|
||||
edit.maxInputImages ?? 0,
|
||||
`${provider.id} edit.enabled requires maxInputImages`,
|
||||
).toBeGreaterThan(0);
|
||||
expect(listSupportedMusicGenerationModes(provider)).toContain("edit");
|
||||
} else {
|
||||
expect(listSupportedMusicGenerationModes(provider)).toEqual(["generate"]);
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
@@ -136,7 +136,9 @@ describe("music-generation runtime", () => {
|
||||
defaultModel: "track-v1",
|
||||
models: ["track-v1"],
|
||||
capabilities: {
|
||||
supportsDuration: true,
|
||||
generate: {
|
||||
supportsDuration: true,
|
||||
},
|
||||
},
|
||||
generateMusic: async () => ({
|
||||
tracks: [{ buffer: Buffer.from("mp3-bytes"), mimeType: "audio/mpeg" }],
|
||||
@@ -164,11 +166,13 @@ describe("music-generation runtime", () => {
|
||||
mocks.getMusicGenerationProvider.mockReturnValue({
|
||||
id: "google",
|
||||
capabilities: {
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsFormat: true,
|
||||
supportedFormatsByModel: {
|
||||
"lyria-3-clip-preview": ["mp3"],
|
||||
generate: {
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsFormat: true,
|
||||
supportedFormatsByModel: {
|
||||
"lyria-3-clip-preview": ["mp3"],
|
||||
},
|
||||
},
|
||||
},
|
||||
generateMusic: async (req) => {
|
||||
@@ -211,4 +215,74 @@ describe("music-generation runtime", () => {
|
||||
{ key: "format", value: "wav" },
|
||||
]);
|
||||
});
|
||||
|
||||
it("uses mode-specific capabilities for edit requests", async () => {
|
||||
let seenRequest:
|
||||
| {
|
||||
lyrics?: string;
|
||||
instrumental?: boolean;
|
||||
durationSeconds?: number;
|
||||
format?: string;
|
||||
}
|
||||
| undefined;
|
||||
mocks.resolveAgentModelPrimaryValue.mockReturnValue("google/lyria-3-pro-preview");
|
||||
mocks.getMusicGenerationProvider.mockReturnValue({
|
||||
id: "google",
|
||||
capabilities: {
|
||||
generate: {
|
||||
supportsLyrics: false,
|
||||
supportsInstrumental: false,
|
||||
supportsFormat: true,
|
||||
supportedFormats: ["mp3"],
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxInputImages: 1,
|
||||
supportsLyrics: true,
|
||||
supportsInstrumental: true,
|
||||
supportsDuration: false,
|
||||
supportsFormat: false,
|
||||
},
|
||||
},
|
||||
generateMusic: async (req) => {
|
||||
seenRequest = {
|
||||
lyrics: req.lyrics,
|
||||
instrumental: req.instrumental,
|
||||
durationSeconds: req.durationSeconds,
|
||||
format: req.format,
|
||||
};
|
||||
return {
|
||||
tracks: [{ buffer: Buffer.from("mp3-bytes"), mimeType: "audio/mpeg" }],
|
||||
model: "lyria-3-pro-preview",
|
||||
};
|
||||
},
|
||||
});
|
||||
|
||||
const result = await generateMusic({
|
||||
cfg: {
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: { primary: "google/lyria-3-pro-preview" },
|
||||
},
|
||||
},
|
||||
} as OpenClawConfig,
|
||||
prompt: "turn this cover image into a trailer cue",
|
||||
lyrics: "rise up",
|
||||
instrumental: true,
|
||||
durationSeconds: 30,
|
||||
format: "mp3",
|
||||
inputImages: [{ buffer: Buffer.from("png"), mimeType: "image/png" }],
|
||||
});
|
||||
|
||||
expect(seenRequest).toEqual({
|
||||
lyrics: "rise up",
|
||||
instrumental: true,
|
||||
durationSeconds: undefined,
|
||||
format: undefined,
|
||||
});
|
||||
expect(result.ignoredOverrides).toEqual([
|
||||
{ key: "durationSeconds", value: 30 },
|
||||
{ key: "format", value: "mp3" },
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -8,6 +8,7 @@ import {
|
||||
resolveCapabilityModelCandidates,
|
||||
throwCapabilityGenerationFailure,
|
||||
} from "../media-generation/runtime-shared.js";
|
||||
import { resolveMusicGenerationModeCapabilities } from "./capabilities.js";
|
||||
import { parseMusicGenerationModelRef } from "./model-ref.js";
|
||||
import { getMusicGenerationProvider, listMusicGenerationProviders } from "./provider-registry.js";
|
||||
import type {
|
||||
@@ -54,14 +55,28 @@ function resolveProviderMusicGenerationOverrides(params: {
|
||||
instrumental?: boolean;
|
||||
durationSeconds?: number;
|
||||
format?: MusicGenerationOutputFormat;
|
||||
inputImages?: MusicGenerationSourceImage[];
|
||||
}) {
|
||||
const caps = params.provider.capabilities;
|
||||
const { capabilities: caps } = resolveMusicGenerationModeCapabilities({
|
||||
provider: params.provider,
|
||||
inputImageCount: params.inputImages?.length ?? 0,
|
||||
});
|
||||
const ignoredOverrides: MusicGenerationIgnoredOverride[] = [];
|
||||
let lyrics = params.lyrics;
|
||||
let instrumental = params.instrumental;
|
||||
let durationSeconds = params.durationSeconds;
|
||||
let format = params.format;
|
||||
|
||||
if (!caps) {
|
||||
return {
|
||||
lyrics,
|
||||
instrumental,
|
||||
durationSeconds,
|
||||
format,
|
||||
ignoredOverrides,
|
||||
};
|
||||
}
|
||||
|
||||
if (lyrics?.trim() && !caps.supportsLyrics) {
|
||||
ignoredOverrides.push({ key: "lyrics", value: lyrics });
|
||||
lyrics = undefined;
|
||||
@@ -142,6 +157,7 @@ export async function generateMusic(
|
||||
instrumental: params.instrumental,
|
||||
durationSeconds: params.durationSeconds,
|
||||
format: params.format,
|
||||
inputImages: params.inputImages,
|
||||
});
|
||||
const result: MusicGenerationResult = await provider.generateMusic({
|
||||
provider: candidate.provider,
|
||||
|
||||
@@ -50,9 +50,10 @@ export type MusicGenerationIgnoredOverride = {
|
||||
value: string | boolean | number;
|
||||
};
|
||||
|
||||
export type MusicGenerationProviderCapabilities = {
|
||||
export type MusicGenerationMode = "generate" | "edit";
|
||||
|
||||
export type MusicGenerationModeCapabilities = {
|
||||
maxTracks?: number;
|
||||
maxInputImages?: number;
|
||||
maxDurationSeconds?: number;
|
||||
supportsLyrics?: boolean;
|
||||
supportsInstrumental?: boolean;
|
||||
@@ -62,6 +63,17 @@ export type MusicGenerationProviderCapabilities = {
|
||||
supportedFormatsByModel?: Readonly<Record<string, readonly MusicGenerationOutputFormat[]>>;
|
||||
};
|
||||
|
||||
export type MusicGenerationEditCapabilities = MusicGenerationModeCapabilities & {
|
||||
enabled: boolean;
|
||||
maxInputImages?: number;
|
||||
};
|
||||
|
||||
export type MusicGenerationProviderCapabilities = MusicGenerationModeCapabilities & {
|
||||
maxInputImages?: number;
|
||||
generate?: MusicGenerationModeCapabilities;
|
||||
edit?: MusicGenerationEditCapabilities;
|
||||
};
|
||||
|
||||
export type MusicGenerationProvider = {
|
||||
id: string;
|
||||
aliases?: string[];
|
||||
|
||||
@@ -2,6 +2,9 @@
|
||||
|
||||
export type {
|
||||
GeneratedMusicAsset,
|
||||
MusicGenerationEditCapabilities,
|
||||
MusicGenerationMode,
|
||||
MusicGenerationModeCapabilities,
|
||||
MusicGenerationProvider,
|
||||
MusicGenerationProviderCapabilities,
|
||||
MusicGenerationRequest,
|
||||
|
||||
@@ -1,8 +1,4 @@
|
||||
import { createJiti } from "jiti";
|
||||
import { loadBundledCapabilityRuntimeRegistry } from "../bundled-capability-runtime.js";
|
||||
import { resolveBundledPluginRepoEntryPath } from "../bundled-plugin-metadata.js";
|
||||
import { createCapturedPluginRegistration } from "../captured-registration.js";
|
||||
import type { OpenClawPluginDefinition } from "../types.js";
|
||||
import type {
|
||||
ImageGenerationProviderPlugin,
|
||||
MediaUnderstandingProviderPlugin,
|
||||
@@ -85,58 +81,62 @@ const VITEST_CONTRACT_PLUGIN_IDS = {
|
||||
function loadVitestVideoGenerationFallbackEntries(
|
||||
pluginIds: readonly string[],
|
||||
): VideoGenerationProviderContractEntry[] {
|
||||
const jiti = createJiti(import.meta.url, {
|
||||
interopDefault: true,
|
||||
moduleCache: false,
|
||||
fsCache: false,
|
||||
return loadVitestCapabilityContractEntries({
|
||||
contract: "videoGenerationProviders",
|
||||
pluginSdkResolution: "src",
|
||||
pluginIds,
|
||||
pickEntries: (registry) =>
|
||||
registry.videoGenerationProviders.map((entry) => ({
|
||||
pluginId: entry.pluginId,
|
||||
provider: entry.provider,
|
||||
})),
|
||||
});
|
||||
const repoRoot = process.cwd();
|
||||
return pluginIds.flatMap((pluginId) => {
|
||||
const modulePath = resolveBundledPluginRepoEntryPath({
|
||||
rootDir: repoRoot,
|
||||
pluginId,
|
||||
preferBuilt: true,
|
||||
});
|
||||
if (!modulePath) {
|
||||
return [];
|
||||
}
|
||||
try {
|
||||
const mod = jiti(modulePath) as
|
||||
| OpenClawPluginDefinition
|
||||
| { default?: OpenClawPluginDefinition };
|
||||
const plugin =
|
||||
(mod as { default?: OpenClawPluginDefinition }).default ??
|
||||
(mod as OpenClawPluginDefinition);
|
||||
if (typeof plugin?.register !== "function") {
|
||||
return [];
|
||||
}
|
||||
const captured = createCapturedPluginRegistration();
|
||||
void plugin.register(captured.api);
|
||||
return captured.videoGenerationProviders.map((provider) => ({
|
||||
pluginId,
|
||||
provider,
|
||||
}));
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function loadVitestMusicGenerationFallbackEntries(
|
||||
pluginIds: readonly string[],
|
||||
): MusicGenerationProviderContractEntry[] {
|
||||
return loadVitestCapabilityContractEntries({
|
||||
contract: "musicGenerationProviders",
|
||||
pluginSdkResolution: "src",
|
||||
pluginIds,
|
||||
pickEntries: (registry) =>
|
||||
registry.musicGenerationProviders.map((entry) => ({
|
||||
pluginId: entry.pluginId,
|
||||
provider: entry.provider,
|
||||
})),
|
||||
});
|
||||
}
|
||||
|
||||
function hasExplicitVideoGenerationModes(provider: VideoGenerationProviderPlugin): boolean {
|
||||
return Boolean(
|
||||
provider.capabilities.generate &&
|
||||
provider.capabilities.imageToVideo &&
|
||||
provider.capabilities.videoToVideo,
|
||||
);
|
||||
}
|
||||
|
||||
function hasExplicitMusicGenerationModes(provider: MusicGenerationProviderPlugin): boolean {
|
||||
return Boolean(provider.capabilities.generate && provider.capabilities.edit);
|
||||
}
|
||||
|
||||
function loadVitestCapabilityContractEntries<T>(params: {
|
||||
contract: ManifestContractKey;
|
||||
pluginIds?: readonly string[];
|
||||
pluginSdkResolution?: "dist" | "src";
|
||||
pickEntries: (registry: ReturnType<typeof loadBundledCapabilityRuntimeRegistry>) => Array<{
|
||||
pluginId: string;
|
||||
provider: T;
|
||||
}>;
|
||||
}): Array<{ pluginId: string; provider: T }> {
|
||||
const pluginIds = VITEST_CONTRACT_PLUGIN_IDS[params.contract];
|
||||
const pluginIds = [...(params.pluginIds ?? VITEST_CONTRACT_PLUGIN_IDS[params.contract])];
|
||||
if (pluginIds.length === 0) {
|
||||
return [];
|
||||
}
|
||||
const bulkEntries = params.pickEntries(
|
||||
loadBundledCapabilityRuntimeRegistry({
|
||||
pluginIds,
|
||||
pluginSdkResolution: "dist",
|
||||
pluginSdkResolution: params.pluginSdkResolution ?? "dist",
|
||||
}),
|
||||
);
|
||||
const coveredPluginIds = new Set(bulkEntries.map((entry) => entry.pluginId));
|
||||
@@ -148,7 +148,7 @@ function loadVitestCapabilityContractEntries<T>(params: {
|
||||
.pickEntries(
|
||||
loadBundledCapabilityRuntimeRegistry({
|
||||
pluginIds: [pluginId],
|
||||
pluginSdkResolution: "dist",
|
||||
pluginSdkResolution: params.pluginSdkResolution ?? "dist",
|
||||
}),
|
||||
)
|
||||
.filter((entry) => entry.pluginId === pluginId),
|
||||
@@ -220,17 +220,27 @@ export function loadVitestVideoGenerationProviderContractRegistry(): VideoGenera
|
||||
})),
|
||||
});
|
||||
const coveredPluginIds = new Set(entries.map((entry) => entry.pluginId));
|
||||
const stalePluginIds = new Set(
|
||||
entries
|
||||
.filter((entry) => !hasExplicitVideoGenerationModes(entry.provider))
|
||||
.map((entry) => entry.pluginId),
|
||||
);
|
||||
const missingPluginIds = VITEST_CONTRACT_PLUGIN_IDS.videoGenerationProviders.filter(
|
||||
(pluginId) => !coveredPluginIds.has(pluginId),
|
||||
(pluginId) => !coveredPluginIds.has(pluginId) || stalePluginIds.has(pluginId),
|
||||
);
|
||||
if (missingPluginIds.length === 0) {
|
||||
return entries;
|
||||
}
|
||||
return [...entries, ...loadVitestVideoGenerationFallbackEntries(missingPluginIds)];
|
||||
const replacementEntries = loadVitestVideoGenerationFallbackEntries(missingPluginIds);
|
||||
const replacedPluginIds = new Set(replacementEntries.map((entry) => entry.pluginId));
|
||||
return [
|
||||
...entries.filter((entry) => !replacedPluginIds.has(entry.pluginId)),
|
||||
...replacementEntries,
|
||||
];
|
||||
}
|
||||
|
||||
export function loadVitestMusicGenerationProviderContractRegistry(): MusicGenerationProviderContractEntry[] {
|
||||
return loadVitestCapabilityContractEntries({
|
||||
const entries = loadVitestCapabilityContractEntries({
|
||||
contract: "musicGenerationProviders",
|
||||
pickEntries: (registry) =>
|
||||
registry.musicGenerationProviders.map((entry) => ({
|
||||
@@ -238,4 +248,22 @@ export function loadVitestMusicGenerationProviderContractRegistry(): MusicGenera
|
||||
provider: entry.provider,
|
||||
})),
|
||||
});
|
||||
const coveredPluginIds = new Set(entries.map((entry) => entry.pluginId));
|
||||
const stalePluginIds = new Set(
|
||||
entries
|
||||
.filter((entry) => !hasExplicitMusicGenerationModes(entry.provider))
|
||||
.map((entry) => entry.pluginId),
|
||||
);
|
||||
const missingPluginIds = VITEST_CONTRACT_PLUGIN_IDS.musicGenerationProviders.filter(
|
||||
(pluginId) => !coveredPluginIds.has(pluginId) || stalePluginIds.has(pluginId),
|
||||
);
|
||||
if (missingPluginIds.length === 0) {
|
||||
return entries;
|
||||
}
|
||||
const replacementEntries = loadVitestMusicGenerationFallbackEntries(missingPluginIds);
|
||||
const replacedPluginIds = new Set(replacementEntries.map((entry) => entry.pluginId));
|
||||
return [
|
||||
...entries.filter((entry) => !replacedPluginIds.has(entry.pluginId)),
|
||||
...replacementEntries,
|
||||
];
|
||||
}
|
||||
|
||||
@@ -19,17 +19,13 @@ function createProvider(
|
||||
}
|
||||
|
||||
describe("video-generation capabilities", () => {
|
||||
it("derives legacy modes from aggregate input limits", () => {
|
||||
it("requires explicit transform capabilities before advertising transform modes", () => {
|
||||
const provider = createProvider({
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 2,
|
||||
});
|
||||
|
||||
expect(listSupportedVideoGenerationModes(provider)).toEqual([
|
||||
"generate",
|
||||
"imageToVideo",
|
||||
"videoToVideo",
|
||||
]);
|
||||
expect(listSupportedVideoGenerationModes(provider)).toEqual(["generate"]);
|
||||
});
|
||||
|
||||
it("prefers explicit mode capabilities for image-to-video requests", () => {
|
||||
@@ -60,7 +56,7 @@ describe("video-generation capabilities", () => {
|
||||
});
|
||||
});
|
||||
|
||||
it("falls back to aggregate capabilities for mixed reference requests", () => {
|
||||
it("does not infer transform capabilities for mixed reference requests", () => {
|
||||
const provider = createProvider({
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 4,
|
||||
@@ -76,19 +72,7 @@ describe("video-generation capabilities", () => {
|
||||
}),
|
||||
).toEqual({
|
||||
mode: null,
|
||||
capabilities: {
|
||||
maxVideos: undefined,
|
||||
maxInputImages: 1,
|
||||
maxInputVideos: 4,
|
||||
maxDurationSeconds: undefined,
|
||||
supportedDurationSeconds: undefined,
|
||||
supportedDurationSecondsByModel: undefined,
|
||||
supportsSize: undefined,
|
||||
supportsAspectRatio: undefined,
|
||||
supportsResolution: undefined,
|
||||
supportsAudio: true,
|
||||
supportsWatermark: undefined,
|
||||
},
|
||||
capabilities: undefined,
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
@@ -2,46 +2,9 @@ import type {
|
||||
VideoGenerationMode,
|
||||
VideoGenerationModeCapabilities,
|
||||
VideoGenerationProvider,
|
||||
VideoGenerationProviderCapabilities,
|
||||
VideoGenerationTransformCapabilities,
|
||||
} from "./types.js";
|
||||
|
||||
function pickModeCapabilities(
|
||||
capabilities: VideoGenerationProviderCapabilities,
|
||||
): VideoGenerationModeCapabilities {
|
||||
return {
|
||||
maxVideos: capabilities.maxVideos,
|
||||
maxInputImages: capabilities.maxInputImages,
|
||||
maxInputVideos: capabilities.maxInputVideos,
|
||||
maxDurationSeconds: capabilities.maxDurationSeconds,
|
||||
supportedDurationSeconds: capabilities.supportedDurationSeconds,
|
||||
supportedDurationSecondsByModel: capabilities.supportedDurationSecondsByModel,
|
||||
supportsSize: capabilities.supportsSize,
|
||||
supportsAspectRatio: capabilities.supportsAspectRatio,
|
||||
supportsResolution: capabilities.supportsResolution,
|
||||
supportsAudio: capabilities.supportsAudio,
|
||||
supportsWatermark: capabilities.supportsWatermark,
|
||||
};
|
||||
}
|
||||
|
||||
function deriveLegacyImageToVideoCapabilities(
|
||||
capabilities: VideoGenerationProviderCapabilities,
|
||||
): VideoGenerationTransformCapabilities {
|
||||
return {
|
||||
...pickModeCapabilities(capabilities),
|
||||
enabled: (capabilities.maxInputImages ?? 0) > 0,
|
||||
};
|
||||
}
|
||||
|
||||
function deriveLegacyVideoToVideoCapabilities(
|
||||
capabilities: VideoGenerationProviderCapabilities,
|
||||
): VideoGenerationTransformCapabilities {
|
||||
return {
|
||||
...pickModeCapabilities(capabilities),
|
||||
enabled: (capabilities.maxInputVideos ?? 0) > 0,
|
||||
};
|
||||
}
|
||||
|
||||
export function resolveVideoGenerationMode(params: {
|
||||
inputImageCount?: number;
|
||||
inputVideoCount?: number;
|
||||
@@ -64,16 +27,12 @@ export function listSupportedVideoGenerationModes(
|
||||
provider: Pick<VideoGenerationProvider, "capabilities">,
|
||||
): VideoGenerationMode[] {
|
||||
const modes: VideoGenerationMode[] = ["generate"];
|
||||
const imageToVideo =
|
||||
provider.capabilities.imageToVideo ??
|
||||
deriveLegacyImageToVideoCapabilities(provider.capabilities);
|
||||
if (imageToVideo.enabled) {
|
||||
const imageToVideo = provider.capabilities.imageToVideo;
|
||||
if (imageToVideo?.enabled) {
|
||||
modes.push("imageToVideo");
|
||||
}
|
||||
const videoToVideo =
|
||||
provider.capabilities.videoToVideo ??
|
||||
deriveLegacyVideoToVideoCapabilities(provider.capabilities);
|
||||
if (videoToVideo.enabled) {
|
||||
const videoToVideo = provider.capabilities.videoToVideo;
|
||||
if (videoToVideo?.enabled) {
|
||||
modes.push("videoToVideo");
|
||||
}
|
||||
return modes;
|
||||
@@ -95,23 +54,23 @@ export function resolveVideoGenerationModeCapabilities(params: {
|
||||
if (mode === "generate") {
|
||||
return {
|
||||
mode,
|
||||
capabilities: capabilities.generate ?? pickModeCapabilities(capabilities),
|
||||
capabilities: capabilities.generate,
|
||||
};
|
||||
}
|
||||
if (mode === "imageToVideo") {
|
||||
return {
|
||||
mode,
|
||||
capabilities: capabilities.imageToVideo ?? deriveLegacyImageToVideoCapabilities(capabilities),
|
||||
capabilities: capabilities.imageToVideo,
|
||||
};
|
||||
}
|
||||
if (mode === "videoToVideo") {
|
||||
return {
|
||||
mode,
|
||||
capabilities: capabilities.videoToVideo ?? deriveLegacyVideoToVideoCapabilities(capabilities),
|
||||
capabilities: capabilities.videoToVideo,
|
||||
};
|
||||
}
|
||||
return {
|
||||
mode,
|
||||
capabilities: pickModeCapabilities(capabilities),
|
||||
capabilities: undefined,
|
||||
};
|
||||
}
|
||||
|
||||
127
src/video-generation/live-test-helpers.test.ts
Normal file
127
src/video-generation/live-test-helpers.test.ts
Normal file
@@ -0,0 +1,127 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import type { OpenClawConfig } from "../config/config.js";
|
||||
import {
|
||||
canRunBufferBackedVideoToVideoLiveLane,
|
||||
parseCsvFilter,
|
||||
parseProviderModelMap,
|
||||
redactLiveApiKey,
|
||||
resolveConfiguredLiveVideoModels,
|
||||
resolveLiveVideoAuthStore,
|
||||
} from "./live-test-helpers.js";
|
||||
|
||||
describe("video-generation live-test helpers", () => {
|
||||
it("parses provider filters and treats empty/all as unfiltered", () => {
|
||||
expect(parseCsvFilter()).toBeNull();
|
||||
expect(parseCsvFilter("all")).toBeNull();
|
||||
expect(parseCsvFilter(" google , openai ")).toEqual(new Set(["google", "openai"]));
|
||||
});
|
||||
|
||||
it("parses provider model overrides by provider id", () => {
|
||||
expect(
|
||||
parseProviderModelMap("google/veo-3.1-fast-generate-preview, openai/sora-2, invalid"),
|
||||
).toEqual(
|
||||
new Map([
|
||||
["google", "google/veo-3.1-fast-generate-preview"],
|
||||
["openai", "openai/sora-2"],
|
||||
]),
|
||||
);
|
||||
});
|
||||
|
||||
it("collects configured models from primary and fallbacks", () => {
|
||||
const cfg = {
|
||||
agents: {
|
||||
defaults: {
|
||||
videoGenerationModel: {
|
||||
primary: "google/veo-3.1-fast-generate-preview",
|
||||
fallbacks: ["openai/sora-2", "invalid"],
|
||||
},
|
||||
},
|
||||
},
|
||||
} as OpenClawConfig;
|
||||
|
||||
expect(resolveConfiguredLiveVideoModels(cfg)).toEqual(
|
||||
new Map([
|
||||
["google", "google/veo-3.1-fast-generate-preview"],
|
||||
["openai", "openai/sora-2"],
|
||||
]),
|
||||
);
|
||||
});
|
||||
|
||||
it("uses an empty auth store when live env keys should override stale profiles", () => {
|
||||
expect(
|
||||
resolveLiveVideoAuthStore({
|
||||
requireProfileKeys: false,
|
||||
hasLiveKeys: true,
|
||||
}),
|
||||
).toEqual({
|
||||
version: 1,
|
||||
profiles: {},
|
||||
});
|
||||
});
|
||||
|
||||
it("keeps profile-store mode when requested or when no live keys exist", () => {
|
||||
expect(
|
||||
resolveLiveVideoAuthStore({
|
||||
requireProfileKeys: true,
|
||||
hasLiveKeys: true,
|
||||
}),
|
||||
).toBeUndefined();
|
||||
expect(
|
||||
resolveLiveVideoAuthStore({
|
||||
requireProfileKeys: false,
|
||||
hasLiveKeys: false,
|
||||
}),
|
||||
).toBeUndefined();
|
||||
});
|
||||
|
||||
it("redacts live API keys for diagnostics", () => {
|
||||
expect(redactLiveApiKey(undefined)).toBe("none");
|
||||
expect(redactLiveApiKey("short-key")).toBe("short-key");
|
||||
expect(redactLiveApiKey("sk-proj-1234567890")).toBe("sk-proj-...7890");
|
||||
});
|
||||
|
||||
it("runs buffer-backed video-to-video only for supported providers/models", () => {
|
||||
expect(
|
||||
canRunBufferBackedVideoToVideoLiveLane({
|
||||
providerId: "google",
|
||||
modelRef: "google/veo-3.1-fast-generate-preview",
|
||||
}),
|
||||
).toBe(true);
|
||||
expect(
|
||||
canRunBufferBackedVideoToVideoLiveLane({
|
||||
providerId: "openai",
|
||||
modelRef: "openai/sora-2",
|
||||
}),
|
||||
).toBe(true);
|
||||
expect(
|
||||
canRunBufferBackedVideoToVideoLiveLane({
|
||||
providerId: "runway",
|
||||
modelRef: "runway/gen4_aleph",
|
||||
}),
|
||||
).toBe(true);
|
||||
expect(
|
||||
canRunBufferBackedVideoToVideoLiveLane({
|
||||
providerId: "runway",
|
||||
modelRef: "runway/gen4.5",
|
||||
}),
|
||||
).toBe(false);
|
||||
expect(
|
||||
canRunBufferBackedVideoToVideoLiveLane({
|
||||
providerId: "alibaba",
|
||||
modelRef: "alibaba/wan2.6-r2v",
|
||||
}),
|
||||
).toBe(false);
|
||||
expect(
|
||||
canRunBufferBackedVideoToVideoLiveLane({
|
||||
providerId: "qwen",
|
||||
modelRef: "qwen/wan2.6-r2v",
|
||||
}),
|
||||
).toBe(false);
|
||||
expect(
|
||||
canRunBufferBackedVideoToVideoLiveLane({
|
||||
providerId: "xai",
|
||||
modelRef: "xai/grok-imagine-video",
|
||||
}),
|
||||
).toBe(false);
|
||||
});
|
||||
});
|
||||
@@ -14,6 +14,8 @@ export const DEFAULT_LIVE_VIDEO_MODELS: Record<string, string> = {
|
||||
xai: "xai/grok-imagine-video",
|
||||
};
|
||||
|
||||
const REMOTE_URL_VIDEO_TO_VIDEO_PROVIDERS = new Set(["alibaba", "qwen", "xai"]);
|
||||
|
||||
export function redactLiveApiKey(value: string | undefined): string {
|
||||
const trimmed = value?.trim();
|
||||
if (!trimmed) {
|
||||
@@ -78,6 +80,25 @@ export function resolveConfiguredLiveVideoModels(cfg: OpenClawConfig): Map<strin
|
||||
return resolved;
|
||||
}
|
||||
|
||||
export function canRunBufferBackedVideoToVideoLiveLane(params: {
|
||||
providerId: string;
|
||||
modelRef: string;
|
||||
}): boolean {
|
||||
const providerId = params.providerId.trim().toLowerCase();
|
||||
if (REMOTE_URL_VIDEO_TO_VIDEO_PROVIDERS.has(providerId)) {
|
||||
return false;
|
||||
}
|
||||
if (providerId !== "runway") {
|
||||
return true;
|
||||
}
|
||||
const slash = params.modelRef.indexOf("/");
|
||||
const model =
|
||||
slash <= 0 || slash === params.modelRef.length - 1
|
||||
? params.modelRef.trim()
|
||||
: params.modelRef.slice(slash + 1).trim();
|
||||
return model === "gen4_aleph";
|
||||
}
|
||||
|
||||
export function resolveLiveVideoAuthStore(params: {
|
||||
requireProfileKeys: boolean;
|
||||
hasLiveKeys: boolean;
|
||||
|
||||
44
src/video-generation/provider-capabilities.contract.test.ts
Normal file
44
src/video-generation/provider-capabilities.contract.test.ts
Normal file
@@ -0,0 +1,44 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { videoGenerationProviderContractRegistry } from "../plugins/contracts/registry.js";
|
||||
import { listSupportedVideoGenerationModes } from "./capabilities.js";
|
||||
|
||||
describe("bundled video-generation provider capabilities", () => {
|
||||
it("declares explicit mode support for every bundled provider", () => {
|
||||
expect(videoGenerationProviderContractRegistry.length).toBeGreaterThan(0);
|
||||
|
||||
for (const entry of videoGenerationProviderContractRegistry) {
|
||||
const { provider } = entry;
|
||||
expect(
|
||||
provider.capabilities.generate,
|
||||
`${provider.id} missing generate capabilities`,
|
||||
).toBeDefined();
|
||||
expect(
|
||||
provider.capabilities.imageToVideo,
|
||||
`${provider.id} missing imageToVideo capabilities`,
|
||||
).toBeDefined();
|
||||
expect(
|
||||
provider.capabilities.videoToVideo,
|
||||
`${provider.id} missing videoToVideo capabilities`,
|
||||
).toBeDefined();
|
||||
|
||||
const supportedModes = listSupportedVideoGenerationModes(provider);
|
||||
const imageToVideo = provider.capabilities.imageToVideo;
|
||||
const videoToVideo = provider.capabilities.videoToVideo;
|
||||
|
||||
if (imageToVideo?.enabled) {
|
||||
expect(
|
||||
imageToVideo.maxInputImages ?? 0,
|
||||
`${provider.id} imageToVideo.enabled requires maxInputImages`,
|
||||
).toBeGreaterThan(0);
|
||||
expect(supportedModes).toContain("imageToVideo");
|
||||
}
|
||||
if (videoToVideo?.enabled) {
|
||||
expect(
|
||||
videoToVideo.maxInputVideos ?? 0,
|
||||
`${provider.id} videoToVideo.enabled requires maxInputVideos`,
|
||||
).toBeGreaterThan(0);
|
||||
expect(supportedModes).toContain("videoToVideo");
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
@@ -136,7 +136,9 @@ describe("video-generation runtime", () => {
|
||||
defaultModel: "vid-v1",
|
||||
models: ["vid-v1"],
|
||||
capabilities: {
|
||||
supportsAudio: true,
|
||||
generate: {
|
||||
supportsAudio: true,
|
||||
},
|
||||
},
|
||||
generateVideo: async () => ({
|
||||
videos: [{ buffer: Buffer.from("mp4-bytes"), mimeType: "video/mp4" }],
|
||||
@@ -157,7 +159,9 @@ describe("video-generation runtime", () => {
|
||||
mocks.getVideoGenerationProvider.mockReturnValue({
|
||||
id: "video-plugin",
|
||||
capabilities: {
|
||||
supportedDurationSeconds: [4, 6, 8],
|
||||
generate: {
|
||||
supportedDurationSeconds: [4, 6, 8],
|
||||
},
|
||||
},
|
||||
generateVideo: async (req) => {
|
||||
seenDurationSeconds = req.durationSeconds;
|
||||
@@ -203,7 +207,9 @@ describe("video-generation runtime", () => {
|
||||
mocks.getVideoGenerationProvider.mockReturnValue({
|
||||
id: "openai",
|
||||
capabilities: {
|
||||
supportsSize: true,
|
||||
generate: {
|
||||
supportsSize: true,
|
||||
},
|
||||
},
|
||||
generateVideo: async (req) => {
|
||||
seenRequest = {
|
||||
|
||||
Reference in New Issue
Block a user