diff --git a/docs/tools/music-generation.md b/docs/tools/music-generation.md index 812d2b60d8d..10ac1ae7c85 100644 --- a/docs/tools/music-generation.md +++ b/docs/tools/music-generation.md @@ -1,53 +1,82 @@ --- -summary: "Generate music with shared providers, including workflow-backed plugins" +summary: "Generate music via music_generate across Google Lyria, MiniMax, and ComfyUI workflows" read_when: - Generating music or audio via the agent - - Configuring music generation providers and models + - Configuring music-generation providers and models - Understanding the music_generate tool parameters title: "Music generation" +sidebarTitle: "Music generation" --- The `music_generate` tool lets the agent create music or audio through the -shared music-generation capability with configured providers such as Google, -MiniMax, and workflow-configured ComfyUI. +shared music-generation capability with configured providers — Google, +MiniMax, and workflow-configured ComfyUI today. -For shared provider-backed agent sessions, OpenClaw starts music generation as a -background task, tracks it in the task ledger, then wakes the agent again when -the track is ready so the agent can post the finished audio back into the -original channel. +For session-backed agent runs, OpenClaw starts music generation as a +background task, tracks it in the task ledger, then wakes the agent again +when the track is ready so the agent can post the finished audio back into +the original channel. -The built-in shared tool only appears when at least one music-generation provider is available. If you don't see `music_generate` in your agent's tools, configure `agents.defaults.musicGenerationModel` or set up a provider API key. +The built-in shared tool only appears when at least one music-generation +provider is available. If you do not see `music_generate` in your agent's +tools, configure `agents.defaults.musicGenerationModel` or set up a +provider API key. ## Quick start -### Shared provider-backed generation + + + + + Set an API key for at least one provider — for example + `GEMINI_API_KEY` or `MINIMAX_API_KEY`. + + + ```json5 + { + agents: { + defaults: { + musicGenerationModel: { + primary: "google/lyria-3-clip-preview", + }, + }, + }, + } + ``` + + + _"Generate an upbeat synthpop track about a night drive through a + neon city."_ -1. Set an API key for at least one provider, for example `GEMINI_API_KEY` or - `MINIMAX_API_KEY`. -2. Optionally set your preferred model: + The agent calls `music_generate` automatically. No tool + allow-listing needed. + + -```json5 -{ - agents: { - defaults: { - musicGenerationModel: { - primary: "google/lyria-3-clip-preview", - }, - }, - }, -} -``` + For direct synchronous contexts without a session-backed agent run, + the built-in tool still falls back to inline generation and returns + the final media path in the tool result. -3. Ask the agent: _"Generate an upbeat synthpop track about a night drive - through a neon city."_ - -The agent calls `music_generate` automatically. No tool allow-listing needed. - -For direct synchronous contexts without a session-backed agent run, the built-in -tool still falls back to inline generation and returns the final media path in -the tool result. + + + + + Configure `plugins.entries.comfy.config.music` with a workflow + JSON and prompt/output nodes. + + + For Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`. + + + ```text + /tool music_generate prompt="Warm ambient synth loop with soft tape texture" + ``` + + + + Example prompts: @@ -59,40 +88,24 @@ Generate a cinematic piano track with soft strings and no vocals. Generate an energetic chiptune loop about launching a rocket at sunrise. ``` -### Workflow-driven Comfy generation +## Supported providers -The bundled `comfy` plugin plugs into the shared `music_generate` tool through -the music-generation provider registry. - -1. Configure `plugins.entries.comfy.config.music` with a workflow JSON and - prompt/output nodes. -2. If you use Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`. -3. Ask the agent for music or call the tool directly. - -Example: - -```text -/tool music_generate prompt="Warm ambient synth loop with soft tape texture" -``` - -## Shared bundled provider support - -| Provider | Default model | Reference inputs | Supported controls | API key | +| Provider | Default model | Reference inputs | Supported controls | Auth | | -------- | ---------------------- | ---------------- | --------------------------------------------------------- | -------------------------------------- | | ComfyUI | `workflow` | Up to 1 image | Workflow-defined music or audio | `COMFY_API_KEY`, `COMFY_CLOUD_API_KEY` | | Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` | | MiniMax | `music-2.6` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` or MiniMax OAuth | -### Declared capability matrix +### Capability matrix -This is the explicit mode contract used by `music_generate`, contract tests, -and the shared live sweep. +The explicit mode contract used by `music_generate`, contract tests, and the +shared live sweep: | Provider | `generate` | `edit` | Edit limit | Shared live lanes | -| -------- | ---------- | ------ | ---------- | ------------------------------------------------------------------------- | -| ComfyUI | Yes | Yes | 1 image | Not in the shared sweep; covered by `extensions/comfy/comfy.live.test.ts` | -| Google | Yes | Yes | 10 images | `generate`, `edit` | -| MiniMax | Yes | No | None | `generate` | +| -------- | :--------: | :----: | ---------- | ------------------------------------------------------------------------- | +| ComfyUI | ✓ | ✓ | 1 image | Not in the shared sweep; covered by `extensions/comfy/comfy.live.test.ts` | +| Google | ✓ | ✓ | 10 images | `generate`, `edit` | +| MiniMax | ✓ | — | None | `generate` | Use `action: "list"` to inspect available shared providers and models at runtime: @@ -113,48 +126,78 @@ Direct generation example: /tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=true ``` -## Built-in tool parameters +## Tool parameters -| Parameter | Type | Description | -| ----------------- | -------- | ------------------------------------------------------------------------------------------------- | -| `prompt` | string | Music generation prompt (required for `action: "generate"`) | -| `action` | string | `"generate"` (default), `"status"` for the current session task, or `"list"` to inspect providers | -| `model` | string | Provider/model override, e.g. `google/lyria-3-pro-preview` or `comfy/workflow` | -| `lyrics` | string | Optional lyrics when the provider supports explicit lyric input | -| `instrumental` | boolean | Request instrumental-only output when the provider supports it | -| `image` | string | Single reference image path or URL | -| `images` | string[] | Multiple reference images (up to 10) | -| `durationSeconds` | number | Target duration in seconds when the provider supports duration hints | -| `timeoutMs` | number | Optional provider request timeout in milliseconds | -| `format` | string | Output format hint (`mp3` or `wav`) when the provider supports it | -| `filename` | string | Output filename hint | + + Music generation prompt. Required for `action: "generate"`. + + + `"status"` returns the current session task; `"list"` inspects providers. + + + Provider/model override (e.g. `google/lyria-3-pro-preview`, + `comfy/workflow`). + + + Optional lyrics when the provider supports explicit lyric input. + + + Request instrumental-only output when the provider supports it. + + + Single reference image path or URL. + + + Multiple reference images (up to 10 on supporting providers). + + + Target duration in seconds when the provider supports duration hints. + + + Output format hint when the provider supports it. + +Output filename hint. +Optional provider request timeout in milliseconds. -Not all providers support all parameters. OpenClaw still validates hard limits -such as input counts before submission. When a provider supports duration but -uses a shorter maximum than the requested value, OpenClaw automatically clamps -to the closest supported duration. Truly unsupported optional hints are ignored -with a warning when the selected provider or model cannot honor them. + +Not all providers support all parameters. OpenClaw still validates hard +limits such as input counts before submission. When a provider supports +duration but uses a shorter maximum than the requested value, OpenClaw +clamps to the closest supported duration. Truly unsupported optional hints +are ignored with a warning when the selected provider or model cannot honor +them. Tool results report applied settings; `details.normalization` +captures any requested-to-applied mapping. + -Tool results report the applied settings. When OpenClaw clamps duration during provider fallback, the returned `durationSeconds` reflects the submitted value and `details.normalization.durationSeconds` shows the requested-to-applied mapping. +## Async behavior -## Async behavior for the shared provider-backed path +Session-backed music generation runs as a background task: -- Session-backed agent runs: `music_generate` creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message. -- Duplicate prevention: while that background task is still `queued` or `running`, later `music_generate` calls in the same session return task status instead of starting another generation. -- Status lookup: use `action: "status"` to inspect the active session-backed music task without starting a new one. -- Task tracking: use `openclaw tasks list` or `openclaw tasks show ` to inspect queued, running, and terminal status for the generation. -- Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself. -- Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight so the model does not blindly call `music_generate` again. -- No-session fallback: direct/local contexts without a real agent session still run inline and return the final audio result in the same turn. +- **Background task:** `music_generate` creates a background task, returns a + started/task response immediately, and posts the finished track later in + a follow-up agent message. +- **Duplicate prevention:** while a task is `queued` or `running`, later + `music_generate` calls in the same session return task status instead of + starting another generation. Use `action: "status"` to check explicitly. +- **Status lookup:** `openclaw tasks list` or `openclaw tasks show ` + inspects queued, running, and terminal status. +- **Completion wake:** OpenClaw injects an internal completion event back + into the same session so the model can write the user-facing follow-up + itself. +- **Prompt hint:** later user/manual turns in the same session get a small + runtime hint when a music task is already in flight, so the model does + not blindly call `music_generate` again. +- **No-session fallback:** direct/local contexts without a real agent + session run inline and return the final audio result in the same turn. ### Task lifecycle -Each `music_generate` request moves through four states: - -1. **queued** -- task created, waiting for the provider to accept it. -2. **running** -- provider is processing (typically 30 seconds to 3 minutes depending on provider and duration). -3. **succeeded** -- track ready; the agent wakes and posts it to the conversation. -4. **failed** -- provider error or timeout; the agent wakes with error details. +| State | Meaning | +| ----------- | ---------------------------------------------------------------------------------------------- | +| `queued` | Task created, waiting for the provider to accept it. | +| `running` | Provider is processing (typically 30 seconds to 3 minutes depending on provider and duration). | +| `succeeded` | Track ready; the agent wakes and posts it to the conversation. | +| `failed` | Provider error or timeout; the agent wakes with error details. | Check status from the CLI: @@ -164,8 +207,6 @@ openclaw tasks show openclaw tasks cancel ``` -Duplicate prevention: if a music task is already `queued` or `running` for the current session, `music_generate` returns the existing task status instead of starting a new one. Use `action: "status"` to check explicitly without triggering a new generation. - ## Configuration ### Model selection @@ -185,38 +226,59 @@ Duplicate prevention: if a music task is already `queued` or `running` for the c ### Provider selection order -When generating music, OpenClaw tries providers in this order: +OpenClaw tries providers in this order: -1. `model` parameter from the tool call, if the agent specifies one -2. `musicGenerationModel.primary` from config -3. `musicGenerationModel.fallbacks` in order +1. `model` parameter from the tool call (if the agent specifies one). +2. `musicGenerationModel.primary` from config. +3. `musicGenerationModel.fallbacks` in order. 4. Auto-detection using auth-backed provider defaults only: - - current default provider first - - remaining registered music-generation providers in provider-id order + - current default provider first; + - remaining registered music-generation providers in provider-id order. -If a provider fails, the next candidate is tried automatically. If all fail, the -error includes details from each attempt. +If a provider fails, the next candidate is tried automatically. If all +fail, the error includes details from each attempt. -Set `agents.defaults.mediaGenerationAutoProviderFallback: false` if you want -music generation to use only the explicit `model`, `primary`, and `fallbacks` -entries. +Set `agents.defaults.mediaGenerationAutoProviderFallback: false` to use only +explicit `model`, `primary`, and `fallbacks` entries. ## Provider notes -- Google uses Lyria 3 batch generation. The current bundled flow supports - prompt, optional lyrics text, and optional reference images. -- MiniMax uses the batch `music_generation` endpoint. The current bundled flow - supports prompt, optional lyrics, instrumental mode, duration steering, and - mp3 output through either `minimax` API-key auth or `minimax-portal` OAuth. -- ComfyUI support is workflow-driven and depends on the configured graph plus - node mapping for prompt/output fields. + + + Workflow-driven and depends on the configured graph plus node mapping + for prompt/output fields. The bundled `comfy` plugin plugs into the + shared `music_generate` tool through the music-generation provider + registry. + + + Uses Lyria 3 batch generation. The current bundled flow supports + prompt, optional lyrics text, and optional reference images. + + + Uses the batch `music_generation` endpoint. Supports prompt, optional + lyrics, instrumental mode, duration steering, and mp3 output through + either `minimax` API-key auth or `minimax-portal` OAuth. + + + +## Choosing the right path + +- **Shared provider-backed** when you want model selection, provider + failover, and the built-in async task/status flow. +- **Plugin path (ComfyUI)** when you need a custom workflow graph or a + provider that is not part of the shared bundled music capability. + +If you are debugging ComfyUI-specific behavior, see +[ComfyUI](/providers/comfy). If you are debugging shared provider +behavior, start with [Google (Gemini)](/providers/google) or +[MiniMax](/providers/minimax). ## Provider capability modes -The shared music-generation contract now supports explicit mode declarations: +The shared music-generation contract supports explicit mode declarations: -- `generate` for prompt-only generation -- `edit` when the request includes one or more reference images +- `generate` for prompt-only generation. +- `edit` when the request includes one or more reference images. New provider implementations should prefer explicit mode blocks: @@ -237,15 +299,10 @@ capabilities: { ``` Legacy flat fields such as `maxInputImages`, `supportsLyrics`, and -`supportsFormat` are not enough to advertise edit support. Providers should -declare `generate` and `edit` explicitly so live tests, contract tests, and -the shared `music_generate` tool can validate mode support deterministically. - -## Choosing the right path - -- Use the shared provider-backed path when you want model selection, provider failover, and the built-in async task/status flow. -- Use a plugin path such as ComfyUI when you need a custom workflow graph or a provider that is not part of the shared bundled music capability. -- If you are debugging ComfyUI-specific behavior, see [ComfyUI](/providers/comfy). If you are debugging shared provider behavior, start with [Google (Gemini)](/providers/google) or [MiniMax](/providers/minimax). +`supportsFormat` are **not** enough to advertise edit support. Providers +should declare `generate` and `edit` explicitly so live tests, contract +tests, and the shared `music_generate` tool can validate mode support +deterministically. ## Live tests @@ -263,9 +320,8 @@ pnpm test:live:media music This live file loads missing provider env vars from `~/.profile`, prefers live/env API keys ahead of stored auth profiles by default, and runs both -`generate` and declared `edit` coverage when the provider enables edit mode. - -Today that means: +`generate` and declared `edit` coverage when the provider enables edit +mode. Coverage today: - `google`: `generate` plus `edit` - `minimax`: `generate` only @@ -282,10 +338,10 @@ sections are configured. ## Related -- [Background Tasks](/automation/tasks) - task tracking for detached `music_generate` runs -- [Configuration Reference](/gateway/config-agents#agent-defaults) - `musicGenerationModel` config +- [Background tasks](/automation/tasks) — task tracking for detached `music_generate` runs - [ComfyUI](/providers/comfy) +- [Configuration reference](/gateway/config-agents#agent-defaults) — `musicGenerationModel` config - [Google (Gemini)](/providers/google) - [MiniMax](/providers/minimax) -- [Models](/concepts/models) - model configuration and failover -- [Tools Overview](/tools) +- [Models](/concepts/models) — model configuration and failover +- [Tools overview](/tools)