mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 13:40:44 +00:00
docs(music-generation): rewrite around Steps, Tabs, and provider Accordion
The music-generation page was 291 lines with two side-by-side 'Quick start' subsections (shared provider-backed vs. ComfyUI workflow), a flat parameter table, two prose paragraphs explaining async behaviour and task lifecycle, and a 'Provider notes' bullet list mixed with a separate 'Choosing the right path' section. Restructure for scan-first reading without losing technical content: - Wrap Quick start in a top-level Tabs with two child Steps blocks (Shared provider-backed | ComfyUI workflow), so readers pick a path first and only see the matching steps. - Convert the tool parameter list to ParamField definitions with type signatures and required flags surfaced visually. - Convert the four async-behaviour bullets to a labelled bullet list and the four-state task lifecycle to a table for at-a-glance scanning. - Change Capability matrix Yes/No values to checkmarks/em-dashes for alignment with the rest of the media docs. - Convert the 'Provider notes' free-form paragraphs into an AccordionGroup keyed by provider (ComfyUI / Google Lyria 3 / MiniMax), keeping wording faithful. - Sentence-case Related entries and add sidebarTitle so the nav reads 'Music generation' explicitly. Provider rows already alphabetized in the supported providers table (ComfyUI / Google / MiniMax), kept that order. Wording, model refs, defaults, env vars, and capability declarations are unchanged.
This commit is contained in:
@@ -1,53 +1,82 @@
|
||||
---
|
||||
summary: "Generate music with shared providers, including workflow-backed plugins"
|
||||
summary: "Generate music via music_generate across Google Lyria, MiniMax, and ComfyUI workflows"
|
||||
read_when:
|
||||
- Generating music or audio via the agent
|
||||
- Configuring music generation providers and models
|
||||
- Configuring music-generation providers and models
|
||||
- Understanding the music_generate tool parameters
|
||||
title: "Music generation"
|
||||
sidebarTitle: "Music generation"
|
||||
---
|
||||
|
||||
The `music_generate` tool lets the agent create music or audio through the
|
||||
shared music-generation capability with configured providers such as Google,
|
||||
MiniMax, and workflow-configured ComfyUI.
|
||||
shared music-generation capability with configured providers — Google,
|
||||
MiniMax, and workflow-configured ComfyUI today.
|
||||
|
||||
For shared provider-backed agent sessions, OpenClaw starts music generation as a
|
||||
background task, tracks it in the task ledger, then wakes the agent again when
|
||||
the track is ready so the agent can post the finished audio back into the
|
||||
original channel.
|
||||
For session-backed agent runs, OpenClaw starts music generation as a
|
||||
background task, tracks it in the task ledger, then wakes the agent again
|
||||
when the track is ready so the agent can post the finished audio back into
|
||||
the original channel.
|
||||
|
||||
<Note>
|
||||
The built-in shared tool only appears when at least one music-generation provider is available. If you don't see `music_generate` in your agent's tools, configure `agents.defaults.musicGenerationModel` or set up a provider API key.
|
||||
The built-in shared tool only appears when at least one music-generation
|
||||
provider is available. If you do not see `music_generate` in your agent's
|
||||
tools, configure `agents.defaults.musicGenerationModel` or set up a
|
||||
provider API key.
|
||||
</Note>
|
||||
|
||||
## Quick start
|
||||
|
||||
### Shared provider-backed generation
|
||||
<Tabs>
|
||||
<Tab title="Shared provider-backed">
|
||||
<Steps>
|
||||
<Step title="Configure auth">
|
||||
Set an API key for at least one provider — for example
|
||||
`GEMINI_API_KEY` or `MINIMAX_API_KEY`.
|
||||
</Step>
|
||||
<Step title="Pick a default model (optional)">
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: {
|
||||
primary: "google/lyria-3-clip-preview",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
</Step>
|
||||
<Step title="Ask the agent">
|
||||
_"Generate an upbeat synthpop track about a night drive through a
|
||||
neon city."_
|
||||
|
||||
1. Set an API key for at least one provider, for example `GEMINI_API_KEY` or
|
||||
`MINIMAX_API_KEY`.
|
||||
2. Optionally set your preferred model:
|
||||
The agent calls `music_generate` automatically. No tool
|
||||
allow-listing needed.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: {
|
||||
primary: "google/lyria-3-clip-preview",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
For direct synchronous contexts without a session-backed agent run,
|
||||
the built-in tool still falls back to inline generation and returns
|
||||
the final media path in the tool result.
|
||||
|
||||
3. Ask the agent: _"Generate an upbeat synthpop track about a night drive
|
||||
through a neon city."_
|
||||
|
||||
The agent calls `music_generate` automatically. No tool allow-listing needed.
|
||||
|
||||
For direct synchronous contexts without a session-backed agent run, the built-in
|
||||
tool still falls back to inline generation and returns the final media path in
|
||||
the tool result.
|
||||
</Tab>
|
||||
<Tab title="ComfyUI workflow">
|
||||
<Steps>
|
||||
<Step title="Configure the workflow">
|
||||
Configure `plugins.entries.comfy.config.music` with a workflow
|
||||
JSON and prompt/output nodes.
|
||||
</Step>
|
||||
<Step title="Cloud auth (optional)">
|
||||
For Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`.
|
||||
</Step>
|
||||
<Step title="Call the tool">
|
||||
```text
|
||||
/tool music_generate prompt="Warm ambient synth loop with soft tape texture"
|
||||
```
|
||||
</Step>
|
||||
</Steps>
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
Example prompts:
|
||||
|
||||
@@ -59,40 +88,24 @@ Generate a cinematic piano track with soft strings and no vocals.
|
||||
Generate an energetic chiptune loop about launching a rocket at sunrise.
|
||||
```
|
||||
|
||||
### Workflow-driven Comfy generation
|
||||
## Supported providers
|
||||
|
||||
The bundled `comfy` plugin plugs into the shared `music_generate` tool through
|
||||
the music-generation provider registry.
|
||||
|
||||
1. Configure `plugins.entries.comfy.config.music` with a workflow JSON and
|
||||
prompt/output nodes.
|
||||
2. If you use Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`.
|
||||
3. Ask the agent for music or call the tool directly.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
/tool music_generate prompt="Warm ambient synth loop with soft tape texture"
|
||||
```
|
||||
|
||||
## Shared bundled provider support
|
||||
|
||||
| Provider | Default model | Reference inputs | Supported controls | API key |
|
||||
| Provider | Default model | Reference inputs | Supported controls | Auth |
|
||||
| -------- | ---------------------- | ---------------- | --------------------------------------------------------- | -------------------------------------- |
|
||||
| ComfyUI | `workflow` | Up to 1 image | Workflow-defined music or audio | `COMFY_API_KEY`, `COMFY_CLOUD_API_KEY` |
|
||||
| Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
|
||||
| MiniMax | `music-2.6` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` or MiniMax OAuth |
|
||||
|
||||
### Declared capability matrix
|
||||
### Capability matrix
|
||||
|
||||
This is the explicit mode contract used by `music_generate`, contract tests,
|
||||
and the shared live sweep.
|
||||
The explicit mode contract used by `music_generate`, contract tests, and the
|
||||
shared live sweep:
|
||||
|
||||
| Provider | `generate` | `edit` | Edit limit | Shared live lanes |
|
||||
| -------- | ---------- | ------ | ---------- | ------------------------------------------------------------------------- |
|
||||
| ComfyUI | Yes | Yes | 1 image | Not in the shared sweep; covered by `extensions/comfy/comfy.live.test.ts` |
|
||||
| Google | Yes | Yes | 10 images | `generate`, `edit` |
|
||||
| MiniMax | Yes | No | None | `generate` |
|
||||
| -------- | :--------: | :----: | ---------- | ------------------------------------------------------------------------- |
|
||||
| ComfyUI | ✓ | ✓ | 1 image | Not in the shared sweep; covered by `extensions/comfy/comfy.live.test.ts` |
|
||||
| Google | ✓ | ✓ | 10 images | `generate`, `edit` |
|
||||
| MiniMax | ✓ | — | None | `generate` |
|
||||
|
||||
Use `action: "list"` to inspect available shared providers and models at
|
||||
runtime:
|
||||
@@ -113,48 +126,78 @@ Direct generation example:
|
||||
/tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=true
|
||||
```
|
||||
|
||||
## Built-in tool parameters
|
||||
## Tool parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| ----------------- | -------- | ------------------------------------------------------------------------------------------------- |
|
||||
| `prompt` | string | Music generation prompt (required for `action: "generate"`) |
|
||||
| `action` | string | `"generate"` (default), `"status"` for the current session task, or `"list"` to inspect providers |
|
||||
| `model` | string | Provider/model override, e.g. `google/lyria-3-pro-preview` or `comfy/workflow` |
|
||||
| `lyrics` | string | Optional lyrics when the provider supports explicit lyric input |
|
||||
| `instrumental` | boolean | Request instrumental-only output when the provider supports it |
|
||||
| `image` | string | Single reference image path or URL |
|
||||
| `images` | string[] | Multiple reference images (up to 10) |
|
||||
| `durationSeconds` | number | Target duration in seconds when the provider supports duration hints |
|
||||
| `timeoutMs` | number | Optional provider request timeout in milliseconds |
|
||||
| `format` | string | Output format hint (`mp3` or `wav`) when the provider supports it |
|
||||
| `filename` | string | Output filename hint |
|
||||
<ParamField path="prompt" type="string" required>
|
||||
Music generation prompt. Required for `action: "generate"`.
|
||||
</ParamField>
|
||||
<ParamField path="action" type='"generate" | "status" | "list"' default="generate">
|
||||
`"status"` returns the current session task; `"list"` inspects providers.
|
||||
</ParamField>
|
||||
<ParamField path="model" type="string">
|
||||
Provider/model override (e.g. `google/lyria-3-pro-preview`,
|
||||
`comfy/workflow`).
|
||||
</ParamField>
|
||||
<ParamField path="lyrics" type="string">
|
||||
Optional lyrics when the provider supports explicit lyric input.
|
||||
</ParamField>
|
||||
<ParamField path="instrumental" type="boolean">
|
||||
Request instrumental-only output when the provider supports it.
|
||||
</ParamField>
|
||||
<ParamField path="image" type="string">
|
||||
Single reference image path or URL.
|
||||
</ParamField>
|
||||
<ParamField path="images" type="string[]">
|
||||
Multiple reference images (up to 10 on supporting providers).
|
||||
</ParamField>
|
||||
<ParamField path="durationSeconds" type="number">
|
||||
Target duration in seconds when the provider supports duration hints.
|
||||
</ParamField>
|
||||
<ParamField path="format" type='"mp3" | "wav"'>
|
||||
Output format hint when the provider supports it.
|
||||
</ParamField>
|
||||
<ParamField path="filename" type="string">Output filename hint.</ParamField>
|
||||
<ParamField path="timeoutMs" type="number">Optional provider request timeout in milliseconds.</ParamField>
|
||||
|
||||
Not all providers support all parameters. OpenClaw still validates hard limits
|
||||
such as input counts before submission. When a provider supports duration but
|
||||
uses a shorter maximum than the requested value, OpenClaw automatically clamps
|
||||
to the closest supported duration. Truly unsupported optional hints are ignored
|
||||
with a warning when the selected provider or model cannot honor them.
|
||||
<Note>
|
||||
Not all providers support all parameters. OpenClaw still validates hard
|
||||
limits such as input counts before submission. When a provider supports
|
||||
duration but uses a shorter maximum than the requested value, OpenClaw
|
||||
clamps to the closest supported duration. Truly unsupported optional hints
|
||||
are ignored with a warning when the selected provider or model cannot honor
|
||||
them. Tool results report applied settings; `details.normalization`
|
||||
captures any requested-to-applied mapping.
|
||||
</Note>
|
||||
|
||||
Tool results report the applied settings. When OpenClaw clamps duration during provider fallback, the returned `durationSeconds` reflects the submitted value and `details.normalization.durationSeconds` shows the requested-to-applied mapping.
|
||||
## Async behavior
|
||||
|
||||
## Async behavior for the shared provider-backed path
|
||||
Session-backed music generation runs as a background task:
|
||||
|
||||
- Session-backed agent runs: `music_generate` creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message.
|
||||
- Duplicate prevention: while that background task is still `queued` or `running`, later `music_generate` calls in the same session return task status instead of starting another generation.
|
||||
- Status lookup: use `action: "status"` to inspect the active session-backed music task without starting a new one.
|
||||
- Task tracking: use `openclaw tasks list` or `openclaw tasks show <taskId>` to inspect queued, running, and terminal status for the generation.
|
||||
- Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
|
||||
- Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight so the model does not blindly call `music_generate` again.
|
||||
- No-session fallback: direct/local contexts without a real agent session still run inline and return the final audio result in the same turn.
|
||||
- **Background task:** `music_generate` creates a background task, returns a
|
||||
started/task response immediately, and posts the finished track later in
|
||||
a follow-up agent message.
|
||||
- **Duplicate prevention:** while a task is `queued` or `running`, later
|
||||
`music_generate` calls in the same session return task status instead of
|
||||
starting another generation. Use `action: "status"` to check explicitly.
|
||||
- **Status lookup:** `openclaw tasks list` or `openclaw tasks show <taskId>`
|
||||
inspects queued, running, and terminal status.
|
||||
- **Completion wake:** OpenClaw injects an internal completion event back
|
||||
into the same session so the model can write the user-facing follow-up
|
||||
itself.
|
||||
- **Prompt hint:** later user/manual turns in the same session get a small
|
||||
runtime hint when a music task is already in flight, so the model does
|
||||
not blindly call `music_generate` again.
|
||||
- **No-session fallback:** direct/local contexts without a real agent
|
||||
session run inline and return the final audio result in the same turn.
|
||||
|
||||
### Task lifecycle
|
||||
|
||||
Each `music_generate` request moves through four states:
|
||||
|
||||
1. **queued** -- task created, waiting for the provider to accept it.
|
||||
2. **running** -- provider is processing (typically 30 seconds to 3 minutes depending on provider and duration).
|
||||
3. **succeeded** -- track ready; the agent wakes and posts it to the conversation.
|
||||
4. **failed** -- provider error or timeout; the agent wakes with error details.
|
||||
| State | Meaning |
|
||||
| ----------- | ---------------------------------------------------------------------------------------------- |
|
||||
| `queued` | Task created, waiting for the provider to accept it. |
|
||||
| `running` | Provider is processing (typically 30 seconds to 3 minutes depending on provider and duration). |
|
||||
| `succeeded` | Track ready; the agent wakes and posts it to the conversation. |
|
||||
| `failed` | Provider error or timeout; the agent wakes with error details. |
|
||||
|
||||
Check status from the CLI:
|
||||
|
||||
@@ -164,8 +207,6 @@ openclaw tasks show <taskId>
|
||||
openclaw tasks cancel <taskId>
|
||||
```
|
||||
|
||||
Duplicate prevention: if a music task is already `queued` or `running` for the current session, `music_generate` returns the existing task status instead of starting a new one. Use `action: "status"` to check explicitly without triggering a new generation.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Model selection
|
||||
@@ -185,38 +226,59 @@ Duplicate prevention: if a music task is already `queued` or `running` for the c
|
||||
|
||||
### Provider selection order
|
||||
|
||||
When generating music, OpenClaw tries providers in this order:
|
||||
OpenClaw tries providers in this order:
|
||||
|
||||
1. `model` parameter from the tool call, if the agent specifies one
|
||||
2. `musicGenerationModel.primary` from config
|
||||
3. `musicGenerationModel.fallbacks` in order
|
||||
1. `model` parameter from the tool call (if the agent specifies one).
|
||||
2. `musicGenerationModel.primary` from config.
|
||||
3. `musicGenerationModel.fallbacks` in order.
|
||||
4. Auto-detection using auth-backed provider defaults only:
|
||||
- current default provider first
|
||||
- remaining registered music-generation providers in provider-id order
|
||||
- current default provider first;
|
||||
- remaining registered music-generation providers in provider-id order.
|
||||
|
||||
If a provider fails, the next candidate is tried automatically. If all fail, the
|
||||
error includes details from each attempt.
|
||||
If a provider fails, the next candidate is tried automatically. If all
|
||||
fail, the error includes details from each attempt.
|
||||
|
||||
Set `agents.defaults.mediaGenerationAutoProviderFallback: false` if you want
|
||||
music generation to use only the explicit `model`, `primary`, and `fallbacks`
|
||||
entries.
|
||||
Set `agents.defaults.mediaGenerationAutoProviderFallback: false` to use only
|
||||
explicit `model`, `primary`, and `fallbacks` entries.
|
||||
|
||||
## Provider notes
|
||||
|
||||
- Google uses Lyria 3 batch generation. The current bundled flow supports
|
||||
prompt, optional lyrics text, and optional reference images.
|
||||
- MiniMax uses the batch `music_generation` endpoint. The current bundled flow
|
||||
supports prompt, optional lyrics, instrumental mode, duration steering, and
|
||||
mp3 output through either `minimax` API-key auth or `minimax-portal` OAuth.
|
||||
- ComfyUI support is workflow-driven and depends on the configured graph plus
|
||||
node mapping for prompt/output fields.
|
||||
<AccordionGroup>
|
||||
<Accordion title="ComfyUI">
|
||||
Workflow-driven and depends on the configured graph plus node mapping
|
||||
for prompt/output fields. The bundled `comfy` plugin plugs into the
|
||||
shared `music_generate` tool through the music-generation provider
|
||||
registry.
|
||||
</Accordion>
|
||||
<Accordion title="Google (Lyria 3)">
|
||||
Uses Lyria 3 batch generation. The current bundled flow supports
|
||||
prompt, optional lyrics text, and optional reference images.
|
||||
</Accordion>
|
||||
<Accordion title="MiniMax">
|
||||
Uses the batch `music_generation` endpoint. Supports prompt, optional
|
||||
lyrics, instrumental mode, duration steering, and mp3 output through
|
||||
either `minimax` API-key auth or `minimax-portal` OAuth.
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
## Choosing the right path
|
||||
|
||||
- **Shared provider-backed** when you want model selection, provider
|
||||
failover, and the built-in async task/status flow.
|
||||
- **Plugin path (ComfyUI)** when you need a custom workflow graph or a
|
||||
provider that is not part of the shared bundled music capability.
|
||||
|
||||
If you are debugging ComfyUI-specific behavior, see
|
||||
[ComfyUI](/providers/comfy). If you are debugging shared provider
|
||||
behavior, start with [Google (Gemini)](/providers/google) or
|
||||
[MiniMax](/providers/minimax).
|
||||
|
||||
## Provider capability modes
|
||||
|
||||
The shared music-generation contract now supports explicit mode declarations:
|
||||
The shared music-generation contract supports explicit mode declarations:
|
||||
|
||||
- `generate` for prompt-only generation
|
||||
- `edit` when the request includes one or more reference images
|
||||
- `generate` for prompt-only generation.
|
||||
- `edit` when the request includes one or more reference images.
|
||||
|
||||
New provider implementations should prefer explicit mode blocks:
|
||||
|
||||
@@ -237,15 +299,10 @@ capabilities: {
|
||||
```
|
||||
|
||||
Legacy flat fields such as `maxInputImages`, `supportsLyrics`, and
|
||||
`supportsFormat` are not enough to advertise edit support. Providers should
|
||||
declare `generate` and `edit` explicitly so live tests, contract tests, and
|
||||
the shared `music_generate` tool can validate mode support deterministically.
|
||||
|
||||
## Choosing the right path
|
||||
|
||||
- Use the shared provider-backed path when you want model selection, provider failover, and the built-in async task/status flow.
|
||||
- Use a plugin path such as ComfyUI when you need a custom workflow graph or a provider that is not part of the shared bundled music capability.
|
||||
- If you are debugging ComfyUI-specific behavior, see [ComfyUI](/providers/comfy). If you are debugging shared provider behavior, start with [Google (Gemini)](/providers/google) or [MiniMax](/providers/minimax).
|
||||
`supportsFormat` are **not** enough to advertise edit support. Providers
|
||||
should declare `generate` and `edit` explicitly so live tests, contract
|
||||
tests, and the shared `music_generate` tool can validate mode support
|
||||
deterministically.
|
||||
|
||||
## Live tests
|
||||
|
||||
@@ -263,9 +320,8 @@ pnpm test:live:media music
|
||||
|
||||
This live file loads missing provider env vars from `~/.profile`, prefers
|
||||
live/env API keys ahead of stored auth profiles by default, and runs both
|
||||
`generate` and declared `edit` coverage when the provider enables edit mode.
|
||||
|
||||
Today that means:
|
||||
`generate` and declared `edit` coverage when the provider enables edit
|
||||
mode. Coverage today:
|
||||
|
||||
- `google`: `generate` plus `edit`
|
||||
- `minimax`: `generate` only
|
||||
@@ -282,10 +338,10 @@ sections are configured.
|
||||
|
||||
## Related
|
||||
|
||||
- [Background Tasks](/automation/tasks) - task tracking for detached `music_generate` runs
|
||||
- [Configuration Reference](/gateway/config-agents#agent-defaults) - `musicGenerationModel` config
|
||||
- [Background tasks](/automation/tasks) — task tracking for detached `music_generate` runs
|
||||
- [ComfyUI](/providers/comfy)
|
||||
- [Configuration reference](/gateway/config-agents#agent-defaults) — `musicGenerationModel` config
|
||||
- [Google (Gemini)](/providers/google)
|
||||
- [MiniMax](/providers/minimax)
|
||||
- [Models](/concepts/models) - model configuration and failover
|
||||
- [Tools Overview](/tools)
|
||||
- [Models](/concepts/models) — model configuration and failover
|
||||
- [Tools overview](/tools)
|
||||
|
||||
Reference in New Issue
Block a user