feat: add comfy workflow media support

2026-05-01 13:00:22 +00:00 · 2026-04-06 01:42:46 +01:00
parent d37b97c2ff
commit aeb9ad52fa
27 changed files with 2384 additions and 32 deletions
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -1200,12 +1200,14 @@
                  "tools/exec",
                  "tools/exec-approvals",
                  "tools/image-generation",
+                  "tools/music-generation",
                  "tools/llm-task",
                  "tools/lobster",
                  "tools/loop-detection",
                  "tools/pdf",
                  "tools/reactions",
-                  "tools/thinking"
+                  "tools/thinking",
+                  "tools/video-generation"
                ]
              },
              {
@@ -1238,6 +1240,7 @@
                  "providers/bedrock",
                  "providers/bedrock-mantle",
                  "providers/chutes",
+                  "providers/comfy",
                  "providers/claude-max-api-proxy",
                  "providers/cloudflare-ai-gateway",
                  "providers/deepgram",
--- a/docs/help/testing.md
+++ b/docs/help/testing.md
@@ -389,6 +389,15 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
 - Enable: `BYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live src/agents/byteplus.live.test.ts`
 - Optional model override: `BYTEPLUS_CODING_MODEL=ark-code-latest`

+## ComfyUI workflow media live
+
+- Test: `extensions/comfy/comfy.live.test.ts`
+- Enable: `OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts`
+- Scope:
+  - Exercises the bundled comfy image, video, and `music_generate` paths
+  - Skips each capability unless `models.providers.comfy.<capability>` is configured
+  - Useful after changing comfy workflow submission, polling, downloads, or plugin registration
+
 ## Image generation live

 - Test: `src/image-generation/runtime.live.test.ts`
--- a/docs/providers/comfy.md
+++ b/docs/providers/comfy.md
@@ -0,0 +1,202 @@
+---
+title: "ComfyUI"
+summary: "ComfyUI workflow image, video, and music generation setup in OpenClaw"
+read_when:
+  - You want to use local ComfyUI workflows with OpenClaw
+  - You want to use Comfy Cloud with image, video, or music workflows
+  - You need the bundled comfy plugin config keys
+---
+
+# ComfyUI
+
+OpenClaw ships a bundled `comfy` plugin for workflow-driven ComfyUI runs.
+
+- Provider: `comfy`
+- Models: `comfy/workflow`
+- Shared surfaces: `image_generate`, `video_generate`
+- Plugin tool: `music_generate`
+- Auth: none for local ComfyUI; `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for Comfy Cloud
+- API: ComfyUI `/prompt` / `/history` / `/view` and Comfy Cloud `/api/*`
+
+## What it supports
+
+- Image generation from a workflow JSON
+- Image editing with 1 uploaded reference image
+- Video generation from a workflow JSON
+- Video generation with 1 uploaded reference image
+- Music or audio generation through the bundled `music_generate` tool
+- Output download from a configured node or all matching output nodes
+
+The bundled plugin is workflow-driven, so OpenClaw does not try to map generic
+`size`, `aspectRatio`, `resolution`, `durationSeconds`, or TTS-style controls
+onto your graph.
+
+## Config layout
+
+Comfy supports shared top-level connection settings plus per-capability workflow
+sections:
+
+```json5
+{
+  models: {
+    providers: {
+      comfy: {
+        mode: "local",
+        baseUrl: "http://127.0.0.1:8188",
+        image: {
+          workflowPath: "./workflows/flux-api.json",
+          promptNodeId: "6",
+          outputNodeId: "9",
+        },
+        video: {
+          workflowPath: "./workflows/video-api.json",
+          promptNodeId: "12",
+          outputNodeId: "21",
+        },
+        music: {
+          workflowPath: "./workflows/music-api.json",
+          promptNodeId: "3",
+          outputNodeId: "18",
+        },
+      },
+    },
+  },
+}
+```
+
+Shared keys:
+
+- `mode`: `local` or `cloud`
+- `baseUrl`: defaults to `http://127.0.0.1:8188` for local or `https://cloud.comfy.org` for cloud
+- `apiKey`: optional inline key alternative to env vars
+- `allowPrivateNetwork`: allow a private/LAN `baseUrl` in cloud mode
+
+Per-capability keys under `image`, `video`, or `music`:
+
+- `workflow` or `workflowPath`: required
+- `promptNodeId`: required
+- `promptInputName`: defaults to `text`
+- `outputNodeId`: optional
+- `pollIntervalMs`: optional
+- `timeoutMs`: optional
+
+Image and video sections also support:
+
+- `inputImageNodeId`: required when you pass a reference image
+- `inputImageInputName`: defaults to `image`
+
+## Backward compatibility
+
+Existing top-level image config still works:
+
+```json5
+{
+  models: {
+    providers: {
+      comfy: {
+        workflowPath: "./workflows/flux-api.json",
+        promptNodeId: "6",
+        outputNodeId: "9",
+      },
+    },
+  },
+}
+```
+
+OpenClaw treats that legacy shape as the image workflow config.
+
+## Image workflows
+
+Set the default image model:
+
+```json5
+{
+  agents: {
+    defaults: {
+      imageGenerationModel: {
+        primary: "comfy/workflow",
+      },
+    },
+  },
+}
+```
+
+Reference-image editing example:
+
+```json5
+{
+  models: {
+    providers: {
+      comfy: {
+        image: {
+          workflowPath: "./workflows/edit-api.json",
+          promptNodeId: "6",
+          inputImageNodeId: "7",
+          inputImageInputName: "image",
+          outputNodeId: "9",
+        },
+      },
+    },
+  },
+}
+```
+
+## Video workflows
+
+Set the default video model:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "comfy/workflow",
+      },
+    },
+  },
+}
+```
+
+Comfy video workflows currently support text-to-video and image-to-video through
+the configured graph. OpenClaw does not pass input videos into Comfy workflows.
+
+## Music workflows
+
+The bundled plugin registers a `music_generate` tool for workflow-defined audio
+or music outputs:
+
+```text
+/tool music_generate prompt="Warm ambient synth loop with soft tape texture"
+```
+
+Use the `music` config section to point at your audio workflow JSON and output
+node.
+
+## Comfy Cloud
+
+Use `mode: "cloud"` plus one of:
+
+- `COMFY_API_KEY`
+- `COMFY_CLOUD_API_KEY`
+- `models.providers.comfy.apiKey`
+
+Cloud mode still uses the same `image`, `video`, and `music` workflow sections.
+
+## Live tests
+
+Opt-in live coverage exists for the bundled plugin:
+
+```bash
+OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
+```
+
+The live test skips individual image, video, or music cases unless the matching
+Comfy workflow section is configured.
+
+## Related
+
+- [Image Generation](/tools/image-generation)
+- [Video Generation](/tools/video-generation)
+- [Music Generation](/tools/music-generation)
+- [Provider Directory](/providers/index)
+- [Configuration Reference](/gateway/configuration-reference#agent-defaults)
--- a/docs/providers/index.md
+++ b/docs/providers/index.md
@@ -31,6 +31,7 @@ Looking for chat channel docs (WhatsApp/Telegram/Discord/Slack/Mattermost (plugi
 - [Anthropic (API + Claude CLI)](/providers/anthropic)
 - [BytePlus (International)](/concepts/model-providers#byteplus-international)
 - [Chutes](/providers/chutes)
+- [ComfyUI](/providers/comfy)
 - [Cloudflare AI Gateway](/providers/cloudflare-ai-gateway)
 - [DeepSeek](/providers/deepseek)
 - [fal](/providers/fal)
@@ -71,6 +72,7 @@ Looking for chat channel docs (WhatsApp/Telegram/Discord/Slack/Mattermost (plugi

 - [Additional bundled variants](/providers/models#additional-bundled-provider-variants) - Anthropic Vertex, Copilot Proxy, and Gemini CLI OAuth
 - [Image Generation](/tools/image-generation) - Shared `image_generate` tool, provider selection, and failover
+- [Music Generation](/tools/music-generation) - Plugin-provided `music_generate` tool surfaces
 - [Video Generation](/tools/video-generation) - Shared `video_generate` tool, provider selection, and failover

 ## Transcription providers
--- a/docs/providers/models.md
+++ b/docs/providers/models.md
@@ -29,6 +29,7 @@ model as `provider/model`.
 - [Amazon Bedrock](/providers/bedrock)
 - [BytePlus (International)](/concepts/model-providers#byteplus-international)
 - [Chutes](/providers/chutes)
+- [ComfyUI](/providers/comfy)
 - [Cloudflare AI Gateway](/providers/cloudflare-ai-gateway)
 - [fal](/providers/fal)
 - [Fireworks](/providers/fireworks)
--- a/docs/tools/image-generation.md
+++ b/docs/tools/image-generation.md
@@ -1,5 +1,5 @@
 ---
-summary: "Generate and edit images using configured providers (OpenAI, Google Gemini, fal, MiniMax)"
+summary: "Generate and edit images using configured providers (OpenAI, Google Gemini, fal, MiniMax, ComfyUI)"
 read_when:
  - Generating images via the agent
  - Configuring image generation providers and models
@@ -38,12 +38,13 @@ The agent calls `image_generate` automatically. No tool allow-listing needed —

 ## Supported providers

-| Provider | Default model                    | Edit support            | API key                                               |
-| -------- | -------------------------------- | ----------------------- | ----------------------------------------------------- |
-| OpenAI   | `gpt-image-1`                    | Yes (up to 5 images)    | `OPENAI_API_KEY`                                      |
-| Google   | `gemini-3.1-flash-image-preview` | Yes                     | `GEMINI_API_KEY` or `GOOGLE_API_KEY`                  |
-| fal      | `fal-ai/flux/dev`                | Yes                     | `FAL_KEY`                                             |
-| MiniMax  | `image-01`                       | Yes (subject reference) | `MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`) |
+| Provider | Default model                    | Edit support                       | API key                                               |
+| -------- | -------------------------------- | ---------------------------------- | ----------------------------------------------------- |
+| OpenAI   | `gpt-image-1`                    | Yes (up to 5 images)               | `OPENAI_API_KEY`                                      |
+| Google   | `gemini-3.1-flash-image-preview` | Yes                                | `GEMINI_API_KEY` or `GOOGLE_API_KEY`                  |
+| fal      | `fal-ai/flux/dev`                | Yes                                | `FAL_KEY`                                             |
+| MiniMax  | `image-01`                       | Yes (subject reference)            | `MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`) |
+| ComfyUI  | `workflow`                       | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud    |

 Use `action: "list"` to inspect available providers and models at runtime:

@@ -107,13 +108,13 @@ Notes:

 ### Image editing

-OpenAI, Google, fal, and MiniMax support editing reference images. Pass a reference image path or URL:
+OpenAI, Google, fal, MiniMax, and ComfyUI support editing reference images. Pass a reference image path or URL:

 ```
 "Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"
 ```

-OpenAI and Google support up to 5 reference images via the `images` parameter. fal and MiniMax support 1.
+OpenAI and Google support up to 5 reference images via the `images` parameter. fal, MiniMax, and ComfyUI support 1.

 MiniMax image generation is available through both bundled MiniMax auth paths:

@@ -122,18 +123,19 @@ MiniMax image generation is available through both bundled MiniMax auth paths:

 ## Provider capabilities

-| Capability            | OpenAI               | Google               | fal                 | MiniMax                    |
-| --------------------- | -------------------- | -------------------- | ------------------- | -------------------------- |
-| Generate              | Yes (up to 4)        | Yes (up to 4)        | Yes (up to 4)       | Yes (up to 9)              |
-| Edit/reference        | Yes (up to 5 images) | Yes (up to 5 images) | Yes (1 image)       | Yes (1 image, subject ref) |
-| Size control          | Yes                  | Yes                  | Yes                 | No                         |
-| Aspect ratio          | No                   | Yes                  | Yes (generate only) | Yes                        |
-| Resolution (1K/2K/4K) | No                   | Yes                  | Yes                 | No                         |
+| Capability            | OpenAI               | Google               | fal                 | MiniMax                    | ComfyUI                            |
+| --------------------- | -------------------- | -------------------- | ------------------- | -------------------------- | ---------------------------------- |
+| Generate              | Yes (up to 4)        | Yes (up to 4)        | Yes (up to 4)       | Yes (up to 9)              | Yes (workflow-defined outputs)     |
+| Edit/reference        | Yes (up to 5 images) | Yes (up to 5 images) | Yes (1 image)       | Yes (1 image, subject ref) | Yes (1 image, workflow-configured) |
+| Size control          | Yes                  | Yes                  | Yes                 | No                         | No                                 |
+| Aspect ratio          | No                   | Yes                  | Yes (generate only) | Yes                        | No                                 |
+| Resolution (1K/2K/4K) | No                   | Yes                  | Yes                 | No                         | No                                 |

 ## Related

 - [Tools Overview](/tools) — all available agent tools
 - [fal](/providers/fal) — fal image and video provider setup
+- [ComfyUI](/providers/comfy) — local ComfyUI and Comfy Cloud workflow setup
 - [Google (Gemini)](/providers/google) — Gemini image provider setup
 - [MiniMax](/providers/minimax) — MiniMax image provider setup
 - [OpenAI](/providers/openai) — OpenAI Images provider setup
--- a/docs/tools/index.md
+++ b/docs/tools/index.md
@@ -75,6 +75,9 @@ For image work, use `image` for analysis and `image_generate` for generation or

 For video work, use `video_generate`. If you target `qwen/*` or another non-default video provider, configure that provider's auth/API key first.

+For workflow-driven audio generation, use `music_generate` when a plugin such as
+ComfyUI registers it. This is separate from `tts`, which is text-to-speech.
+
 `session_status` is the lightweight status/readback tool in the sessions group.
 It answers `/status`-style questions about the current session and can
 optionally set a per-session model override; `model=default` clears that
@@ -100,6 +103,7 @@ Plugins can register additional tools. Some examples:

 - [Lobster](/tools/lobster) — typed workflow runtime with resumable approvals
 - [LLM Task](/tools/llm-task) — JSON-only LLM step for structured output
+- [Music Generation](/tools/music-generation) — plugin-provided `music_generate` tool surfaces
 - [Diffs](/tools/diffs) — diff viewer and renderer
 - [OpenProse](/prose) — markdown-first workflow orchestration

--- a/docs/tools/music-generation.md
+++ b/docs/tools/music-generation.md
@@ -0,0 +1,59 @@
+---
+summary: "Generate music or audio with plugin-provided tools such as ComfyUI workflows"
+read_when:
+  - Generating music or audio via the agent
+  - Configuring plugin-provided music generation tools
+  - Understanding the music_generate tool parameters
+title: "Music Generation"
+---
+
+# Music Generation
+
+The `music_generate` tool lets the agent create audio files when a plugin
+registers music generation support.
+
+The bundled `comfy` plugin currently provides `music_generate` using a
+workflow-configured ComfyUI graph.
+
+## Quick start
+
+1. Configure `models.providers.comfy.music` with a workflow JSON and prompt/output nodes.
+2. If you use Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`.
+3. Ask the agent for music or call the tool directly.
+
+Example:
+
+```text
+/tool music_generate prompt="Warm ambient synth loop with soft tape texture"
+```
+
+## Tool parameters
+
+| Parameter  | Type   | Description                                         |
+| ---------- | ------ | --------------------------------------------------- |
+| `prompt`   | string | Music or audio generation prompt                    |
+| `action`   | string | `"generate"` (default) or `"list"`                  |
+| `model`    | string | Provider/model override. Currently `comfy/workflow` |
+| `filename` | string | Output filename hint for the saved audio file       |
+
+## Current provider support
+
+| Provider | Model      | Notes                           |
+| -------- | ---------- | ------------------------------- |
+| ComfyUI  | `workflow` | Workflow-defined music or audio |
+
+## Live test
+
+Opt-in live coverage for the bundled ComfyUI music path:
+
+```bash
+OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
+```
+
+The live file also covers comfy image and video workflows when those sections
+are configured.
+
+## Related
+
+- [ComfyUI](/providers/comfy)
+- [Tools Overview](/tools)
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@@ -1,5 +1,5 @@
 ---
-summary: "Generate videos from text, images, or existing videos using 10 provider backends"
+summary: "Generate videos from text, images, or existing videos using 11 provider backends"
 read_when:
  - Generating videos via the agent
  - Configuring video generation providers and models
@@ -9,7 +9,7 @@ title: "Video Generation"

 # Video Generation

-OpenClaw agents can generate videos from text prompts, reference images, or existing videos. Ten provider backends are supported, each with different model options, input modes, and feature sets. The agent picks the right provider automatically based on your configuration and available API keys.
+OpenClaw agents can generate videos from text prompts, reference images, or existing videos. Eleven provider backends are supported, each with different model options, input modes, and feature sets. The agent picks the right provider automatically based on your configuration and available API keys.

 <Note>
 The `video_generate` tool only appears when at least one video-generation provider is available. If you do not see it in your agent tools, set a provider API key or configure `agents.defaults.videoGenerationModel`.
@@ -50,18 +50,19 @@ Outside of session-backed agent runs (for example, direct tool invocations), the

 ## Supported providers

-| Provider | Default model                   | Text | Image ref        | Video ref        | API key               |
-| -------- | ------------------------------- | ---- | ---------------- | ---------------- | --------------------- |
-| Alibaba  | `wan2.6-t2v`                    | Yes  | Yes (remote URL) | Yes (remote URL) | `MODELSTUDIO_API_KEY` |
-| BytePlus | `seedance-1-0-lite-t2v-250428`  | Yes  | 1 image          | No               | `BYTEPLUS_API_KEY`    |
-| fal      | `fal-ai/minimax/video-01-live`  | Yes  | 1 image          | No               | `FAL_KEY`             |
-| Google   | `veo-3.1-fast-generate-preview` | Yes  | 1 image          | 1 video          | `GEMINI_API_KEY`      |
-| MiniMax  | `MiniMax-Hailuo-2.3`            | Yes  | 1 image          | No               | `MINIMAX_API_KEY`     |
-| OpenAI   | `sora-2`                        | Yes  | 1 image          | 1 video          | `OPENAI_API_KEY`      |
-| Qwen     | `wan2.6-t2v`                    | Yes  | Yes (remote URL) | Yes (remote URL) | `QWEN_API_KEY`        |
-| Runway   | `gen4.5`                        | Yes  | 1 image          | 1 video          | `RUNWAYML_API_SECRET` |
-| Together | `Wan-AI/Wan2.2-T2V-A14B`        | Yes  | 1 image          | No               | `TOGETHER_API_KEY`    |
-| xAI      | `grok-imagine-video`            | Yes  | 1 image          | 1 video          | `XAI_API_KEY`         |
+| Provider | Default model                   | Text | Image ref        | Video ref        | API key                                  |
+| -------- | ------------------------------- | ---- | ---------------- | ---------------- | ---------------------------------------- |
+| Alibaba  | `wan2.6-t2v`                    | Yes  | Yes (remote URL) | Yes (remote URL) | `MODELSTUDIO_API_KEY`                    |
+| BytePlus | `seedance-1-0-lite-t2v-250428`  | Yes  | 1 image          | No               | `BYTEPLUS_API_KEY`                       |
+| ComfyUI  | `workflow`                      | Yes  | 1 image          | No               | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` |
+| fal      | `fal-ai/minimax/video-01-live`  | Yes  | 1 image          | No               | `FAL_KEY`                                |
+| Google   | `veo-3.1-fast-generate-preview` | Yes  | 1 image          | 1 video          | `GEMINI_API_KEY`                         |
+| MiniMax  | `MiniMax-Hailuo-2.3`            | Yes  | 1 image          | No               | `MINIMAX_API_KEY`                        |
+| OpenAI   | `sora-2`                        | Yes  | 1 image          | 1 video          | `OPENAI_API_KEY`                         |
+| Qwen     | `wan2.6-t2v`                    | Yes  | Yes (remote URL) | Yes (remote URL) | `QWEN_API_KEY`                           |
+| Runway   | `gen4.5`                        | Yes  | 1 image          | 1 video          | `RUNWAYML_API_SECRET`                    |
+| Together | `Wan-AI/Wan2.2-T2V-A14B`        | Yes  | 1 image          | No               | `TOGETHER_API_KEY`                       |
+| xAI      | `grok-imagine-video`            | Yes  | 1 image          | 1 video          | `XAI_API_KEY`                            |

 Some providers accept additional or alternate API key env vars. See individual [provider pages](#related) for details.

@@ -141,6 +142,7 @@ If a provider fails, the next candidate is tried automatically. If all candidate
 | -------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
 | Alibaba  | Uses DashScope/Model Studio async endpoint. Reference images and videos must be remote `http(s)` URLs.                                   |
 | BytePlus | Single image reference only.                                                                                                             |
+| ComfyUI  | Workflow-driven local or cloud execution. Supports text-to-video and image-to-video through the configured graph.                        |
 | fal      | Uses queue-backed flow for long-running jobs. Single image reference only.                                                               |
 | Google   | Uses Gemini/Veo. Supports one image or one video reference.                                                                              |
 | MiniMax  | Single image reference only.                                                                                                             |
@@ -179,6 +181,7 @@ openclaw config set agents.defaults.videoGenerationModel.primary "qwen/wan2.6-t2
 - [Background Tasks](/automation/tasks) -- task tracking for async video generation
 - [Alibaba Model Studio](/providers/alibaba)
 - [BytePlus](/providers/byteplus)
+- [ComfyUI](/providers/comfy)
 - [fal](/providers/fal)
 - [Google (Gemini)](/providers/google)
 - [MiniMax](/providers/minimax)