From f0ea901a0d204098756cff4f9f581482410519c0 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Sat, 25 Apr 2026 22:23:00 -0700 Subject: [PATCH] docs(image-generation): rewrite around Steps, Tabs, and AZ providers The image-generation page was 395 lines with a 3-step quick-start written as plain numbered prose, a sprawling 'OpenAI gpt-image-2' section that mixed routing/legacy/OpenAI options with five inline slash-command examples, and provider tables that mixed alphabetic and recency order. Restructure for scan-first reading without losing technical content: - Wrap Quick start in a Steps component (auth -> default model -> ask the agent), pulling the Codex OAuth note inline with the model step where it belongs and surfacing the LAN/SSRF caveat as a Warning callout. - Alphabetize the Supported providers table (ComfyUI, fal, Google, LiteLLM, MiniMax, OpenAI, OpenRouter, Vydra, xAI) and the Provider capabilities table (same order across both). Convert the Yes/No capability table to checkmarks plus exact counts for readability. - Replace the long inline OpenAI / OpenRouter / MiniMax / xAI prose with a 'Provider deep dives' AccordionGroup so each backend's routing, legacy URL handling, and provider-specific knobs collapse by default. - Move the four provider-selection-order notes into a small AccordionGroup ('Per-call overrides are exact', 'Auto-detection is auth-aware', 'Timeouts', 'Inspect at runtime'). - Collapse the five flat slash-command examples into a single Tabs component (4K landscape / transparent PNG / two-square / edit-one-ref / edit-multi-ref) with the matching CLI variant inline on the transparent-PNG tab. - Sentence-case the Related list (Tools overview, Configuration reference) and drop the redundant generic introductory wording. - Add sidebarTitle so the nav reads 'Image generation' explicitly. Wording, schema fields, defaults, model refs, env vars, and the detailed OpenAI/OpenRouter/Codex routing rules are unchanged. --- docs/tools/image-generation.md | 529 +++++++++++++++++---------------- 1 file changed, 271 insertions(+), 258 deletions(-) diff --git a/docs/tools/image-generation.md b/docs/tools/image-generation.md index 90a23810a4d..bb6591f2f9b 100644 --- a/docs/tools/image-generation.md +++ b/docs/tools/image-generation.md @@ -1,50 +1,68 @@ --- -summary: "Generate and edit images using configured providers (OpenAI, OpenAI Codex OAuth, Google Gemini, OpenRouter, LiteLLM, fal, MiniMax, ComfyUI, Vydra, xAI)" +summary: "Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, OpenRouter, LiteLLM, xAI, Vydra" read_when: - - Generating images via the agent - - Configuring image generation providers and models + - Generating or editing images via the agent + - Configuring image-generation providers and models - Understanding the image_generate tool parameters title: "Image generation" +sidebarTitle: "Image generation" --- -The `image_generate` tool lets the agent create and edit images using your configured providers. Generated images are delivered automatically as media attachments in the agent's reply. +The `image_generate` tool lets the agent create and edit images using your +configured providers. Generated images are delivered automatically as media +attachments in the agent's reply. -The tool only appears when at least one image generation provider is available. If you don't see `image_generate` in your agent's tools, configure `agents.defaults.imageGenerationModel`, set up a provider API key, or sign in with OpenAI Codex OAuth. +The tool only appears when at least one image-generation provider is +available. If you do not see `image_generate` in your agent's tools, +configure `agents.defaults.imageGenerationModel`, set up a provider API key, +or sign in with OpenAI Codex OAuth. ## Quick start -1. Set an API key for at least one provider (for example `OPENAI_API_KEY`, `GEMINI_API_KEY`, or `OPENROUTER_API_KEY`) or sign in with OpenAI Codex OAuth. -2. Optionally set your preferred model: - -```json5 -{ - agents: { - defaults: { - imageGenerationModel: { - primary: "openai/gpt-image-2", - // Optional default provider request timeout for image_generate. - timeoutMs: 180_000, + + + Set an API key for at least one provider (for example `OPENAI_API_KEY`, + `GEMINI_API_KEY`, `OPENROUTER_API_KEY`) or sign in with OpenAI Codex OAuth. + + + ```json5 + { + agents: { + defaults: { + imageGenerationModel: { + primary: "openai/gpt-image-2", + timeoutMs: 180_000, + }, + }, }, - }, - }, -} -``` + } + ``` -Codex OAuth uses the same `openai/gpt-image-2` model ref. When an -`openai-codex` OAuth profile is configured, OpenClaw routes image requests -through that same OAuth profile instead of first trying `OPENAI_API_KEY`. -Explicit custom `models.providers.openai` image config, such as an API key or -custom/Azure base URL, opts back into the direct OpenAI Images API route. + Codex OAuth uses the same `openai/gpt-image-2` model ref. When an + `openai-codex` OAuth profile is configured, OpenClaw routes image + requests through that OAuth profile instead of first trying + `OPENAI_API_KEY`. Explicit `models.providers.openai` config (API key, + custom/Azure base URL) opts back into the direct OpenAI Images API + route. + + + + _"Generate an image of a friendly robot mascot."_ + + The agent calls `image_generate` automatically. No tool allow-listing + needed — it is enabled by default when a provider is available. + + + + + For OpenAI-compatible LAN endpoints such as LocalAI, keep the custom `models.providers.openai.baseUrl` and explicitly opt in with -`browser.ssrfPolicy.dangerouslyAllowPrivateNetwork: true`; private/internal -image endpoints remain blocked by default. - -3. Ask the agent: _"Generate an image of a friendly robot mascot."_ - -The agent calls `image_generate` automatically. No tool allow-listing needed — it's enabled by default when a provider is available. +`browser.ssrfPolicy.dangerouslyAllowPrivateNetwork: true`. Private and +internal image endpoints remain blocked by default. + ## Common routes @@ -61,97 +79,91 @@ The same `image_generate` tool handles text-to-image and reference-image editing. Use `image` for one reference or `images` for multiple references. Provider-supported output hints such as `quality`, `outputFormat`, and `background` are forwarded when available and reported as ignored when a -provider does not support them. Current bundled transparent-background support -is OpenAI-specific; other providers may still preserve PNG alpha if their +provider does not support them. Bundled transparent-background support is +OpenAI-specific; other providers may still preserve PNG alpha if their backend emits it. ## Supported providers | Provider | Default model | Edit support | Auth | | ---------- | --------------------------------------- | ---------------------------------- | ----------------------------------------------------- | +| ComfyUI | `workflow` | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud | +| fal | `fal-ai/flux/dev` | Yes | `FAL_KEY` | +| Google | `gemini-3.1-flash-image-preview` | Yes | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | +| LiteLLM | `gpt-image-2` | Yes (up to 5 input images) | `LITELLM_API_KEY` | +| MiniMax | `image-01` | Yes (subject reference) | `MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`) | | OpenAI | `gpt-image-2` | Yes (up to 4 images) | `OPENAI_API_KEY` or OpenAI Codex OAuth | | OpenRouter | `google/gemini-3.1-flash-image-preview` | Yes (up to 5 input images) | `OPENROUTER_API_KEY` | -| LiteLLM | `gpt-image-2` | Yes (up to 5 input images) | `LITELLM_API_KEY` | -| Google | `gemini-3.1-flash-image-preview` | Yes | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | -| fal | `fal-ai/flux/dev` | Yes | `FAL_KEY` | -| MiniMax | `image-01` | Yes (subject reference) | `MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`) | -| ComfyUI | `workflow` | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud | | Vydra | `grok-imagine` | No | `VYDRA_API_KEY` | | xAI | `grok-imagine-image` | Yes (up to 5 images) | `XAI_API_KEY` | Use `action: "list"` to inspect available providers and models at runtime: -``` +```text /tool image_generate action=list ``` +## Provider capabilities + +| Capability | ComfyUI | fal | Google | MiniMax | OpenAI | Vydra | xAI | +| --------------------- | ------------------ | ----------------- | -------------- | --------------------- | -------------- | ----- | -------------- | +| Generate (max count) | Workflow-defined | 4 | 4 | 9 | 4 | 1 | 4 | +| Edit / reference | 1 image (workflow) | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images | +| Size control | — | ✓ | ✓ | — | Up to 4K | — | — | +| Aspect ratio | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ | +| Resolution (1K/2K/4K) | — | ✓ | ✓ | — | — | — | 1K, 2K | + ## Tool parameters -Image generation prompt. Required for `action: "generate"`. + Image generation prompt. Required for `action: "generate"`. - - -Use `"list"` to inspect available providers and models at runtime. + + Use `"list"` to inspect available providers and models at runtime. - -Provider/model override, e.g. `openai/gpt-image-2`; use -`openai/gpt-image-1.5` for transparent OpenAI backgrounds. + Provider/model override (e.g. `openai/gpt-image-2`). Use + `openai/gpt-image-1.5` for transparent OpenAI backgrounds. - -Single reference image path or URL for edit mode. + Single reference image path or URL for edit mode. - -Multiple reference images for edit mode (up to 5). + Multiple reference images for edit mode (up to 5 on supporting providers). - -Size hint: `1024x1024`, `1536x1024`, `1024x1536`, `2048x2048`, `3840x2160`. + Size hint: `1024x1024`, `1536x1024`, `1024x1536`, `2048x2048`, `3840x2160`. - -Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`. + Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`. - - -Resolution hint. +Resolution hint. + + Quality hint when the provider supports it. - - -Quality hint when the provider supports it. + + Output format hint when the provider supports it. - - -Output format hint when the provider supports it. + + Background hint when the provider supports it. Use `transparent` with + `outputFormat: "png"` or `"webp"` for transparency-capable providers. - - -Background hint when the provider supports it. Use `transparent` with -`outputFormat: "png"` or `"webp"` for transparency-capable providers. - - - -Number of images to generate (1–4). - - - -Optional provider request timeout in milliseconds. - - - -Output filename hint. - - +Number of images to generate (1–4). +Optional provider request timeout in milliseconds. +Output filename hint. -OpenAI-only hints: `background`, `moderation`, `outputCompression`, and `user`. + OpenAI-only hints: `background`, `moderation`, `outputCompression`, and `user`. -Not all providers support all parameters. When a fallback provider supports a nearby geometry option instead of the exact requested one, OpenClaw remaps to the closest supported size, aspect ratio, or resolution before submission. Unsupported output hints such as `quality` or `outputFormat` are dropped for providers that do not declare support and are reported in the tool result. - -Tool results report the applied settings. When OpenClaw remaps geometry during provider fallback, the returned `size`, `aspectRatio`, and `resolution` values reflect what was actually sent, and `details.normalization` captures the requested-to-applied translation. + +Not all providers support all parameters. When a fallback provider supports a +nearby geometry option instead of the exact requested one, OpenClaw remaps to +the closest supported size, aspect ratio, or resolution before submission. +Unsupported output hints are dropped for providers that do not declare +support and reported in the tool result. Tool results report the applied +settings; `details.normalization` captures any requested-to-applied +translation. + ## Configuration @@ -177,129 +189,177 @@ Tool results report the applied settings. When OpenClaw remaps geometry during p ### Provider selection order -When generating an image, OpenClaw tries providers in this order: +OpenClaw tries providers in this order: -1. **`model` parameter** from the tool call (if the agent specifies one) -2. **`imageGenerationModel.primary`** from config -3. **`imageGenerationModel.fallbacks`** in order -4. **Auto-detection** — uses auth-backed provider defaults only: - - current default provider first - - remaining registered image-generation providers in provider-id order +1. **`model` parameter** from the tool call (if the agent specifies one). +2. **`imageGenerationModel.primary`** from config. +3. **`imageGenerationModel.fallbacks`** in order. +4. **Auto-detection** — auth-backed provider defaults only: + - current default provider first; + - remaining registered image-generation providers in provider-id order. -If a provider fails (auth error, rate limit, etc.), the next configured candidate is tried automatically. If all fail, the error includes details from each attempt. +If a provider fails (auth error, rate limit, etc.), the next configured +candidate is tried automatically. If all fail, the error includes details +from each attempt. -Notes: - -- A per-call `model` override is exact: OpenClaw tries only that provider/model - and does not continue to configured primary/fallback or auto-detected - providers. -- Auto-detection is auth-aware. A provider default only enters the candidate list - when OpenClaw can actually authenticate that provider. -- Auto-detection is enabled by default. Set - `agents.defaults.mediaGenerationAutoProviderFallback: false` if you want image - generation to use only the explicit `model`, `primary`, and `fallbacks` - entries. -- Set `agents.defaults.imageGenerationModel.timeoutMs` for slow image backends. - A per-call `timeoutMs` tool parameter overrides the configured default. -- Use `action: "list"` to inspect the currently registered providers, their - default models, and auth env-var hints. + + + A per-call `model` override tries only that provider/model and does + not continue to configured primary/fallback or auto-detected providers. + + + A provider default only enters the candidate list when OpenClaw can + actually authenticate that provider. Set + `agents.defaults.mediaGenerationAutoProviderFallback: false` to use only + explicit `model`, `primary`, and `fallbacks` entries. + + + Set `agents.defaults.imageGenerationModel.timeoutMs` for slow image + backends. A per-call `timeoutMs` tool parameter overrides the configured + default. + + + Use `action: "list"` to inspect the currently registered providers, + their default models, and auth env-var hints. + + ### Image editing -OpenAI, OpenRouter, Google, fal, MiniMax, ComfyUI, and xAI support editing reference images. Pass a reference image path or URL: +OpenAI, OpenRouter, Google, fal, MiniMax, ComfyUI, and xAI support editing +reference images. Pass a reference image path or URL: -``` +```text "Generate a watercolor version of this photo" + image: "/path/to/photo.jpg" ``` -OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the `images` parameter. fal, MiniMax, and ComfyUI support 1. +OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the +`images` parameter. fal, MiniMax, and ComfyUI support 1. -### OpenRouter image models +## Provider deep dives -OpenRouter image generation uses the same `OPENROUTER_API_KEY` and routes through OpenRouter's chat completions image API. Select OpenRouter image models with the `openrouter/` prefix: + + + OpenAI image generation defaults to `openai/gpt-image-2`. If an + `openai-codex` OAuth profile is configured, OpenClaw reuses the same + OAuth profile used by Codex subscription chat models and sends the + image request through the Codex Responses backend. Legacy Codex base + URLs such as `https://chatgpt.com/backend-api` are canonicalized to + `https://chatgpt.com/backend-api/codex` for image requests. OpenClaw + does **not** silently fall back to `OPENAI_API_KEY` for that request — + to force direct OpenAI Images API routing, configure + `models.providers.openai` explicitly with an API key, custom base URL, + or Azure endpoint. -```json5 -{ - agents: { - defaults: { - imageGenerationModel: { - primary: "openrouter/google/gemini-3.1-flash-image-preview", + The `openai/gpt-image-1.5`, `openai/gpt-image-1`, and + `openai/gpt-image-1-mini` models can still be selected explicitly. Use + `gpt-image-1.5` for transparent-background PNG/WebP output; the current + `gpt-image-2` API rejects `background: "transparent"`. + + `gpt-image-2` supports both text-to-image generation and + reference-image editing through the same `image_generate` tool. + OpenClaw forwards `prompt`, `count`, `size`, `quality`, `outputFormat`, + and reference images to OpenAI. OpenAI does **not** receive + `aspectRatio` or `resolution` directly; when possible OpenClaw maps + those into a supported `size`, otherwise the tool reports them as + ignored overrides. + + OpenAI-specific options live under the `openai` object: + + ```json + { + "quality": "low", + "outputFormat": "jpeg", + "openai": { + "background": "opaque", + "moderation": "low", + "outputCompression": 60, + "user": "end-user-42" + } + } + ``` + + `openai.background` accepts `transparent`, `opaque`, or `auto`; + transparent outputs require `outputFormat` `png` or `webp` and a + transparency-capable OpenAI image model. OpenClaw routes default + `gpt-image-2` transparent-background requests to `gpt-image-1.5`. + `openai.outputCompression` applies to JPEG/WebP outputs. + + The top-level `background` hint is provider-neutral and currently maps + to the same OpenAI `background` request field when the OpenAI provider + is selected. Providers that do not declare background support return + it in `ignoredOverrides` instead of receiving the unsupported parameter. + + To route OpenAI image generation through an Azure OpenAI deployment + instead of `api.openai.com`, see + [Azure OpenAI endpoints](/providers/openai#azure-openai-endpoints). + + + + OpenRouter image generation uses the same `OPENROUTER_API_KEY` and + routes through OpenRouter's chat completions image API. Select + OpenRouter image models with the `openrouter/` prefix: + + ```json5 + { + agents: { + defaults: { + imageGenerationModel: { + primary: "openrouter/google/gemini-3.1-flash-image-preview", + }, + }, }, - }, - }, -} + } + ``` + + OpenClaw forwards `prompt`, `count`, reference images, and + Gemini-compatible `aspectRatio` / `resolution` hints to OpenRouter. + Current built-in OpenRouter image model shortcuts include + `google/gemini-3.1-flash-image-preview`, + `google/gemini-3-pro-image-preview`, and `openai/gpt-5.4-image-2`. Use + `action: "list"` to see what your configured plugin exposes. + + + + MiniMax image generation is available through both bundled MiniMax + auth paths: + + - `minimax/image-01` for API-key setups + - `minimax-portal/image-01` for OAuth setups + + + + The bundled xAI provider uses `/v1/images/generations` for prompt-only + requests and `/v1/images/edits` when `image` or `images` is present. + + - Models: `xai/grok-imagine-image`, `xai/grok-imagine-image-pro` + - Count: up to 4 + - References: one `image` or up to five `images` + - Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2` + - Resolutions: `1K`, `2K` + - Outputs: returned as OpenClaw-managed image attachments + + OpenClaw intentionally does not expose xAI-native `quality`, `mask`, + `user`, or extra native-only aspect ratios until those controls exist + in the shared cross-provider `image_generate` contract. + + + + +## Examples + + + +```text +/tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1 +``` + + +```text +/tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png background=transparent ``` -OpenClaw forwards `prompt`, `count`, reference images, and Gemini-compatible `aspectRatio` / `resolution` hints to OpenRouter. Current built-in OpenRouter image model shortcuts include `google/gemini-3.1-flash-image-preview`, `google/gemini-3-pro-image-preview`, and `openai/gpt-5.4-image-2`; use `action: "list"` to see what your configured plugin exposes. - -### OpenAI `gpt-image-2` - -OpenAI image generation defaults to `openai/gpt-image-2`. If an -`openai-codex` OAuth profile is configured, OpenClaw reuses the same OAuth -profile used by Codex subscription chat models and sends the image request -through the Codex Responses backend. Legacy Codex base URLs such as -`https://chatgpt.com/backend-api` are canonicalized to -`https://chatgpt.com/backend-api/codex` for image requests. It does not -silently fall back to `OPENAI_API_KEY` for that request. To force direct OpenAI -Images API routing, configure `models.providers.openai` explicitly with an API -key, custom base URL, or Azure endpoint. The `openai/gpt-image-1.5`, -`openai/gpt-image-1`, and `openai/gpt-image-1-mini` models can still be -selected explicitly. Use `gpt-image-1.5` for transparent-background PNG/WebP -output; the current `gpt-image-2` API rejects `background: "transparent"`. - -`gpt-image-2` supports both text-to-image generation and reference-image -editing through the same `image_generate` tool. OpenClaw forwards `prompt`, -`count`, `size`, `quality`, `outputFormat`, and reference images to OpenAI. -OpenAI does not receive `aspectRatio` or `resolution` directly; when possible -OpenClaw maps those into a supported `size`, otherwise the tool reports them as -ignored overrides. - -OpenAI-specific options live under the `openai` object: - -```json -{ - "quality": "low", - "outputFormat": "jpeg", - "openai": { - "background": "opaque", - "moderation": "low", - "outputCompression": 60, - "user": "end-user-42" - } -} -``` - -`openai.background` accepts `transparent`, `opaque`, or `auto`; transparent -outputs require `outputFormat` `png` or `webp` and a transparency-capable OpenAI -image model. OpenClaw routes default `gpt-image-2` transparent-background -requests to `gpt-image-1.5`. `openai.outputCompression` applies to JPEG/WebP -outputs. - -The top-level `background` hint is provider-neutral and currently maps to the -same OpenAI `background` request field when the OpenAI provider is selected. -Providers that do not declare background support return it in `ignoredOverrides` -instead of receiving the unsupported parameter. - -When asking an agent for a transparent-background OpenAI image, the expected -tool call is: - -```json -{ - "model": "openai/gpt-image-1.5", - "prompt": "A simple red circle sticker on a transparent background", - "outputFormat": "png", - "background": "transparent" -} -``` - -The explicit `openai/gpt-image-1.5` model keeps the request portable across -tool summaries and harnesses. If the agent instead uses the default -`openai/gpt-image-2` with `openai.background: "transparent"` on the public -OpenAI or OpenAI Codex OAuth route, OpenClaw rewrites the provider request to -`gpt-image-1.5`. Azure and custom OpenAI-compatible endpoints keep their -configured deployment/model names. - -For headless CLI generation, use the equivalent `openclaw infer` flags: +Equivalent CLI: ```bash openclaw infer image generate \ @@ -310,86 +370,39 @@ openclaw infer image generate \ --json ``` -The same `--output-format` and `--background` flags are available on -`openclaw infer image edit`; `--openai-background` remains available as an -OpenAI-specific alias. Current bundled providers other than OpenAI do not -declare explicit background control, so `background: "transparent"` is reported -as ignored for them. - -Generate one 4K landscape image: - -``` -/tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1 -``` - -Generate a transparent PNG: - -``` -/tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png background=transparent -``` - -Generate two square images: - -``` + + +```text /tool image_generate action=generate model=openai/gpt-image-2 prompt="Two visual directions for a calm productivity app icon" size=1024x1024 count=2 ``` - -Edit one local reference image: - -``` + + +```text /tool image_generate action=generate model=openai/gpt-image-2 prompt="Keep the subject, replace the background with a bright studio setup" image=/path/to/reference.png size=1024x1536 ``` - -Edit with multiple references: - -``` + + +```text /tool image_generate action=generate model=openai/gpt-image-2 prompt="Combine the character identity from the first image with the color palette from the second" images='["/path/to/character.png","/path/to/palette.jpg"]' size=1536x1024 ``` + + -To route OpenAI image generation through an Azure OpenAI deployment instead -of `api.openai.com`, see [Azure OpenAI endpoints](/providers/openai#azure-openai-endpoints) -in the OpenAI provider docs. - -MiniMax image generation is available through both bundled MiniMax auth paths: - -- `minimax/image-01` for API-key setups -- `minimax-portal/image-01` for OAuth setups - -## Provider capabilities - -| Capability | OpenAI | Google | fal | MiniMax | ComfyUI | Vydra | xAI | -| --------------------- | -------------------- | -------------------- | ------------------- | -------------------------- | ---------------------------------- | ------- | -------------------- | -| Generate | Yes (up to 4) | Yes (up to 4) | Yes (up to 4) | Yes (up to 9) | Yes (workflow-defined outputs) | Yes (1) | Yes (up to 4) | -| Edit/reference | Yes (up to 5 images) | Yes (up to 5 images) | Yes (1 image) | Yes (1 image, subject ref) | Yes (1 image, workflow-configured) | No | Yes (up to 5 images) | -| Size control | Yes (up to 4K) | Yes | Yes | No | No | No | No | -| Aspect ratio | No | Yes | Yes (generate only) | Yes | No | No | Yes | -| Resolution (1K/2K/4K) | No | Yes | Yes | No | No | No | Yes (1K/2K) | - -### xAI `grok-imagine-image` - -The bundled xAI provider uses `/v1/images/generations` for prompt-only requests -and `/v1/images/edits` when `image` or `images` is present. - -- Models: `xai/grok-imagine-image`, `xai/grok-imagine-image-pro` -- Count: up to 4 -- References: one `image` or up to five `images` -- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2` -- Resolutions: `1K`, `2K` -- Outputs: returned as OpenClaw-managed image attachments - -OpenClaw intentionally does not expose xAI-native `quality`, `mask`, `user`, or -extra native-only aspect ratios until those controls exist in the shared -cross-provider `image_generate` contract. +The same `--output-format` and `--background` flags are available on +`openclaw infer image edit`; `--openai-background` remains as an +OpenAI-specific alias. Bundled providers other than OpenAI do not declare +explicit background control today, so `background: "transparent"` is reported +as ignored for them. ## Related -- [Tools Overview](/tools) — all available agent tools -- [fal](/providers/fal) — fal image and video provider setup +- [Tools overview](/tools) — all available agent tools - [ComfyUI](/providers/comfy) — local ComfyUI and Comfy Cloud workflow setup +- [fal](/providers/fal) — fal image and video provider setup - [Google (Gemini)](/providers/google) — Gemini image provider setup - [MiniMax](/providers/minimax) — MiniMax image provider setup - [OpenAI](/providers/openai) — OpenAI Images provider setup - [Vydra](/providers/vydra) — Vydra image, video, and speech setup - [xAI](/providers/xai) — Grok image, video, search, code execution, and TTS setup -- [Configuration Reference](/gateway/config-agents#agent-defaults) — `imageGenerationModel` config +- [Configuration reference](/gateway/config-agents#agent-defaults) — `imageGenerationModel` config - [Models](/concepts/models) — model configuration and failover