docs(image-generation): rewrite around Steps, Tabs, and AZ providers

The image-generation page was 395 lines with a 3-step quick-start
written as plain numbered prose, a sprawling 'OpenAI gpt-image-2'
section that mixed routing/legacy/OpenAI options with five inline
slash-command examples, and provider tables that mixed alphabetic
and recency order.

Restructure for scan-first reading without losing technical content:

- Wrap Quick start in a Steps component (auth -> default model ->
  ask the agent), pulling the Codex OAuth note inline with the model
  step where it belongs and surfacing the LAN/SSRF caveat as a
  Warning callout.
- Alphabetize the Supported providers table (ComfyUI, fal, Google,
  LiteLLM, MiniMax, OpenAI, OpenRouter, Vydra, xAI) and the Provider
  capabilities table (same order across both). Convert the Yes/No
  capability table to checkmarks plus exact counts for readability.
- Replace the long inline OpenAI / OpenRouter / MiniMax / xAI prose
  with a 'Provider deep dives' AccordionGroup so each backend's
  routing, legacy URL handling, and provider-specific knobs collapse
  by default.
- Move the four provider-selection-order notes into a small
  AccordionGroup ('Per-call overrides are exact', 'Auto-detection is
  auth-aware', 'Timeouts', 'Inspect at runtime').
- Collapse the five flat slash-command examples into a single Tabs
  component (4K landscape / transparent PNG / two-square /
  edit-one-ref / edit-multi-ref) with the matching CLI variant inline
  on the transparent-PNG tab.
- Sentence-case the Related list (Tools overview, Configuration
  reference) and drop the redundant generic introductory wording.
- Add sidebarTitle so the nav reads 'Image generation' explicitly.

Wording, schema fields, defaults, model refs, env vars, and the
detailed OpenAI/OpenRouter/Codex routing rules are unchanged.
This commit is contained in:
Vincent Koc
2026-04-25 22:23:00 -07:00
parent 5d3168c343
commit f0ea901a0d

View File

@@ -1,50 +1,68 @@
---
summary: "Generate and edit images using configured providers (OpenAI, OpenAI Codex OAuth, Google Gemini, OpenRouter, LiteLLM, fal, MiniMax, ComfyUI, Vydra, xAI)"
summary: "Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, OpenRouter, LiteLLM, xAI, Vydra"
read_when:
- Generating images via the agent
- Configuring image generation providers and models
- Generating or editing images via the agent
- Configuring image-generation providers and models
- Understanding the image_generate tool parameters
title: "Image generation"
sidebarTitle: "Image generation"
---
The `image_generate` tool lets the agent create and edit images using your configured providers. Generated images are delivered automatically as media attachments in the agent's reply.
The `image_generate` tool lets the agent create and edit images using your
configured providers. Generated images are delivered automatically as media
attachments in the agent's reply.
<Note>
The tool only appears when at least one image generation provider is available. If you don't see `image_generate` in your agent's tools, configure `agents.defaults.imageGenerationModel`, set up a provider API key, or sign in with OpenAI Codex OAuth.
The tool only appears when at least one image-generation provider is
available. If you do not see `image_generate` in your agent's tools,
configure `agents.defaults.imageGenerationModel`, set up a provider API key,
or sign in with OpenAI Codex OAuth.
</Note>
## Quick start
1. Set an API key for at least one provider (for example `OPENAI_API_KEY`, `GEMINI_API_KEY`, or `OPENROUTER_API_KEY`) or sign in with OpenAI Codex OAuth.
2. Optionally set your preferred model:
```json5
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openai/gpt-image-2",
// Optional default provider request timeout for image_generate.
timeoutMs: 180_000,
<Steps>
<Step title="Configure auth">
Set an API key for at least one provider (for example `OPENAI_API_KEY`,
`GEMINI_API_KEY`, `OPENROUTER_API_KEY`) or sign in with OpenAI Codex OAuth.
</Step>
<Step title="Pick a default model (optional)">
```json5
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openai/gpt-image-2",
timeoutMs: 180_000,
},
},
},
},
},
}
```
}
```
Codex OAuth uses the same `openai/gpt-image-2` model ref. When an
`openai-codex` OAuth profile is configured, OpenClaw routes image requests
through that same OAuth profile instead of first trying `OPENAI_API_KEY`.
Explicit custom `models.providers.openai` image config, such as an API key or
custom/Azure base URL, opts back into the direct OpenAI Images API route.
Codex OAuth uses the same `openai/gpt-image-2` model ref. When an
`openai-codex` OAuth profile is configured, OpenClaw routes image
requests through that OAuth profile instead of first trying
`OPENAI_API_KEY`. Explicit `models.providers.openai` config (API key,
custom/Azure base URL) opts back into the direct OpenAI Images API
route.
</Step>
<Step title="Ask the agent">
_"Generate an image of a friendly robot mascot."_
The agent calls `image_generate` automatically. No tool allow-listing
needed — it is enabled by default when a provider is available.
</Step>
</Steps>
<Warning>
For OpenAI-compatible LAN endpoints such as LocalAI, keep the custom
`models.providers.openai.baseUrl` and explicitly opt in with
`browser.ssrfPolicy.dangerouslyAllowPrivateNetwork: true`; private/internal
image endpoints remain blocked by default.
3. Ask the agent: _"Generate an image of a friendly robot mascot."_
The agent calls `image_generate` automatically. No tool allow-listing needed — it's enabled by default when a provider is available.
`browser.ssrfPolicy.dangerouslyAllowPrivateNetwork: true`. Private and
internal image endpoints remain blocked by default.
</Warning>
## Common routes
@@ -61,97 +79,91 @@ The same `image_generate` tool handles text-to-image and reference-image
editing. Use `image` for one reference or `images` for multiple references.
Provider-supported output hints such as `quality`, `outputFormat`, and
`background` are forwarded when available and reported as ignored when a
provider does not support them. Current bundled transparent-background support
is OpenAI-specific; other providers may still preserve PNG alpha if their
provider does not support them. Bundled transparent-background support is
OpenAI-specific; other providers may still preserve PNG alpha if their
backend emits it.
## Supported providers
| Provider | Default model | Edit support | Auth |
| ---------- | --------------------------------------- | ---------------------------------- | ----------------------------------------------------- |
| ComfyUI | `workflow` | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud |
| fal | `fal-ai/flux/dev` | Yes | `FAL_KEY` |
| Google | `gemini-3.1-flash-image-preview` | Yes | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
| LiteLLM | `gpt-image-2` | Yes (up to 5 input images) | `LITELLM_API_KEY` |
| MiniMax | `image-01` | Yes (subject reference) | `MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`) |
| OpenAI | `gpt-image-2` | Yes (up to 4 images) | `OPENAI_API_KEY` or OpenAI Codex OAuth |
| OpenRouter | `google/gemini-3.1-flash-image-preview` | Yes (up to 5 input images) | `OPENROUTER_API_KEY` |
| LiteLLM | `gpt-image-2` | Yes (up to 5 input images) | `LITELLM_API_KEY` |
| Google | `gemini-3.1-flash-image-preview` | Yes | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
| fal | `fal-ai/flux/dev` | Yes | `FAL_KEY` |
| MiniMax | `image-01` | Yes (subject reference) | `MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`) |
| ComfyUI | `workflow` | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud |
| Vydra | `grok-imagine` | No | `VYDRA_API_KEY` |
| xAI | `grok-imagine-image` | Yes (up to 5 images) | `XAI_API_KEY` |
Use `action: "list"` to inspect available providers and models at runtime:
```
```text
/tool image_generate action=list
```
## Provider capabilities
| Capability | ComfyUI | fal | Google | MiniMax | OpenAI | Vydra | xAI |
| --------------------- | ------------------ | ----------------- | -------------- | --------------------- | -------------- | ----- | -------------- |
| Generate (max count) | Workflow-defined | 4 | 4 | 9 | 4 | 1 | 4 |
| Edit / reference | 1 image (workflow) | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
| Size control | — | ✓ | ✓ | — | Up to 4K | — | — |
| Aspect ratio | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
| Resolution (1K/2K/4K) | — | ✓ | ✓ | — | — | — | 1K, 2K |
## Tool parameters
<ParamField path="prompt" type="string" required>
Image generation prompt. Required for `action: "generate"`.
Image generation prompt. Required for `action: "generate"`.
</ParamField>
<ParamField path="action" type="'generate' | 'list'" default="generate">
Use `"list"` to inspect available providers and models at runtime.
<ParamField path="action" type='"generate" | "list"' default="generate">
Use `"list"` to inspect available providers and models at runtime.
</ParamField>
<ParamField path="model" type="string">
Provider/model override, e.g. `openai/gpt-image-2`; use
`openai/gpt-image-1.5` for transparent OpenAI backgrounds.
Provider/model override (e.g. `openai/gpt-image-2`). Use
`openai/gpt-image-1.5` for transparent OpenAI backgrounds.
</ParamField>
<ParamField path="image" type="string">
Single reference image path or URL for edit mode.
Single reference image path or URL for edit mode.
</ParamField>
<ParamField path="images" type="string[]">
Multiple reference images for edit mode (up to 5).
Multiple reference images for edit mode (up to 5 on supporting providers).
</ParamField>
<ParamField path="size" type="string">
Size hint: `1024x1024`, `1536x1024`, `1024x1536`, `2048x2048`, `3840x2160`.
Size hint: `1024x1024`, `1536x1024`, `1024x1536`, `2048x2048`, `3840x2160`.
</ParamField>
<ParamField path="aspectRatio" type="string">
Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`.
Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`.
</ParamField>
<ParamField path="resolution" type="'1K' | '2K' | '4K'">
Resolution hint.
<ParamField path="resolution" type='"1K" | "2K" | "4K"'>Resolution hint.</ParamField>
<ParamField path="quality" type='"low" | "medium" | "high" | "auto"'>
Quality hint when the provider supports it.
</ParamField>
<ParamField path="quality" type="'low' | 'medium' | 'high' | 'auto'">
Quality hint when the provider supports it.
<ParamField path="outputFormat" type='"png" | "jpeg" | "webp"'>
Output format hint when the provider supports it.
</ParamField>
<ParamField path="outputFormat" type="'png' | 'jpeg' | 'webp'">
Output format hint when the provider supports it.
<ParamField path="background" type='"transparent" | "opaque" | "auto"'>
Background hint when the provider supports it. Use `transparent` with
`outputFormat: "png"` or `"webp"` for transparency-capable providers.
</ParamField>
<ParamField path="background" type="'transparent' | 'opaque' | 'auto'">
Background hint when the provider supports it. Use `transparent` with
`outputFormat: "png"` or `"webp"` for transparency-capable providers.
</ParamField>
<ParamField path="count" type="number">
Number of images to generate (14).
</ParamField>
<ParamField path="timeoutMs" type="number">
Optional provider request timeout in milliseconds.
</ParamField>
<ParamField path="filename" type="string">
Output filename hint.
</ParamField>
<ParamField path="count" type="number">Number of images to generate (14).</ParamField>
<ParamField path="timeoutMs" type="number">Optional provider request timeout in milliseconds.</ParamField>
<ParamField path="filename" type="string">Output filename hint.</ParamField>
<ParamField path="openai" type="object">
OpenAI-only hints: `background`, `moderation`, `outputCompression`, and `user`.
OpenAI-only hints: `background`, `moderation`, `outputCompression`, and `user`.
</ParamField>
Not all providers support all parameters. When a fallback provider supports a nearby geometry option instead of the exact requested one, OpenClaw remaps to the closest supported size, aspect ratio, or resolution before submission. Unsupported output hints such as `quality` or `outputFormat` are dropped for providers that do not declare support and are reported in the tool result.
Tool results report the applied settings. When OpenClaw remaps geometry during provider fallback, the returned `size`, `aspectRatio`, and `resolution` values reflect what was actually sent, and `details.normalization` captures the requested-to-applied translation.
<Note>
Not all providers support all parameters. When a fallback provider supports a
nearby geometry option instead of the exact requested one, OpenClaw remaps to
the closest supported size, aspect ratio, or resolution before submission.
Unsupported output hints are dropped for providers that do not declare
support and reported in the tool result. Tool results report the applied
settings; `details.normalization` captures any requested-to-applied
translation.
</Note>
## Configuration
@@ -177,129 +189,177 @@ Tool results report the applied settings. When OpenClaw remaps geometry during p
### Provider selection order
When generating an image, OpenClaw tries providers in this order:
OpenClaw tries providers in this order:
1. **`model` parameter** from the tool call (if the agent specifies one)
2. **`imageGenerationModel.primary`** from config
3. **`imageGenerationModel.fallbacks`** in order
4. **Auto-detection** uses auth-backed provider defaults only:
- current default provider first
- remaining registered image-generation providers in provider-id order
1. **`model` parameter** from the tool call (if the agent specifies one).
2. **`imageGenerationModel.primary`** from config.
3. **`imageGenerationModel.fallbacks`** in order.
4. **Auto-detection** — auth-backed provider defaults only:
- current default provider first;
- remaining registered image-generation providers in provider-id order.
If a provider fails (auth error, rate limit, etc.), the next configured candidate is tried automatically. If all fail, the error includes details from each attempt.
If a provider fails (auth error, rate limit, etc.), the next configured
candidate is tried automatically. If all fail, the error includes details
from each attempt.
Notes:
- A per-call `model` override is exact: OpenClaw tries only that provider/model
and does not continue to configured primary/fallback or auto-detected
providers.
- Auto-detection is auth-aware. A provider default only enters the candidate list
when OpenClaw can actually authenticate that provider.
- Auto-detection is enabled by default. Set
`agents.defaults.mediaGenerationAutoProviderFallback: false` if you want image
generation to use only the explicit `model`, `primary`, and `fallbacks`
entries.
- Set `agents.defaults.imageGenerationModel.timeoutMs` for slow image backends.
A per-call `timeoutMs` tool parameter overrides the configured default.
- Use `action: "list"` to inspect the currently registered providers, their
default models, and auth env-var hints.
<AccordionGroup>
<Accordion title="Per-call model overrides are exact">
A per-call `model` override tries only that provider/model and does
not continue to configured primary/fallback or auto-detected providers.
</Accordion>
<Accordion title="Auto-detection is auth-aware">
A provider default only enters the candidate list when OpenClaw can
actually authenticate that provider. Set
`agents.defaults.mediaGenerationAutoProviderFallback: false` to use only
explicit `model`, `primary`, and `fallbacks` entries.
</Accordion>
<Accordion title="Timeouts">
Set `agents.defaults.imageGenerationModel.timeoutMs` for slow image
backends. A per-call `timeoutMs` tool parameter overrides the configured
default.
</Accordion>
<Accordion title="Inspect at runtime">
Use `action: "list"` to inspect the currently registered providers,
their default models, and auth env-var hints.
</Accordion>
</AccordionGroup>
### Image editing
OpenAI, OpenRouter, Google, fal, MiniMax, ComfyUI, and xAI support editing reference images. Pass a reference image path or URL:
OpenAI, OpenRouter, Google, fal, MiniMax, ComfyUI, and xAI support editing
reference images. Pass a reference image path or URL:
```
```text
"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"
```
OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the `images` parameter. fal, MiniMax, and ComfyUI support 1.
OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the
`images` parameter. fal, MiniMax, and ComfyUI support 1.
### OpenRouter image models
## Provider deep dives
OpenRouter image generation uses the same `OPENROUTER_API_KEY` and routes through OpenRouter's chat completions image API. Select OpenRouter image models with the `openrouter/` prefix:
<AccordionGroup>
<Accordion title="OpenAI gpt-image-2 (and gpt-image-1.5)">
OpenAI image generation defaults to `openai/gpt-image-2`. If an
`openai-codex` OAuth profile is configured, OpenClaw reuses the same
OAuth profile used by Codex subscription chat models and sends the
image request through the Codex Responses backend. Legacy Codex base
URLs such as `https://chatgpt.com/backend-api` are canonicalized to
`https://chatgpt.com/backend-api/codex` for image requests. OpenClaw
does **not** silently fall back to `OPENAI_API_KEY` for that request —
to force direct OpenAI Images API routing, configure
`models.providers.openai` explicitly with an API key, custom base URL,
or Azure endpoint.
```json5
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openrouter/google/gemini-3.1-flash-image-preview",
The `openai/gpt-image-1.5`, `openai/gpt-image-1`, and
`openai/gpt-image-1-mini` models can still be selected explicitly. Use
`gpt-image-1.5` for transparent-background PNG/WebP output; the current
`gpt-image-2` API rejects `background: "transparent"`.
`gpt-image-2` supports both text-to-image generation and
reference-image editing through the same `image_generate` tool.
OpenClaw forwards `prompt`, `count`, `size`, `quality`, `outputFormat`,
and reference images to OpenAI. OpenAI does **not** receive
`aspectRatio` or `resolution` directly; when possible OpenClaw maps
those into a supported `size`, otherwise the tool reports them as
ignored overrides.
OpenAI-specific options live under the `openai` object:
```json
{
"quality": "low",
"outputFormat": "jpeg",
"openai": {
"background": "opaque",
"moderation": "low",
"outputCompression": 60,
"user": "end-user-42"
}
}
```
`openai.background` accepts `transparent`, `opaque`, or `auto`;
transparent outputs require `outputFormat` `png` or `webp` and a
transparency-capable OpenAI image model. OpenClaw routes default
`gpt-image-2` transparent-background requests to `gpt-image-1.5`.
`openai.outputCompression` applies to JPEG/WebP outputs.
The top-level `background` hint is provider-neutral and currently maps
to the same OpenAI `background` request field when the OpenAI provider
is selected. Providers that do not declare background support return
it in `ignoredOverrides` instead of receiving the unsupported parameter.
To route OpenAI image generation through an Azure OpenAI deployment
instead of `api.openai.com`, see
[Azure OpenAI endpoints](/providers/openai#azure-openai-endpoints).
</Accordion>
<Accordion title="OpenRouter image models">
OpenRouter image generation uses the same `OPENROUTER_API_KEY` and
routes through OpenRouter's chat completions image API. Select
OpenRouter image models with the `openrouter/` prefix:
```json5
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openrouter/google/gemini-3.1-flash-image-preview",
},
},
},
},
},
}
}
```
OpenClaw forwards `prompt`, `count`, reference images, and
Gemini-compatible `aspectRatio` / `resolution` hints to OpenRouter.
Current built-in OpenRouter image model shortcuts include
`google/gemini-3.1-flash-image-preview`,
`google/gemini-3-pro-image-preview`, and `openai/gpt-5.4-image-2`. Use
`action: "list"` to see what your configured plugin exposes.
</Accordion>
<Accordion title="MiniMax dual-auth">
MiniMax image generation is available through both bundled MiniMax
auth paths:
- `minimax/image-01` for API-key setups
- `minimax-portal/image-01` for OAuth setups
</Accordion>
<Accordion title="xAI grok-imagine-image">
The bundled xAI provider uses `/v1/images/generations` for prompt-only
requests and `/v1/images/edits` when `image` or `images` is present.
- Models: `xai/grok-imagine-image`, `xai/grok-imagine-image-pro`
- Count: up to 4
- References: one `image` or up to five `images`
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
- Resolutions: `1K`, `2K`
- Outputs: returned as OpenClaw-managed image attachments
OpenClaw intentionally does not expose xAI-native `quality`, `mask`,
`user`, or extra native-only aspect ratios until those controls exist
in the shared cross-provider `image_generate` contract.
</Accordion>
</AccordionGroup>
## Examples
<Tabs>
<Tab title="Generate (4K landscape)">
```text
/tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1
```
</Tab>
<Tab title="Generate (transparent PNG)">
```text
/tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png background=transparent
```
OpenClaw forwards `prompt`, `count`, reference images, and Gemini-compatible `aspectRatio` / `resolution` hints to OpenRouter. Current built-in OpenRouter image model shortcuts include `google/gemini-3.1-flash-image-preview`, `google/gemini-3-pro-image-preview`, and `openai/gpt-5.4-image-2`; use `action: "list"` to see what your configured plugin exposes.
### OpenAI `gpt-image-2`
OpenAI image generation defaults to `openai/gpt-image-2`. If an
`openai-codex` OAuth profile is configured, OpenClaw reuses the same OAuth
profile used by Codex subscription chat models and sends the image request
through the Codex Responses backend. Legacy Codex base URLs such as
`https://chatgpt.com/backend-api` are canonicalized to
`https://chatgpt.com/backend-api/codex` for image requests. It does not
silently fall back to `OPENAI_API_KEY` for that request. To force direct OpenAI
Images API routing, configure `models.providers.openai` explicitly with an API
key, custom base URL, or Azure endpoint. The `openai/gpt-image-1.5`,
`openai/gpt-image-1`, and `openai/gpt-image-1-mini` models can still be
selected explicitly. Use `gpt-image-1.5` for transparent-background PNG/WebP
output; the current `gpt-image-2` API rejects `background: "transparent"`.
`gpt-image-2` supports both text-to-image generation and reference-image
editing through the same `image_generate` tool. OpenClaw forwards `prompt`,
`count`, `size`, `quality`, `outputFormat`, and reference images to OpenAI.
OpenAI does not receive `aspectRatio` or `resolution` directly; when possible
OpenClaw maps those into a supported `size`, otherwise the tool reports them as
ignored overrides.
OpenAI-specific options live under the `openai` object:
```json
{
"quality": "low",
"outputFormat": "jpeg",
"openai": {
"background": "opaque",
"moderation": "low",
"outputCompression": 60,
"user": "end-user-42"
}
}
```
`openai.background` accepts `transparent`, `opaque`, or `auto`; transparent
outputs require `outputFormat` `png` or `webp` and a transparency-capable OpenAI
image model. OpenClaw routes default `gpt-image-2` transparent-background
requests to `gpt-image-1.5`. `openai.outputCompression` applies to JPEG/WebP
outputs.
The top-level `background` hint is provider-neutral and currently maps to the
same OpenAI `background` request field when the OpenAI provider is selected.
Providers that do not declare background support return it in `ignoredOverrides`
instead of receiving the unsupported parameter.
When asking an agent for a transparent-background OpenAI image, the expected
tool call is:
```json
{
"model": "openai/gpt-image-1.5",
"prompt": "A simple red circle sticker on a transparent background",
"outputFormat": "png",
"background": "transparent"
}
```
The explicit `openai/gpt-image-1.5` model keeps the request portable across
tool summaries and harnesses. If the agent instead uses the default
`openai/gpt-image-2` with `openai.background: "transparent"` on the public
OpenAI or OpenAI Codex OAuth route, OpenClaw rewrites the provider request to
`gpt-image-1.5`. Azure and custom OpenAI-compatible endpoints keep their
configured deployment/model names.
For headless CLI generation, use the equivalent `openclaw infer` flags:
Equivalent CLI:
```bash
openclaw infer image generate \
@@ -310,86 +370,39 @@ openclaw infer image generate \
--json
```
The same `--output-format` and `--background` flags are available on
`openclaw infer image edit`; `--openai-background` remains available as an
OpenAI-specific alias. Current bundled providers other than OpenAI do not
declare explicit background control, so `background: "transparent"` is reported
as ignored for them.
Generate one 4K landscape image:
```
/tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1
```
Generate a transparent PNG:
```
/tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png background=transparent
```
Generate two square images:
```
</Tab>
<Tab title="Generate (two square)">
```text
/tool image_generate action=generate model=openai/gpt-image-2 prompt="Two visual directions for a calm productivity app icon" size=1024x1024 count=2
```
Edit one local reference image:
```
</Tab>
<Tab title="Edit (one reference)">
```text
/tool image_generate action=generate model=openai/gpt-image-2 prompt="Keep the subject, replace the background with a bright studio setup" image=/path/to/reference.png size=1024x1536
```
Edit with multiple references:
```
</Tab>
<Tab title="Edit (multiple references)">
```text
/tool image_generate action=generate model=openai/gpt-image-2 prompt="Combine the character identity from the first image with the color palette from the second" images='["/path/to/character.png","/path/to/palette.jpg"]' size=1536x1024
```
</Tab>
</Tabs>
To route OpenAI image generation through an Azure OpenAI deployment instead
of `api.openai.com`, see [Azure OpenAI endpoints](/providers/openai#azure-openai-endpoints)
in the OpenAI provider docs.
MiniMax image generation is available through both bundled MiniMax auth paths:
- `minimax/image-01` for API-key setups
- `minimax-portal/image-01` for OAuth setups
## Provider capabilities
| Capability | OpenAI | Google | fal | MiniMax | ComfyUI | Vydra | xAI |
| --------------------- | -------------------- | -------------------- | ------------------- | -------------------------- | ---------------------------------- | ------- | -------------------- |
| Generate | Yes (up to 4) | Yes (up to 4) | Yes (up to 4) | Yes (up to 9) | Yes (workflow-defined outputs) | Yes (1) | Yes (up to 4) |
| Edit/reference | Yes (up to 5 images) | Yes (up to 5 images) | Yes (1 image) | Yes (1 image, subject ref) | Yes (1 image, workflow-configured) | No | Yes (up to 5 images) |
| Size control | Yes (up to 4K) | Yes | Yes | No | No | No | No |
| Aspect ratio | No | Yes | Yes (generate only) | Yes | No | No | Yes |
| Resolution (1K/2K/4K) | No | Yes | Yes | No | No | No | Yes (1K/2K) |
### xAI `grok-imagine-image`
The bundled xAI provider uses `/v1/images/generations` for prompt-only requests
and `/v1/images/edits` when `image` or `images` is present.
- Models: `xai/grok-imagine-image`, `xai/grok-imagine-image-pro`
- Count: up to 4
- References: one `image` or up to five `images`
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
- Resolutions: `1K`, `2K`
- Outputs: returned as OpenClaw-managed image attachments
OpenClaw intentionally does not expose xAI-native `quality`, `mask`, `user`, or
extra native-only aspect ratios until those controls exist in the shared
cross-provider `image_generate` contract.
The same `--output-format` and `--background` flags are available on
`openclaw infer image edit`; `--openai-background` remains as an
OpenAI-specific alias. Bundled providers other than OpenAI do not declare
explicit background control today, so `background: "transparent"` is reported
as ignored for them.
## Related
- [Tools Overview](/tools) — all available agent tools
- [fal](/providers/fal) — fal image and video provider setup
- [Tools overview](/tools) — all available agent tools
- [ComfyUI](/providers/comfy) — local ComfyUI and Comfy Cloud workflow setup
- [fal](/providers/fal) — fal image and video provider setup
- [Google (Gemini)](/providers/google) — Gemini image provider setup
- [MiniMax](/providers/minimax) — MiniMax image provider setup
- [OpenAI](/providers/openai) — OpenAI Images provider setup
- [Vydra](/providers/vydra) — Vydra image, video, and speech setup
- [xAI](/providers/xai) — Grok image, video, search, code execution, and TTS setup
- [Configuration Reference](/gateway/config-agents#agent-defaults) — `imageGenerationModel` config
- [Configuration reference](/gateway/config-agents#agent-defaults) — `imageGenerationModel` config
- [Models](/concepts/models) — model configuration and failover