fix: support transparent OpenAI image generation

This commit is contained in:
Peter Steinberger
2026-04-25 19:28:25 +01:00
parent 0bf4876add
commit de0097a23c
9 changed files with 362 additions and 26 deletions

View File

@@ -342,8 +342,8 @@ Time format in system prompt. Default: `auto` (OS preference).
- Also used as fallback routing when the selected/default model cannot accept image input.
- `imageGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
- Used by the shared image-generation capability and any future tool/plugin surface that generates images.
- Typical values: `google/gemini-3.1-flash-image-preview` for native Gemini image generation, `fal/fal-ai/flux/dev` for fal, or `openai/gpt-image-2` for OpenAI Images.
- If you select a provider/model directly, configure matching provider auth too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` or OpenAI Codex OAuth for `openai/gpt-image-2`, `FAL_KEY` for `fal/*`).
- Typical values: `google/gemini-3.1-flash-image-preview` for native Gemini image generation, `fal/fal-ai/flux/dev` for fal, `openai/gpt-image-2` for OpenAI Images, or `openai/gpt-image-1.5` for transparent-background OpenAI PNG/WebP output.
- If you select a provider/model directly, configure matching provider auth too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` or OpenAI Codex OAuth for `openai/gpt-image-2` / `openai/gpt-image-1.5`, `FAL_KEY` for `fal/*`).
- If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order.
- `musicGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
- Used by the shared music-generation capability and the built-in `music_generate` tool.

View File

@@ -27,6 +27,7 @@ changing config.
| GPT-5.5 with ChatGPT/Codex subscription auth | `openai-codex/gpt-5.5` | Default PI route for Codex OAuth. Best first choice for subscription setups. |
| GPT-5.5 with native Codex app-server behavior | `openai/gpt-5.5` plus `embeddedHarness.runtime: "codex"` | Forces the Codex app-server harness for that model ref. |
| Image generation or editing | `openai/gpt-image-2` | Works with either `OPENAI_API_KEY` or OpenAI Codex OAuth. |
| Transparent-background images | `openai/gpt-image-1.5` | Use `outputFormat=png` or `webp` and `openai.background=transparent`. |
<Note>
GPT-5.5 is available through both direct OpenAI Platform API-key access and
@@ -254,8 +255,17 @@ See [Image Generation](/tools/image-generation) for shared tool parameters, prov
</Note>
`gpt-image-2` is the default for both OpenAI text-to-image generation and image
editing. `gpt-image-1` remains usable as an explicit model override, but new
OpenAI image workflows should use `openai/gpt-image-2`.
editing. `gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini` remain usable as
explicit model overrides. Use `openai/gpt-image-1.5` for transparent-background
PNG/WebP output; the current `gpt-image-2` API rejects
`background: "transparent"`.
For a transparent-background request, agents should call `image_generate` with
`model: "openai/gpt-image-1.5"`, `outputFormat: "png"` or `"webp"`, and
`openai.background: "transparent"`. OpenClaw also protects the public OpenAI and
OpenAI Codex OAuth routes by rewriting default `openai/gpt-image-2` transparent
requests to `gpt-image-1.5`; Azure and custom OpenAI-compatible endpoints keep
their configured deployment/model names.
For Codex OAuth installs, keep the same `openai/gpt-image-2` ref. When an
`openai-codex` OAuth profile is configured, OpenClaw resolves that stored OAuth
@@ -275,6 +285,12 @@ Generate:
/tool image_generate model=openai/gpt-image-2 prompt="A polished launch poster for OpenClaw on macOS" size=3840x2160 count=1
```
Generate a transparent PNG:
```
/tool image_generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png openai='{"background":"transparent"}'
```
Edit:
```

View File

@@ -48,13 +48,14 @@ The agent calls `image_generate` automatically. No tool allow-listing needed —
## Common routes
| Goal | Model ref | Auth |
| ---------------------------------------------------- | -------------------------------------------------- | ------------------------------------ |
| OpenAI image generation with API billing | `openai/gpt-image-2` | `OPENAI_API_KEY` |
| OpenAI image generation with Codex subscription auth | `openai/gpt-image-2` | OpenAI Codex OAuth |
| OpenRouter image generation | `openrouter/google/gemini-3.1-flash-image-preview` | `OPENROUTER_API_KEY` |
| LiteLLM image generation | `litellm/gpt-image-2` | `LITELLM_API_KEY` |
| Google Gemini image generation | `google/gemini-3.1-flash-image-preview` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
| Goal | Model ref | Auth |
| ---------------------------------------------------- | -------------------------------------------------- | -------------------------------------- |
| OpenAI image generation with API billing | `openai/gpt-image-2` | `OPENAI_API_KEY` |
| OpenAI image generation with Codex subscription auth | `openai/gpt-image-2` | OpenAI Codex OAuth |
| OpenAI transparent-background PNG/WebP | `openai/gpt-image-1.5` | `OPENAI_API_KEY` or OpenAI Codex OAuth |
| OpenRouter image generation | `openrouter/google/gemini-3.1-flash-image-preview` | `OPENROUTER_API_KEY` |
| LiteLLM image generation | `litellm/gpt-image-2` | `LITELLM_API_KEY` |
| Google Gemini image generation | `google/gemini-3.1-flash-image-preview` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
The same `image_generate` tool handles text-to-image and reference-image
editing. Use `image` for one reference or `images` for multiple references.
@@ -93,7 +94,8 @@ Use `"list"` to inspect available providers and models at runtime.
</ParamField>
<ParamField path="model" type="string">
Provider/model override, e.g. `openai/gpt-image-2`.
Provider/model override, e.g. `openai/gpt-image-2`; use
`openai/gpt-image-1.5` for transparent OpenAI backgrounds.
</ParamField>
<ParamField path="image" type="string">
@@ -233,9 +235,10 @@ through the Codex Responses backend. Legacy Codex base URLs such as
`https://chatgpt.com/backend-api/codex` for image requests. It does not
silently fall back to `OPENAI_API_KEY` for that request. To force direct OpenAI
Images API routing, configure `models.providers.openai` explicitly with an API
key, custom base URL, or Azure endpoint. The older
`openai/gpt-image-1` model can still be selected explicitly, but new OpenAI
image-generation and image-editing requests should use `gpt-image-2`.
key, custom base URL, or Azure endpoint. The `openai/gpt-image-1.5`,
`openai/gpt-image-1`, and `openai/gpt-image-1-mini` models can still be
selected explicitly. Use `gpt-image-1.5` for transparent-background PNG/WebP
output; the current `gpt-image-2` API rejects `background: "transparent"`.
`gpt-image-2` supports both text-to-image generation and reference-image
editing through the same `image_generate` tool. OpenClaw forwards `prompt`,
@@ -260,8 +263,31 @@ OpenAI-specific options live under the `openai` object:
```
`openai.background` accepts `transparent`, `opaque`, or `auto`; transparent
outputs require `outputFormat` `png` or `webp`. `openai.outputCompression`
applies to JPEG/WebP outputs.
outputs require `outputFormat` `png` or `webp` and a transparency-capable OpenAI
image model. OpenClaw routes default `gpt-image-2` transparent-background
requests to `gpt-image-1.5`. `openai.outputCompression` applies to JPEG/WebP
outputs.
When asking an agent for a transparent-background OpenAI image, the expected
tool call is:
```json
{
"model": "openai/gpt-image-1.5",
"prompt": "A simple red circle sticker on a transparent background",
"outputFormat": "png",
"openai": {
"background": "transparent"
}
}
```
The explicit `openai/gpt-image-1.5` model keeps the request portable across
tool summaries and harnesses. If the agent instead uses the default
`openai/gpt-image-2` with `openai.background: "transparent"` on the public
OpenAI or OpenAI Codex OAuth route, OpenClaw rewrites the provider request to
`gpt-image-1.5`. Azure and custom OpenAI-compatible endpoints keep their
configured deployment/model names.
Generate one 4K landscape image:
@@ -269,6 +295,12 @@ Generate one 4K landscape image:
/tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1
```
Generate a transparent PNG:
```
/tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png openai='{"background":"transparent"}'
```
Generate two square images:
```