fix: support transparent OpenAI image generation

2026-05-06 17:31:06 +00:00 · 2026-04-25 19:28:25 +01:00
parent 0bf4876add
commit de0097a23c
9 changed files with 362 additions and 26 deletions
--- a/docs/gateway/config-agents.md
+++ b/docs/gateway/config-agents.md
@@ -342,8 +342,8 @@ Time format in system prompt. Default: `auto` (OS preference).
  - Also used as fallback routing when the selected/default model cannot accept image input.
 - `imageGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
  - Used by the shared image-generation capability and any future tool/plugin surface that generates images.
-  - Typical values: `google/gemini-3.1-flash-image-preview` for native Gemini image generation, `fal/fal-ai/flux/dev` for fal, or `openai/gpt-image-2` for OpenAI Images.
-  - If you select a provider/model directly, configure matching provider auth too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` or OpenAI Codex OAuth for `openai/gpt-image-2`, `FAL_KEY` for `fal/*`).
+  - Typical values: `google/gemini-3.1-flash-image-preview` for native Gemini image generation, `fal/fal-ai/flux/dev` for fal, `openai/gpt-image-2` for OpenAI Images, or `openai/gpt-image-1.5` for transparent-background OpenAI PNG/WebP output.
+  - If you select a provider/model directly, configure matching provider auth too (for example `GEMINI_API_KEY` or `GOOGLE_API_KEY` for `google/*`, `OPENAI_API_KEY` or OpenAI Codex OAuth for `openai/gpt-image-2` / `openai/gpt-image-1.5`, `FAL_KEY` for `fal/*`).
  - If omitted, `image_generate` can still infer an auth-backed provider default. It tries the current default provider first, then the remaining registered image-generation providers in provider-id order.
 - `musicGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
  - Used by the shared music-generation capability and the built-in `music_generate` tool.
--- a/docs/providers/openai.md
+++ b/docs/providers/openai.md
@@ -27,6 +27,7 @@ changing config.
 | GPT-5.5 with ChatGPT/Codex subscription auth  | `openai-codex/gpt-5.5`                                   | Default PI route for Codex OAuth. Best first choice for subscription setups. |
 | GPT-5.5 with native Codex app-server behavior | `openai/gpt-5.5` plus `embeddedHarness.runtime: "codex"` | Forces the Codex app-server harness for that model ref.                      |
 | Image generation or editing                   | `openai/gpt-image-2`                                     | Works with either `OPENAI_API_KEY` or OpenAI Codex OAuth.                    |
+| Transparent-background images                 | `openai/gpt-image-1.5`                                   | Use `outputFormat=png` or `webp` and `openai.background=transparent`.        |

 <Note>
 GPT-5.5 is available through both direct OpenAI Platform API-key access and
@@ -254,8 +255,17 @@ See [Image Generation](/tools/image-generation) for shared tool parameters, prov
 </Note>

 `gpt-image-2` is the default for both OpenAI text-to-image generation and image
-editing. `gpt-image-1` remains usable as an explicit model override, but new
-OpenAI image workflows should use `openai/gpt-image-2`.
+editing. `gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini` remain usable as
+explicit model overrides. Use `openai/gpt-image-1.5` for transparent-background
+PNG/WebP output; the current `gpt-image-2` API rejects
+`background: "transparent"`.
+
+For a transparent-background request, agents should call `image_generate` with
+`model: "openai/gpt-image-1.5"`, `outputFormat: "png"` or `"webp"`, and
+`openai.background: "transparent"`. OpenClaw also protects the public OpenAI and
+OpenAI Codex OAuth routes by rewriting default `openai/gpt-image-2` transparent
+requests to `gpt-image-1.5`; Azure and custom OpenAI-compatible endpoints keep
+their configured deployment/model names.

 For Codex OAuth installs, keep the same `openai/gpt-image-2` ref. When an
 `openai-codex` OAuth profile is configured, OpenClaw resolves that stored OAuth
@@ -275,6 +285,12 @@ Generate:
 /tool image_generate model=openai/gpt-image-2 prompt="A polished launch poster for OpenClaw on macOS" size=3840x2160 count=1
 ```

+Generate a transparent PNG:
+
+```
+/tool image_generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png openai='{"background":"transparent"}'
+```
+
 Edit:

 ```
--- a/docs/tools/image-generation.md
+++ b/docs/tools/image-generation.md
@@ -48,13 +48,14 @@ The agent calls `image_generate` automatically. No tool allow-listing needed —

 ## Common routes

-| Goal                                                 | Model ref                                          | Auth                                 |
-| ---------------------------------------------------- | -------------------------------------------------- | ------------------------------------ |
-| OpenAI image generation with API billing             | `openai/gpt-image-2`                               | `OPENAI_API_KEY`                     |
-| OpenAI image generation with Codex subscription auth | `openai/gpt-image-2`                               | OpenAI Codex OAuth                   |
-| OpenRouter image generation                          | `openrouter/google/gemini-3.1-flash-image-preview` | `OPENROUTER_API_KEY`                 |
-| LiteLLM image generation                             | `litellm/gpt-image-2`                              | `LITELLM_API_KEY`                    |
-| Google Gemini image generation                       | `google/gemini-3.1-flash-image-preview`            | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
+| Goal                                                 | Model ref                                          | Auth                                   |
+| ---------------------------------------------------- | -------------------------------------------------- | -------------------------------------- |
+| OpenAI image generation with API billing             | `openai/gpt-image-2`                               | `OPENAI_API_KEY`                       |
+| OpenAI image generation with Codex subscription auth | `openai/gpt-image-2`                               | OpenAI Codex OAuth                     |
+| OpenAI transparent-background PNG/WebP               | `openai/gpt-image-1.5`                             | `OPENAI_API_KEY` or OpenAI Codex OAuth |
+| OpenRouter image generation                          | `openrouter/google/gemini-3.1-flash-image-preview` | `OPENROUTER_API_KEY`                   |
+| LiteLLM image generation                             | `litellm/gpt-image-2`                              | `LITELLM_API_KEY`                      |
+| Google Gemini image generation                       | `google/gemini-3.1-flash-image-preview`            | `GEMINI_API_KEY` or `GOOGLE_API_KEY`   |

 The same `image_generate` tool handles text-to-image and reference-image
 editing. Use `image` for one reference or `images` for multiple references.
@@ -93,7 +94,8 @@ Use `"list"` to inspect available providers and models at runtime.
 </ParamField>

 <ParamField path="model" type="string">
-Provider/model override, e.g. `openai/gpt-image-2`.
+Provider/model override, e.g. `openai/gpt-image-2`; use
+`openai/gpt-image-1.5` for transparent OpenAI backgrounds.
 </ParamField>

 <ParamField path="image" type="string">
@@ -233,9 +235,10 @@ through the Codex Responses backend. Legacy Codex base URLs such as
 `https://chatgpt.com/backend-api/codex` for image requests. It does not
 silently fall back to `OPENAI_API_KEY` for that request. To force direct OpenAI
 Images API routing, configure `models.providers.openai` explicitly with an API
-key, custom base URL, or Azure endpoint. The older
-`openai/gpt-image-1` model can still be selected explicitly, but new OpenAI
-image-generation and image-editing requests should use `gpt-image-2`.
+key, custom base URL, or Azure endpoint. The `openai/gpt-image-1.5`,
+`openai/gpt-image-1`, and `openai/gpt-image-1-mini` models can still be
+selected explicitly. Use `gpt-image-1.5` for transparent-background PNG/WebP
+output; the current `gpt-image-2` API rejects `background: "transparent"`.

 `gpt-image-2` supports both text-to-image generation and reference-image
 editing through the same `image_generate` tool. OpenClaw forwards `prompt`,
@@ -260,8 +263,31 @@ OpenAI-specific options live under the `openai` object:
 ```

 `openai.background` accepts `transparent`, `opaque`, or `auto`; transparent
-outputs require `outputFormat` `png` or `webp`. `openai.outputCompression`
-applies to JPEG/WebP outputs.
+outputs require `outputFormat` `png` or `webp` and a transparency-capable OpenAI
+image model. OpenClaw routes default `gpt-image-2` transparent-background
+requests to `gpt-image-1.5`. `openai.outputCompression` applies to JPEG/WebP
+outputs.
+
+When asking an agent for a transparent-background OpenAI image, the expected
+tool call is:
+
+```json
+{
+  "model": "openai/gpt-image-1.5",
+  "prompt": "A simple red circle sticker on a transparent background",
+  "outputFormat": "png",
+  "openai": {
+    "background": "transparent"
+  }
+}
+```
+
+The explicit `openai/gpt-image-1.5` model keeps the request portable across
+tool summaries and harnesses. If the agent instead uses the default
+`openai/gpt-image-2` with `openai.background: "transparent"` on the public
+OpenAI or OpenAI Codex OAuth route, OpenClaw rewrites the provider request to
+`gpt-image-1.5`. Azure and custom OpenAI-compatible endpoints keep their
+configured deployment/model names.

 Generate one 4K landscape image:

@@ -269,6 +295,12 @@ Generate one 4K landscape image:
 /tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1
 ```

+Generate a transparent PNG:
+
+```
+/tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png openai='{"background":"transparent"}'
+```
+
 Generate two square images:

 ```