feat(providers): add DeepInfra provider plugin (#73038)

* feat(providers): add DeepInfra provider plugin

* feat(deepinfra): add media provider surfaces

* fix(deepinfra): satisfy provider boundary checks

* docs: add gitcrawl maintainer skill

* test: include deepinfra in live media sweeps

* fix: remove stale tts contract import
This commit is contained in:
Peter Steinberger
2026-04-28 01:12:54 +01:00
committed by GitHub
parent 1fde7dbc0e
commit 0294aebe6f
54 changed files with 2830 additions and 179 deletions

View File

@@ -1,5 +1,5 @@
---
summary: "Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, OpenRouter, LiteLLM, xAI, Vydra"
summary: "Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, DeepInfra, OpenRouter, LiteLLM, xAI, Vydra"
read_when:
- Generating or editing images via the agent
- Configuring image-generation providers and models
@@ -71,6 +71,7 @@ internal image endpoints remain blocked by default.
| OpenAI image generation with API billing | `openai/gpt-image-2` | `OPENAI_API_KEY` |
| OpenAI image generation with Codex subscription auth | `openai/gpt-image-2` | OpenAI Codex OAuth |
| OpenAI transparent-background PNG/WebP | `openai/gpt-image-1.5` | `OPENAI_API_KEY` or OpenAI Codex OAuth |
| DeepInfra image generation | `deepinfra/black-forest-labs/FLUX-1-schnell` | `DEEPINFRA_API_KEY` |
| OpenRouter image generation | `openrouter/google/gemini-3.1-flash-image-preview` | `OPENROUTER_API_KEY` |
| LiteLLM image generation | `litellm/gpt-image-2` | `LITELLM_API_KEY` |
| Google Gemini image generation | `google/gemini-3.1-flash-image-preview` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
@@ -88,6 +89,7 @@ backend emits it.
| Provider | Default model | Edit support | Auth |
| ---------- | --------------------------------------- | ---------------------------------- | ----------------------------------------------------- |
| ComfyUI | `workflow` | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud |
| DeepInfra | `black-forest-labs/FLUX-1-schnell` | Yes (1 image) | `DEEPINFRA_API_KEY` |
| fal | `fal-ai/flux/dev` | Yes | `FAL_KEY` |
| Google | `gemini-3.1-flash-image-preview` | Yes | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
| LiteLLM | `gpt-image-2` | Yes (up to 5 input images) | `LITELLM_API_KEY` |
@@ -105,13 +107,13 @@ Use `action: "list"` to inspect available providers and models at runtime:
## Provider capabilities
| Capability | ComfyUI | fal | Google | MiniMax | OpenAI | Vydra | xAI |
| --------------------- | ------------------ | ----------------- | -------------- | --------------------- | -------------- | ----- | -------------- |
| Generate (max count) | Workflow-defined | 4 | 4 | 9 | 4 | 1 | 4 |
| Edit / reference | 1 image (workflow) | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
| Size control | — | ✓ | ✓ | — | Up to 4K | — | — |
| Aspect ratio | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
| Resolution (1K/2K/4K) | — | ✓ | ✓ | — | — | — | 1K, 2K |
| Capability | ComfyUI | DeepInfra | fal | Google | MiniMax | OpenAI | Vydra | xAI |
| --------------------- | ------------------ | --------- | ----------------- | -------------- | --------------------- | -------------- | ----- | -------------- |
| Generate (max count) | Workflow-defined | 4 | 4 | 4 | 9 | 4 | 1 | 4 |
| Edit / reference | 1 image (workflow) | 1 image | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
| Size control | — | ✓ | ✓ | ✓ | — | Up to 4K | — | — |
| Aspect ratio | — | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
| Resolution (1K/2K/4K) | — | — | ✓ | ✓ | — | — | — | 1K, 2K |
## Tool parameters
@@ -226,7 +228,7 @@ from each attempt.
### Image editing
OpenAI, OpenRouter, Google, fal, MiniMax, ComfyUI, and xAI support editing
OpenAI, OpenRouter, Google, DeepInfra, fal, MiniMax, ComfyUI, and xAI support editing
reference images. Pass a reference image path or URL:
```text

View File

@@ -50,6 +50,7 @@ provider is configured.
| Alibaba | | ✓ | | | | | |
| BytePlus | | ✓ | | | | | |
| ComfyUI | ✓ | ✓ | ✓ | | | | |
| DeepInfra | ✓ | ✓ | | ✓ | ✓ | | ✓ |
| Deepgram | | | | | ✓ | ✓ | |
| ElevenLabs | | | | ✓ | ✓ | | |
| fal | ✓ | ✓ | | | | | |
@@ -94,7 +95,7 @@ original channel.
## Speech-to-text and Voice Call
Deepgram, ElevenLabs, Mistral, OpenAI, SenseAudio, and xAI can all transcribe
Deepgram, DeepInfra, ElevenLabs, Mistral, OpenAI, SenseAudio, and xAI can all transcribe
inbound audio through the batch `tools.media.audio` path when configured.
Channel plugins that preflight a voice note for mention gating or command
parsing mark the transcribed attachment on the inbound context, so the shared
@@ -116,6 +117,13 @@ vendor without waiting for a completed recording.
Image, video, batch TTS, batch STT, Voice Call streaming STT, backend
realtime voice, and memory-embedding surfaces.
</Accordion>
<Accordion title="DeepInfra">
Chat/model routing, image generation/editing, text-to-video, batch TTS,
batch STT, image media understanding, and memory-embedding surfaces.
DeepInfra-native rerank/classification/object-detection models are not
registered until OpenClaw has dedicated provider contracts for those
categories.
</Accordion>
<Accordion title="xAI">
Image, video, search, code-execution, batch TTS, batch STT, and Voice
Call streaming STT. xAI Realtime voice is an upstream capability but is

View File

@@ -8,7 +8,7 @@ title: "Text-to-speech"
sidebarTitle: "Text to speech (TTS)"
---
OpenClaw can convert outbound replies into audio across **13 speech providers**
OpenClaw can convert outbound replies into audio across **14 speech providers**
and deliver native voice messages on Feishu, Matrix, Telegram, and WhatsApp,
audio attachments everywhere else, and PCM/Ulaw streams for telephony and Talk.
@@ -55,6 +55,7 @@ OpenClaw picks the first configured provider in registry auto-select order.
| Provider | Auth | Notes |
| ----------------- | ---------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| **Azure Speech** | `AZURE_SPEECH_KEY` + `AZURE_SPEECH_REGION` (also `AZURE_SPEECH_API_KEY`, `SPEECH_KEY`, `SPEECH_REGION`) | Native Ogg/Opus voice-note output and telephony. |
| **DeepInfra** | `DEEPINFRA_API_KEY` | OpenAI-compatible TTS. Defaults to `hexgrad/Kokoro-82M`. |
| **ElevenLabs** | `ELEVENLABS_API_KEY` or `XI_API_KEY` | Voice cloning, multilingual, deterministic via `seed`. |
| **Google Gemini** | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | Gemini API TTS; persona-aware via `promptTemplate: "audio-profile-v1"`. |
| **Gradium** | `GRADIUM_API_KEY` | Voice-note and telephony output. |

View File

@@ -9,7 +9,7 @@ sidebarTitle: "Video generation"
---
OpenClaw agents can generate videos from text prompts, reference images, or
existing videos. Fourteen provider backends are supported, each with
existing videos. Fifteen provider backends are supported, each with
different model options, input modes, and feature sets. The agent picks the
right provider automatically based on your configuration and available API
keys.
@@ -111,6 +111,7 @@ generation.
| BytePlus Seedance 1.5 | `seedance-1-5-pro-251215` | ✓ | Up to 2 images (first + last frame via role) | — | `BYTEPLUS_API_KEY` |
| BytePlus Seedance 2.0 | `dreamina-seedance-2-0-260128` | ✓ | Up to 9 reference images | Up to 3 videos | `BYTEPLUS_API_KEY` |
| ComfyUI | `workflow` | ✓ | 1 image | — | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` |
| DeepInfra | `Pixverse/Pixverse-T2V` | ✓ | — | — | `DEEPINFRA_API_KEY` |
| fal | `fal-ai/minimax/video-01-live` | ✓ | 1 image; up to 9 with Seedance reference-to-video | Up to 3 videos with Seedance reference-to-video | `FAL_KEY` |
| Google | `veo-3.1-fast-generate-preview` | ✓ | 1 image | 1 video | `GEMINI_API_KEY` |
| MiniMax | `MiniMax-Hailuo-2.3` | ✓ | 1 image | — | `MINIMAX_API_KEY` or MiniMax OAuth |
@@ -132,20 +133,21 @@ runtime modes at runtime.
The explicit mode contract used by `video_generate`, contract tests, and
the shared live sweep:
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
| -------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
| fal | ✓ | | | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
| MiniMax | ✓ | ✓ | | `generate`, `imageToVideo` |
| OpenAI | ✓ | ✓ | | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| Runway | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
| Together | ✓ | ✓ | | `generate`, `imageToVideo` |
| Vydra | ✓ | ✓ | — | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
| xAI | ✓ | ✓ | | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
| --------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
| DeepInfra | ✓ | | | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract |
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
| Google | ✓ | ✓ | | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
| MiniMax | ✓ | ✓ | | `generate`, `imageToVideo` |
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
| Runway | ✓ | ✓ | | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
| Together | ✓ | ✓ | — | `generate`, `imageToVideo` |
| Vydra | ✓ | ✓ | | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
| xAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
## Tool parameters