mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 14:30:45 +00:00
feat(providers): add DeepInfra provider plugin (#73038)
* feat(providers): add DeepInfra provider plugin * feat(deepinfra): add media provider surfaces * fix(deepinfra): satisfy provider boundary checks * docs: add gitcrawl maintainer skill * test: include deepinfra in live media sweeps * fix: remove stale tts contract import
This commit is contained in:
committed by
GitHub
parent
1fde7dbc0e
commit
0294aebe6f
@@ -1,5 +1,5 @@
|
||||
---
|
||||
summary: "Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, OpenRouter, LiteLLM, xAI, Vydra"
|
||||
summary: "Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, DeepInfra, OpenRouter, LiteLLM, xAI, Vydra"
|
||||
read_when:
|
||||
- Generating or editing images via the agent
|
||||
- Configuring image-generation providers and models
|
||||
@@ -71,6 +71,7 @@ internal image endpoints remain blocked by default.
|
||||
| OpenAI image generation with API billing | `openai/gpt-image-2` | `OPENAI_API_KEY` |
|
||||
| OpenAI image generation with Codex subscription auth | `openai/gpt-image-2` | OpenAI Codex OAuth |
|
||||
| OpenAI transparent-background PNG/WebP | `openai/gpt-image-1.5` | `OPENAI_API_KEY` or OpenAI Codex OAuth |
|
||||
| DeepInfra image generation | `deepinfra/black-forest-labs/FLUX-1-schnell` | `DEEPINFRA_API_KEY` |
|
||||
| OpenRouter image generation | `openrouter/google/gemini-3.1-flash-image-preview` | `OPENROUTER_API_KEY` |
|
||||
| LiteLLM image generation | `litellm/gpt-image-2` | `LITELLM_API_KEY` |
|
||||
| Google Gemini image generation | `google/gemini-3.1-flash-image-preview` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
|
||||
@@ -88,6 +89,7 @@ backend emits it.
|
||||
| Provider | Default model | Edit support | Auth |
|
||||
| ---------- | --------------------------------------- | ---------------------------------- | ----------------------------------------------------- |
|
||||
| ComfyUI | `workflow` | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud |
|
||||
| DeepInfra | `black-forest-labs/FLUX-1-schnell` | Yes (1 image) | `DEEPINFRA_API_KEY` |
|
||||
| fal | `fal-ai/flux/dev` | Yes | `FAL_KEY` |
|
||||
| Google | `gemini-3.1-flash-image-preview` | Yes | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
|
||||
| LiteLLM | `gpt-image-2` | Yes (up to 5 input images) | `LITELLM_API_KEY` |
|
||||
@@ -105,13 +107,13 @@ Use `action: "list"` to inspect available providers and models at runtime:
|
||||
|
||||
## Provider capabilities
|
||||
|
||||
| Capability | ComfyUI | fal | Google | MiniMax | OpenAI | Vydra | xAI |
|
||||
| --------------------- | ------------------ | ----------------- | -------------- | --------------------- | -------------- | ----- | -------------- |
|
||||
| Generate (max count) | Workflow-defined | 4 | 4 | 9 | 4 | 1 | 4 |
|
||||
| Edit / reference | 1 image (workflow) | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
|
||||
| Size control | — | ✓ | ✓ | — | Up to 4K | — | — |
|
||||
| Aspect ratio | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
|
||||
| Resolution (1K/2K/4K) | — | ✓ | ✓ | — | — | — | 1K, 2K |
|
||||
| Capability | ComfyUI | DeepInfra | fal | Google | MiniMax | OpenAI | Vydra | xAI |
|
||||
| --------------------- | ------------------ | --------- | ----------------- | -------------- | --------------------- | -------------- | ----- | -------------- |
|
||||
| Generate (max count) | Workflow-defined | 4 | 4 | 4 | 9 | 4 | 1 | 4 |
|
||||
| Edit / reference | 1 image (workflow) | 1 image | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
|
||||
| Size control | — | ✓ | ✓ | ✓ | — | Up to 4K | — | — |
|
||||
| Aspect ratio | — | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
|
||||
| Resolution (1K/2K/4K) | — | — | ✓ | ✓ | — | — | — | 1K, 2K |
|
||||
|
||||
## Tool parameters
|
||||
|
||||
@@ -226,7 +228,7 @@ from each attempt.
|
||||
|
||||
### Image editing
|
||||
|
||||
OpenAI, OpenRouter, Google, fal, MiniMax, ComfyUI, and xAI support editing
|
||||
OpenAI, OpenRouter, Google, DeepInfra, fal, MiniMax, ComfyUI, and xAI support editing
|
||||
reference images. Pass a reference image path or URL:
|
||||
|
||||
```text
|
||||
|
||||
@@ -50,6 +50,7 @@ provider is configured.
|
||||
| Alibaba | | ✓ | | | | | |
|
||||
| BytePlus | | ✓ | | | | | |
|
||||
| ComfyUI | ✓ | ✓ | ✓ | | | | |
|
||||
| DeepInfra | ✓ | ✓ | | ✓ | ✓ | | ✓ |
|
||||
| Deepgram | | | | | ✓ | ✓ | |
|
||||
| ElevenLabs | | | | ✓ | ✓ | | |
|
||||
| fal | ✓ | ✓ | | | | | |
|
||||
@@ -94,7 +95,7 @@ original channel.
|
||||
|
||||
## Speech-to-text and Voice Call
|
||||
|
||||
Deepgram, ElevenLabs, Mistral, OpenAI, SenseAudio, and xAI can all transcribe
|
||||
Deepgram, DeepInfra, ElevenLabs, Mistral, OpenAI, SenseAudio, and xAI can all transcribe
|
||||
inbound audio through the batch `tools.media.audio` path when configured.
|
||||
Channel plugins that preflight a voice note for mention gating or command
|
||||
parsing mark the transcribed attachment on the inbound context, so the shared
|
||||
@@ -116,6 +117,13 @@ vendor without waiting for a completed recording.
|
||||
Image, video, batch TTS, batch STT, Voice Call streaming STT, backend
|
||||
realtime voice, and memory-embedding surfaces.
|
||||
</Accordion>
|
||||
<Accordion title="DeepInfra">
|
||||
Chat/model routing, image generation/editing, text-to-video, batch TTS,
|
||||
batch STT, image media understanding, and memory-embedding surfaces.
|
||||
DeepInfra-native rerank/classification/object-detection models are not
|
||||
registered until OpenClaw has dedicated provider contracts for those
|
||||
categories.
|
||||
</Accordion>
|
||||
<Accordion title="xAI">
|
||||
Image, video, search, code-execution, batch TTS, batch STT, and Voice
|
||||
Call streaming STT. xAI Realtime voice is an upstream capability but is
|
||||
|
||||
@@ -8,7 +8,7 @@ title: "Text-to-speech"
|
||||
sidebarTitle: "Text to speech (TTS)"
|
||||
---
|
||||
|
||||
OpenClaw can convert outbound replies into audio across **13 speech providers**
|
||||
OpenClaw can convert outbound replies into audio across **14 speech providers**
|
||||
and deliver native voice messages on Feishu, Matrix, Telegram, and WhatsApp,
|
||||
audio attachments everywhere else, and PCM/Ulaw streams for telephony and Talk.
|
||||
|
||||
@@ -55,6 +55,7 @@ OpenClaw picks the first configured provider in registry auto-select order.
|
||||
| Provider | Auth | Notes |
|
||||
| ----------------- | ---------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
|
||||
| **Azure Speech** | `AZURE_SPEECH_KEY` + `AZURE_SPEECH_REGION` (also `AZURE_SPEECH_API_KEY`, `SPEECH_KEY`, `SPEECH_REGION`) | Native Ogg/Opus voice-note output and telephony. |
|
||||
| **DeepInfra** | `DEEPINFRA_API_KEY` | OpenAI-compatible TTS. Defaults to `hexgrad/Kokoro-82M`. |
|
||||
| **ElevenLabs** | `ELEVENLABS_API_KEY` or `XI_API_KEY` | Voice cloning, multilingual, deterministic via `seed`. |
|
||||
| **Google Gemini** | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | Gemini API TTS; persona-aware via `promptTemplate: "audio-profile-v1"`. |
|
||||
| **Gradium** | `GRADIUM_API_KEY` | Voice-note and telephony output. |
|
||||
|
||||
@@ -9,7 +9,7 @@ sidebarTitle: "Video generation"
|
||||
---
|
||||
|
||||
OpenClaw agents can generate videos from text prompts, reference images, or
|
||||
existing videos. Fourteen provider backends are supported, each with
|
||||
existing videos. Fifteen provider backends are supported, each with
|
||||
different model options, input modes, and feature sets. The agent picks the
|
||||
right provider automatically based on your configuration and available API
|
||||
keys.
|
||||
@@ -111,6 +111,7 @@ generation.
|
||||
| BytePlus Seedance 1.5 | `seedance-1-5-pro-251215` | ✓ | Up to 2 images (first + last frame via role) | — | `BYTEPLUS_API_KEY` |
|
||||
| BytePlus Seedance 2.0 | `dreamina-seedance-2-0-260128` | ✓ | Up to 9 reference images | Up to 3 videos | `BYTEPLUS_API_KEY` |
|
||||
| ComfyUI | `workflow` | ✓ | 1 image | — | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` |
|
||||
| DeepInfra | `Pixverse/Pixverse-T2V` | ✓ | — | — | `DEEPINFRA_API_KEY` |
|
||||
| fal | `fal-ai/minimax/video-01-live` | ✓ | 1 image; up to 9 with Seedance reference-to-video | Up to 3 videos with Seedance reference-to-video | `FAL_KEY` |
|
||||
| Google | `veo-3.1-fast-generate-preview` | ✓ | 1 image | 1 video | `GEMINI_API_KEY` |
|
||||
| MiniMax | `MiniMax-Hailuo-2.3` | ✓ | 1 image | — | `MINIMAX_API_KEY` or MiniMax OAuth |
|
||||
@@ -132,20 +133,21 @@ runtime modes at runtime.
|
||||
The explicit mode contract used by `video_generate`, contract tests, and
|
||||
the shared live sweep:
|
||||
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| -------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
|
||||
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
|
||||
| MiniMax | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
|
||||
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| Vydra | ✓ | ✓ | — | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
|
||||
| xAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| --------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| DeepInfra | ✓ | — | — | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract |
|
||||
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
|
||||
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
|
||||
| MiniMax | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
|
||||
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| Vydra | ✓ | ✓ | — | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
|
||||
| xAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
|
||||
## Tool parameters
|
||||
|
||||
|
||||
Reference in New Issue
Block a user