mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 13:00:44 +00:00
feat(providers): add DeepInfra provider plugin (#73038)
* feat(providers): add DeepInfra provider plugin * feat(deepinfra): add media provider surfaces * fix(deepinfra): satisfy provider boundary checks * docs: add gitcrawl maintainer skill * test: include deepinfra in live media sweeps * fix: remove stale tts contract import
This commit is contained in:
committed by
GitHub
parent
1fde7dbc0e
commit
0294aebe6f
@@ -95,6 +95,10 @@
|
||||
"source": "Chutes",
|
||||
"target": "Chutes"
|
||||
},
|
||||
{
|
||||
"source": "DeepInfra",
|
||||
"target": "DeepInfra"
|
||||
},
|
||||
{
|
||||
"source": "Qwen",
|
||||
"target": "Qwen"
|
||||
|
||||
@@ -19,7 +19,7 @@ a per-agent SQLite database and needs no extra dependencies to get started.
|
||||
|
||||
## Getting started
|
||||
|
||||
If you have an API key for OpenAI, Gemini, Voyage, or Mistral, the builtin
|
||||
If you have an API key for OpenAI, Gemini, Voyage, Mistral, or DeepInfra, the builtin
|
||||
engine auto-detects it and enables vector search. No config needed.
|
||||
|
||||
To set a provider explicitly:
|
||||
@@ -60,14 +60,15 @@ at a GGUF file:
|
||||
|
||||
## Supported embedding providers
|
||||
|
||||
| Provider | ID | Auto-detected | Notes |
|
||||
| -------- | --------- | ------------- | ----------------------------------- |
|
||||
| OpenAI | `openai` | Yes | Default: `text-embedding-3-small` |
|
||||
| Gemini | `gemini` | Yes | Supports multimodal (image + audio) |
|
||||
| Voyage | `voyage` | Yes | |
|
||||
| Mistral | `mistral` | Yes | |
|
||||
| Ollama | `ollama` | No | Local, set explicitly |
|
||||
| Local | `local` | Yes (first) | Optional `node-llama-cpp` runtime |
|
||||
| Provider | ID | Auto-detected | Notes |
|
||||
| --------- | ----------- | ------------- | ----------------------------------- |
|
||||
| OpenAI | `openai` | Yes | Default: `text-embedding-3-small` |
|
||||
| Gemini | `gemini` | Yes | Supports multimodal (image + audio) |
|
||||
| Voyage | `voyage` | Yes | |
|
||||
| Mistral | `mistral` | Yes | |
|
||||
| DeepInfra | `deepinfra` | Yes | Default: `BAAI/bge-m3` |
|
||||
| Ollama | `ollama` | No | Local, set explicitly |
|
||||
| Local | `local` | Yes (first) | Optional `node-llama-cpp` runtime |
|
||||
|
||||
Auto-detection picks the first provider whose API key can be resolved, in the
|
||||
order shown. Set `memorySearch.provider` to override.
|
||||
|
||||
@@ -280,6 +280,7 @@ See [/providers/kilocode](/providers/kilocode) for setup details.
|
||||
| BytePlus | `byteplus` / `byteplus-plan` | `BYTEPLUS_API_KEY` | `byteplus-plan/ark-code-latest` |
|
||||
| Cerebras | `cerebras` | `CEREBRAS_API_KEY` | `cerebras/zai-glm-4.7` |
|
||||
| Cloudflare AI Gateway | `cloudflare-ai-gateway` | `CLOUDFLARE_AI_GATEWAY_API_KEY` | — |
|
||||
| DeepInfra | `deepinfra` | `DEEPINFRA_API_KEY` | `deepinfra/deepseek-ai/DeepSeek-V3.2` |
|
||||
| DeepSeek | `deepseek` | `DEEPSEEK_API_KEY` | `deepseek/deepseek-v4-flash` |
|
||||
| GitHub Copilot | `github-copilot` | `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / `GITHUB_TOKEN` | — |
|
||||
| Groq | `groq` | `GROQ_API_KEY` | — |
|
||||
|
||||
@@ -1331,6 +1331,7 @@
|
||||
"providers/cloudflare-ai-gateway",
|
||||
"providers/comfy",
|
||||
"providers/deepgram",
|
||||
"providers/deepinfra",
|
||||
"providers/deepseek",
|
||||
"providers/elevenlabs",
|
||||
"providers/fal",
|
||||
|
||||
@@ -468,6 +468,7 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
|
||||
- `<provider>:generate`
|
||||
- `<provider>:edit` when the provider declares edit support
|
||||
- Current bundled providers covered:
|
||||
- `deepinfra`
|
||||
- `fal`
|
||||
- `google`
|
||||
- `minimax`
|
||||
@@ -477,6 +478,7 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
|
||||
- `xai`
|
||||
- Optional narrowing:
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="openai,google,openrouter,xai"`
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="deepinfra"`
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_MODELS="openai/gpt-image-2,google/gemini-3.1-flash-image-preview,openrouter/google/gemini-3.1-flash-image-preview,xai/grok-imagine-image"`
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_CASES="google:flash-generate,google:pro-edit,openrouter:generate,xai:default-generate,xai:default-edit"`
|
||||
- Optional auth behavior:
|
||||
@@ -551,7 +553,7 @@ image-generation runtime, and the live provider request.
|
||||
- `google` because the current shared Gemini/Veo lane uses local buffer-backed input and that path is not accepted in the shared sweep
|
||||
- `openai` because the current shared lane lacks org-specific video inpaint/remix access guarantees
|
||||
- Optional narrowing:
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="google,openai,runway"`
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="deepinfra,google,openai,runway"`
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_MODELS="google/veo-3.1-fast-generate-preview,openai/sora-2,runway/gen4_aleph"`
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_SKIP_PROVIDERS=""` to include every provider in the default sweep, including FAL
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS=60000` to reduce each provider operation cap for an aggressive smoke run
|
||||
|
||||
83
docs/providers/deepinfra.md
Normal file
83
docs/providers/deepinfra.md
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
summary: "Use DeepInfra's unified API to access the most popular open source and frontier models in OpenClaw"
|
||||
read_when:
|
||||
- You want a single API key for the top open source LLMs
|
||||
- You want to run models via DeepInfra's API in OpenClaw
|
||||
---
|
||||
|
||||
# DeepInfra
|
||||
|
||||
DeepInfra provides a **unified API** that routes requests to the most popular open source and frontier models behind a single
|
||||
endpoint and API key. It is OpenAI-compatible, so most OpenAI SDKs work by switching the base URL.
|
||||
|
||||
## Getting an API key
|
||||
|
||||
1. Go to [https://deepinfra.com/](https://deepinfra.com/)
|
||||
2. Sign in or create an account
|
||||
3. Navigate to Dashboard / Keys and generate a new API key or use the auto created one
|
||||
|
||||
## CLI setup
|
||||
|
||||
```bash
|
||||
openclaw onboard --deepinfra-api-key <key>
|
||||
```
|
||||
|
||||
Or set the environment variable:
|
||||
|
||||
```bash
|
||||
export DEEPINFRA_API_KEY="<your-deepinfra-api-key>" # pragma: allowlist secret
|
||||
```
|
||||
|
||||
## Config snippet
|
||||
|
||||
```json5
|
||||
{
|
||||
env: { DEEPINFRA_API_KEY: "<your-deepinfra-api-key>" }, // pragma: allowlist secret
|
||||
agents: {
|
||||
defaults: {
|
||||
model: { primary: "deepinfra/deepseek-ai/DeepSeek-V3.2" },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Supported OpenClaw surfaces
|
||||
|
||||
The bundled plugin registers all DeepInfra surfaces that match current
|
||||
OpenClaw provider contracts:
|
||||
|
||||
| Surface | Default model | OpenClaw config/tool |
|
||||
| ------------------------ | ---------------------------------- | -------------------------------------------------------- |
|
||||
| Chat / model provider | `deepseek-ai/DeepSeek-V3.2` | `agents.defaults.model` |
|
||||
| Image generation/editing | `black-forest-labs/FLUX-1-schnell` | `image_generate`, `agents.defaults.imageGenerationModel` |
|
||||
| Media understanding | `moonshotai/Kimi-K2.5` for images | inbound image understanding |
|
||||
| Speech-to-text | `openai/whisper-large-v3-turbo` | inbound audio transcription |
|
||||
| Text-to-speech | `hexgrad/Kokoro-82M` | `messages.tts.provider: "deepinfra"` |
|
||||
| Video generation | `Pixverse/Pixverse-T2V` | `video_generate`, `agents.defaults.videoGenerationModel` |
|
||||
| Memory embeddings | `BAAI/bge-m3` | `agents.defaults.memorySearch.provider: "deepinfra"` |
|
||||
|
||||
DeepInfra also exposes reranking, classification, object-detection, and other
|
||||
native model types. OpenClaw does not currently have first-class provider
|
||||
contracts for those categories, so this plugin does not register them yet.
|
||||
|
||||
## Available models
|
||||
|
||||
OpenClaw dynamically discovers available DeepInfra models at startup. Use
|
||||
`/models deepinfra` to see the full list of models available.
|
||||
|
||||
Any model available on [DeepInfra.com](https://deepinfra.com/) can be used with the `deepinfra/` prefix:
|
||||
|
||||
```
|
||||
deepinfra/MiniMaxAI/MiniMax-M2.5
|
||||
deepinfra/deepseek-ai/DeepSeek-V3.2
|
||||
deepinfra/moonshotai/Kimi-K2.5
|
||||
deepinfra/zai-org/GLM-5.1
|
||||
...and many more
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Model refs are `deepinfra/<provider>/<model>` (e.g., `deepinfra/Qwen/Qwen3-Max`).
|
||||
- Default model: `deepinfra/deepseek-ai/DeepSeek-V3.2`
|
||||
- Base URL: `https://api.deepinfra.com/v1/openai`
|
||||
- Native video generation uses `https://api.deepinfra.com/v1/inference/<model>`.
|
||||
@@ -31,6 +31,7 @@ model as `provider/model`.
|
||||
- [Chutes](/providers/chutes)
|
||||
- [ComfyUI](/providers/comfy)
|
||||
- [Cloudflare AI Gateway](/providers/cloudflare-ai-gateway)
|
||||
- [DeepInfra](/providers/deepinfra)
|
||||
- [fal](/providers/fal)
|
||||
- [Fireworks](/providers/fireworks)
|
||||
- [GLM models](/providers/glm)
|
||||
|
||||
@@ -84,8 +84,8 @@ See [Models](/providers/models) for pricing config and [Token use & costs](/refe
|
||||
|
||||
Inbound media can be summarized/transcribed before the reply runs. This uses model/provider APIs.
|
||||
|
||||
- Audio: OpenAI / Groq / Deepgram / Google / Mistral.
|
||||
- Image: OpenAI / OpenRouter / Anthropic / Google / MiniMax / Moonshot / Qwen / Z.AI.
|
||||
- Audio: OpenAI / Groq / Deepgram / DeepInfra / Google / Mistral.
|
||||
- Image: OpenAI / OpenRouter / Anthropic / DeepInfra / Google / MiniMax / Moonshot / Qwen / Z.AI.
|
||||
- Video: Google / Qwen / Moonshot.
|
||||
|
||||
See [Media understanding](/nodes/media-understanding).
|
||||
@@ -94,8 +94,8 @@ See [Media understanding](/nodes/media-understanding).
|
||||
|
||||
Shared generation capabilities can also spend provider keys:
|
||||
|
||||
- Image generation: OpenAI / Google / fal / MiniMax
|
||||
- Video generation: Qwen
|
||||
- Image generation: OpenAI / Google / DeepInfra / fal / MiniMax
|
||||
- Video generation: DeepInfra / Qwen
|
||||
|
||||
Image generation can infer an auth-backed provider default when
|
||||
`agents.defaults.imageGenerationModel` is unset. Video generation currently
|
||||
@@ -113,6 +113,7 @@ Semantic memory search uses **embedding APIs** when configured for remote provid
|
||||
- `memorySearch.provider = "gemini"` → Gemini embeddings
|
||||
- `memorySearch.provider = "voyage"` → Voyage embeddings
|
||||
- `memorySearch.provider = "mistral"` → Mistral embeddings
|
||||
- `memorySearch.provider = "deepinfra"` → DeepInfra embeddings
|
||||
- `memorySearch.provider = "lmstudio"` → LM Studio embeddings (local/self-hosted)
|
||||
- `memorySearch.provider = "ollama"` → Ollama embeddings (local/self-hosted; typically no hosted API billing)
|
||||
- Optional fallback to a remote provider if local embeddings fail
|
||||
|
||||
@@ -46,12 +46,12 @@ See [Active Memory](/concepts/active-memory) for the activation model, plugin-ow
|
||||
|
||||
## Provider selection
|
||||
|
||||
| Key | Type | Default | Description |
|
||||
| ---------- | --------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
|
||||
| `provider` | `string` | auto-detected | Embedding adapter ID: `bedrock`, `gemini`, `github-copilot`, `local`, `mistral`, `ollama`, `openai`, `voyage` |
|
||||
| `model` | `string` | provider default | Embedding model name |
|
||||
| `fallback` | `string` | `"none"` | Fallback adapter ID when the primary fails |
|
||||
| `enabled` | `boolean` | `true` | Enable or disable memory search |
|
||||
| Key | Type | Default | Description |
|
||||
| ---------- | --------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `provider` | `string` | auto-detected | Embedding adapter ID: `bedrock`, `deepinfra`, `gemini`, `github-copilot`, `local`, `mistral`, `ollama`, `openai`, `voyage` |
|
||||
| `model` | `string` | provider default | Embedding model name |
|
||||
| `fallback` | `string` | `"none"` | Fallback adapter ID when the primary fails |
|
||||
| `enabled` | `boolean` | `true` | Enable or disable memory search |
|
||||
|
||||
### Auto-detection order
|
||||
|
||||
@@ -76,6 +76,9 @@ When `provider` is not set, OpenClaw selects the first available:
|
||||
<Step title="mistral">
|
||||
Selected if a Mistral key can be resolved.
|
||||
</Step>
|
||||
<Step title="deepinfra">
|
||||
Selected if a DeepInfra key can be resolved.
|
||||
</Step>
|
||||
<Step title="bedrock">
|
||||
Selected if the AWS SDK credential chain resolves (instance role, access keys, profile, SSO, web identity, or shared config).
|
||||
</Step>
|
||||
@@ -87,15 +90,16 @@ When `provider` is not set, OpenClaw selects the first available:
|
||||
|
||||
Remote embeddings require an API key. Bedrock uses the AWS SDK default credential chain instead (instance roles, SSO, access keys).
|
||||
|
||||
| Provider | Env var | Config key |
|
||||
| -------------- | -------------------------------------------------- | --------------------------------- |
|
||||
| Bedrock | AWS credential chain | No API key needed |
|
||||
| Gemini | `GEMINI_API_KEY` | `models.providers.google.apiKey` |
|
||||
| GitHub Copilot | `COPILOT_GITHUB_TOKEN`, `GH_TOKEN`, `GITHUB_TOKEN` | Auth profile via device login |
|
||||
| Mistral | `MISTRAL_API_KEY` | `models.providers.mistral.apiKey` |
|
||||
| Ollama | `OLLAMA_API_KEY` (placeholder) | -- |
|
||||
| OpenAI | `OPENAI_API_KEY` | `models.providers.openai.apiKey` |
|
||||
| Voyage | `VOYAGE_API_KEY` | `models.providers.voyage.apiKey` |
|
||||
| Provider | Env var | Config key |
|
||||
| -------------- | -------------------------------------------------- | ----------------------------------- |
|
||||
| Bedrock | AWS credential chain | No API key needed |
|
||||
| DeepInfra | `DEEPINFRA_API_KEY` | `models.providers.deepinfra.apiKey` |
|
||||
| Gemini | `GEMINI_API_KEY` | `models.providers.google.apiKey` |
|
||||
| GitHub Copilot | `COPILOT_GITHUB_TOKEN`, `GH_TOKEN`, `GITHUB_TOKEN` | Auth profile via device login |
|
||||
| Mistral | `MISTRAL_API_KEY` | `models.providers.mistral.apiKey` |
|
||||
| Ollama | `OLLAMA_API_KEY` (placeholder) | -- |
|
||||
| OpenAI | `OPENAI_API_KEY` | `models.providers.openai.apiKey` |
|
||||
| Voyage | `VOYAGE_API_KEY` | `models.providers.voyage.apiKey` |
|
||||
|
||||
<Note>
|
||||
Codex OAuth covers chat/completions only and does not satisfy embedding requests.
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
summary: "Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, OpenRouter, LiteLLM, xAI, Vydra"
|
||||
summary: "Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, DeepInfra, OpenRouter, LiteLLM, xAI, Vydra"
|
||||
read_when:
|
||||
- Generating or editing images via the agent
|
||||
- Configuring image-generation providers and models
|
||||
@@ -71,6 +71,7 @@ internal image endpoints remain blocked by default.
|
||||
| OpenAI image generation with API billing | `openai/gpt-image-2` | `OPENAI_API_KEY` |
|
||||
| OpenAI image generation with Codex subscription auth | `openai/gpt-image-2` | OpenAI Codex OAuth |
|
||||
| OpenAI transparent-background PNG/WebP | `openai/gpt-image-1.5` | `OPENAI_API_KEY` or OpenAI Codex OAuth |
|
||||
| DeepInfra image generation | `deepinfra/black-forest-labs/FLUX-1-schnell` | `DEEPINFRA_API_KEY` |
|
||||
| OpenRouter image generation | `openrouter/google/gemini-3.1-flash-image-preview` | `OPENROUTER_API_KEY` |
|
||||
| LiteLLM image generation | `litellm/gpt-image-2` | `LITELLM_API_KEY` |
|
||||
| Google Gemini image generation | `google/gemini-3.1-flash-image-preview` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
|
||||
@@ -88,6 +89,7 @@ backend emits it.
|
||||
| Provider | Default model | Edit support | Auth |
|
||||
| ---------- | --------------------------------------- | ---------------------------------- | ----------------------------------------------------- |
|
||||
| ComfyUI | `workflow` | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud |
|
||||
| DeepInfra | `black-forest-labs/FLUX-1-schnell` | Yes (1 image) | `DEEPINFRA_API_KEY` |
|
||||
| fal | `fal-ai/flux/dev` | Yes | `FAL_KEY` |
|
||||
| Google | `gemini-3.1-flash-image-preview` | Yes | `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
|
||||
| LiteLLM | `gpt-image-2` | Yes (up to 5 input images) | `LITELLM_API_KEY` |
|
||||
@@ -105,13 +107,13 @@ Use `action: "list"` to inspect available providers and models at runtime:
|
||||
|
||||
## Provider capabilities
|
||||
|
||||
| Capability | ComfyUI | fal | Google | MiniMax | OpenAI | Vydra | xAI |
|
||||
| --------------------- | ------------------ | ----------------- | -------------- | --------------------- | -------------- | ----- | -------------- |
|
||||
| Generate (max count) | Workflow-defined | 4 | 4 | 9 | 4 | 1 | 4 |
|
||||
| Edit / reference | 1 image (workflow) | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
|
||||
| Size control | — | ✓ | ✓ | — | Up to 4K | — | — |
|
||||
| Aspect ratio | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
|
||||
| Resolution (1K/2K/4K) | — | ✓ | ✓ | — | — | — | 1K, 2K |
|
||||
| Capability | ComfyUI | DeepInfra | fal | Google | MiniMax | OpenAI | Vydra | xAI |
|
||||
| --------------------- | ------------------ | --------- | ----------------- | -------------- | --------------------- | -------------- | ----- | -------------- |
|
||||
| Generate (max count) | Workflow-defined | 4 | 4 | 4 | 9 | 4 | 1 | 4 |
|
||||
| Edit / reference | 1 image (workflow) | 1 image | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
|
||||
| Size control | — | ✓ | ✓ | ✓ | — | Up to 4K | — | — |
|
||||
| Aspect ratio | — | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
|
||||
| Resolution (1K/2K/4K) | — | — | ✓ | ✓ | — | — | — | 1K, 2K |
|
||||
|
||||
## Tool parameters
|
||||
|
||||
@@ -226,7 +228,7 @@ from each attempt.
|
||||
|
||||
### Image editing
|
||||
|
||||
OpenAI, OpenRouter, Google, fal, MiniMax, ComfyUI, and xAI support editing
|
||||
OpenAI, OpenRouter, Google, DeepInfra, fal, MiniMax, ComfyUI, and xAI support editing
|
||||
reference images. Pass a reference image path or URL:
|
||||
|
||||
```text
|
||||
|
||||
@@ -50,6 +50,7 @@ provider is configured.
|
||||
| Alibaba | | ✓ | | | | | |
|
||||
| BytePlus | | ✓ | | | | | |
|
||||
| ComfyUI | ✓ | ✓ | ✓ | | | | |
|
||||
| DeepInfra | ✓ | ✓ | | ✓ | ✓ | | ✓ |
|
||||
| Deepgram | | | | | ✓ | ✓ | |
|
||||
| ElevenLabs | | | | ✓ | ✓ | | |
|
||||
| fal | ✓ | ✓ | | | | | |
|
||||
@@ -94,7 +95,7 @@ original channel.
|
||||
|
||||
## Speech-to-text and Voice Call
|
||||
|
||||
Deepgram, ElevenLabs, Mistral, OpenAI, SenseAudio, and xAI can all transcribe
|
||||
Deepgram, DeepInfra, ElevenLabs, Mistral, OpenAI, SenseAudio, and xAI can all transcribe
|
||||
inbound audio through the batch `tools.media.audio` path when configured.
|
||||
Channel plugins that preflight a voice note for mention gating or command
|
||||
parsing mark the transcribed attachment on the inbound context, so the shared
|
||||
@@ -116,6 +117,13 @@ vendor without waiting for a completed recording.
|
||||
Image, video, batch TTS, batch STT, Voice Call streaming STT, backend
|
||||
realtime voice, and memory-embedding surfaces.
|
||||
</Accordion>
|
||||
<Accordion title="DeepInfra">
|
||||
Chat/model routing, image generation/editing, text-to-video, batch TTS,
|
||||
batch STT, image media understanding, and memory-embedding surfaces.
|
||||
DeepInfra-native rerank/classification/object-detection models are not
|
||||
registered until OpenClaw has dedicated provider contracts for those
|
||||
categories.
|
||||
</Accordion>
|
||||
<Accordion title="xAI">
|
||||
Image, video, search, code-execution, batch TTS, batch STT, and Voice
|
||||
Call streaming STT. xAI Realtime voice is an upstream capability but is
|
||||
|
||||
@@ -8,7 +8,7 @@ title: "Text-to-speech"
|
||||
sidebarTitle: "Text to speech (TTS)"
|
||||
---
|
||||
|
||||
OpenClaw can convert outbound replies into audio across **13 speech providers**
|
||||
OpenClaw can convert outbound replies into audio across **14 speech providers**
|
||||
and deliver native voice messages on Feishu, Matrix, Telegram, and WhatsApp,
|
||||
audio attachments everywhere else, and PCM/Ulaw streams for telephony and Talk.
|
||||
|
||||
@@ -55,6 +55,7 @@ OpenClaw picks the first configured provider in registry auto-select order.
|
||||
| Provider | Auth | Notes |
|
||||
| ----------------- | ---------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
|
||||
| **Azure Speech** | `AZURE_SPEECH_KEY` + `AZURE_SPEECH_REGION` (also `AZURE_SPEECH_API_KEY`, `SPEECH_KEY`, `SPEECH_REGION`) | Native Ogg/Opus voice-note output and telephony. |
|
||||
| **DeepInfra** | `DEEPINFRA_API_KEY` | OpenAI-compatible TTS. Defaults to `hexgrad/Kokoro-82M`. |
|
||||
| **ElevenLabs** | `ELEVENLABS_API_KEY` or `XI_API_KEY` | Voice cloning, multilingual, deterministic via `seed`. |
|
||||
| **Google Gemini** | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | Gemini API TTS; persona-aware via `promptTemplate: "audio-profile-v1"`. |
|
||||
| **Gradium** | `GRADIUM_API_KEY` | Voice-note and telephony output. |
|
||||
|
||||
@@ -9,7 +9,7 @@ sidebarTitle: "Video generation"
|
||||
---
|
||||
|
||||
OpenClaw agents can generate videos from text prompts, reference images, or
|
||||
existing videos. Fourteen provider backends are supported, each with
|
||||
existing videos. Fifteen provider backends are supported, each with
|
||||
different model options, input modes, and feature sets. The agent picks the
|
||||
right provider automatically based on your configuration and available API
|
||||
keys.
|
||||
@@ -111,6 +111,7 @@ generation.
|
||||
| BytePlus Seedance 1.5 | `seedance-1-5-pro-251215` | ✓ | Up to 2 images (first + last frame via role) | — | `BYTEPLUS_API_KEY` |
|
||||
| BytePlus Seedance 2.0 | `dreamina-seedance-2-0-260128` | ✓ | Up to 9 reference images | Up to 3 videos | `BYTEPLUS_API_KEY` |
|
||||
| ComfyUI | `workflow` | ✓ | 1 image | — | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` |
|
||||
| DeepInfra | `Pixverse/Pixverse-T2V` | ✓ | — | — | `DEEPINFRA_API_KEY` |
|
||||
| fal | `fal-ai/minimax/video-01-live` | ✓ | 1 image; up to 9 with Seedance reference-to-video | Up to 3 videos with Seedance reference-to-video | `FAL_KEY` |
|
||||
| Google | `veo-3.1-fast-generate-preview` | ✓ | 1 image | 1 video | `GEMINI_API_KEY` |
|
||||
| MiniMax | `MiniMax-Hailuo-2.3` | ✓ | 1 image | — | `MINIMAX_API_KEY` or MiniMax OAuth |
|
||||
@@ -132,20 +133,21 @@ runtime modes at runtime.
|
||||
The explicit mode contract used by `video_generate`, contract tests, and
|
||||
the shared live sweep:
|
||||
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| -------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
|
||||
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
|
||||
| MiniMax | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
|
||||
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| Vydra | ✓ | ✓ | — | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
|
||||
| xAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| --------- | :--------: | :------------: | :------------: | ---------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| ComfyUI | ✓ | ✓ | — | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| DeepInfra | ✓ | — | — | `generate`; native DeepInfra video schemas are text-to-video in the bundled contract |
|
||||
| fal | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` only when using Seedance reference-to-video |
|
||||
| Google | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because the current buffer-backed Gemini/Veo sweep does not accept that input |
|
||||
| MiniMax | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| OpenAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; shared `videoToVideo` skipped because this org/input path currently needs provider-side inpaint/remix access |
|
||||
| Qwen | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | ✓ | ✓ | — | `generate`, `imageToVideo` |
|
||||
| Vydra | ✓ | ✓ | — | `generate`; shared `imageToVideo` skipped because bundled `veo3` is text-only and bundled `kling` requires a remote image URL |
|
||||
| xAI | ✓ | ✓ | ✓ | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
|
||||
## Tool parameters
|
||||
|
||||
|
||||
Reference in New Issue
Block a user