Files
openclaw/docs/providers/elevenlabs.md
2026-05-04 07:16:48 +01:00

124 lines
4.3 KiB
Markdown

---
summary: "Use ElevenLabs speech, Scribe STT, and realtime transcription with OpenClaw"
read_when:
- You want ElevenLabs text-to-speech in OpenClaw
- You want ElevenLabs Scribe speech-to-text for audio attachments
- You want ElevenLabs realtime transcription for Voice Call or Google Meet
title: "ElevenLabs"
---
OpenClaw uses ElevenLabs for text-to-speech, batch speech-to-text with Scribe
v2, and streaming STT with Scribe v2 Realtime.
| Capability | OpenClaw surface | Default |
| ------------------------ | -------------------------------------------------------------------- | ------------------------ |
| Text-to-speech | `messages.tts` / `talk` | `eleven_multilingual_v2` |
| Batch speech-to-text | `tools.media.audio` | `scribe_v2` |
| Streaming speech-to-text | Voice Call streaming or Google Meet `realtime.transcriptionProvider` | `scribe_v2_realtime` |
## Authentication
Set `ELEVENLABS_API_KEY` in the environment. `XI_API_KEY` is also accepted for
compatibility with existing ElevenLabs tooling.
```bash
export ELEVENLABS_API_KEY="..."
```
## Text-to-speech
```json5
{
messages: {
tts: {
providers: {
elevenlabs: {
apiKey: "${ELEVENLABS_API_KEY}",
voiceId: "pMsXgVXv3BLzUgSXRplE",
modelId: "eleven_multilingual_v2",
},
},
},
},
}
```
Set `modelId` to `eleven_v3` to use ElevenLabs v3 TTS. OpenClaw keeps
`eleven_multilingual_v2` as the default for existing installs.
## Speech-to-text
Use Scribe v2 for inbound audio attachments and short recorded voice segments:
```json5
{
tools: {
media: {
audio: {
enabled: true,
models: [{ provider: "elevenlabs", model: "scribe_v2" }],
},
},
},
}
```
OpenClaw sends multipart audio to ElevenLabs `/v1/speech-to-text` with
`model_id: "scribe_v2"`. Language hints map to `language_code` when present.
## Streaming STT
The bundled `elevenlabs` plugin registers Scribe v2 Realtime for Voice Call and
Google Meet agent-mode streaming transcription.
| Setting | Config path | Default |
| --------------- | ------------------------------------------------------------------------- | ------------------------------------------------- |
| API key | `plugins.entries.voice-call.config.streaming.providers.elevenlabs.apiKey` | Falls back to `ELEVENLABS_API_KEY` / `XI_API_KEY` |
| Model | `...elevenlabs.modelId` | `scribe_v2_realtime` |
| Audio format | `...elevenlabs.audioFormat` | `ulaw_8000` |
| Sample rate | `...elevenlabs.sampleRate` | `8000` |
| Commit strategy | `...elevenlabs.commitStrategy` | `vad` |
| Language | `...elevenlabs.languageCode` | (unset) |
```json5
{
plugins: {
entries: {
"voice-call": {
config: {
streaming: {
enabled: true,
provider: "elevenlabs",
providers: {
elevenlabs: {
apiKey: "${ELEVENLABS_API_KEY}",
audioFormat: "ulaw_8000",
commitStrategy: "vad",
languageCode: "en",
},
},
},
},
},
},
},
}
```
<Note>
Voice Call receives Twilio media as 8 kHz G.711 u-law. The ElevenLabs realtime
provider defaults to `ulaw_8000`, so telephony frames can be forwarded without
transcoding.
</Note>
For Google Meet agent mode, set
`plugins.entries.google-meet.config.realtime.transcriptionProvider` to
`"elevenlabs"` and configure the same provider block under
`plugins.entries.google-meet.config.realtime.providers.elevenlabs`.
## Related
- [Text-to-speech](/tools/tts)
- [Google Meet](/plugins/google-meet)
- [Model selection](/concepts/model-providers)