mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 16:01:01 +00:00
124 lines
4.3 KiB
Markdown
124 lines
4.3 KiB
Markdown
---
|
|
summary: "Use ElevenLabs speech, Scribe STT, and realtime transcription with OpenClaw"
|
|
read_when:
|
|
- You want ElevenLabs text-to-speech in OpenClaw
|
|
- You want ElevenLabs Scribe speech-to-text for audio attachments
|
|
- You want ElevenLabs realtime transcription for Voice Call or Google Meet
|
|
title: "ElevenLabs"
|
|
---
|
|
|
|
OpenClaw uses ElevenLabs for text-to-speech, batch speech-to-text with Scribe
|
|
v2, and streaming STT with Scribe v2 Realtime.
|
|
|
|
| Capability | OpenClaw surface | Default |
|
|
| ------------------------ | -------------------------------------------------------------------- | ------------------------ |
|
|
| Text-to-speech | `messages.tts` / `talk` | `eleven_multilingual_v2` |
|
|
| Batch speech-to-text | `tools.media.audio` | `scribe_v2` |
|
|
| Streaming speech-to-text | Voice Call streaming or Google Meet `realtime.transcriptionProvider` | `scribe_v2_realtime` |
|
|
|
|
## Authentication
|
|
|
|
Set `ELEVENLABS_API_KEY` in the environment. `XI_API_KEY` is also accepted for
|
|
compatibility with existing ElevenLabs tooling.
|
|
|
|
```bash
|
|
export ELEVENLABS_API_KEY="..."
|
|
```
|
|
|
|
## Text-to-speech
|
|
|
|
```json5
|
|
{
|
|
messages: {
|
|
tts: {
|
|
providers: {
|
|
elevenlabs: {
|
|
apiKey: "${ELEVENLABS_API_KEY}",
|
|
voiceId: "pMsXgVXv3BLzUgSXRplE",
|
|
modelId: "eleven_multilingual_v2",
|
|
},
|
|
},
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
Set `modelId` to `eleven_v3` to use ElevenLabs v3 TTS. OpenClaw keeps
|
|
`eleven_multilingual_v2` as the default for existing installs.
|
|
|
|
## Speech-to-text
|
|
|
|
Use Scribe v2 for inbound audio attachments and short recorded voice segments:
|
|
|
|
```json5
|
|
{
|
|
tools: {
|
|
media: {
|
|
audio: {
|
|
enabled: true,
|
|
models: [{ provider: "elevenlabs", model: "scribe_v2" }],
|
|
},
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
OpenClaw sends multipart audio to ElevenLabs `/v1/speech-to-text` with
|
|
`model_id: "scribe_v2"`. Language hints map to `language_code` when present.
|
|
|
|
## Streaming STT
|
|
|
|
The bundled `elevenlabs` plugin registers Scribe v2 Realtime for Voice Call and
|
|
Google Meet agent-mode streaming transcription.
|
|
|
|
| Setting | Config path | Default |
|
|
| --------------- | ------------------------------------------------------------------------- | ------------------------------------------------- |
|
|
| API key | `plugins.entries.voice-call.config.streaming.providers.elevenlabs.apiKey` | Falls back to `ELEVENLABS_API_KEY` / `XI_API_KEY` |
|
|
| Model | `...elevenlabs.modelId` | `scribe_v2_realtime` |
|
|
| Audio format | `...elevenlabs.audioFormat` | `ulaw_8000` |
|
|
| Sample rate | `...elevenlabs.sampleRate` | `8000` |
|
|
| Commit strategy | `...elevenlabs.commitStrategy` | `vad` |
|
|
| Language | `...elevenlabs.languageCode` | (unset) |
|
|
|
|
```json5
|
|
{
|
|
plugins: {
|
|
entries: {
|
|
"voice-call": {
|
|
config: {
|
|
streaming: {
|
|
enabled: true,
|
|
provider: "elevenlabs",
|
|
providers: {
|
|
elevenlabs: {
|
|
apiKey: "${ELEVENLABS_API_KEY}",
|
|
audioFormat: "ulaw_8000",
|
|
commitStrategy: "vad",
|
|
languageCode: "en",
|
|
},
|
|
},
|
|
},
|
|
},
|
|
},
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
<Note>
|
|
Voice Call receives Twilio media as 8 kHz G.711 u-law. The ElevenLabs realtime
|
|
provider defaults to `ulaw_8000`, so telephony frames can be forwarded without
|
|
transcoding.
|
|
</Note>
|
|
|
|
For Google Meet agent mode, set
|
|
`plugins.entries.google-meet.config.realtime.transcriptionProvider` to
|
|
`"elevenlabs"` and configure the same provider block under
|
|
`plugins.entries.google-meet.config.realtime.providers.elevenlabs`.
|
|
|
|
## Related
|
|
|
|
- [Text-to-speech](/tools/tts)
|
|
- [Google Meet](/plugins/google-meet)
|
|
- [Model selection](/concepts/model-providers)
|