diff --git a/docs/plugins/google-meet.md b/docs/plugins/google-meet.md index e5364f83ab5..9e5cd6d1351 100644 --- a/docs/plugins/google-meet.md +++ b/docs/plugins/google-meet.md @@ -1131,6 +1131,50 @@ Optional overrides: } ``` +ElevenLabs for both agent-mode listening and speaking: + +```json5 +{ + messages: { + tts: { + provider: "elevenlabs", + providers: { + elevenlabs: { + modelId: "eleven_v3", + voiceId: "pMsXgVXv3BLzUgSXRplE", + }, + }, + }, + }, + plugins: { + entries: { + "google-meet": { + config: { + realtime: { + transcriptionProvider: "elevenlabs", + providers: { + elevenlabs: { + modelId: "scribe_v2_realtime", + audioFormat: "ulaw_8000", + sampleRate: 8000, + commitStrategy: "vad", + }, + }, + }, + }, + }, + }, + }, +} +``` + +The persistent Meet voice comes from +`messages.tts.providers.elevenlabs.voiceId`. Agent replies can also use +per-reply `[[tts:voiceId=... model=eleven_v3]]` directives when TTS model +overrides are enabled, but config is the deterministic default for meetings. +On join, the logs should show `transcriptionProvider=elevenlabs` and each +spoken reply should log `provider=elevenlabs model=eleven_v3 voice=`. + Twilio-only config: ```json5 diff --git a/docs/providers/elevenlabs.md b/docs/providers/elevenlabs.md index 14a2d29b60f..1157bf72405 100644 --- a/docs/providers/elevenlabs.md +++ b/docs/providers/elevenlabs.md @@ -3,18 +3,18 @@ summary: "Use ElevenLabs speech, Scribe STT, and realtime transcription with Ope read_when: - You want ElevenLabs text-to-speech in OpenClaw - You want ElevenLabs Scribe speech-to-text for audio attachments - - You want ElevenLabs realtime transcription for Voice Call + - You want ElevenLabs realtime transcription for Voice Call or Google Meet title: "ElevenLabs" --- OpenClaw uses ElevenLabs for text-to-speech, batch speech-to-text with Scribe -v2, and Voice Call streaming STT with Scribe v2 Realtime. +v2, and streaming STT with Scribe v2 Realtime. -| Capability | OpenClaw surface | Default | -| ------------------------ | --------------------------------------------- | ------------------------ | -| Text-to-speech | `messages.tts` / `talk` | `eleven_multilingual_v2` | -| Batch speech-to-text | `tools.media.audio` | `scribe_v2` | -| Streaming speech-to-text | Voice Call `streaming.provider: "elevenlabs"` | `scribe_v2_realtime` | +| Capability | OpenClaw surface | Default | +| ------------------------ | -------------------------------------------------------------------- | ------------------------ | +| Text-to-speech | `messages.tts` / `talk` | `eleven_multilingual_v2` | +| Batch speech-to-text | `tools.media.audio` | `scribe_v2` | +| Streaming speech-to-text | Voice Call streaming or Google Meet `realtime.transcriptionProvider` | `scribe_v2_realtime` | ## Authentication @@ -66,10 +66,10 @@ Use Scribe v2 for inbound audio attachments and short recorded voice segments: OpenClaw sends multipart audio to ElevenLabs `/v1/speech-to-text` with `model_id: "scribe_v2"`. Language hints map to `language_code` when present. -## Voice Call streaming STT +## Streaming STT -The bundled `elevenlabs` plugin registers Scribe v2 Realtime for Voice Call -streaming transcription. +The bundled `elevenlabs` plugin registers Scribe v2 Realtime for Voice Call and +Google Meet agent-mode streaming transcription. | Setting | Config path | Default | | --------------- | ------------------------------------------------------------------------- | ------------------------------------------------- | @@ -111,7 +111,13 @@ provider defaults to `ulaw_8000`, so telephony frames can be forwarded without transcoding. +For Google Meet agent mode, set +`plugins.entries.google-meet.config.realtime.transcriptionProvider` to +`"elevenlabs"` and configure the same provider block under +`plugins.entries.google-meet.config.realtime.providers.elevenlabs`. + ## Related - [Text-to-speech](/tools/tts) +- [Google Meet](/plugins/google-meet) - [Model selection](/concepts/model-providers)