docs: document Google realtime voice support

This commit is contained in:
Peter Steinberger
2026-04-24 10:14:19 +01:00
parent 6831579267
commit e5f55dd024
3 changed files with 138 additions and 28 deletions

View File

@@ -26,12 +26,15 @@ The plugin is explicit by design:
## Quick start
Install the local audio dependencies and make sure the realtime provider can use
OpenAI:
Install the local audio dependencies and configure a backend realtime voice
provider. OpenAI is the default; Google Gemini Live also works with
`realtime.provider: "google"`:
```bash
brew install blackhole-2ch sox
export OPENAI_API_KEY=sk-...
# or
export GEMINI_API_KEY=...
```
`blackhole-2ch` installs the `BlackHole 2ch` virtual audio device. Homebrew's
@@ -319,11 +322,14 @@ Workspace Developer Preview Program for Meet media APIs.
## Config
The common Chrome realtime path only needs the plugin enabled, BlackHole, SoX,
and an OpenAI key:
and a backend realtime voice provider key. OpenAI is the default; set
`realtime.provider: "google"` to use Google Gemini Live:
```bash
brew install blackhole-2ch sox
export OPENAI_API_KEY=sk-...
# or
export GEMINI_API_KEY=...
```
Set the plugin config under `plugins.entries.google-meet.config`:
@@ -372,8 +378,15 @@ Optional overrides:
node: "parallels-macos",
},
realtime: {
provider: "google",
toolPolicy: "owner",
introMessage: "Say exactly: I'm here.",
providers: {
google: {
model: "gemini-2.5-flash-native-audio-preview-12-2025",
voice: "Kore",
},
},
},
}
```

View File

@@ -122,6 +122,17 @@ Set config under `plugins.entries.voice-call.config`:
maxPendingConnectionsPerIp: 4,
maxConnections: 128,
},
realtime: {
enabled: false,
provider: "google", // optional; first registered realtime voice provider when unset
providers: {
google: {
model: "gemini-2.5-flash-native-audio-preview-12-2025",
voice: "Kore",
},
},
},
},
},
},
@@ -140,6 +151,7 @@ Notes:
- If you use ngrok free tier, set `publicUrl` to the exact ngrok URL; signature verification is always enforced.
- `tunnel.allowNgrokFreeTierLoopbackBypass: true` allows Twilio webhooks with invalid signatures **only** when `tunnel.provider="ngrok"` and `serve.bind` is loopback (ngrok local agent). Use for local dev only.
- Ngrok free tier URLs can change or add interstitial behavior; if `publicUrl` drifts, Twilio signatures will fail. For production, prefer a stable domain or Tailscale funnel.
- `realtime.enabled` starts full voice-to-voice conversations; do not enable it together with `streaming.enabled`.
- Streaming security defaults:
- `streaming.preStartTimeoutMs` closes sockets that never send a valid `start` frame.
- `streaming.maxPendingConnections` caps total unauthenticated pre-start sockets.
@@ -147,6 +159,89 @@ Notes:
- `streaming.maxConnections` caps total open media stream sockets (pending + active).
- Runtime fallback still accepts those old voice-call keys for now, but the rewrite path is `openclaw doctor --fix` and the compat shim is temporary.
## Realtime voice conversations
`realtime` selects a full duplex realtime voice provider for live call audio.
It is separate from `streaming`, which only forwards audio to realtime
transcription providers.
Current runtime behavior:
- `realtime.enabled` is supported for Twilio Media Streams.
- `realtime.enabled` cannot be combined with `streaming.enabled`.
- `realtime.provider` is optional. If unset, Voice Call uses the first
registered realtime voice provider.
- Bundled realtime voice providers include Google Gemini Live (`google`) and
OpenAI (`openai`), registered by their provider plugins.
- Provider-owned raw config lives under `realtime.providers.<providerId>`.
- If `realtime.provider` points at an unregistered provider, or no realtime
voice provider is registered at all, Voice Call logs a warning and skips
realtime media instead of failing the whole plugin.
Google Gemini Live realtime defaults:
- API key: `realtime.providers.google.apiKey`, `GEMINI_API_KEY`, or
`GOOGLE_GENERATIVE_AI_API_KEY`
- model: `gemini-2.5-flash-native-audio-preview-12-2025`
- voice: `Kore`
Example:
```json5
{
plugins: {
entries: {
"voice-call": {
config: {
provider: "twilio",
inboundPolicy: "allowlist",
allowFrom: ["+15550005678"],
realtime: {
enabled: true,
provider: "google",
instructions: "Speak briefly and ask before using tools.",
providers: {
google: {
apiKey: "${GEMINI_API_KEY}",
model: "gemini-2.5-flash-native-audio-preview-12-2025",
voice: "Kore",
},
},
},
},
},
},
},
}
```
Use OpenAI instead:
```json5
{
plugins: {
entries: {
"voice-call": {
config: {
realtime: {
enabled: true,
provider: "openai",
providers: {
openai: {
apiKey: "${OPENAI_API_KEY}",
},
},
},
},
},
},
},
}
```
See [Google provider](/providers/google) and [OpenAI provider](/providers/openai)
for provider-specific realtime voice options.
## Streaming transcription
`streaming` selects a realtime transcription provider for live call audio.