mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 09:40:43 +00:00
feat(google): add realtime voice provider
This commit is contained in:
@@ -132,6 +132,7 @@ Choose your preferred auth method and follow the setup steps.
|
||||
| Image generation | Yes |
|
||||
| Music generation | Yes |
|
||||
| Text-to-speech | Yes |
|
||||
| Realtime voice | Yes (Google Live API) |
|
||||
| Image understanding | Yes |
|
||||
| Audio transcription | Yes |
|
||||
| Video understanding | Yes |
|
||||
@@ -281,6 +282,63 @@ A Google Cloud Console API key restricted to the Gemini API is valid for this
|
||||
provider. This is not the separate Cloud Text-to-Speech API path.
|
||||
</Note>
|
||||
|
||||
## Realtime voice
|
||||
|
||||
The bundled `google` plugin registers a realtime voice provider backed by the
|
||||
Gemini Live API for backend audio bridges such as Voice Call and Google Meet.
|
||||
|
||||
| Setting | Config path | Default |
|
||||
| --------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
|
||||
| Model | `plugins.entries.voice-call.config.realtime.providers.google.model` | `gemini-2.5-flash-native-audio-preview-12-2025` |
|
||||
| Voice | `...google.voice` | `Kore` |
|
||||
| Temperature | `...google.temperature` | (unset) |
|
||||
| VAD start sensitivity | `...google.startSensitivity` | (unset) |
|
||||
| VAD end sensitivity | `...google.endSensitivity` | (unset) |
|
||||
| Silence duration | `...google.silenceDurationMs` | (unset) |
|
||||
| API key | `...google.apiKey` | Falls back to `models.providers.google.apiKey`, `GEMINI_API_KEY`, or `GOOGLE_API_KEY` |
|
||||
|
||||
Example Voice Call realtime config:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
enabled: true,
|
||||
config: {
|
||||
realtime: {
|
||||
enabled: true,
|
||||
provider: "google",
|
||||
providers: {
|
||||
google: {
|
||||
model: "gemini-2.5-flash-native-audio-preview-12-2025",
|
||||
voice: "Kore",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
Google Live API uses bidirectional audio and function calling over a WebSocket.
|
||||
OpenClaw adapts telephony/Meet bridge audio to Gemini's PCM Live API stream and
|
||||
keeps tool calls on the shared realtime voice contract. Leave `temperature`
|
||||
unset unless you need sampling changes; OpenClaw omits non-positive values
|
||||
because Google Live can return transcripts without audio for `temperature: 0`.
|
||||
Gemini API transcription is enabled without `languageCodes`; the current Google
|
||||
SDK rejects language-code hints on this API path.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
Control UI Talk browser sessions still require a realtime voice provider with a
|
||||
browser WebRTC session implementation. Today that path is OpenAI Realtime; the
|
||||
Google provider is for backend realtime bridges.
|
||||
</Note>
|
||||
|
||||
## Advanced configuration
|
||||
|
||||
<AccordionGroup>
|
||||
|
||||
@@ -156,12 +156,14 @@ Cron jobs panel notes:
|
||||
- `chat.history` also strips display-only inline directive tags from visible assistant text (for example `[[reply_to_*]]` and `[[audio_as_voice]]`), plain-text tool-call XML payloads (including `<tool_call>...</tool_call>`, `<function_call>...</function_call>`, `<tool_calls>...</tool_calls>`, `<function_calls>...</function_calls>`, and truncated tool-call blocks), and leaked ASCII/full-width model control tokens, and omits assistant entries whose whole visible text is only the exact silent token `NO_REPLY` / `no_reply`.
|
||||
- `chat.inject` appends an assistant note to the session transcript and broadcasts a `chat` event for UI-only updates (no agent run, no channel delivery).
|
||||
- The chat header model and thinking pickers patch the active session immediately through `sessions.patch`; they are persistent session overrides, not one-turn-only send options.
|
||||
- Talk mode uses the registered realtime voice provider. Configure OpenAI with
|
||||
`talk.provider: "openai"` plus `talk.providers.openai.apiKey`, or reuse the
|
||||
Voice Call realtime provider config. The browser never receives the standard
|
||||
OpenAI API key; it receives only the ephemeral Realtime client secret. The
|
||||
Realtime session prompt is assembled by the Gateway; `talk.realtime.session`
|
||||
does not accept caller-provided instruction overrides.
|
||||
- Talk mode uses a registered realtime voice provider that supports browser
|
||||
WebRTC sessions. Configure OpenAI with `talk.provider: "openai"` plus
|
||||
`talk.providers.openai.apiKey`, or reuse the Voice Call realtime provider
|
||||
config. The browser never receives the standard OpenAI API key; it receives
|
||||
only the ephemeral Realtime client secret. Google Live realtime voice is
|
||||
supported for backend Voice Call and Google Meet bridges, but not this browser
|
||||
WebRTC path yet. The Realtime session prompt is assembled by the Gateway;
|
||||
`talk.realtime.session` does not accept caller-provided instruction overrides.
|
||||
- In the Chat composer, the Talk control is the waves button next to the
|
||||
microphone dictation button. When Talk starts, the composer status row shows
|
||||
`Connecting Talk...`, then `Talk live` while audio is connected, or
|
||||
|
||||
Reference in New Issue
Block a user