mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 16:50:43 +00:00
feat(google): add realtime voice provider
This commit is contained in:
@@ -132,6 +132,7 @@ Choose your preferred auth method and follow the setup steps.
|
||||
| Image generation | Yes |
|
||||
| Music generation | Yes |
|
||||
| Text-to-speech | Yes |
|
||||
| Realtime voice | Yes (Google Live API) |
|
||||
| Image understanding | Yes |
|
||||
| Audio transcription | Yes |
|
||||
| Video understanding | Yes |
|
||||
@@ -281,6 +282,63 @@ A Google Cloud Console API key restricted to the Gemini API is valid for this
|
||||
provider. This is not the separate Cloud Text-to-Speech API path.
|
||||
</Note>
|
||||
|
||||
## Realtime voice
|
||||
|
||||
The bundled `google` plugin registers a realtime voice provider backed by the
|
||||
Gemini Live API for backend audio bridges such as Voice Call and Google Meet.
|
||||
|
||||
| Setting | Config path | Default |
|
||||
| --------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
|
||||
| Model | `plugins.entries.voice-call.config.realtime.providers.google.model` | `gemini-2.5-flash-native-audio-preview-12-2025` |
|
||||
| Voice | `...google.voice` | `Kore` |
|
||||
| Temperature | `...google.temperature` | (unset) |
|
||||
| VAD start sensitivity | `...google.startSensitivity` | (unset) |
|
||||
| VAD end sensitivity | `...google.endSensitivity` | (unset) |
|
||||
| Silence duration | `...google.silenceDurationMs` | (unset) |
|
||||
| API key | `...google.apiKey` | Falls back to `models.providers.google.apiKey`, `GEMINI_API_KEY`, or `GOOGLE_API_KEY` |
|
||||
|
||||
Example Voice Call realtime config:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
enabled: true,
|
||||
config: {
|
||||
realtime: {
|
||||
enabled: true,
|
||||
provider: "google",
|
||||
providers: {
|
||||
google: {
|
||||
model: "gemini-2.5-flash-native-audio-preview-12-2025",
|
||||
voice: "Kore",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
Google Live API uses bidirectional audio and function calling over a WebSocket.
|
||||
OpenClaw adapts telephony/Meet bridge audio to Gemini's PCM Live API stream and
|
||||
keeps tool calls on the shared realtime voice contract. Leave `temperature`
|
||||
unset unless you need sampling changes; OpenClaw omits non-positive values
|
||||
because Google Live can return transcripts without audio for `temperature: 0`.
|
||||
Gemini API transcription is enabled without `languageCodes`; the current Google
|
||||
SDK rejects language-code hints on this API path.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
Control UI Talk browser sessions still require a realtime voice provider with a
|
||||
browser WebRTC session implementation. Today that path is OpenAI Realtime; the
|
||||
Google provider is for backend realtime bridges.
|
||||
</Note>
|
||||
|
||||
## Advanced configuration
|
||||
|
||||
<AccordionGroup>
|
||||
|
||||
Reference in New Issue
Block a user