feat(google): add realtime voice provider

2026-05-06 16:50:43 +00:00 · 2026-04-24 09:08:09 +01:00
parent c138368040
commit b5e5f2cede
13 changed files with 1127 additions and 141 deletions
--- a/docs/providers/google.md
+++ b/docs/providers/google.md
@@ -132,6 +132,7 @@ Choose your preferred auth method and follow the setup steps.
 | Image generation       | Yes                           |
 | Music generation       | Yes                           |
 | Text-to-speech         | Yes                           |
+| Realtime voice         | Yes (Google Live API)         |
 | Image understanding    | Yes                           |
 | Audio transcription    | Yes                           |
 | Video understanding    | Yes                           |
@@ -281,6 +282,63 @@ A Google Cloud Console API key restricted to the Gemini API is valid for this
 provider. This is not the separate Cloud Text-to-Speech API path.
 </Note>

+## Realtime voice
+
+The bundled `google` plugin registers a realtime voice provider backed by the
+Gemini Live API for backend audio bridges such as Voice Call and Google Meet.
+
+| Setting               | Config path                                                         | Default                                                                               |
+| --------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
+| Model                 | `plugins.entries.voice-call.config.realtime.providers.google.model` | `gemini-2.5-flash-native-audio-preview-12-2025`                                       |
+| Voice                 | `...google.voice`                                                   | `Kore`                                                                                |
+| Temperature           | `...google.temperature`                                             | (unset)                                                                               |
+| VAD start sensitivity | `...google.startSensitivity`                                        | (unset)                                                                               |
+| VAD end sensitivity   | `...google.endSensitivity`                                          | (unset)                                                                               |
+| Silence duration      | `...google.silenceDurationMs`                                       | (unset)                                                                               |
+| API key               | `...google.apiKey`                                                  | Falls back to `models.providers.google.apiKey`, `GEMINI_API_KEY`, or `GOOGLE_API_KEY` |
+
+Example Voice Call realtime config:
+
+```json5
+{
+  plugins: {
+    entries: {
+      "voice-call": {
+        enabled: true,
+        config: {
+          realtime: {
+            enabled: true,
+            provider: "google",
+            providers: {
+              google: {
+                model: "gemini-2.5-flash-native-audio-preview-12-2025",
+                voice: "Kore",
+              },
+            },
+          },
+        },
+      },
+    },
+  },
+}
+```
+
+<Note>
+Google Live API uses bidirectional audio and function calling over a WebSocket.
+OpenClaw adapts telephony/Meet bridge audio to Gemini's PCM Live API stream and
+keeps tool calls on the shared realtime voice contract. Leave `temperature`
+unset unless you need sampling changes; OpenClaw omits non-positive values
+because Google Live can return transcripts without audio for `temperature: 0`.
+Gemini API transcription is enabled without `languageCodes`; the current Google
+SDK rejects language-code hints on this API path.
+</Note>
+
+<Note>
+Control UI Talk browser sessions still require a realtime voice provider with a
+browser WebRTC session implementation. Today that path is OpenAI Realtime; the
+Google provider is for backend realtime bridges.
+</Note>
+
 ## Advanced configuration

 <AccordionGroup>