fix: support Google Meet realtime barge-in (#73834)

Replay #73834 onto current main and preserve provider-side interruption when Google Meet detects a local human barge-in.

Thanks @shhtheonlyperson.
This commit is contained in:
ShihChi Huang
2026-05-01 01:00:50 -07:00
committed by GitHub
parent 250376f885
commit 0c3d1892cd
16 changed files with 577 additions and 28 deletions

View File

@@ -924,6 +924,16 @@ Defaults:
and writing audio in `chrome.audioFormat`
- `chrome.audioOutputCommand`: SoX command reading audio in `chrome.audioFormat`
and writing to CoreAudio `BlackHole 2ch`
- `chrome.bargeInInputCommand`: optional local microphone command that writes
signed 16-bit little-endian mono PCM for human barge-in detection while
assistant playback is active. This currently applies to the Gateway-hosted
`chrome` command-pair bridge.
- `chrome.bargeInRmsThreshold: 650`: RMS level that counts as a human
interruption on `chrome.bargeInInputCommand`
- `chrome.bargeInPeakThreshold: 2500`: peak level that counts as a human
interruption on `chrome.bargeInInputCommand`
- `chrome.bargeInCooldownMs: 900`: minimum delay between repeated human
interruption clears
- `realtime.provider: "openai"`
- `realtime.toolPolicy: "safe-read-only"`
- `realtime.instructions`: brief spoken replies, with
@@ -946,6 +956,24 @@ Optional overrides:
chrome: {
guestName: "OpenClaw Agent",
waitForInCallMs: 30000,
bargeInInputCommand: [
"sox",
"-q",
"-t",
"coreaudio",
"External Microphone",
"-r",
"24000",
"-c",
"1",
"-b",
"16",
"-e",
"signed-integer",
"-t",
"raw",
"-",
],
},
chromeNode: {
node: "parallels-macos",
@@ -1028,6 +1056,8 @@ a session ended.
not send the intro/test phrase into the audio bridge.
- `providerConnected` / `realtimeReady`: realtime voice bridge state
- `lastInputAt` / `lastOutputAt`: last audio seen from or sent to the bridge
- `lastSuppressedInputAt` / `suppressedInputBytes`: loopback input ignored while
assistant playback is active
```json
{
@@ -1448,6 +1478,14 @@ For clean duplex audio, route Meet output and Meet microphone through separate
virtual devices or a Loopback-style virtual device graph. A single shared
BlackHole device can echo other participants back into the call.
With the command-pair Chrome bridge, `chrome.bargeInInputCommand` can listen to a
separate local microphone and clear assistant playback when the human starts
talking. This keeps human speech ahead of assistant output even when the shared
BlackHole loopback input is temporarily suppressed during assistant playback.
Like `chrome.audioInputCommand` and `chrome.audioOutputCommand`, it is an
operator-configured local command. Use an explicit trusted command path or
argument list, and do not point it at scripts from untrusted locations.
`googlemeet speak` triggers the active realtime audio bridge for a Chrome
session. `googlemeet leave` stops that bridge. For Twilio sessions delegated
through the Voice Call plugin, `leave` also hangs up the underlying voice call.

View File

@@ -593,6 +593,7 @@ API key auth, and dynamic model resolution.
connect: async () => {},
sendAudio: () => {},
setMediaTimestamp: () => {},
handleBargeIn: () => {},
submitToolResult: () => {},
acknowledgeMark: () => {},
close: () => {},
@@ -600,6 +601,10 @@ API key auth, and dynamic model resolution.
}),
});
```
Implement `handleBargeIn` when a transport can detect that a human is
interrupting assistant playback and the provider supports truncating or
clearing the active audio response.
</Tab>
<Tab title="Media understanding">
```typescript