fix: support Google Meet realtime barge-in (#73834)

Replay #73834 onto current main and preserve provider-side interruption when Google Meet detects a local human barge-in.

Thanks @shhtheonlyperson.
This commit is contained in:
ShihChi Huang
2026-05-01 01:00:50 -07:00
committed by GitHub
parent 250376f885
commit 0c3d1892cd
16 changed files with 577 additions and 28 deletions

View File

@@ -1,4 +1,4 @@
a69e6b650513c2a697ee51087928bf78f63ba998c7c60f8cca61dd65a0184fd0 config-baseline.json
13b715c3aac380161ec167bccfcfb902c3231a802a08ab7ca9ef760e0c11913a config-baseline.json
0a259216178a582c567d1fa48c5236bff4bbd27c3e6af838ffcd042459ffce3c config-baseline.core.json
92712871defa92eeda8161b516db85574681f2b70678b940508a808b987aeae2 config-baseline.channel.json
6005cf9f6e8c9f25ef97207b5eee29ae0e506cf910cdeca77fc9894ad1755b1f config-baseline.plugin.json
da8e055ebba0730498703d209f9e2cfaa1484a83f3240e611dcdd7280e22a525 config-baseline.channel.json
8d41287cd9cb696cf8a5e8810bd731b9eda4af9b0829c6dadae2da56e19dc644 config-baseline.plugin.json

View File

@@ -924,6 +924,16 @@ Defaults:
and writing audio in `chrome.audioFormat`
- `chrome.audioOutputCommand`: SoX command reading audio in `chrome.audioFormat`
and writing to CoreAudio `BlackHole 2ch`
- `chrome.bargeInInputCommand`: optional local microphone command that writes
signed 16-bit little-endian mono PCM for human barge-in detection while
assistant playback is active. This currently applies to the Gateway-hosted
`chrome` command-pair bridge.
- `chrome.bargeInRmsThreshold: 650`: RMS level that counts as a human
interruption on `chrome.bargeInInputCommand`
- `chrome.bargeInPeakThreshold: 2500`: peak level that counts as a human
interruption on `chrome.bargeInInputCommand`
- `chrome.bargeInCooldownMs: 900`: minimum delay between repeated human
interruption clears
- `realtime.provider: "openai"`
- `realtime.toolPolicy: "safe-read-only"`
- `realtime.instructions`: brief spoken replies, with
@@ -946,6 +956,24 @@ Optional overrides:
chrome: {
guestName: "OpenClaw Agent",
waitForInCallMs: 30000,
bargeInInputCommand: [
"sox",
"-q",
"-t",
"coreaudio",
"External Microphone",
"-r",
"24000",
"-c",
"1",
"-b",
"16",
"-e",
"signed-integer",
"-t",
"raw",
"-",
],
},
chromeNode: {
node: "parallels-macos",
@@ -1028,6 +1056,8 @@ a session ended.
not send the intro/test phrase into the audio bridge.
- `providerConnected` / `realtimeReady`: realtime voice bridge state
- `lastInputAt` / `lastOutputAt`: last audio seen from or sent to the bridge
- `lastSuppressedInputAt` / `suppressedInputBytes`: loopback input ignored while
assistant playback is active
```json
{
@@ -1448,6 +1478,14 @@ For clean duplex audio, route Meet output and Meet microphone through separate
virtual devices or a Loopback-style virtual device graph. A single shared
BlackHole device can echo other participants back into the call.
With the command-pair Chrome bridge, `chrome.bargeInInputCommand` can listen to a
separate local microphone and clear assistant playback when the human starts
talking. This keeps human speech ahead of assistant output even when the shared
BlackHole loopback input is temporarily suppressed during assistant playback.
Like `chrome.audioInputCommand` and `chrome.audioOutputCommand`, it is an
operator-configured local command. Use an explicit trusted command path or
argument list, and do not point it at scripts from untrusted locations.
`googlemeet speak` triggers the active realtime audio bridge for a Chrome
session. `googlemeet leave` stops that bridge. For Twilio sessions delegated
through the Voice Call plugin, `leave` also hangs up the underlying voice call.

View File

@@ -593,6 +593,7 @@ API key auth, and dynamic model resolution.
connect: async () => {},
sendAudio: () => {},
setMediaTimestamp: () => {},
handleBargeIn: () => {},
submitToolResult: () => {},
acknowledgeMark: () => {},
close: () => {},
@@ -600,6 +601,10 @@ API key auth, and dynamic model resolution.
}),
});
```
Implement `handleBargeIn` when a transport can detect that a human is
interrupting assistant playback and the provider supports truncating or
clearing the active audio response.
</Tab>
<Tab title="Media understanding">
```typescript