fix(google-meet): harden observe mode speech health (#73256)

* fix(google-meet): harden observe mode speech health

* fix(google-meet): address observe speech review

* docs(google-meet): clarify observe mode guarantees
This commit is contained in:
Peter Steinberger
2026-04-28 06:21:10 +01:00
committed by GitHub
parent 2633b14914
commit 25851e3cae
10 changed files with 398 additions and 154 deletions

View File

@@ -63,6 +63,7 @@ Docs: https://docs.openclaw.ai
- Agents/ACPX: stage the patched Claude ACP adapter as an ACPX runtime dependency and route known Codex/Claude ACP commands through local wrappers, so Gateway runtime no longer depends on live `npx` adapter resolution. Fixes #73202. Thanks @joerod26.
- Memory/compaction: let pre-compaction memory flush use an exact `agents.defaults.compaction.memoryFlush.model` override such as `ollama/qwen3:8b` without inheriting the active session fallback chain, so local housekeeping can avoid paid conversation models. Fixes #53772. Thanks @limen96.
- macOS/update: stop managed Gateway services before package replacement and keep LaunchAgent service secrets out of world-readable plist metadata by loading them from owner-only env files. Fixes #72996. Thanks @Mathewb7.
- Google Meet: keep observe-only Chrome joins and setup checks from requiring BlackHole or audio bridge commands, avoid granting or selecting the microphone in observe-only mode, and make `test_speech` report fresh realtime output-byte verification instead of only confirming a queued utterance. Refs #72478. Thanks @DougButdorf.
- Gateway/hooks: route non-delivered hook completion and error summaries to the target agent's main session instead of the default agent session, preserving multi-agent hook isolation. Fixes #24693; carries forward #68667. Thanks @abersonFAC and @bluesky6868.
- Control UI/models: request the configured Gateway model-list view so dashboards with only `models.providers.*.models` show those configured models first instead of flooding the picker with the full built-in catalog. Fixes #65405. Thanks @wbyanclaw.
- CLI/models: keep default-model and allowlist pickers on explicit `models.providers.*.models` entries when `models.mode` is `replace` instead of loading the full built-in catalog. Fixes #64950. Thanks @mrozentsvayg.

View File

@@ -74,12 +74,21 @@ Check setup:
openclaw googlemeet setup
```
The setup output is meant to be agent-readable. It reports Chrome profile,
audio bridge, node pinning, delayed realtime intro, and, when Twilio delegation
is configured, whether the `voice-call` plugin and Twilio credentials are ready.
Treat any `ok: false` check as a blocker before asking an agent to join.
Use `openclaw googlemeet setup --json` for scripts or machine-readable output.
Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio`
The setup output is meant to be agent-readable and mode-aware. It reports Chrome
profile, node pinning, and, for realtime Chrome joins, the BlackHole/SoX audio
bridge and delayed realtime intro checks. For observe-only joins, check the same
transport with `--mode transcribe`; that mode skips realtime audio prerequisites
because it does not listen through or speak through the bridge:
```bash
openclaw googlemeet setup --transport chrome-node --mode transcribe
```
When Twilio delegation is configured, setup also reports whether the
`voice-call` plugin and Twilio credentials are ready. Treat any `ok: false`
check as a blocker for the checked transport and mode before asking an agent to
join. Use `openclaw googlemeet setup --json` for scripts or machine-readable
output. Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio`
to preflight a specific transport before an agent tries it.
Join a meeting:
@@ -144,8 +153,12 @@ then share the returned `meetingUri`.
```
For an observe-only/browser-control join, set `"mode": "transcribe"`. That does
not start the duplex realtime model bridge, so it will not talk back into the
meeting.
not start the duplex realtime model bridge, does not require BlackHole or SoX,
and will not talk back into the meeting. Chrome joins in this mode also avoid
OpenClaw's microphone/camera permission grant and avoid the Meet **Use
microphone** path. If Meet shows an audio-choice interstitial, automation tries
the no-microphone path and otherwise reports a manual action instead of opening
the local microphone.
During realtime sessions, `google_meet` status includes browser and audio bridge
health such as `inCall`, `manualActionRequired`, `providerConnected`,
@@ -155,10 +168,10 @@ appears, browser automation handles it when it can. Login, host admission, and
browser/OS permission prompts are reported as manual action with a reason and
message for the agent to relay.
Local Chrome joins through the signed-in OpenClaw browser profile. In Meet, pick
`BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For clean
duplex audio, use separate virtual devices or a Loopback-style graph; a single
BlackHole device is enough for a first smoke test but can echo.
Local Chrome joins through the signed-in OpenClaw browser profile. Realtime mode
requires `BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For
clean duplex audio, use separate virtual devices or a Loopback-style graph; a
single BlackHole device is enough for a first smoke test but can echo.
### Local gateway + Parallels Chrome
@@ -286,13 +299,13 @@ phrase, and prints session health:
openclaw googlemeet test-speech https://meet.google.com/abc-defg-hij
```
During join, OpenClaw browser automation fills the guest name, clicks Join/Ask
to join, and accepts Meet's first-run "Use microphone" choice when that prompt
appears. During browser-only meeting creation, it can also continue past the
same prompt without microphone if Meet does not expose the use-microphone button.
If the browser profile is not signed in, Meet is waiting for host
admission, Chrome needs microphone/camera permission, or Meet is stuck on a
prompt automation could not resolve, the join/test-speech result reports
During realtime join, OpenClaw browser automation fills the guest name, clicks
Join/Ask to join, and accepts Meet's first-run "Use microphone" choice when that
prompt appears. During observe-only join or browser-only meeting creation, it
continues past the same prompt without microphone when that choice is available.
If the browser profile is not signed in, Meet is waiting for host admission,
Chrome needs microphone/camera permission for a realtime join, or Meet is stuck
on a prompt automation could not resolve, the join/test-speech result reports
`manualActionRequired: true` with `manualActionReason` and
`manualActionMessage`. Agents should stop retrying the join, report that exact
message plus the current `browserUrl`/`browserTitle`, and retry only after the
@@ -979,7 +992,12 @@ Use `action: "status"` to list active sessions or inspect a session ID. Use
`action: "speak"` with `sessionId` and `message` to make the realtime agent
speak immediately. Use `action: "test_speech"` to create or reuse the session,
trigger a known phrase, and return `inCall` health when the Chrome host can
report it. Use `action: "leave"` to mark a session ended.
report it. `test_speech` always forces `mode: "realtime"` and fails if asked to
run in `mode: "transcribe"` because observe-only sessions intentionally cannot
emit speech. Its `speechOutputVerified` result is based on realtime audio output
bytes increasing during this test call, so a reused session with older audio
does not count as a fresh successful speech check. Use `action: "leave"` to mark
a session ended.
`status` includes Chrome health when available:
@@ -1224,7 +1242,12 @@ openclaw googlemeet doctor
```
Use `mode: "realtime"` for listen/talk-back. `mode: "transcribe"` intentionally
does not start the duplex realtime voice bridge.
does not start the duplex realtime voice bridge. `googlemeet test-speech`
always checks the realtime path and reports whether bridge output bytes were
observed for that invocation. If `speechOutputVerified` is false and
`speechOutputTimedOut` is true, the realtime provider may have accepted the
utterance but OpenClaw did not see new output bytes reach the Chrome audio
bridge.
Also verify:
@@ -1317,7 +1340,7 @@ call still needs a participant path. This plugin keeps that boundary visible:
Chrome handles browser participation and local audio routing; Twilio handles
phone dial-in participation.
Chrome realtime mode needs either:
Chrome realtime mode needs `BlackHole 2ch` plus either:
- `chrome.audioInputCommand` plus `chrome.audioOutputCommand`: OpenClaw owns the
realtime model bridge and pipes audio in `chrome.audioFormat` between those

View File

@@ -110,7 +110,10 @@ function mockLocalMeetBrowserRequest(
params?: unknown,
_extra?: unknown,
): Promise<Record<string, unknown>> => {
const request = params as { path?: string; body?: { targetId?: string; url?: string } };
const request = params as {
path?: string;
body?: { fn?: string; targetId?: string; url?: string };
};
if (request.path === "/tabs") {
return { tabs: [] };
}
@@ -1298,6 +1301,52 @@ describe("google-meet plugin", () => {
}
});
it("skips local Chrome audio prerequisites for observe-only setup status", async () => {
const originalPlatform = process.platform;
Object.defineProperty(process, "platform", { value: "darwin" });
try {
const { tools, runCommandWithTimeout } = setup(
{ defaultMode: "transcribe", defaultTransport: "chrome" },
{
runCommandWithTimeoutHandler: async () => ({
code: 1,
stdout: "Built-in Output",
stderr: "",
}),
},
);
const tool = tools[0] as {
execute: (
id: string,
params: unknown,
) => Promise<{ details: { ok?: boolean; checks?: Array<{ id?: string; ok?: boolean }> } }>;
};
const result = await tool.execute("id", {
action: "setup_status",
transport: "chrome",
mode: "transcribe",
});
expect(result.details.ok).toBe(true);
expect(result.details.checks).toEqual(
expect.arrayContaining([
expect.objectContaining({
id: "audio-bridge",
ok: true,
message: "Chrome observe-only mode does not require a realtime audio bridge",
}),
]),
);
expect(result.details.checks?.some((check) => check.id === "chrome-local-audio-device")).toBe(
false,
);
expect(runCommandWithTimeout).not.toHaveBeenCalled();
} finally {
Object.defineProperty(process, "platform", { value: originalPlatform });
}
});
it("reports Twilio delegation readiness when voice-call is enabled", async () => {
vi.stubEnv("TWILIO_ACCOUNT_SID", "AC123");
vi.stubEnv("TWILIO_AUTH_TOKEN", "secret");
@@ -1386,7 +1435,7 @@ describe("google-meet plugin", () => {
);
});
it("opens local Chrome Meet through browser control after the BlackHole check", async () => {
it("opens local Chrome Meet in observe-only mode without BlackHole checks", async () => {
const originalPlatform = process.platform;
Object.defineProperty(process, "platform", { value: "darwin" });
try {
@@ -1408,12 +1457,7 @@ describe("google-meet plugin", () => {
});
expect(respond.mock.calls[0]?.[0]).toBe(true);
expect(runCommandWithTimeout).toHaveBeenNthCalledWith(
1,
["/usr/sbin/system_profiler", "SPAudioDataType"],
{ timeoutMs: 10000 },
);
expect(runCommandWithTimeout).toHaveBeenCalledTimes(1);
expect(runCommandWithTimeout).not.toHaveBeenCalled();
expect(callGatewayFromCli).toHaveBeenCalledWith(
"browser.request",
expect.any(Object),
@@ -1424,19 +1468,16 @@ describe("google-meet plugin", () => {
}),
{ progress: false },
);
expect(callGatewayFromCli).toHaveBeenCalledWith(
"browser.request",
expect.any(Object),
expect.objectContaining({
method: "POST",
path: "/permissions/grant",
body: expect.objectContaining({
origin: "https://meet.google.com",
permissions: ["audioCapture", "videoCapture"],
optionalPermissions: ["speakerSelection"],
}),
}),
{ progress: false },
expect(
callGatewayFromCli.mock.calls.some(
([, , request]) => (request as { path?: string }).path === "/permissions/grant",
),
).toBe(false);
const actCall = callGatewayFromCli.mock.calls.find(
([, , request]) => (request as { path?: string }).path === "/act",
);
expect(String((actCall?.[2] as { body?: { fn?: string } } | undefined)?.body?.fn)).toContain(
"const allowMicrophone = false",
);
} finally {
Object.defineProperty(process, "platform", { value: originalPlatform });
@@ -1883,9 +1924,14 @@ describe("google-meet plugin", () => {
updatedAt: "2026-04-27T00:00:00.000Z",
participantIdentity: "signed-in Google Chrome profile",
realtime: { enabled: true, provider: "openai", toolPolicy: "safe-read-only" },
chrome: { audioBackend: "blackhole-2ch", launched: true },
chrome: {
audioBackend: "blackhole-2ch",
launched: true,
health: { audioOutputActive: true, lastOutputBytes: 10 },
},
notes: [],
};
vi.spyOn(runtime, "list").mockReturnValue([session]);
const join = vi.spyOn(runtime, "join").mockResolvedValue({ session, spoken: true });
const speak = vi.spyOn(runtime, "speak");
@@ -1894,9 +1940,32 @@ describe("google-meet plugin", () => {
message: "Say exactly: hello.",
});
expect(join).toHaveBeenCalledWith(expect.objectContaining({ message: "Say exactly: hello." }));
expect(join).toHaveBeenCalledWith(
expect.objectContaining({
message: "Say exactly: hello.",
mode: "realtime",
}),
);
expect(speak).not.toHaveBeenCalled();
expect(result.spoken).toBe(true);
expect(result.speechOutputVerified).toBe(false);
expect(result.speechOutputTimedOut).toBe(false);
});
it("rejects observe-only mode for test speech", async () => {
const runtime = new GoogleMeetRuntime({
config: resolveGoogleMeetConfig({}),
fullConfig: {} as never,
runtime: {} as never,
logger: noopLogger,
});
await expect(
runtime.testSpeech({
url: "https://meet.google.com/abc-defg-hij",
mode: "transcribe",
}),
).rejects.toThrow("test_speech requires mode: realtime");
});
it("reports manual action when the browser profile needs Google login", async () => {

View File

@@ -677,7 +677,13 @@ export default definePluginEntry({
async ({ params, respond }: GatewayRequestHandlerOptions) => {
try {
const rt = await ensureRuntime();
respond(true, await rt.setupStatus({ transport: normalizeTransport(params?.transport) }));
respond(
true,
await rt.setupStatus({
transport: normalizeTransport(params?.transport),
mode: normalizeMode(params?.mode),
}),
);
} catch (err) {
sendError(respond, err);
}

View File

@@ -1,3 +1,4 @@
import { spawnSync } from "node:child_process";
import { EventEmitter } from "node:events";
import { describe, expect, it, vi } from "vitest";
@@ -40,6 +41,35 @@ vi.mock("node:child_process", async (importOriginal) => {
});
describe("google-meet node host bridge sessions", () => {
it("starts observe-only Chrome without BlackHole or bridge processes", async () => {
const { handleGoogleMeetNodeHostCommand } = await import("./src/node-host.js");
const originalPlatform = process.platform;
children.length = 0;
vi.mocked(spawnSync).mockClear();
Object.defineProperty(process, "platform", { configurable: true, value: "darwin" });
try {
const start = JSON.parse(
await handleGoogleMeetNodeHostCommand(
JSON.stringify({
action: "start",
url: "https://meet.google.com/xyz-abcd-uvw",
mode: "transcribe",
launch: false,
audioInputCommand: ["mock-rec"],
audioOutputCommand: ["mock-play"],
}),
),
);
expect(start).toEqual({ launched: false });
expect(spawnSync).not.toHaveBeenCalled();
expect(children).toHaveLength(0);
} finally {
Object.defineProperty(process, "platform", { configurable: true, value: originalPlatform });
}
});
it("clears output playback without closing the active bridge when the old output exits", async () => {
const { handleGoogleMeetNodeHostCommand } = await import("./src/node-host.js");
const originalPlatform = process.platform;

View File

@@ -129,6 +129,7 @@ export type GoogleMeetExportManifest = {
type SetupOptions = {
json?: boolean;
mode?: GoogleMeetMode;
transport?: GoogleMeetTransport;
};
@@ -1986,10 +1987,11 @@ export function registerGoogleMeetCli(params: {
.command("setup")
.description("Show Google Meet transport setup status")
.option("--transport <transport>", "Transport to check: chrome, chrome-node, or twilio")
.option("--mode <mode>", "Mode to check: realtime or transcribe")
.option("--json", "Print JSON output", false)
.action(async (options: SetupOptions) => {
const rt = await params.ensureRuntime();
const status = await rt.setupStatus({ transport: options.transport });
const status = await rt.setupStatus({ transport: options.transport, mode: options.mode });
if (options.json) {
writeStdoutJson(status);
return;

View File

@@ -270,42 +270,46 @@ function startChrome(params: Record<string, unknown>) {
throw new Error("url required");
}
const timeoutMs = readNumber(params.joinTimeoutMs, 30_000);
assertBlackHoleAvailable(Math.min(timeoutMs, 10_000));
const healthCommand = readStringArray(params.audioBridgeHealthCommand);
if (healthCommand) {
const health = runCommandWithTimeout(healthCommand, timeoutMs);
if (health.code !== 0) {
throw new Error(
`Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`,
);
}
}
const mode = readString(params.mode);
let bridgeId: string | undefined;
let audioBridge: { type: "external-command" | "node-command-pair" } | undefined;
const bridgeCommand = readStringArray(params.audioBridgeCommand);
if (bridgeCommand) {
const bridge = runCommandWithTimeout(bridgeCommand, timeoutMs);
if (bridge.code !== 0) {
throw new Error(
`failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`,
);
if (mode === "realtime") {
assertBlackHoleAvailable(Math.min(timeoutMs, 10_000));
const healthCommand = readStringArray(params.audioBridgeHealthCommand);
if (healthCommand) {
const health = runCommandWithTimeout(healthCommand, timeoutMs);
if (health.code !== 0) {
throw new Error(
`Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`,
);
}
}
const bridgeCommand = readStringArray(params.audioBridgeCommand);
if (bridgeCommand) {
const bridge = runCommandWithTimeout(bridgeCommand, timeoutMs);
if (bridge.code !== 0) {
throw new Error(
`failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`,
);
}
audioBridge = { type: "external-command" };
} else {
const session = startCommandPair({
inputCommand: readStringArray(params.audioInputCommand) ?? [
...DEFAULT_GOOGLE_MEET_AUDIO_INPUT_COMMAND,
],
outputCommand: readStringArray(params.audioOutputCommand) ?? [
...DEFAULT_GOOGLE_MEET_AUDIO_OUTPUT_COMMAND,
],
url,
mode,
});
bridgeId = session.id;
audioBridge = { type: "node-command-pair" };
}
audioBridge = { type: "external-command" };
} else if (params.mode === "realtime") {
const session = startCommandPair({
inputCommand: readStringArray(params.audioInputCommand) ?? [
...DEFAULT_GOOGLE_MEET_AUDIO_INPUT_COMMAND,
],
outputCommand: readStringArray(params.audioOutputCommand) ?? [
...DEFAULT_GOOGLE_MEET_AUDIO_OUTPUT_COMMAND,
],
url,
mode: readString(params.mode),
});
bridgeId = session.id;
audioBridge = { type: "node-command-pair" };
}
if (params.launch !== false) {

View File

@@ -55,6 +55,17 @@ function resolveMode(input: GoogleMeetMode | undefined, config: GoogleMeetConfig
return input ?? config.defaultMode;
}
function hasRealtimeAudioOutputAdvanced(
health: GoogleMeetChromeHealth | undefined,
startOutputBytes: number,
): boolean {
return (health?.lastOutputBytes ?? 0) > startOutputBytes;
}
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
function collectChromeAudioCommands(config: GoogleMeetConfig): string[] {
const commands = config.chrome.audioBridgeCommand
? [config.chrome.audioBridgeCommand[0]]
@@ -103,13 +114,16 @@ export class GoogleMeetRuntime {
return session ? { found: true, session } : { found: false };
}
async setupStatus(options: { transport?: GoogleMeetTransport } = {}) {
async setupStatus(options: { transport?: GoogleMeetTransport; mode?: GoogleMeetMode } = {}) {
const transport = resolveTransport(options.transport, this.params.config);
const mode = resolveMode(options.mode, this.params.config);
const shouldCheckChromeNode =
transport === "chrome-node" ||
(!options.transport && Boolean(this.params.config.chromeNode.node));
let status = getGoogleMeetSetupStatus(this.params.config, {
fullConfig: this.params.fullConfig,
mode,
transport,
});
if (shouldCheckChromeNode) {
try {
@@ -131,7 +145,7 @@ export class GoogleMeetRuntime {
});
}
}
if (transport === "chrome") {
if (transport === "chrome" && mode === "realtime") {
try {
await assertBlackHole2chAvailable({
runtime: this.params.runtime,
@@ -302,7 +316,9 @@ export class GoogleMeetRuntime {
? transport === "chrome-node"
? "Chrome node transport joins as the signed-in Google profile on the selected node and routes realtime audio through the node bridge."
: "Chrome transport joins as the signed-in Google profile and routes realtime audio through the configured bridge."
: "Chrome transport joins as the signed-in Google profile and expects BlackHole 2ch audio routing.",
: mode === "realtime"
? "Chrome transport joins as the signed-in Google profile and expects BlackHole 2ch audio routing."
: "Chrome transport joins as the signed-in Google profile without starting the realtime audio bridge.",
);
} else {
const dialInNumber = normalizeDialInNumber(
@@ -398,14 +414,53 @@ export class GoogleMeetRuntime {
manualActionReason?: GoogleMeetChromeHealth["manualActionReason"];
manualActionMessage?: string;
spoken: boolean;
speechOutputVerified: boolean;
speechOutputTimedOut: boolean;
audioOutputActive?: boolean;
lastOutputBytes?: number;
session: GoogleMeetSession;
}> {
const before = new Set(this.list().map((session) => session.id));
if (request.mode === "transcribe") {
throw new Error(
"test_speech requires mode: realtime; use join mode: transcribe for observe-only sessions.",
);
}
const url = normalizeMeetUrl(request.url);
const transport = resolveTransport(request.transport, this.params.config);
const beforeSessions = this.list();
const before = new Set(beforeSessions.map((session) => session.id));
const existingSession = beforeSessions.find(
(session) =>
session.state === "active" &&
isSameMeetUrlForReuse(session.url, url) &&
session.transport === transport &&
session.mode === "realtime",
);
const startOutputBytes = existingSession?.chrome?.health?.lastOutputBytes ?? 0;
const result = await this.join({
...request,
transport,
url,
mode: "realtime",
message: request.message ?? "Say exactly: Google Meet speech test complete.",
});
const health = result.session.chrome?.health;
let health = result.session.chrome?.health;
const shouldWaitForOutput =
result.spoken === true &&
health?.manualActionRequired !== true &&
this.#sessionHealth.has(result.session.id);
if (shouldWaitForOutput && !hasRealtimeAudioOutputAdvanced(health, startOutputBytes)) {
const deadline = Date.now() + Math.min(this.params.config.chrome.joinTimeoutMs, 5_000);
while (Date.now() < deadline) {
await sleep(100);
this.#refreshHealth(result.session.id);
health = result.session.chrome?.health;
if (hasRealtimeAudioOutputAdvanced(health, startOutputBytes)) {
break;
}
}
}
const speechOutputVerified = hasRealtimeAudioOutputAdvanced(health, startOutputBytes);
return {
createdSession: !before.has(result.session.id),
inCall: health?.inCall,
@@ -413,6 +468,10 @@ export class GoogleMeetRuntime {
manualActionReason: health?.manualActionReason,
manualActionMessage: health?.manualActionMessage,
spoken: result.spoken ?? false,
speechOutputVerified,
speechOutputTimedOut: shouldWaitForOutput && !speechOutputVerified,
audioOutputActive: health?.audioOutputActive,
lastOutputBytes: health?.lastOutputBytes,
session: result.session,
};
}

View File

@@ -1,7 +1,7 @@
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import type { GoogleMeetConfig } from "./config.js";
import type { GoogleMeetConfig, GoogleMeetMode, GoogleMeetTransport } from "./config.js";
export type SetupCheck = {
id: string;
@@ -33,6 +33,8 @@ export function getGoogleMeetSetupStatus(
options?: {
env?: NodeJS.ProcessEnv;
fullConfig?: unknown;
mode?: GoogleMeetMode;
transport?: GoogleMeetTransport;
},
): {
ok: boolean;
@@ -43,11 +45,17 @@ export function getGoogleMeetSetupStatus(
options?: {
env?: NodeJS.ProcessEnv;
fullConfig?: unknown;
mode?: GoogleMeetMode;
transport?: GoogleMeetTransport;
},
) {
const checks: SetupCheck[] = [];
const env = options?.env ?? process.env;
const fullConfig = asRecord(options?.fullConfig);
const mode = options?.mode ?? config.defaultMode;
const transport = options?.transport ?? config.defaultTransport;
const needsChromeRealtimeAudio =
mode === "realtime" && (transport === "chrome" || transport === "chrome-node");
const pluginEntries = asRecord(asRecord(fullConfig.plugins).entries);
const pluginAllow = asRecord(fullConfig.plugins).allow;
const voiceCallEntry = asRecord(pluginEntries["voice-call"]);
@@ -79,18 +87,26 @@ export function getGoogleMeetSetupStatus(
: "Local Chrome uses the OpenClaw browser profile; configure browser.defaultProfile to choose another profile",
});
checks.push({
id: "audio-bridge",
ok: Boolean(
config.chrome.audioBridgeCommand ||
(config.chrome.audioInputCommand && config.chrome.audioOutputCommand),
),
message: config.chrome.audioBridgeCommand
? "Chrome audio bridge command configured"
: config.chrome.audioInputCommand && config.chrome.audioOutputCommand
? `Chrome command-pair realtime audio bridge configured (${config.chrome.audioFormat})`
: "Chrome realtime audio bridge not configured",
});
if (needsChromeRealtimeAudio) {
checks.push({
id: "audio-bridge",
ok: Boolean(
config.chrome.audioBridgeCommand ||
(config.chrome.audioInputCommand && config.chrome.audioOutputCommand),
),
message: config.chrome.audioBridgeCommand
? "Chrome audio bridge command configured"
: config.chrome.audioInputCommand && config.chrome.audioOutputCommand
? `Chrome command-pair realtime audio bridge configured (${config.chrome.audioFormat})`
: "Chrome realtime audio bridge not configured",
});
} else if (transport === "chrome" || transport === "chrome-node") {
checks.push({
id: "audio-bridge",
ok: true,
message: "Chrome observe-only mode does not require a realtime audio bridge",
});
}
checks.push({
id: "guest-join-defaults",
@@ -114,14 +130,16 @@ export function getGoogleMeetSetupStatus(
: "Chrome node not pinned; automatic selection works when exactly one capable node is connected",
});
checks.push({
id: "intro-after-in-call",
ok: config.chrome.waitForInCallMs > 0,
message:
config.chrome.waitForInCallMs > 0
? `Realtime intro waits up to ${config.chrome.waitForInCallMs}ms for the Meet tab to be in-call`
: "Set chrome.waitForInCallMs to delay realtime intro until the Meet tab is in-call",
});
if (needsChromeRealtimeAudio) {
checks.push({
id: "intro-after-in-call",
ok: config.chrome.waitForInCallMs > 0,
message:
config.chrome.waitForInCallMs > 0
? `Realtime intro waits up to ${config.chrome.waitForInCallMs}ms for the Meet tab to be in-call`
: "Set chrome.waitForInCallMs to delay realtime intro until the Meet tab is in-call",
});
}
const shouldCheckTwilioDelegation =
config.voiceCall.enabled &&

View File

@@ -95,57 +95,59 @@ export async function launchChromeMeet(params: {
| ({ type: "command-pair" } & ChromeRealtimeAudioBridgeHandle);
browser?: GoogleMeetChromeHealth;
}> {
await assertBlackHole2chAvailable({
runtime: params.runtime,
timeoutMs: Math.min(params.config.chrome.joinTimeoutMs, 10_000),
});
if (params.config.chrome.audioBridgeHealthCommand) {
const health = await params.runtime.system.runCommandWithTimeout(
params.config.chrome.audioBridgeHealthCommand,
{ timeoutMs: params.config.chrome.joinTimeoutMs },
);
if (health.code !== 0) {
throw new Error(
`Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`,
);
}
}
let audioBridge:
| { type: "external-command" }
| ({ type: "command-pair" } & ChromeRealtimeAudioBridgeHandle)
| undefined;
if (params.config.chrome.audioBridgeCommand) {
const bridge = await params.runtime.system.runCommandWithTimeout(
params.config.chrome.audioBridgeCommand,
{ timeoutMs: params.config.chrome.joinTimeoutMs },
);
if (bridge.code !== 0) {
throw new Error(
`failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`,
if (params.mode === "realtime") {
await assertBlackHole2chAvailable({
runtime: params.runtime,
timeoutMs: Math.min(params.config.chrome.joinTimeoutMs, 10_000),
});
if (params.config.chrome.audioBridgeHealthCommand) {
const health = await params.runtime.system.runCommandWithTimeout(
params.config.chrome.audioBridgeHealthCommand,
{ timeoutMs: params.config.chrome.joinTimeoutMs },
);
if (health.code !== 0) {
throw new Error(
`Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`,
);
}
}
audioBridge = { type: "external-command" };
} else if (params.mode === "realtime") {
if (!params.config.chrome.audioInputCommand || !params.config.chrome.audioOutputCommand) {
throw new Error(
"Chrome realtime mode requires chrome.audioInputCommand and chrome.audioOutputCommand, or chrome.audioBridgeCommand for an external bridge.",
if (params.config.chrome.audioBridgeCommand) {
const bridge = await params.runtime.system.runCommandWithTimeout(
params.config.chrome.audioBridgeCommand,
{ timeoutMs: params.config.chrome.joinTimeoutMs },
);
if (bridge.code !== 0) {
throw new Error(
`failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`,
);
}
audioBridge = { type: "external-command" };
} else {
if (!params.config.chrome.audioInputCommand || !params.config.chrome.audioOutputCommand) {
throw new Error(
"Chrome realtime mode requires chrome.audioInputCommand and chrome.audioOutputCommand, or chrome.audioBridgeCommand for an external bridge.",
);
}
audioBridge = {
type: "command-pair",
...(await startCommandRealtimeAudioBridge({
config: params.config,
fullConfig: params.fullConfig,
runtime: params.runtime,
meetingSessionId: params.meetingSessionId,
inputCommand: params.config.chrome.audioInputCommand,
outputCommand: params.config.chrome.audioOutputCommand,
logger: params.logger,
})),
};
}
audioBridge = {
type: "command-pair",
...(await startCommandRealtimeAudioBridge({
config: params.config,
fullConfig: params.fullConfig,
runtime: params.runtime,
meetingSessionId: params.meetingSessionId,
inputCommand: params.config.chrome.audioInputCommand,
outputCommand: params.config.chrome.audioOutputCommand,
logger: params.logger,
})),
};
}
if (!params.config.chrome.launch) {
@@ -167,6 +169,7 @@ export async function launchChromeMeet(params: {
const result = await openMeetWithBrowserRequest({
callBrowser: callLocalBrowserRequest,
config: params.config,
mode: params.mode,
url: params.url,
});
return { ...result, audioBridge };
@@ -273,7 +276,11 @@ function parsePermissionGrantNotes(result: unknown): string[] {
async function grantMeetMediaPermissions(params: {
callBrowser: BrowserRequestCaller;
timeoutMs: number;
allowMicrophone: boolean;
}): Promise<string[]> {
if (!params.allowMicrophone) {
return ["Observe-only mode skips Meet microphone/camera permission grants."];
}
try {
const result = await params.callBrowser({
method: "POST",
@@ -296,9 +303,14 @@ async function grantMeetMediaPermissions(params: {
}
}
function meetStatusScript(params: { guestName: string; autoJoin: boolean }) {
function meetStatusScript(params: {
allowMicrophone: boolean;
autoJoin: boolean;
guestName: string;
}) {
return `() => {
const text = (node) => (node?.innerText || node?.textContent || "").trim();
const allowMicrophone = ${JSON.stringify(params.allowMicrophone)};
const buttons = [...document.querySelectorAll('button')];
const notes = [];
const findButton = (pattern) =>
@@ -325,16 +337,24 @@ function meetStatusScript(params: { guestName: string; autoJoin: boolean }) {
const host = location.hostname.toLowerCase();
const pageUrl = location.href;
const permissionNeeded = /permission needed|allow.*(microphone|camera)|blocked.*(microphone|camera)|permission.*(microphone|camera|speaker)/i.test(pageText);
const mic = buttons.find((button) => /turn off microphone|turn on microphone|microphone/i.test(button.getAttribute('aria-label') || text(button)));
if (!allowMicrophone && mic && /turn off microphone/i.test(mic.getAttribute('aria-label') || text(mic))) {
mic.click();
notes.push("Muted Meet microphone for observe-only mode.");
}
const join = ${JSON.stringify(params.autoJoin)}
? findButton(/join now|ask to join/i)
: null;
if (join) join.click();
const microphoneChoice = findButton(/\\buse microphone\\b/i);
if (microphoneChoice) {
const noMicrophoneChoice = findButton(/\\b(continue|join|use) without (microphone|mic)\\b|\\bnot now\\b/i);
if (allowMicrophone && microphoneChoice) {
microphoneChoice.click();
notes.push("Accepted Meet microphone prompt with browser automation.");
} else if (!allowMicrophone && noMicrophoneChoice) {
noMicrophoneChoice.click();
notes.push("Skipped Meet microphone prompt for observe-only mode.");
}
const mic = buttons.find((button) => /turn off microphone|turn on microphone|microphone/i.test(button.getAttribute('aria-label') || text(button)));
const inCall = buttons.some((button) => /leave call/i.test(button.getAttribute('aria-label') || text(button)));
let manualActionReason;
let manualActionMessage;
@@ -346,14 +366,18 @@ function meetStatusScript(params: { guestName: string; autoJoin: boolean }) {
manualActionMessage = "Admit the OpenClaw browser participant in Google Meet, then retry speech.";
} else if (permissionNeeded) {
manualActionReason = "meet-permission-required";
manualActionMessage = "Allow microphone/camera/speaker permissions for Meet in the OpenClaw browser profile, then retry.";
} else if (!inCall && !microphoneChoice && /do you want people to hear you in the meeting/i.test(pageText)) {
manualActionMessage = allowMicrophone
? "Allow microphone/camera/speaker permissions for Meet in the OpenClaw browser profile, then retry."
: "Join without microphone/camera permissions in the OpenClaw browser profile, then retry.";
} else if (!inCall && (allowMicrophone ? !microphoneChoice : !noMicrophoneChoice) && /do you want people to hear you in the meeting/i.test(pageText)) {
manualActionReason = "meet-audio-choice-required";
manualActionMessage = "Meet is showing the microphone choice. Click Use microphone in the OpenClaw browser profile, then retry.";
manualActionMessage = allowMicrophone
? "Meet is showing the microphone choice. Click Use microphone in the OpenClaw browser profile, then retry."
: "Meet is showing the microphone choice. Choose the no-microphone option in the OpenClaw browser profile, then retry.";
}
return JSON.stringify({
clickedJoin: Boolean(join),
clickedMicrophoneChoice: Boolean(microphoneChoice),
clickedMicrophoneChoice: Boolean(allowMicrophone && microphoneChoice),
inCall,
micMuted: mic ? /turn on microphone/i.test(mic.getAttribute('aria-label') || text(mic)) : undefined,
manualActionRequired: Boolean(manualActionReason),
@@ -370,6 +394,7 @@ async function openMeetWithBrowserProxy(params: {
runtime: PluginRuntime;
nodeId: string;
config: GoogleMeetConfig;
mode: "realtime" | "transcribe";
url: string;
}): Promise<{ launched: boolean; browser?: GoogleMeetChromeHealth }> {
return await openMeetWithBrowserRequest({
@@ -383,6 +408,7 @@ async function openMeetWithBrowserProxy(params: {
timeoutMs: request.timeoutMs,
}),
config: params.config,
mode: params.mode,
url: params.url,
});
}
@@ -390,6 +416,7 @@ async function openMeetWithBrowserProxy(params: {
async function openMeetWithBrowserRequest(params: {
callBrowser: BrowserRequestCaller;
config: GoogleMeetConfig;
mode: "realtime" | "transcribe";
url: string;
}): Promise<{ launched: boolean; browser?: GoogleMeetChromeHealth }> {
if (!params.config.chrome.launch) {
@@ -442,6 +469,7 @@ async function openMeetWithBrowserRequest(params: {
}
const permissionNotes = await grantMeetMediaPermissions({
allowMicrophone: params.mode === "realtime",
callBrowser: params.callBrowser,
timeoutMs,
});
@@ -461,6 +489,7 @@ async function openMeetWithBrowserRequest(params: {
kind: "evaluate",
targetId,
fn: meetStatusScript({
allowMicrophone: params.mode === "realtime",
guestName: params.config.chrome.guestName,
autoJoin: params.config.chrome.autoJoin,
}),
@@ -526,6 +555,7 @@ async function inspectRecoverableMeetTab(params: {
timeoutMs: Math.min(params.timeoutMs, 5_000),
});
const permissionNotes = await grantMeetMediaPermissions({
allowMicrophone: true,
callBrowser: params.callBrowser,
timeoutMs: params.timeoutMs,
});
@@ -536,6 +566,7 @@ async function inspectRecoverableMeetTab(params: {
kind: "evaluate",
targetId: params.targetId,
fn: meetStatusScript({
allowMicrophone: true,
guestName: params.config.chrome.guestName,
autoJoin: false,
}),
@@ -714,6 +745,7 @@ export async function launchChromeMeetOnNode(params: {
runtime: params.runtime,
nodeId,
config: params.config,
mode: params.mode,
url: params.url,
});
const raw = await params.runtime.nodes.invoke({