diff --git a/CHANGELOG.md b/CHANGELOG.md index 8d6ba910e08..a7ee399cb35 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -63,6 +63,7 @@ Docs: https://docs.openclaw.ai - Agents/ACPX: stage the patched Claude ACP adapter as an ACPX runtime dependency and route known Codex/Claude ACP commands through local wrappers, so Gateway runtime no longer depends on live `npx` adapter resolution. Fixes #73202. Thanks @joerod26. - Memory/compaction: let pre-compaction memory flush use an exact `agents.defaults.compaction.memoryFlush.model` override such as `ollama/qwen3:8b` without inheriting the active session fallback chain, so local housekeeping can avoid paid conversation models. Fixes #53772. Thanks @limen96. - macOS/update: stop managed Gateway services before package replacement and keep LaunchAgent service secrets out of world-readable plist metadata by loading them from owner-only env files. Fixes #72996. Thanks @Mathewb7. +- Google Meet: keep observe-only Chrome joins and setup checks from requiring BlackHole or audio bridge commands, avoid granting or selecting the microphone in observe-only mode, and make `test_speech` report fresh realtime output-byte verification instead of only confirming a queued utterance. Refs #72478. Thanks @DougButdorf. - Gateway/hooks: route non-delivered hook completion and error summaries to the target agent's main session instead of the default agent session, preserving multi-agent hook isolation. Fixes #24693; carries forward #68667. Thanks @abersonFAC and @bluesky6868. - Control UI/models: request the configured Gateway model-list view so dashboards with only `models.providers.*.models` show those configured models first instead of flooding the picker with the full built-in catalog. Fixes #65405. Thanks @wbyanclaw. - CLI/models: keep default-model and allowlist pickers on explicit `models.providers.*.models` entries when `models.mode` is `replace` instead of loading the full built-in catalog. Fixes #64950. Thanks @mrozentsvayg. diff --git a/docs/plugins/google-meet.md b/docs/plugins/google-meet.md index 264b5e9b979..8c798783f3b 100644 --- a/docs/plugins/google-meet.md +++ b/docs/plugins/google-meet.md @@ -74,12 +74,21 @@ Check setup: openclaw googlemeet setup ``` -The setup output is meant to be agent-readable. It reports Chrome profile, -audio bridge, node pinning, delayed realtime intro, and, when Twilio delegation -is configured, whether the `voice-call` plugin and Twilio credentials are ready. -Treat any `ok: false` check as a blocker before asking an agent to join. -Use `openclaw googlemeet setup --json` for scripts or machine-readable output. -Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio` +The setup output is meant to be agent-readable and mode-aware. It reports Chrome +profile, node pinning, and, for realtime Chrome joins, the BlackHole/SoX audio +bridge and delayed realtime intro checks. For observe-only joins, check the same +transport with `--mode transcribe`; that mode skips realtime audio prerequisites +because it does not listen through or speak through the bridge: + +```bash +openclaw googlemeet setup --transport chrome-node --mode transcribe +``` + +When Twilio delegation is configured, setup also reports whether the +`voice-call` plugin and Twilio credentials are ready. Treat any `ok: false` +check as a blocker for the checked transport and mode before asking an agent to +join. Use `openclaw googlemeet setup --json` for scripts or machine-readable +output. Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio` to preflight a specific transport before an agent tries it. Join a meeting: @@ -144,8 +153,12 @@ then share the returned `meetingUri`. ``` For an observe-only/browser-control join, set `"mode": "transcribe"`. That does -not start the duplex realtime model bridge, so it will not talk back into the -meeting. +not start the duplex realtime model bridge, does not require BlackHole or SoX, +and will not talk back into the meeting. Chrome joins in this mode also avoid +OpenClaw's microphone/camera permission grant and avoid the Meet **Use +microphone** path. If Meet shows an audio-choice interstitial, automation tries +the no-microphone path and otherwise reports a manual action instead of opening +the local microphone. During realtime sessions, `google_meet` status includes browser and audio bridge health such as `inCall`, `manualActionRequired`, `providerConnected`, @@ -155,10 +168,10 @@ appears, browser automation handles it when it can. Login, host admission, and browser/OS permission prompts are reported as manual action with a reason and message for the agent to relay. -Local Chrome joins through the signed-in OpenClaw browser profile. In Meet, pick -`BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For clean -duplex audio, use separate virtual devices or a Loopback-style graph; a single -BlackHole device is enough for a first smoke test but can echo. +Local Chrome joins through the signed-in OpenClaw browser profile. Realtime mode +requires `BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For +clean duplex audio, use separate virtual devices or a Loopback-style graph; a +single BlackHole device is enough for a first smoke test but can echo. ### Local gateway + Parallels Chrome @@ -286,13 +299,13 @@ phrase, and prints session health: openclaw googlemeet test-speech https://meet.google.com/abc-defg-hij ``` -During join, OpenClaw browser automation fills the guest name, clicks Join/Ask -to join, and accepts Meet's first-run "Use microphone" choice when that prompt -appears. During browser-only meeting creation, it can also continue past the -same prompt without microphone if Meet does not expose the use-microphone button. -If the browser profile is not signed in, Meet is waiting for host -admission, Chrome needs microphone/camera permission, or Meet is stuck on a -prompt automation could not resolve, the join/test-speech result reports +During realtime join, OpenClaw browser automation fills the guest name, clicks +Join/Ask to join, and accepts Meet's first-run "Use microphone" choice when that +prompt appears. During observe-only join or browser-only meeting creation, it +continues past the same prompt without microphone when that choice is available. +If the browser profile is not signed in, Meet is waiting for host admission, +Chrome needs microphone/camera permission for a realtime join, or Meet is stuck +on a prompt automation could not resolve, the join/test-speech result reports `manualActionRequired: true` with `manualActionReason` and `manualActionMessage`. Agents should stop retrying the join, report that exact message plus the current `browserUrl`/`browserTitle`, and retry only after the @@ -979,7 +992,12 @@ Use `action: "status"` to list active sessions or inspect a session ID. Use `action: "speak"` with `sessionId` and `message` to make the realtime agent speak immediately. Use `action: "test_speech"` to create or reuse the session, trigger a known phrase, and return `inCall` health when the Chrome host can -report it. Use `action: "leave"` to mark a session ended. +report it. `test_speech` always forces `mode: "realtime"` and fails if asked to +run in `mode: "transcribe"` because observe-only sessions intentionally cannot +emit speech. Its `speechOutputVerified` result is based on realtime audio output +bytes increasing during this test call, so a reused session with older audio +does not count as a fresh successful speech check. Use `action: "leave"` to mark +a session ended. `status` includes Chrome health when available: @@ -1224,7 +1242,12 @@ openclaw googlemeet doctor ``` Use `mode: "realtime"` for listen/talk-back. `mode: "transcribe"` intentionally -does not start the duplex realtime voice bridge. +does not start the duplex realtime voice bridge. `googlemeet test-speech` +always checks the realtime path and reports whether bridge output bytes were +observed for that invocation. If `speechOutputVerified` is false and +`speechOutputTimedOut` is true, the realtime provider may have accepted the +utterance but OpenClaw did not see new output bytes reach the Chrome audio +bridge. Also verify: @@ -1317,7 +1340,7 @@ call still needs a participant path. This plugin keeps that boundary visible: Chrome handles browser participation and local audio routing; Twilio handles phone dial-in participation. -Chrome realtime mode needs either: +Chrome realtime mode needs `BlackHole 2ch` plus either: - `chrome.audioInputCommand` plus `chrome.audioOutputCommand`: OpenClaw owns the realtime model bridge and pipes audio in `chrome.audioFormat` between those diff --git a/extensions/google-meet/index.test.ts b/extensions/google-meet/index.test.ts index eb604497156..d9b6692aaf9 100644 --- a/extensions/google-meet/index.test.ts +++ b/extensions/google-meet/index.test.ts @@ -110,7 +110,10 @@ function mockLocalMeetBrowserRequest( params?: unknown, _extra?: unknown, ): Promise> => { - const request = params as { path?: string; body?: { targetId?: string; url?: string } }; + const request = params as { + path?: string; + body?: { fn?: string; targetId?: string; url?: string }; + }; if (request.path === "/tabs") { return { tabs: [] }; } @@ -1298,6 +1301,52 @@ describe("google-meet plugin", () => { } }); + it("skips local Chrome audio prerequisites for observe-only setup status", async () => { + const originalPlatform = process.platform; + Object.defineProperty(process, "platform", { value: "darwin" }); + try { + const { tools, runCommandWithTimeout } = setup( + { defaultMode: "transcribe", defaultTransport: "chrome" }, + { + runCommandWithTimeoutHandler: async () => ({ + code: 1, + stdout: "Built-in Output", + stderr: "", + }), + }, + ); + const tool = tools[0] as { + execute: ( + id: string, + params: unknown, + ) => Promise<{ details: { ok?: boolean; checks?: Array<{ id?: string; ok?: boolean }> } }>; + }; + + const result = await tool.execute("id", { + action: "setup_status", + transport: "chrome", + mode: "transcribe", + }); + + expect(result.details.ok).toBe(true); + expect(result.details.checks).toEqual( + expect.arrayContaining([ + expect.objectContaining({ + id: "audio-bridge", + ok: true, + message: "Chrome observe-only mode does not require a realtime audio bridge", + }), + ]), + ); + expect(result.details.checks?.some((check) => check.id === "chrome-local-audio-device")).toBe( + false, + ); + expect(runCommandWithTimeout).not.toHaveBeenCalled(); + } finally { + Object.defineProperty(process, "platform", { value: originalPlatform }); + } + }); + it("reports Twilio delegation readiness when voice-call is enabled", async () => { vi.stubEnv("TWILIO_ACCOUNT_SID", "AC123"); vi.stubEnv("TWILIO_AUTH_TOKEN", "secret"); @@ -1386,7 +1435,7 @@ describe("google-meet plugin", () => { ); }); - it("opens local Chrome Meet through browser control after the BlackHole check", async () => { + it("opens local Chrome Meet in observe-only mode without BlackHole checks", async () => { const originalPlatform = process.platform; Object.defineProperty(process, "platform", { value: "darwin" }); try { @@ -1408,12 +1457,7 @@ describe("google-meet plugin", () => { }); expect(respond.mock.calls[0]?.[0]).toBe(true); - expect(runCommandWithTimeout).toHaveBeenNthCalledWith( - 1, - ["/usr/sbin/system_profiler", "SPAudioDataType"], - { timeoutMs: 10000 }, - ); - expect(runCommandWithTimeout).toHaveBeenCalledTimes(1); + expect(runCommandWithTimeout).not.toHaveBeenCalled(); expect(callGatewayFromCli).toHaveBeenCalledWith( "browser.request", expect.any(Object), @@ -1424,19 +1468,16 @@ describe("google-meet plugin", () => { }), { progress: false }, ); - expect(callGatewayFromCli).toHaveBeenCalledWith( - "browser.request", - expect.any(Object), - expect.objectContaining({ - method: "POST", - path: "/permissions/grant", - body: expect.objectContaining({ - origin: "https://meet.google.com", - permissions: ["audioCapture", "videoCapture"], - optionalPermissions: ["speakerSelection"], - }), - }), - { progress: false }, + expect( + callGatewayFromCli.mock.calls.some( + ([, , request]) => (request as { path?: string }).path === "/permissions/grant", + ), + ).toBe(false); + const actCall = callGatewayFromCli.mock.calls.find( + ([, , request]) => (request as { path?: string }).path === "/act", + ); + expect(String((actCall?.[2] as { body?: { fn?: string } } | undefined)?.body?.fn)).toContain( + "const allowMicrophone = false", ); } finally { Object.defineProperty(process, "platform", { value: originalPlatform }); @@ -1883,9 +1924,14 @@ describe("google-meet plugin", () => { updatedAt: "2026-04-27T00:00:00.000Z", participantIdentity: "signed-in Google Chrome profile", realtime: { enabled: true, provider: "openai", toolPolicy: "safe-read-only" }, - chrome: { audioBackend: "blackhole-2ch", launched: true }, + chrome: { + audioBackend: "blackhole-2ch", + launched: true, + health: { audioOutputActive: true, lastOutputBytes: 10 }, + }, notes: [], }; + vi.spyOn(runtime, "list").mockReturnValue([session]); const join = vi.spyOn(runtime, "join").mockResolvedValue({ session, spoken: true }); const speak = vi.spyOn(runtime, "speak"); @@ -1894,9 +1940,32 @@ describe("google-meet plugin", () => { message: "Say exactly: hello.", }); - expect(join).toHaveBeenCalledWith(expect.objectContaining({ message: "Say exactly: hello." })); + expect(join).toHaveBeenCalledWith( + expect.objectContaining({ + message: "Say exactly: hello.", + mode: "realtime", + }), + ); expect(speak).not.toHaveBeenCalled(); expect(result.spoken).toBe(true); + expect(result.speechOutputVerified).toBe(false); + expect(result.speechOutputTimedOut).toBe(false); + }); + + it("rejects observe-only mode for test speech", async () => { + const runtime = new GoogleMeetRuntime({ + config: resolveGoogleMeetConfig({}), + fullConfig: {} as never, + runtime: {} as never, + logger: noopLogger, + }); + + await expect( + runtime.testSpeech({ + url: "https://meet.google.com/abc-defg-hij", + mode: "transcribe", + }), + ).rejects.toThrow("test_speech requires mode: realtime"); }); it("reports manual action when the browser profile needs Google login", async () => { diff --git a/extensions/google-meet/index.ts b/extensions/google-meet/index.ts index d47d20dc615..4d2a12cd418 100644 --- a/extensions/google-meet/index.ts +++ b/extensions/google-meet/index.ts @@ -677,7 +677,13 @@ export default definePluginEntry({ async ({ params, respond }: GatewayRequestHandlerOptions) => { try { const rt = await ensureRuntime(); - respond(true, await rt.setupStatus({ transport: normalizeTransport(params?.transport) })); + respond( + true, + await rt.setupStatus({ + transport: normalizeTransport(params?.transport), + mode: normalizeMode(params?.mode), + }), + ); } catch (err) { sendError(respond, err); } diff --git a/extensions/google-meet/node-host.test.ts b/extensions/google-meet/node-host.test.ts index 01cd731c6ca..6bf66e40a6a 100644 --- a/extensions/google-meet/node-host.test.ts +++ b/extensions/google-meet/node-host.test.ts @@ -1,3 +1,4 @@ +import { spawnSync } from "node:child_process"; import { EventEmitter } from "node:events"; import { describe, expect, it, vi } from "vitest"; @@ -40,6 +41,35 @@ vi.mock("node:child_process", async (importOriginal) => { }); describe("google-meet node host bridge sessions", () => { + it("starts observe-only Chrome without BlackHole or bridge processes", async () => { + const { handleGoogleMeetNodeHostCommand } = await import("./src/node-host.js"); + const originalPlatform = process.platform; + children.length = 0; + vi.mocked(spawnSync).mockClear(); + + Object.defineProperty(process, "platform", { configurable: true, value: "darwin" }); + try { + const start = JSON.parse( + await handleGoogleMeetNodeHostCommand( + JSON.stringify({ + action: "start", + url: "https://meet.google.com/xyz-abcd-uvw", + mode: "transcribe", + launch: false, + audioInputCommand: ["mock-rec"], + audioOutputCommand: ["mock-play"], + }), + ), + ); + + expect(start).toEqual({ launched: false }); + expect(spawnSync).not.toHaveBeenCalled(); + expect(children).toHaveLength(0); + } finally { + Object.defineProperty(process, "platform", { configurable: true, value: originalPlatform }); + } + }); + it("clears output playback without closing the active bridge when the old output exits", async () => { const { handleGoogleMeetNodeHostCommand } = await import("./src/node-host.js"); const originalPlatform = process.platform; diff --git a/extensions/google-meet/src/cli.ts b/extensions/google-meet/src/cli.ts index 4bde834a7a3..73e45467db6 100644 --- a/extensions/google-meet/src/cli.ts +++ b/extensions/google-meet/src/cli.ts @@ -129,6 +129,7 @@ export type GoogleMeetExportManifest = { type SetupOptions = { json?: boolean; + mode?: GoogleMeetMode; transport?: GoogleMeetTransport; }; @@ -1986,10 +1987,11 @@ export function registerGoogleMeetCli(params: { .command("setup") .description("Show Google Meet transport setup status") .option("--transport ", "Transport to check: chrome, chrome-node, or twilio") + .option("--mode ", "Mode to check: realtime or transcribe") .option("--json", "Print JSON output", false) .action(async (options: SetupOptions) => { const rt = await params.ensureRuntime(); - const status = await rt.setupStatus({ transport: options.transport }); + const status = await rt.setupStatus({ transport: options.transport, mode: options.mode }); if (options.json) { writeStdoutJson(status); return; diff --git a/extensions/google-meet/src/node-host.ts b/extensions/google-meet/src/node-host.ts index adc3b901064..7b114a69c97 100644 --- a/extensions/google-meet/src/node-host.ts +++ b/extensions/google-meet/src/node-host.ts @@ -270,42 +270,46 @@ function startChrome(params: Record) { throw new Error("url required"); } const timeoutMs = readNumber(params.joinTimeoutMs, 30_000); - assertBlackHoleAvailable(Math.min(timeoutMs, 10_000)); - - const healthCommand = readStringArray(params.audioBridgeHealthCommand); - if (healthCommand) { - const health = runCommandWithTimeout(healthCommand, timeoutMs); - if (health.code !== 0) { - throw new Error( - `Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`, - ); - } - } + const mode = readString(params.mode); let bridgeId: string | undefined; let audioBridge: { type: "external-command" | "node-command-pair" } | undefined; - const bridgeCommand = readStringArray(params.audioBridgeCommand); - if (bridgeCommand) { - const bridge = runCommandWithTimeout(bridgeCommand, timeoutMs); - if (bridge.code !== 0) { - throw new Error( - `failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`, - ); + if (mode === "realtime") { + assertBlackHoleAvailable(Math.min(timeoutMs, 10_000)); + + const healthCommand = readStringArray(params.audioBridgeHealthCommand); + if (healthCommand) { + const health = runCommandWithTimeout(healthCommand, timeoutMs); + if (health.code !== 0) { + throw new Error( + `Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`, + ); + } + } + + const bridgeCommand = readStringArray(params.audioBridgeCommand); + if (bridgeCommand) { + const bridge = runCommandWithTimeout(bridgeCommand, timeoutMs); + if (bridge.code !== 0) { + throw new Error( + `failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`, + ); + } + audioBridge = { type: "external-command" }; + } else { + const session = startCommandPair({ + inputCommand: readStringArray(params.audioInputCommand) ?? [ + ...DEFAULT_GOOGLE_MEET_AUDIO_INPUT_COMMAND, + ], + outputCommand: readStringArray(params.audioOutputCommand) ?? [ + ...DEFAULT_GOOGLE_MEET_AUDIO_OUTPUT_COMMAND, + ], + url, + mode, + }); + bridgeId = session.id; + audioBridge = { type: "node-command-pair" }; } - audioBridge = { type: "external-command" }; - } else if (params.mode === "realtime") { - const session = startCommandPair({ - inputCommand: readStringArray(params.audioInputCommand) ?? [ - ...DEFAULT_GOOGLE_MEET_AUDIO_INPUT_COMMAND, - ], - outputCommand: readStringArray(params.audioOutputCommand) ?? [ - ...DEFAULT_GOOGLE_MEET_AUDIO_OUTPUT_COMMAND, - ], - url, - mode: readString(params.mode), - }); - bridgeId = session.id; - audioBridge = { type: "node-command-pair" }; } if (params.launch !== false) { diff --git a/extensions/google-meet/src/runtime.ts b/extensions/google-meet/src/runtime.ts index 9df38f0ab11..a64f22e6465 100644 --- a/extensions/google-meet/src/runtime.ts +++ b/extensions/google-meet/src/runtime.ts @@ -55,6 +55,17 @@ function resolveMode(input: GoogleMeetMode | undefined, config: GoogleMeetConfig return input ?? config.defaultMode; } +function hasRealtimeAudioOutputAdvanced( + health: GoogleMeetChromeHealth | undefined, + startOutputBytes: number, +): boolean { + return (health?.lastOutputBytes ?? 0) > startOutputBytes; +} + +function sleep(ms: number): Promise { + return new Promise((resolve) => setTimeout(resolve, ms)); +} + function collectChromeAudioCommands(config: GoogleMeetConfig): string[] { const commands = config.chrome.audioBridgeCommand ? [config.chrome.audioBridgeCommand[0]] @@ -103,13 +114,16 @@ export class GoogleMeetRuntime { return session ? { found: true, session } : { found: false }; } - async setupStatus(options: { transport?: GoogleMeetTransport } = {}) { + async setupStatus(options: { transport?: GoogleMeetTransport; mode?: GoogleMeetMode } = {}) { const transport = resolveTransport(options.transport, this.params.config); + const mode = resolveMode(options.mode, this.params.config); const shouldCheckChromeNode = transport === "chrome-node" || (!options.transport && Boolean(this.params.config.chromeNode.node)); let status = getGoogleMeetSetupStatus(this.params.config, { fullConfig: this.params.fullConfig, + mode, + transport, }); if (shouldCheckChromeNode) { try { @@ -131,7 +145,7 @@ export class GoogleMeetRuntime { }); } } - if (transport === "chrome") { + if (transport === "chrome" && mode === "realtime") { try { await assertBlackHole2chAvailable({ runtime: this.params.runtime, @@ -302,7 +316,9 @@ export class GoogleMeetRuntime { ? transport === "chrome-node" ? "Chrome node transport joins as the signed-in Google profile on the selected node and routes realtime audio through the node bridge." : "Chrome transport joins as the signed-in Google profile and routes realtime audio through the configured bridge." - : "Chrome transport joins as the signed-in Google profile and expects BlackHole 2ch audio routing.", + : mode === "realtime" + ? "Chrome transport joins as the signed-in Google profile and expects BlackHole 2ch audio routing." + : "Chrome transport joins as the signed-in Google profile without starting the realtime audio bridge.", ); } else { const dialInNumber = normalizeDialInNumber( @@ -398,14 +414,53 @@ export class GoogleMeetRuntime { manualActionReason?: GoogleMeetChromeHealth["manualActionReason"]; manualActionMessage?: string; spoken: boolean; + speechOutputVerified: boolean; + speechOutputTimedOut: boolean; + audioOutputActive?: boolean; + lastOutputBytes?: number; session: GoogleMeetSession; }> { - const before = new Set(this.list().map((session) => session.id)); + if (request.mode === "transcribe") { + throw new Error( + "test_speech requires mode: realtime; use join mode: transcribe for observe-only sessions.", + ); + } + const url = normalizeMeetUrl(request.url); + const transport = resolveTransport(request.transport, this.params.config); + const beforeSessions = this.list(); + const before = new Set(beforeSessions.map((session) => session.id)); + const existingSession = beforeSessions.find( + (session) => + session.state === "active" && + isSameMeetUrlForReuse(session.url, url) && + session.transport === transport && + session.mode === "realtime", + ); + const startOutputBytes = existingSession?.chrome?.health?.lastOutputBytes ?? 0; const result = await this.join({ ...request, + transport, + url, + mode: "realtime", message: request.message ?? "Say exactly: Google Meet speech test complete.", }); - const health = result.session.chrome?.health; + let health = result.session.chrome?.health; + const shouldWaitForOutput = + result.spoken === true && + health?.manualActionRequired !== true && + this.#sessionHealth.has(result.session.id); + if (shouldWaitForOutput && !hasRealtimeAudioOutputAdvanced(health, startOutputBytes)) { + const deadline = Date.now() + Math.min(this.params.config.chrome.joinTimeoutMs, 5_000); + while (Date.now() < deadline) { + await sleep(100); + this.#refreshHealth(result.session.id); + health = result.session.chrome?.health; + if (hasRealtimeAudioOutputAdvanced(health, startOutputBytes)) { + break; + } + } + } + const speechOutputVerified = hasRealtimeAudioOutputAdvanced(health, startOutputBytes); return { createdSession: !before.has(result.session.id), inCall: health?.inCall, @@ -413,6 +468,10 @@ export class GoogleMeetRuntime { manualActionReason: health?.manualActionReason, manualActionMessage: health?.manualActionMessage, spoken: result.spoken ?? false, + speechOutputVerified, + speechOutputTimedOut: shouldWaitForOutput && !speechOutputVerified, + audioOutputActive: health?.audioOutputActive, + lastOutputBytes: health?.lastOutputBytes, session: result.session, }; } diff --git a/extensions/google-meet/src/setup.ts b/extensions/google-meet/src/setup.ts index cada2a3f355..e58a7116654 100644 --- a/extensions/google-meet/src/setup.ts +++ b/extensions/google-meet/src/setup.ts @@ -1,7 +1,7 @@ import fs from "node:fs"; import os from "node:os"; import path from "node:path"; -import type { GoogleMeetConfig } from "./config.js"; +import type { GoogleMeetConfig, GoogleMeetMode, GoogleMeetTransport } from "./config.js"; export type SetupCheck = { id: string; @@ -33,6 +33,8 @@ export function getGoogleMeetSetupStatus( options?: { env?: NodeJS.ProcessEnv; fullConfig?: unknown; + mode?: GoogleMeetMode; + transport?: GoogleMeetTransport; }, ): { ok: boolean; @@ -43,11 +45,17 @@ export function getGoogleMeetSetupStatus( options?: { env?: NodeJS.ProcessEnv; fullConfig?: unknown; + mode?: GoogleMeetMode; + transport?: GoogleMeetTransport; }, ) { const checks: SetupCheck[] = []; const env = options?.env ?? process.env; const fullConfig = asRecord(options?.fullConfig); + const mode = options?.mode ?? config.defaultMode; + const transport = options?.transport ?? config.defaultTransport; + const needsChromeRealtimeAudio = + mode === "realtime" && (transport === "chrome" || transport === "chrome-node"); const pluginEntries = asRecord(asRecord(fullConfig.plugins).entries); const pluginAllow = asRecord(fullConfig.plugins).allow; const voiceCallEntry = asRecord(pluginEntries["voice-call"]); @@ -79,18 +87,26 @@ export function getGoogleMeetSetupStatus( : "Local Chrome uses the OpenClaw browser profile; configure browser.defaultProfile to choose another profile", }); - checks.push({ - id: "audio-bridge", - ok: Boolean( - config.chrome.audioBridgeCommand || - (config.chrome.audioInputCommand && config.chrome.audioOutputCommand), - ), - message: config.chrome.audioBridgeCommand - ? "Chrome audio bridge command configured" - : config.chrome.audioInputCommand && config.chrome.audioOutputCommand - ? `Chrome command-pair realtime audio bridge configured (${config.chrome.audioFormat})` - : "Chrome realtime audio bridge not configured", - }); + if (needsChromeRealtimeAudio) { + checks.push({ + id: "audio-bridge", + ok: Boolean( + config.chrome.audioBridgeCommand || + (config.chrome.audioInputCommand && config.chrome.audioOutputCommand), + ), + message: config.chrome.audioBridgeCommand + ? "Chrome audio bridge command configured" + : config.chrome.audioInputCommand && config.chrome.audioOutputCommand + ? `Chrome command-pair realtime audio bridge configured (${config.chrome.audioFormat})` + : "Chrome realtime audio bridge not configured", + }); + } else if (transport === "chrome" || transport === "chrome-node") { + checks.push({ + id: "audio-bridge", + ok: true, + message: "Chrome observe-only mode does not require a realtime audio bridge", + }); + } checks.push({ id: "guest-join-defaults", @@ -114,14 +130,16 @@ export function getGoogleMeetSetupStatus( : "Chrome node not pinned; automatic selection works when exactly one capable node is connected", }); - checks.push({ - id: "intro-after-in-call", - ok: config.chrome.waitForInCallMs > 0, - message: - config.chrome.waitForInCallMs > 0 - ? `Realtime intro waits up to ${config.chrome.waitForInCallMs}ms for the Meet tab to be in-call` - : "Set chrome.waitForInCallMs to delay realtime intro until the Meet tab is in-call", - }); + if (needsChromeRealtimeAudio) { + checks.push({ + id: "intro-after-in-call", + ok: config.chrome.waitForInCallMs > 0, + message: + config.chrome.waitForInCallMs > 0 + ? `Realtime intro waits up to ${config.chrome.waitForInCallMs}ms for the Meet tab to be in-call` + : "Set chrome.waitForInCallMs to delay realtime intro until the Meet tab is in-call", + }); + } const shouldCheckTwilioDelegation = config.voiceCall.enabled && diff --git a/extensions/google-meet/src/transports/chrome.ts b/extensions/google-meet/src/transports/chrome.ts index 35fb7b24be3..f89c13dd34f 100644 --- a/extensions/google-meet/src/transports/chrome.ts +++ b/extensions/google-meet/src/transports/chrome.ts @@ -95,57 +95,59 @@ export async function launchChromeMeet(params: { | ({ type: "command-pair" } & ChromeRealtimeAudioBridgeHandle); browser?: GoogleMeetChromeHealth; }> { - await assertBlackHole2chAvailable({ - runtime: params.runtime, - timeoutMs: Math.min(params.config.chrome.joinTimeoutMs, 10_000), - }); - - if (params.config.chrome.audioBridgeHealthCommand) { - const health = await params.runtime.system.runCommandWithTimeout( - params.config.chrome.audioBridgeHealthCommand, - { timeoutMs: params.config.chrome.joinTimeoutMs }, - ); - if (health.code !== 0) { - throw new Error( - `Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`, - ); - } - } - let audioBridge: | { type: "external-command" } | ({ type: "command-pair" } & ChromeRealtimeAudioBridgeHandle) | undefined; - if (params.config.chrome.audioBridgeCommand) { - const bridge = await params.runtime.system.runCommandWithTimeout( - params.config.chrome.audioBridgeCommand, - { timeoutMs: params.config.chrome.joinTimeoutMs }, - ); - if (bridge.code !== 0) { - throw new Error( - `failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`, + if (params.mode === "realtime") { + await assertBlackHole2chAvailable({ + runtime: params.runtime, + timeoutMs: Math.min(params.config.chrome.joinTimeoutMs, 10_000), + }); + + if (params.config.chrome.audioBridgeHealthCommand) { + const health = await params.runtime.system.runCommandWithTimeout( + params.config.chrome.audioBridgeHealthCommand, + { timeoutMs: params.config.chrome.joinTimeoutMs }, ); + if (health.code !== 0) { + throw new Error( + `Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`, + ); + } } - audioBridge = { type: "external-command" }; - } else if (params.mode === "realtime") { - if (!params.config.chrome.audioInputCommand || !params.config.chrome.audioOutputCommand) { - throw new Error( - "Chrome realtime mode requires chrome.audioInputCommand and chrome.audioOutputCommand, or chrome.audioBridgeCommand for an external bridge.", + + if (params.config.chrome.audioBridgeCommand) { + const bridge = await params.runtime.system.runCommandWithTimeout( + params.config.chrome.audioBridgeCommand, + { timeoutMs: params.config.chrome.joinTimeoutMs }, ); + if (bridge.code !== 0) { + throw new Error( + `failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`, + ); + } + audioBridge = { type: "external-command" }; + } else { + if (!params.config.chrome.audioInputCommand || !params.config.chrome.audioOutputCommand) { + throw new Error( + "Chrome realtime mode requires chrome.audioInputCommand and chrome.audioOutputCommand, or chrome.audioBridgeCommand for an external bridge.", + ); + } + audioBridge = { + type: "command-pair", + ...(await startCommandRealtimeAudioBridge({ + config: params.config, + fullConfig: params.fullConfig, + runtime: params.runtime, + meetingSessionId: params.meetingSessionId, + inputCommand: params.config.chrome.audioInputCommand, + outputCommand: params.config.chrome.audioOutputCommand, + logger: params.logger, + })), + }; } - audioBridge = { - type: "command-pair", - ...(await startCommandRealtimeAudioBridge({ - config: params.config, - fullConfig: params.fullConfig, - runtime: params.runtime, - meetingSessionId: params.meetingSessionId, - inputCommand: params.config.chrome.audioInputCommand, - outputCommand: params.config.chrome.audioOutputCommand, - logger: params.logger, - })), - }; } if (!params.config.chrome.launch) { @@ -167,6 +169,7 @@ export async function launchChromeMeet(params: { const result = await openMeetWithBrowserRequest({ callBrowser: callLocalBrowserRequest, config: params.config, + mode: params.mode, url: params.url, }); return { ...result, audioBridge }; @@ -273,7 +276,11 @@ function parsePermissionGrantNotes(result: unknown): string[] { async function grantMeetMediaPermissions(params: { callBrowser: BrowserRequestCaller; timeoutMs: number; + allowMicrophone: boolean; }): Promise { + if (!params.allowMicrophone) { + return ["Observe-only mode skips Meet microphone/camera permission grants."]; + } try { const result = await params.callBrowser({ method: "POST", @@ -296,9 +303,14 @@ async function grantMeetMediaPermissions(params: { } } -function meetStatusScript(params: { guestName: string; autoJoin: boolean }) { +function meetStatusScript(params: { + allowMicrophone: boolean; + autoJoin: boolean; + guestName: string; +}) { return `() => { const text = (node) => (node?.innerText || node?.textContent || "").trim(); + const allowMicrophone = ${JSON.stringify(params.allowMicrophone)}; const buttons = [...document.querySelectorAll('button')]; const notes = []; const findButton = (pattern) => @@ -325,16 +337,24 @@ function meetStatusScript(params: { guestName: string; autoJoin: boolean }) { const host = location.hostname.toLowerCase(); const pageUrl = location.href; const permissionNeeded = /permission needed|allow.*(microphone|camera)|blocked.*(microphone|camera)|permission.*(microphone|camera|speaker)/i.test(pageText); + const mic = buttons.find((button) => /turn off microphone|turn on microphone|microphone/i.test(button.getAttribute('aria-label') || text(button))); + if (!allowMicrophone && mic && /turn off microphone/i.test(mic.getAttribute('aria-label') || text(mic))) { + mic.click(); + notes.push("Muted Meet microphone for observe-only mode."); + } const join = ${JSON.stringify(params.autoJoin)} ? findButton(/join now|ask to join/i) : null; if (join) join.click(); const microphoneChoice = findButton(/\\buse microphone\\b/i); - if (microphoneChoice) { + const noMicrophoneChoice = findButton(/\\b(continue|join|use) without (microphone|mic)\\b|\\bnot now\\b/i); + if (allowMicrophone && microphoneChoice) { microphoneChoice.click(); notes.push("Accepted Meet microphone prompt with browser automation."); + } else if (!allowMicrophone && noMicrophoneChoice) { + noMicrophoneChoice.click(); + notes.push("Skipped Meet microphone prompt for observe-only mode."); } - const mic = buttons.find((button) => /turn off microphone|turn on microphone|microphone/i.test(button.getAttribute('aria-label') || text(button))); const inCall = buttons.some((button) => /leave call/i.test(button.getAttribute('aria-label') || text(button))); let manualActionReason; let manualActionMessage; @@ -346,14 +366,18 @@ function meetStatusScript(params: { guestName: string; autoJoin: boolean }) { manualActionMessage = "Admit the OpenClaw browser participant in Google Meet, then retry speech."; } else if (permissionNeeded) { manualActionReason = "meet-permission-required"; - manualActionMessage = "Allow microphone/camera/speaker permissions for Meet in the OpenClaw browser profile, then retry."; - } else if (!inCall && !microphoneChoice && /do you want people to hear you in the meeting/i.test(pageText)) { + manualActionMessage = allowMicrophone + ? "Allow microphone/camera/speaker permissions for Meet in the OpenClaw browser profile, then retry." + : "Join without microphone/camera permissions in the OpenClaw browser profile, then retry."; + } else if (!inCall && (allowMicrophone ? !microphoneChoice : !noMicrophoneChoice) && /do you want people to hear you in the meeting/i.test(pageText)) { manualActionReason = "meet-audio-choice-required"; - manualActionMessage = "Meet is showing the microphone choice. Click Use microphone in the OpenClaw browser profile, then retry."; + manualActionMessage = allowMicrophone + ? "Meet is showing the microphone choice. Click Use microphone in the OpenClaw browser profile, then retry." + : "Meet is showing the microphone choice. Choose the no-microphone option in the OpenClaw browser profile, then retry."; } return JSON.stringify({ clickedJoin: Boolean(join), - clickedMicrophoneChoice: Boolean(microphoneChoice), + clickedMicrophoneChoice: Boolean(allowMicrophone && microphoneChoice), inCall, micMuted: mic ? /turn on microphone/i.test(mic.getAttribute('aria-label') || text(mic)) : undefined, manualActionRequired: Boolean(manualActionReason), @@ -370,6 +394,7 @@ async function openMeetWithBrowserProxy(params: { runtime: PluginRuntime; nodeId: string; config: GoogleMeetConfig; + mode: "realtime" | "transcribe"; url: string; }): Promise<{ launched: boolean; browser?: GoogleMeetChromeHealth }> { return await openMeetWithBrowserRequest({ @@ -383,6 +408,7 @@ async function openMeetWithBrowserProxy(params: { timeoutMs: request.timeoutMs, }), config: params.config, + mode: params.mode, url: params.url, }); } @@ -390,6 +416,7 @@ async function openMeetWithBrowserProxy(params: { async function openMeetWithBrowserRequest(params: { callBrowser: BrowserRequestCaller; config: GoogleMeetConfig; + mode: "realtime" | "transcribe"; url: string; }): Promise<{ launched: boolean; browser?: GoogleMeetChromeHealth }> { if (!params.config.chrome.launch) { @@ -442,6 +469,7 @@ async function openMeetWithBrowserRequest(params: { } const permissionNotes = await grantMeetMediaPermissions({ + allowMicrophone: params.mode === "realtime", callBrowser: params.callBrowser, timeoutMs, }); @@ -461,6 +489,7 @@ async function openMeetWithBrowserRequest(params: { kind: "evaluate", targetId, fn: meetStatusScript({ + allowMicrophone: params.mode === "realtime", guestName: params.config.chrome.guestName, autoJoin: params.config.chrome.autoJoin, }), @@ -526,6 +555,7 @@ async function inspectRecoverableMeetTab(params: { timeoutMs: Math.min(params.timeoutMs, 5_000), }); const permissionNotes = await grantMeetMediaPermissions({ + allowMicrophone: true, callBrowser: params.callBrowser, timeoutMs: params.timeoutMs, }); @@ -536,6 +566,7 @@ async function inspectRecoverableMeetTab(params: { kind: "evaluate", targetId: params.targetId, fn: meetStatusScript({ + allowMicrophone: true, guestName: params.config.chrome.guestName, autoJoin: false, }), @@ -714,6 +745,7 @@ export async function launchChromeMeetOnNode(params: { runtime: params.runtime, nodeId, config: params.config, + mode: params.mode, url: params.url, }); const raw = await params.runtime.nodes.invoke({