mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 05:10:44 +00:00
fix(google-meet): harden observe mode speech health (#73256)
* fix(google-meet): harden observe mode speech health * fix(google-meet): address observe speech review * docs(google-meet): clarify observe mode guarantees
This commit is contained in:
committed by
GitHub
parent
2633b14914
commit
25851e3cae
@@ -63,6 +63,7 @@ Docs: https://docs.openclaw.ai
|
||||
- Agents/ACPX: stage the patched Claude ACP adapter as an ACPX runtime dependency and route known Codex/Claude ACP commands through local wrappers, so Gateway runtime no longer depends on live `npx` adapter resolution. Fixes #73202. Thanks @joerod26.
|
||||
- Memory/compaction: let pre-compaction memory flush use an exact `agents.defaults.compaction.memoryFlush.model` override such as `ollama/qwen3:8b` without inheriting the active session fallback chain, so local housekeeping can avoid paid conversation models. Fixes #53772. Thanks @limen96.
|
||||
- macOS/update: stop managed Gateway services before package replacement and keep LaunchAgent service secrets out of world-readable plist metadata by loading them from owner-only env files. Fixes #72996. Thanks @Mathewb7.
|
||||
- Google Meet: keep observe-only Chrome joins and setup checks from requiring BlackHole or audio bridge commands, avoid granting or selecting the microphone in observe-only mode, and make `test_speech` report fresh realtime output-byte verification instead of only confirming a queued utterance. Refs #72478. Thanks @DougButdorf.
|
||||
- Gateway/hooks: route non-delivered hook completion and error summaries to the target agent's main session instead of the default agent session, preserving multi-agent hook isolation. Fixes #24693; carries forward #68667. Thanks @abersonFAC and @bluesky6868.
|
||||
- Control UI/models: request the configured Gateway model-list view so dashboards with only `models.providers.*.models` show those configured models first instead of flooding the picker with the full built-in catalog. Fixes #65405. Thanks @wbyanclaw.
|
||||
- CLI/models: keep default-model and allowlist pickers on explicit `models.providers.*.models` entries when `models.mode` is `replace` instead of loading the full built-in catalog. Fixes #64950. Thanks @mrozentsvayg.
|
||||
|
||||
@@ -74,12 +74,21 @@ Check setup:
|
||||
openclaw googlemeet setup
|
||||
```
|
||||
|
||||
The setup output is meant to be agent-readable. It reports Chrome profile,
|
||||
audio bridge, node pinning, delayed realtime intro, and, when Twilio delegation
|
||||
is configured, whether the `voice-call` plugin and Twilio credentials are ready.
|
||||
Treat any `ok: false` check as a blocker before asking an agent to join.
|
||||
Use `openclaw googlemeet setup --json` for scripts or machine-readable output.
|
||||
Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio`
|
||||
The setup output is meant to be agent-readable and mode-aware. It reports Chrome
|
||||
profile, node pinning, and, for realtime Chrome joins, the BlackHole/SoX audio
|
||||
bridge and delayed realtime intro checks. For observe-only joins, check the same
|
||||
transport with `--mode transcribe`; that mode skips realtime audio prerequisites
|
||||
because it does not listen through or speak through the bridge:
|
||||
|
||||
```bash
|
||||
openclaw googlemeet setup --transport chrome-node --mode transcribe
|
||||
```
|
||||
|
||||
When Twilio delegation is configured, setup also reports whether the
|
||||
`voice-call` plugin and Twilio credentials are ready. Treat any `ok: false`
|
||||
check as a blocker for the checked transport and mode before asking an agent to
|
||||
join. Use `openclaw googlemeet setup --json` for scripts or machine-readable
|
||||
output. Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio`
|
||||
to preflight a specific transport before an agent tries it.
|
||||
|
||||
Join a meeting:
|
||||
@@ -144,8 +153,12 @@ then share the returned `meetingUri`.
|
||||
```
|
||||
|
||||
For an observe-only/browser-control join, set `"mode": "transcribe"`. That does
|
||||
not start the duplex realtime model bridge, so it will not talk back into the
|
||||
meeting.
|
||||
not start the duplex realtime model bridge, does not require BlackHole or SoX,
|
||||
and will not talk back into the meeting. Chrome joins in this mode also avoid
|
||||
OpenClaw's microphone/camera permission grant and avoid the Meet **Use
|
||||
microphone** path. If Meet shows an audio-choice interstitial, automation tries
|
||||
the no-microphone path and otherwise reports a manual action instead of opening
|
||||
the local microphone.
|
||||
|
||||
During realtime sessions, `google_meet` status includes browser and audio bridge
|
||||
health such as `inCall`, `manualActionRequired`, `providerConnected`,
|
||||
@@ -155,10 +168,10 @@ appears, browser automation handles it when it can. Login, host admission, and
|
||||
browser/OS permission prompts are reported as manual action with a reason and
|
||||
message for the agent to relay.
|
||||
|
||||
Local Chrome joins through the signed-in OpenClaw browser profile. In Meet, pick
|
||||
`BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For clean
|
||||
duplex audio, use separate virtual devices or a Loopback-style graph; a single
|
||||
BlackHole device is enough for a first smoke test but can echo.
|
||||
Local Chrome joins through the signed-in OpenClaw browser profile. Realtime mode
|
||||
requires `BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For
|
||||
clean duplex audio, use separate virtual devices or a Loopback-style graph; a
|
||||
single BlackHole device is enough for a first smoke test but can echo.
|
||||
|
||||
### Local gateway + Parallels Chrome
|
||||
|
||||
@@ -286,13 +299,13 @@ phrase, and prints session health:
|
||||
openclaw googlemeet test-speech https://meet.google.com/abc-defg-hij
|
||||
```
|
||||
|
||||
During join, OpenClaw browser automation fills the guest name, clicks Join/Ask
|
||||
to join, and accepts Meet's first-run "Use microphone" choice when that prompt
|
||||
appears. During browser-only meeting creation, it can also continue past the
|
||||
same prompt without microphone if Meet does not expose the use-microphone button.
|
||||
If the browser profile is not signed in, Meet is waiting for host
|
||||
admission, Chrome needs microphone/camera permission, or Meet is stuck on a
|
||||
prompt automation could not resolve, the join/test-speech result reports
|
||||
During realtime join, OpenClaw browser automation fills the guest name, clicks
|
||||
Join/Ask to join, and accepts Meet's first-run "Use microphone" choice when that
|
||||
prompt appears. During observe-only join or browser-only meeting creation, it
|
||||
continues past the same prompt without microphone when that choice is available.
|
||||
If the browser profile is not signed in, Meet is waiting for host admission,
|
||||
Chrome needs microphone/camera permission for a realtime join, or Meet is stuck
|
||||
on a prompt automation could not resolve, the join/test-speech result reports
|
||||
`manualActionRequired: true` with `manualActionReason` and
|
||||
`manualActionMessage`. Agents should stop retrying the join, report that exact
|
||||
message plus the current `browserUrl`/`browserTitle`, and retry only after the
|
||||
@@ -979,7 +992,12 @@ Use `action: "status"` to list active sessions or inspect a session ID. Use
|
||||
`action: "speak"` with `sessionId` and `message` to make the realtime agent
|
||||
speak immediately. Use `action: "test_speech"` to create or reuse the session,
|
||||
trigger a known phrase, and return `inCall` health when the Chrome host can
|
||||
report it. Use `action: "leave"` to mark a session ended.
|
||||
report it. `test_speech` always forces `mode: "realtime"` and fails if asked to
|
||||
run in `mode: "transcribe"` because observe-only sessions intentionally cannot
|
||||
emit speech. Its `speechOutputVerified` result is based on realtime audio output
|
||||
bytes increasing during this test call, so a reused session with older audio
|
||||
does not count as a fresh successful speech check. Use `action: "leave"` to mark
|
||||
a session ended.
|
||||
|
||||
`status` includes Chrome health when available:
|
||||
|
||||
@@ -1224,7 +1242,12 @@ openclaw googlemeet doctor
|
||||
```
|
||||
|
||||
Use `mode: "realtime"` for listen/talk-back. `mode: "transcribe"` intentionally
|
||||
does not start the duplex realtime voice bridge.
|
||||
does not start the duplex realtime voice bridge. `googlemeet test-speech`
|
||||
always checks the realtime path and reports whether bridge output bytes were
|
||||
observed for that invocation. If `speechOutputVerified` is false and
|
||||
`speechOutputTimedOut` is true, the realtime provider may have accepted the
|
||||
utterance but OpenClaw did not see new output bytes reach the Chrome audio
|
||||
bridge.
|
||||
|
||||
Also verify:
|
||||
|
||||
@@ -1317,7 +1340,7 @@ call still needs a participant path. This plugin keeps that boundary visible:
|
||||
Chrome handles browser participation and local audio routing; Twilio handles
|
||||
phone dial-in participation.
|
||||
|
||||
Chrome realtime mode needs either:
|
||||
Chrome realtime mode needs `BlackHole 2ch` plus either:
|
||||
|
||||
- `chrome.audioInputCommand` plus `chrome.audioOutputCommand`: OpenClaw owns the
|
||||
realtime model bridge and pipes audio in `chrome.audioFormat` between those
|
||||
|
||||
@@ -110,7 +110,10 @@ function mockLocalMeetBrowserRequest(
|
||||
params?: unknown,
|
||||
_extra?: unknown,
|
||||
): Promise<Record<string, unknown>> => {
|
||||
const request = params as { path?: string; body?: { targetId?: string; url?: string } };
|
||||
const request = params as {
|
||||
path?: string;
|
||||
body?: { fn?: string; targetId?: string; url?: string };
|
||||
};
|
||||
if (request.path === "/tabs") {
|
||||
return { tabs: [] };
|
||||
}
|
||||
@@ -1298,6 +1301,52 @@ describe("google-meet plugin", () => {
|
||||
}
|
||||
});
|
||||
|
||||
it("skips local Chrome audio prerequisites for observe-only setup status", async () => {
|
||||
const originalPlatform = process.platform;
|
||||
Object.defineProperty(process, "platform", { value: "darwin" });
|
||||
try {
|
||||
const { tools, runCommandWithTimeout } = setup(
|
||||
{ defaultMode: "transcribe", defaultTransport: "chrome" },
|
||||
{
|
||||
runCommandWithTimeoutHandler: async () => ({
|
||||
code: 1,
|
||||
stdout: "Built-in Output",
|
||||
stderr: "",
|
||||
}),
|
||||
},
|
||||
);
|
||||
const tool = tools[0] as {
|
||||
execute: (
|
||||
id: string,
|
||||
params: unknown,
|
||||
) => Promise<{ details: { ok?: boolean; checks?: Array<{ id?: string; ok?: boolean }> } }>;
|
||||
};
|
||||
|
||||
const result = await tool.execute("id", {
|
||||
action: "setup_status",
|
||||
transport: "chrome",
|
||||
mode: "transcribe",
|
||||
});
|
||||
|
||||
expect(result.details.ok).toBe(true);
|
||||
expect(result.details.checks).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
id: "audio-bridge",
|
||||
ok: true,
|
||||
message: "Chrome observe-only mode does not require a realtime audio bridge",
|
||||
}),
|
||||
]),
|
||||
);
|
||||
expect(result.details.checks?.some((check) => check.id === "chrome-local-audio-device")).toBe(
|
||||
false,
|
||||
);
|
||||
expect(runCommandWithTimeout).not.toHaveBeenCalled();
|
||||
} finally {
|
||||
Object.defineProperty(process, "platform", { value: originalPlatform });
|
||||
}
|
||||
});
|
||||
|
||||
it("reports Twilio delegation readiness when voice-call is enabled", async () => {
|
||||
vi.stubEnv("TWILIO_ACCOUNT_SID", "AC123");
|
||||
vi.stubEnv("TWILIO_AUTH_TOKEN", "secret");
|
||||
@@ -1386,7 +1435,7 @@ describe("google-meet plugin", () => {
|
||||
);
|
||||
});
|
||||
|
||||
it("opens local Chrome Meet through browser control after the BlackHole check", async () => {
|
||||
it("opens local Chrome Meet in observe-only mode without BlackHole checks", async () => {
|
||||
const originalPlatform = process.platform;
|
||||
Object.defineProperty(process, "platform", { value: "darwin" });
|
||||
try {
|
||||
@@ -1408,12 +1457,7 @@ describe("google-meet plugin", () => {
|
||||
});
|
||||
|
||||
expect(respond.mock.calls[0]?.[0]).toBe(true);
|
||||
expect(runCommandWithTimeout).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
["/usr/sbin/system_profiler", "SPAudioDataType"],
|
||||
{ timeoutMs: 10000 },
|
||||
);
|
||||
expect(runCommandWithTimeout).toHaveBeenCalledTimes(1);
|
||||
expect(runCommandWithTimeout).not.toHaveBeenCalled();
|
||||
expect(callGatewayFromCli).toHaveBeenCalledWith(
|
||||
"browser.request",
|
||||
expect.any(Object),
|
||||
@@ -1424,19 +1468,16 @@ describe("google-meet plugin", () => {
|
||||
}),
|
||||
{ progress: false },
|
||||
);
|
||||
expect(callGatewayFromCli).toHaveBeenCalledWith(
|
||||
"browser.request",
|
||||
expect.any(Object),
|
||||
expect.objectContaining({
|
||||
method: "POST",
|
||||
path: "/permissions/grant",
|
||||
body: expect.objectContaining({
|
||||
origin: "https://meet.google.com",
|
||||
permissions: ["audioCapture", "videoCapture"],
|
||||
optionalPermissions: ["speakerSelection"],
|
||||
}),
|
||||
}),
|
||||
{ progress: false },
|
||||
expect(
|
||||
callGatewayFromCli.mock.calls.some(
|
||||
([, , request]) => (request as { path?: string }).path === "/permissions/grant",
|
||||
),
|
||||
).toBe(false);
|
||||
const actCall = callGatewayFromCli.mock.calls.find(
|
||||
([, , request]) => (request as { path?: string }).path === "/act",
|
||||
);
|
||||
expect(String((actCall?.[2] as { body?: { fn?: string } } | undefined)?.body?.fn)).toContain(
|
||||
"const allowMicrophone = false",
|
||||
);
|
||||
} finally {
|
||||
Object.defineProperty(process, "platform", { value: originalPlatform });
|
||||
@@ -1883,9 +1924,14 @@ describe("google-meet plugin", () => {
|
||||
updatedAt: "2026-04-27T00:00:00.000Z",
|
||||
participantIdentity: "signed-in Google Chrome profile",
|
||||
realtime: { enabled: true, provider: "openai", toolPolicy: "safe-read-only" },
|
||||
chrome: { audioBackend: "blackhole-2ch", launched: true },
|
||||
chrome: {
|
||||
audioBackend: "blackhole-2ch",
|
||||
launched: true,
|
||||
health: { audioOutputActive: true, lastOutputBytes: 10 },
|
||||
},
|
||||
notes: [],
|
||||
};
|
||||
vi.spyOn(runtime, "list").mockReturnValue([session]);
|
||||
const join = vi.spyOn(runtime, "join").mockResolvedValue({ session, spoken: true });
|
||||
const speak = vi.spyOn(runtime, "speak");
|
||||
|
||||
@@ -1894,9 +1940,32 @@ describe("google-meet plugin", () => {
|
||||
message: "Say exactly: hello.",
|
||||
});
|
||||
|
||||
expect(join).toHaveBeenCalledWith(expect.objectContaining({ message: "Say exactly: hello." }));
|
||||
expect(join).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
message: "Say exactly: hello.",
|
||||
mode: "realtime",
|
||||
}),
|
||||
);
|
||||
expect(speak).not.toHaveBeenCalled();
|
||||
expect(result.spoken).toBe(true);
|
||||
expect(result.speechOutputVerified).toBe(false);
|
||||
expect(result.speechOutputTimedOut).toBe(false);
|
||||
});
|
||||
|
||||
it("rejects observe-only mode for test speech", async () => {
|
||||
const runtime = new GoogleMeetRuntime({
|
||||
config: resolveGoogleMeetConfig({}),
|
||||
fullConfig: {} as never,
|
||||
runtime: {} as never,
|
||||
logger: noopLogger,
|
||||
});
|
||||
|
||||
await expect(
|
||||
runtime.testSpeech({
|
||||
url: "https://meet.google.com/abc-defg-hij",
|
||||
mode: "transcribe",
|
||||
}),
|
||||
).rejects.toThrow("test_speech requires mode: realtime");
|
||||
});
|
||||
|
||||
it("reports manual action when the browser profile needs Google login", async () => {
|
||||
|
||||
@@ -677,7 +677,13 @@ export default definePluginEntry({
|
||||
async ({ params, respond }: GatewayRequestHandlerOptions) => {
|
||||
try {
|
||||
const rt = await ensureRuntime();
|
||||
respond(true, await rt.setupStatus({ transport: normalizeTransport(params?.transport) }));
|
||||
respond(
|
||||
true,
|
||||
await rt.setupStatus({
|
||||
transport: normalizeTransport(params?.transport),
|
||||
mode: normalizeMode(params?.mode),
|
||||
}),
|
||||
);
|
||||
} catch (err) {
|
||||
sendError(respond, err);
|
||||
}
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
import { spawnSync } from "node:child_process";
|
||||
import { EventEmitter } from "node:events";
|
||||
import { describe, expect, it, vi } from "vitest";
|
||||
|
||||
@@ -40,6 +41,35 @@ vi.mock("node:child_process", async (importOriginal) => {
|
||||
});
|
||||
|
||||
describe("google-meet node host bridge sessions", () => {
|
||||
it("starts observe-only Chrome without BlackHole or bridge processes", async () => {
|
||||
const { handleGoogleMeetNodeHostCommand } = await import("./src/node-host.js");
|
||||
const originalPlatform = process.platform;
|
||||
children.length = 0;
|
||||
vi.mocked(spawnSync).mockClear();
|
||||
|
||||
Object.defineProperty(process, "platform", { configurable: true, value: "darwin" });
|
||||
try {
|
||||
const start = JSON.parse(
|
||||
await handleGoogleMeetNodeHostCommand(
|
||||
JSON.stringify({
|
||||
action: "start",
|
||||
url: "https://meet.google.com/xyz-abcd-uvw",
|
||||
mode: "transcribe",
|
||||
launch: false,
|
||||
audioInputCommand: ["mock-rec"],
|
||||
audioOutputCommand: ["mock-play"],
|
||||
}),
|
||||
),
|
||||
);
|
||||
|
||||
expect(start).toEqual({ launched: false });
|
||||
expect(spawnSync).not.toHaveBeenCalled();
|
||||
expect(children).toHaveLength(0);
|
||||
} finally {
|
||||
Object.defineProperty(process, "platform", { configurable: true, value: originalPlatform });
|
||||
}
|
||||
});
|
||||
|
||||
it("clears output playback without closing the active bridge when the old output exits", async () => {
|
||||
const { handleGoogleMeetNodeHostCommand } = await import("./src/node-host.js");
|
||||
const originalPlatform = process.platform;
|
||||
|
||||
@@ -129,6 +129,7 @@ export type GoogleMeetExportManifest = {
|
||||
|
||||
type SetupOptions = {
|
||||
json?: boolean;
|
||||
mode?: GoogleMeetMode;
|
||||
transport?: GoogleMeetTransport;
|
||||
};
|
||||
|
||||
@@ -1986,10 +1987,11 @@ export function registerGoogleMeetCli(params: {
|
||||
.command("setup")
|
||||
.description("Show Google Meet transport setup status")
|
||||
.option("--transport <transport>", "Transport to check: chrome, chrome-node, or twilio")
|
||||
.option("--mode <mode>", "Mode to check: realtime or transcribe")
|
||||
.option("--json", "Print JSON output", false)
|
||||
.action(async (options: SetupOptions) => {
|
||||
const rt = await params.ensureRuntime();
|
||||
const status = await rt.setupStatus({ transport: options.transport });
|
||||
const status = await rt.setupStatus({ transport: options.transport, mode: options.mode });
|
||||
if (options.json) {
|
||||
writeStdoutJson(status);
|
||||
return;
|
||||
|
||||
@@ -270,42 +270,46 @@ function startChrome(params: Record<string, unknown>) {
|
||||
throw new Error("url required");
|
||||
}
|
||||
const timeoutMs = readNumber(params.joinTimeoutMs, 30_000);
|
||||
assertBlackHoleAvailable(Math.min(timeoutMs, 10_000));
|
||||
|
||||
const healthCommand = readStringArray(params.audioBridgeHealthCommand);
|
||||
if (healthCommand) {
|
||||
const health = runCommandWithTimeout(healthCommand, timeoutMs);
|
||||
if (health.code !== 0) {
|
||||
throw new Error(
|
||||
`Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
const mode = readString(params.mode);
|
||||
|
||||
let bridgeId: string | undefined;
|
||||
let audioBridge: { type: "external-command" | "node-command-pair" } | undefined;
|
||||
const bridgeCommand = readStringArray(params.audioBridgeCommand);
|
||||
if (bridgeCommand) {
|
||||
const bridge = runCommandWithTimeout(bridgeCommand, timeoutMs);
|
||||
if (bridge.code !== 0) {
|
||||
throw new Error(
|
||||
`failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`,
|
||||
);
|
||||
if (mode === "realtime") {
|
||||
assertBlackHoleAvailable(Math.min(timeoutMs, 10_000));
|
||||
|
||||
const healthCommand = readStringArray(params.audioBridgeHealthCommand);
|
||||
if (healthCommand) {
|
||||
const health = runCommandWithTimeout(healthCommand, timeoutMs);
|
||||
if (health.code !== 0) {
|
||||
throw new Error(
|
||||
`Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
const bridgeCommand = readStringArray(params.audioBridgeCommand);
|
||||
if (bridgeCommand) {
|
||||
const bridge = runCommandWithTimeout(bridgeCommand, timeoutMs);
|
||||
if (bridge.code !== 0) {
|
||||
throw new Error(
|
||||
`failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`,
|
||||
);
|
||||
}
|
||||
audioBridge = { type: "external-command" };
|
||||
} else {
|
||||
const session = startCommandPair({
|
||||
inputCommand: readStringArray(params.audioInputCommand) ?? [
|
||||
...DEFAULT_GOOGLE_MEET_AUDIO_INPUT_COMMAND,
|
||||
],
|
||||
outputCommand: readStringArray(params.audioOutputCommand) ?? [
|
||||
...DEFAULT_GOOGLE_MEET_AUDIO_OUTPUT_COMMAND,
|
||||
],
|
||||
url,
|
||||
mode,
|
||||
});
|
||||
bridgeId = session.id;
|
||||
audioBridge = { type: "node-command-pair" };
|
||||
}
|
||||
audioBridge = { type: "external-command" };
|
||||
} else if (params.mode === "realtime") {
|
||||
const session = startCommandPair({
|
||||
inputCommand: readStringArray(params.audioInputCommand) ?? [
|
||||
...DEFAULT_GOOGLE_MEET_AUDIO_INPUT_COMMAND,
|
||||
],
|
||||
outputCommand: readStringArray(params.audioOutputCommand) ?? [
|
||||
...DEFAULT_GOOGLE_MEET_AUDIO_OUTPUT_COMMAND,
|
||||
],
|
||||
url,
|
||||
mode: readString(params.mode),
|
||||
});
|
||||
bridgeId = session.id;
|
||||
audioBridge = { type: "node-command-pair" };
|
||||
}
|
||||
|
||||
if (params.launch !== false) {
|
||||
|
||||
@@ -55,6 +55,17 @@ function resolveMode(input: GoogleMeetMode | undefined, config: GoogleMeetConfig
|
||||
return input ?? config.defaultMode;
|
||||
}
|
||||
|
||||
function hasRealtimeAudioOutputAdvanced(
|
||||
health: GoogleMeetChromeHealth | undefined,
|
||||
startOutputBytes: number,
|
||||
): boolean {
|
||||
return (health?.lastOutputBytes ?? 0) > startOutputBytes;
|
||||
}
|
||||
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
function collectChromeAudioCommands(config: GoogleMeetConfig): string[] {
|
||||
const commands = config.chrome.audioBridgeCommand
|
||||
? [config.chrome.audioBridgeCommand[0]]
|
||||
@@ -103,13 +114,16 @@ export class GoogleMeetRuntime {
|
||||
return session ? { found: true, session } : { found: false };
|
||||
}
|
||||
|
||||
async setupStatus(options: { transport?: GoogleMeetTransport } = {}) {
|
||||
async setupStatus(options: { transport?: GoogleMeetTransport; mode?: GoogleMeetMode } = {}) {
|
||||
const transport = resolveTransport(options.transport, this.params.config);
|
||||
const mode = resolveMode(options.mode, this.params.config);
|
||||
const shouldCheckChromeNode =
|
||||
transport === "chrome-node" ||
|
||||
(!options.transport && Boolean(this.params.config.chromeNode.node));
|
||||
let status = getGoogleMeetSetupStatus(this.params.config, {
|
||||
fullConfig: this.params.fullConfig,
|
||||
mode,
|
||||
transport,
|
||||
});
|
||||
if (shouldCheckChromeNode) {
|
||||
try {
|
||||
@@ -131,7 +145,7 @@ export class GoogleMeetRuntime {
|
||||
});
|
||||
}
|
||||
}
|
||||
if (transport === "chrome") {
|
||||
if (transport === "chrome" && mode === "realtime") {
|
||||
try {
|
||||
await assertBlackHole2chAvailable({
|
||||
runtime: this.params.runtime,
|
||||
@@ -302,7 +316,9 @@ export class GoogleMeetRuntime {
|
||||
? transport === "chrome-node"
|
||||
? "Chrome node transport joins as the signed-in Google profile on the selected node and routes realtime audio through the node bridge."
|
||||
: "Chrome transport joins as the signed-in Google profile and routes realtime audio through the configured bridge."
|
||||
: "Chrome transport joins as the signed-in Google profile and expects BlackHole 2ch audio routing.",
|
||||
: mode === "realtime"
|
||||
? "Chrome transport joins as the signed-in Google profile and expects BlackHole 2ch audio routing."
|
||||
: "Chrome transport joins as the signed-in Google profile without starting the realtime audio bridge.",
|
||||
);
|
||||
} else {
|
||||
const dialInNumber = normalizeDialInNumber(
|
||||
@@ -398,14 +414,53 @@ export class GoogleMeetRuntime {
|
||||
manualActionReason?: GoogleMeetChromeHealth["manualActionReason"];
|
||||
manualActionMessage?: string;
|
||||
spoken: boolean;
|
||||
speechOutputVerified: boolean;
|
||||
speechOutputTimedOut: boolean;
|
||||
audioOutputActive?: boolean;
|
||||
lastOutputBytes?: number;
|
||||
session: GoogleMeetSession;
|
||||
}> {
|
||||
const before = new Set(this.list().map((session) => session.id));
|
||||
if (request.mode === "transcribe") {
|
||||
throw new Error(
|
||||
"test_speech requires mode: realtime; use join mode: transcribe for observe-only sessions.",
|
||||
);
|
||||
}
|
||||
const url = normalizeMeetUrl(request.url);
|
||||
const transport = resolveTransport(request.transport, this.params.config);
|
||||
const beforeSessions = this.list();
|
||||
const before = new Set(beforeSessions.map((session) => session.id));
|
||||
const existingSession = beforeSessions.find(
|
||||
(session) =>
|
||||
session.state === "active" &&
|
||||
isSameMeetUrlForReuse(session.url, url) &&
|
||||
session.transport === transport &&
|
||||
session.mode === "realtime",
|
||||
);
|
||||
const startOutputBytes = existingSession?.chrome?.health?.lastOutputBytes ?? 0;
|
||||
const result = await this.join({
|
||||
...request,
|
||||
transport,
|
||||
url,
|
||||
mode: "realtime",
|
||||
message: request.message ?? "Say exactly: Google Meet speech test complete.",
|
||||
});
|
||||
const health = result.session.chrome?.health;
|
||||
let health = result.session.chrome?.health;
|
||||
const shouldWaitForOutput =
|
||||
result.spoken === true &&
|
||||
health?.manualActionRequired !== true &&
|
||||
this.#sessionHealth.has(result.session.id);
|
||||
if (shouldWaitForOutput && !hasRealtimeAudioOutputAdvanced(health, startOutputBytes)) {
|
||||
const deadline = Date.now() + Math.min(this.params.config.chrome.joinTimeoutMs, 5_000);
|
||||
while (Date.now() < deadline) {
|
||||
await sleep(100);
|
||||
this.#refreshHealth(result.session.id);
|
||||
health = result.session.chrome?.health;
|
||||
if (hasRealtimeAudioOutputAdvanced(health, startOutputBytes)) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
const speechOutputVerified = hasRealtimeAudioOutputAdvanced(health, startOutputBytes);
|
||||
return {
|
||||
createdSession: !before.has(result.session.id),
|
||||
inCall: health?.inCall,
|
||||
@@ -413,6 +468,10 @@ export class GoogleMeetRuntime {
|
||||
manualActionReason: health?.manualActionReason,
|
||||
manualActionMessage: health?.manualActionMessage,
|
||||
spoken: result.spoken ?? false,
|
||||
speechOutputVerified,
|
||||
speechOutputTimedOut: shouldWaitForOutput && !speechOutputVerified,
|
||||
audioOutputActive: health?.audioOutputActive,
|
||||
lastOutputBytes: health?.lastOutputBytes,
|
||||
session: result.session,
|
||||
};
|
||||
}
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
import fs from "node:fs";
|
||||
import os from "node:os";
|
||||
import path from "node:path";
|
||||
import type { GoogleMeetConfig } from "./config.js";
|
||||
import type { GoogleMeetConfig, GoogleMeetMode, GoogleMeetTransport } from "./config.js";
|
||||
|
||||
export type SetupCheck = {
|
||||
id: string;
|
||||
@@ -33,6 +33,8 @@ export function getGoogleMeetSetupStatus(
|
||||
options?: {
|
||||
env?: NodeJS.ProcessEnv;
|
||||
fullConfig?: unknown;
|
||||
mode?: GoogleMeetMode;
|
||||
transport?: GoogleMeetTransport;
|
||||
},
|
||||
): {
|
||||
ok: boolean;
|
||||
@@ -43,11 +45,17 @@ export function getGoogleMeetSetupStatus(
|
||||
options?: {
|
||||
env?: NodeJS.ProcessEnv;
|
||||
fullConfig?: unknown;
|
||||
mode?: GoogleMeetMode;
|
||||
transport?: GoogleMeetTransport;
|
||||
},
|
||||
) {
|
||||
const checks: SetupCheck[] = [];
|
||||
const env = options?.env ?? process.env;
|
||||
const fullConfig = asRecord(options?.fullConfig);
|
||||
const mode = options?.mode ?? config.defaultMode;
|
||||
const transport = options?.transport ?? config.defaultTransport;
|
||||
const needsChromeRealtimeAudio =
|
||||
mode === "realtime" && (transport === "chrome" || transport === "chrome-node");
|
||||
const pluginEntries = asRecord(asRecord(fullConfig.plugins).entries);
|
||||
const pluginAllow = asRecord(fullConfig.plugins).allow;
|
||||
const voiceCallEntry = asRecord(pluginEntries["voice-call"]);
|
||||
@@ -79,18 +87,26 @@ export function getGoogleMeetSetupStatus(
|
||||
: "Local Chrome uses the OpenClaw browser profile; configure browser.defaultProfile to choose another profile",
|
||||
});
|
||||
|
||||
checks.push({
|
||||
id: "audio-bridge",
|
||||
ok: Boolean(
|
||||
config.chrome.audioBridgeCommand ||
|
||||
(config.chrome.audioInputCommand && config.chrome.audioOutputCommand),
|
||||
),
|
||||
message: config.chrome.audioBridgeCommand
|
||||
? "Chrome audio bridge command configured"
|
||||
: config.chrome.audioInputCommand && config.chrome.audioOutputCommand
|
||||
? `Chrome command-pair realtime audio bridge configured (${config.chrome.audioFormat})`
|
||||
: "Chrome realtime audio bridge not configured",
|
||||
});
|
||||
if (needsChromeRealtimeAudio) {
|
||||
checks.push({
|
||||
id: "audio-bridge",
|
||||
ok: Boolean(
|
||||
config.chrome.audioBridgeCommand ||
|
||||
(config.chrome.audioInputCommand && config.chrome.audioOutputCommand),
|
||||
),
|
||||
message: config.chrome.audioBridgeCommand
|
||||
? "Chrome audio bridge command configured"
|
||||
: config.chrome.audioInputCommand && config.chrome.audioOutputCommand
|
||||
? `Chrome command-pair realtime audio bridge configured (${config.chrome.audioFormat})`
|
||||
: "Chrome realtime audio bridge not configured",
|
||||
});
|
||||
} else if (transport === "chrome" || transport === "chrome-node") {
|
||||
checks.push({
|
||||
id: "audio-bridge",
|
||||
ok: true,
|
||||
message: "Chrome observe-only mode does not require a realtime audio bridge",
|
||||
});
|
||||
}
|
||||
|
||||
checks.push({
|
||||
id: "guest-join-defaults",
|
||||
@@ -114,14 +130,16 @@ export function getGoogleMeetSetupStatus(
|
||||
: "Chrome node not pinned; automatic selection works when exactly one capable node is connected",
|
||||
});
|
||||
|
||||
checks.push({
|
||||
id: "intro-after-in-call",
|
||||
ok: config.chrome.waitForInCallMs > 0,
|
||||
message:
|
||||
config.chrome.waitForInCallMs > 0
|
||||
? `Realtime intro waits up to ${config.chrome.waitForInCallMs}ms for the Meet tab to be in-call`
|
||||
: "Set chrome.waitForInCallMs to delay realtime intro until the Meet tab is in-call",
|
||||
});
|
||||
if (needsChromeRealtimeAudio) {
|
||||
checks.push({
|
||||
id: "intro-after-in-call",
|
||||
ok: config.chrome.waitForInCallMs > 0,
|
||||
message:
|
||||
config.chrome.waitForInCallMs > 0
|
||||
? `Realtime intro waits up to ${config.chrome.waitForInCallMs}ms for the Meet tab to be in-call`
|
||||
: "Set chrome.waitForInCallMs to delay realtime intro until the Meet tab is in-call",
|
||||
});
|
||||
}
|
||||
|
||||
const shouldCheckTwilioDelegation =
|
||||
config.voiceCall.enabled &&
|
||||
|
||||
@@ -95,57 +95,59 @@ export async function launchChromeMeet(params: {
|
||||
| ({ type: "command-pair" } & ChromeRealtimeAudioBridgeHandle);
|
||||
browser?: GoogleMeetChromeHealth;
|
||||
}> {
|
||||
await assertBlackHole2chAvailable({
|
||||
runtime: params.runtime,
|
||||
timeoutMs: Math.min(params.config.chrome.joinTimeoutMs, 10_000),
|
||||
});
|
||||
|
||||
if (params.config.chrome.audioBridgeHealthCommand) {
|
||||
const health = await params.runtime.system.runCommandWithTimeout(
|
||||
params.config.chrome.audioBridgeHealthCommand,
|
||||
{ timeoutMs: params.config.chrome.joinTimeoutMs },
|
||||
);
|
||||
if (health.code !== 0) {
|
||||
throw new Error(
|
||||
`Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
let audioBridge:
|
||||
| { type: "external-command" }
|
||||
| ({ type: "command-pair" } & ChromeRealtimeAudioBridgeHandle)
|
||||
| undefined;
|
||||
|
||||
if (params.config.chrome.audioBridgeCommand) {
|
||||
const bridge = await params.runtime.system.runCommandWithTimeout(
|
||||
params.config.chrome.audioBridgeCommand,
|
||||
{ timeoutMs: params.config.chrome.joinTimeoutMs },
|
||||
);
|
||||
if (bridge.code !== 0) {
|
||||
throw new Error(
|
||||
`failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`,
|
||||
if (params.mode === "realtime") {
|
||||
await assertBlackHole2chAvailable({
|
||||
runtime: params.runtime,
|
||||
timeoutMs: Math.min(params.config.chrome.joinTimeoutMs, 10_000),
|
||||
});
|
||||
|
||||
if (params.config.chrome.audioBridgeHealthCommand) {
|
||||
const health = await params.runtime.system.runCommandWithTimeout(
|
||||
params.config.chrome.audioBridgeHealthCommand,
|
||||
{ timeoutMs: params.config.chrome.joinTimeoutMs },
|
||||
);
|
||||
if (health.code !== 0) {
|
||||
throw new Error(
|
||||
`Chrome audio bridge health check failed: ${health.stderr || health.stdout || health.code}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
audioBridge = { type: "external-command" };
|
||||
} else if (params.mode === "realtime") {
|
||||
if (!params.config.chrome.audioInputCommand || !params.config.chrome.audioOutputCommand) {
|
||||
throw new Error(
|
||||
"Chrome realtime mode requires chrome.audioInputCommand and chrome.audioOutputCommand, or chrome.audioBridgeCommand for an external bridge.",
|
||||
|
||||
if (params.config.chrome.audioBridgeCommand) {
|
||||
const bridge = await params.runtime.system.runCommandWithTimeout(
|
||||
params.config.chrome.audioBridgeCommand,
|
||||
{ timeoutMs: params.config.chrome.joinTimeoutMs },
|
||||
);
|
||||
if (bridge.code !== 0) {
|
||||
throw new Error(
|
||||
`failed to start Chrome audio bridge: ${bridge.stderr || bridge.stdout || bridge.code}`,
|
||||
);
|
||||
}
|
||||
audioBridge = { type: "external-command" };
|
||||
} else {
|
||||
if (!params.config.chrome.audioInputCommand || !params.config.chrome.audioOutputCommand) {
|
||||
throw new Error(
|
||||
"Chrome realtime mode requires chrome.audioInputCommand and chrome.audioOutputCommand, or chrome.audioBridgeCommand for an external bridge.",
|
||||
);
|
||||
}
|
||||
audioBridge = {
|
||||
type: "command-pair",
|
||||
...(await startCommandRealtimeAudioBridge({
|
||||
config: params.config,
|
||||
fullConfig: params.fullConfig,
|
||||
runtime: params.runtime,
|
||||
meetingSessionId: params.meetingSessionId,
|
||||
inputCommand: params.config.chrome.audioInputCommand,
|
||||
outputCommand: params.config.chrome.audioOutputCommand,
|
||||
logger: params.logger,
|
||||
})),
|
||||
};
|
||||
}
|
||||
audioBridge = {
|
||||
type: "command-pair",
|
||||
...(await startCommandRealtimeAudioBridge({
|
||||
config: params.config,
|
||||
fullConfig: params.fullConfig,
|
||||
runtime: params.runtime,
|
||||
meetingSessionId: params.meetingSessionId,
|
||||
inputCommand: params.config.chrome.audioInputCommand,
|
||||
outputCommand: params.config.chrome.audioOutputCommand,
|
||||
logger: params.logger,
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
if (!params.config.chrome.launch) {
|
||||
@@ -167,6 +169,7 @@ export async function launchChromeMeet(params: {
|
||||
const result = await openMeetWithBrowserRequest({
|
||||
callBrowser: callLocalBrowserRequest,
|
||||
config: params.config,
|
||||
mode: params.mode,
|
||||
url: params.url,
|
||||
});
|
||||
return { ...result, audioBridge };
|
||||
@@ -273,7 +276,11 @@ function parsePermissionGrantNotes(result: unknown): string[] {
|
||||
async function grantMeetMediaPermissions(params: {
|
||||
callBrowser: BrowserRequestCaller;
|
||||
timeoutMs: number;
|
||||
allowMicrophone: boolean;
|
||||
}): Promise<string[]> {
|
||||
if (!params.allowMicrophone) {
|
||||
return ["Observe-only mode skips Meet microphone/camera permission grants."];
|
||||
}
|
||||
try {
|
||||
const result = await params.callBrowser({
|
||||
method: "POST",
|
||||
@@ -296,9 +303,14 @@ async function grantMeetMediaPermissions(params: {
|
||||
}
|
||||
}
|
||||
|
||||
function meetStatusScript(params: { guestName: string; autoJoin: boolean }) {
|
||||
function meetStatusScript(params: {
|
||||
allowMicrophone: boolean;
|
||||
autoJoin: boolean;
|
||||
guestName: string;
|
||||
}) {
|
||||
return `() => {
|
||||
const text = (node) => (node?.innerText || node?.textContent || "").trim();
|
||||
const allowMicrophone = ${JSON.stringify(params.allowMicrophone)};
|
||||
const buttons = [...document.querySelectorAll('button')];
|
||||
const notes = [];
|
||||
const findButton = (pattern) =>
|
||||
@@ -325,16 +337,24 @@ function meetStatusScript(params: { guestName: string; autoJoin: boolean }) {
|
||||
const host = location.hostname.toLowerCase();
|
||||
const pageUrl = location.href;
|
||||
const permissionNeeded = /permission needed|allow.*(microphone|camera)|blocked.*(microphone|camera)|permission.*(microphone|camera|speaker)/i.test(pageText);
|
||||
const mic = buttons.find((button) => /turn off microphone|turn on microphone|microphone/i.test(button.getAttribute('aria-label') || text(button)));
|
||||
if (!allowMicrophone && mic && /turn off microphone/i.test(mic.getAttribute('aria-label') || text(mic))) {
|
||||
mic.click();
|
||||
notes.push("Muted Meet microphone for observe-only mode.");
|
||||
}
|
||||
const join = ${JSON.stringify(params.autoJoin)}
|
||||
? findButton(/join now|ask to join/i)
|
||||
: null;
|
||||
if (join) join.click();
|
||||
const microphoneChoice = findButton(/\\buse microphone\\b/i);
|
||||
if (microphoneChoice) {
|
||||
const noMicrophoneChoice = findButton(/\\b(continue|join|use) without (microphone|mic)\\b|\\bnot now\\b/i);
|
||||
if (allowMicrophone && microphoneChoice) {
|
||||
microphoneChoice.click();
|
||||
notes.push("Accepted Meet microphone prompt with browser automation.");
|
||||
} else if (!allowMicrophone && noMicrophoneChoice) {
|
||||
noMicrophoneChoice.click();
|
||||
notes.push("Skipped Meet microphone prompt for observe-only mode.");
|
||||
}
|
||||
const mic = buttons.find((button) => /turn off microphone|turn on microphone|microphone/i.test(button.getAttribute('aria-label') || text(button)));
|
||||
const inCall = buttons.some((button) => /leave call/i.test(button.getAttribute('aria-label') || text(button)));
|
||||
let manualActionReason;
|
||||
let manualActionMessage;
|
||||
@@ -346,14 +366,18 @@ function meetStatusScript(params: { guestName: string; autoJoin: boolean }) {
|
||||
manualActionMessage = "Admit the OpenClaw browser participant in Google Meet, then retry speech.";
|
||||
} else if (permissionNeeded) {
|
||||
manualActionReason = "meet-permission-required";
|
||||
manualActionMessage = "Allow microphone/camera/speaker permissions for Meet in the OpenClaw browser profile, then retry.";
|
||||
} else if (!inCall && !microphoneChoice && /do you want people to hear you in the meeting/i.test(pageText)) {
|
||||
manualActionMessage = allowMicrophone
|
||||
? "Allow microphone/camera/speaker permissions for Meet in the OpenClaw browser profile, then retry."
|
||||
: "Join without microphone/camera permissions in the OpenClaw browser profile, then retry.";
|
||||
} else if (!inCall && (allowMicrophone ? !microphoneChoice : !noMicrophoneChoice) && /do you want people to hear you in the meeting/i.test(pageText)) {
|
||||
manualActionReason = "meet-audio-choice-required";
|
||||
manualActionMessage = "Meet is showing the microphone choice. Click Use microphone in the OpenClaw browser profile, then retry.";
|
||||
manualActionMessage = allowMicrophone
|
||||
? "Meet is showing the microphone choice. Click Use microphone in the OpenClaw browser profile, then retry."
|
||||
: "Meet is showing the microphone choice. Choose the no-microphone option in the OpenClaw browser profile, then retry.";
|
||||
}
|
||||
return JSON.stringify({
|
||||
clickedJoin: Boolean(join),
|
||||
clickedMicrophoneChoice: Boolean(microphoneChoice),
|
||||
clickedMicrophoneChoice: Boolean(allowMicrophone && microphoneChoice),
|
||||
inCall,
|
||||
micMuted: mic ? /turn on microphone/i.test(mic.getAttribute('aria-label') || text(mic)) : undefined,
|
||||
manualActionRequired: Boolean(manualActionReason),
|
||||
@@ -370,6 +394,7 @@ async function openMeetWithBrowserProxy(params: {
|
||||
runtime: PluginRuntime;
|
||||
nodeId: string;
|
||||
config: GoogleMeetConfig;
|
||||
mode: "realtime" | "transcribe";
|
||||
url: string;
|
||||
}): Promise<{ launched: boolean; browser?: GoogleMeetChromeHealth }> {
|
||||
return await openMeetWithBrowserRequest({
|
||||
@@ -383,6 +408,7 @@ async function openMeetWithBrowserProxy(params: {
|
||||
timeoutMs: request.timeoutMs,
|
||||
}),
|
||||
config: params.config,
|
||||
mode: params.mode,
|
||||
url: params.url,
|
||||
});
|
||||
}
|
||||
@@ -390,6 +416,7 @@ async function openMeetWithBrowserProxy(params: {
|
||||
async function openMeetWithBrowserRequest(params: {
|
||||
callBrowser: BrowserRequestCaller;
|
||||
config: GoogleMeetConfig;
|
||||
mode: "realtime" | "transcribe";
|
||||
url: string;
|
||||
}): Promise<{ launched: boolean; browser?: GoogleMeetChromeHealth }> {
|
||||
if (!params.config.chrome.launch) {
|
||||
@@ -442,6 +469,7 @@ async function openMeetWithBrowserRequest(params: {
|
||||
}
|
||||
|
||||
const permissionNotes = await grantMeetMediaPermissions({
|
||||
allowMicrophone: params.mode === "realtime",
|
||||
callBrowser: params.callBrowser,
|
||||
timeoutMs,
|
||||
});
|
||||
@@ -461,6 +489,7 @@ async function openMeetWithBrowserRequest(params: {
|
||||
kind: "evaluate",
|
||||
targetId,
|
||||
fn: meetStatusScript({
|
||||
allowMicrophone: params.mode === "realtime",
|
||||
guestName: params.config.chrome.guestName,
|
||||
autoJoin: params.config.chrome.autoJoin,
|
||||
}),
|
||||
@@ -526,6 +555,7 @@ async function inspectRecoverableMeetTab(params: {
|
||||
timeoutMs: Math.min(params.timeoutMs, 5_000),
|
||||
});
|
||||
const permissionNotes = await grantMeetMediaPermissions({
|
||||
allowMicrophone: true,
|
||||
callBrowser: params.callBrowser,
|
||||
timeoutMs: params.timeoutMs,
|
||||
});
|
||||
@@ -536,6 +566,7 @@ async function inspectRecoverableMeetTab(params: {
|
||||
kind: "evaluate",
|
||||
targetId: params.targetId,
|
||||
fn: meetStatusScript({
|
||||
allowMicrophone: true,
|
||||
guestName: params.config.chrome.guestName,
|
||||
autoJoin: false,
|
||||
}),
|
||||
@@ -714,6 +745,7 @@ export async function launchChromeMeetOnNode(params: {
|
||||
runtime: params.runtime,
|
||||
nodeId,
|
||||
config: params.config,
|
||||
mode: params.mode,
|
||||
url: params.url,
|
||||
});
|
||||
const raw = await params.runtime.nodes.invoke({
|
||||
|
||||
Reference in New Issue
Block a user