fix(gateway): persist hidden lifecycle session keys (#74442)

* Prevent hidden channel lifecycle runs from staying stuck as running

Hidden channel-routed runs were dropping session keys on lifecycle events at
our shared agent-event bus. Gateway lifecycle persistence then had to rely on
run-context lookup surviving until the terminal event, which is unnecessarily
fragile for the exact sessions that are intentionally hidden from Control UI.

This keeps session keys on hidden lifecycle events only, preserving the existing
privacy boundary for assistant/tool traffic while making terminal session-state
persistence explicit and test-covered.

Constraint: Hidden channel runs must stay out of Control UI chat/tool streams
Rejected: Broaden sessionKey preservation to every hidden event | would expose more hidden traffic than needed
Confidence: medium
Scope-risk: narrow
Reversibility: clean
Directive: If hidden-run event redaction changes again, keep lifecycle persistence independent from ephemeral run-context lookup
Tested: pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/infra/agent-events.ts src/infra/agent-events.test.ts; pnpm tsgo:core; pnpm tsgo:extensions; pnpm tsgo:core:test; pnpm tsgo:extensions:test; pnpm test src/infra/agent-events.test.ts; pnpm test src/gateway/server-chat.agent-events.test.ts; pnpm test src/gateway/session-lifecycle-state.test.ts; pnpm lint:extensions:bundled; codex exec review returned ship it
Not-tested: Live gateway reproduction against Knox's local stuck-session install

* Clarify hidden lifecycle redaction and cover context fallback

The follow-up review asked for two things: document why the separate error
stream stays redacted for hidden runs, and cover the registered-context fallback
branch for hidden lifecycle events when callers omit sessionKey.

Constraint: Hidden assistant/tool/error diagnostics must remain redacted from Control UI
Rejected: Preserve sessionKey on the generic error stream | terminal persistence already flows through lifecycle phase:error, so widening the visible identity surface is unnecessary
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep hidden-run identity exceptions tightly scoped to terminal lifecycle persistence unless a concrete downstream consumer requires more
Tested: pnpm exec oxfmt --write --threads=1 src/infra/agent-events.ts src/infra/agent-events.test.ts; pnpm test src/infra/agent-events.test.ts; pnpm test src/gateway/server-chat.agent-events.test.ts; pnpm test src/gateway/session-lifecycle-state.test.ts
Not-tested: Full repo gate rerun; previous branch-wide gates remain from the parent PR commit

* fix(gateway): keep hidden agent broadcasts redacted

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
This commit is contained in:
Cathryn Lavery
2026-04-29 11:03:10 -05:00
committed by GitHub
parent 58db3d2d22
commit 763a88083e
5 changed files with 68 additions and 7 deletions

View File

@@ -125,6 +125,7 @@ Docs: https://docs.openclaw.ai
- Gateway/auth status: scope external CLI credential overlays to configured providers, runtimes, or profiles and keep status reads off new Keychain prompts, so single-provider Gateway configs no longer probe unrelated Claude/Codex/MiniMax auth on startup. Fixes #73908. Thanks @Ailuras.
- Agents/runtime status: expose effective agent runtime metadata in `agents.list`, Control UI agent panels, and `/agents`, and avoid rendering stale or cumulative CLI token totals as live context usage. Fixes #73660, #73578, and #45268. Thanks @spartman, @DashLabsDev, and @xyooz.
- Agents/transcripts: strip empty assistant text blocks while preserving valid text, images, and signatures, so Anthropic-style providers no longer reject sanitized transcript turns. Fixes #73640. Thanks @jowhee327.
- Gateway/sessions: preserve session keys on hidden lifecycle events so channel-routed runs still persist terminal session state and do not strand session status as running after Codex turn completion. Thanks @cathrynlavery.
- Providers/Bedrock: omit deprecated `temperature` for Claude Opus 4.7 Bedrock model ids, named and application inference profiles, including dotted `opus-4.7` refs, and classify the nested validation response for failover. Fixes #73663. Thanks @bstanbury.
- Gateway: raise the preauth/connect-challenge timeout to 15s so cold CLI starts on slower hosts have more time to process the WebSocket challenge before the Gateway closes the connection. Fixes #51469; refs #73592 and #62060. Thanks @GothicFox and @jackychen-png.
- CLI/status: fall back to a bounded local `status` RPC when loopback detail probes time out or report unknown capability, so reachable local gateways are no longer marked unreachable by slow read diagnostics. Fixes #73535; refs #48360, #62762, #51357, and #42019. Thanks @RacecarGuy, @justinschille, @DJBlackhawk, @tianyaqpzm, and @0xrsydn.

View File

@@ -1397,7 +1397,7 @@ describe("agent event handler", () => {
expect(agentRunSeq.has("run-chat-send")).toBe(false);
});
it("suppresses chat and node session events for non-control-UI-visible runs", () => {
it("suppresses live client events but persists lifecycle for non-control-UI-visible runs", () => {
const { broadcast, nodeSendToSession, handler } = createHarness({
resolveSessionKeyForRun: () => "session-hidden",
});
@@ -1417,7 +1417,15 @@ describe("agent event handler", () => {
emitLifecycleEnd(handler, "run-hidden", 2);
expect(chatBroadcastCalls(broadcast)).toHaveLength(0);
expect(broadcast.mock.calls.filter(([event]) => event === "agent")).toHaveLength(0);
expect(nodeSendToSession).not.toHaveBeenCalled();
expect(persistGatewaySessionLifecycleEventMock).toHaveBeenCalledWith({
sessionKey: "session-hidden",
event: expect.objectContaining({
runId: "run-hidden",
data: expect.objectContaining({ phase: "end" }),
}),
});
});
it("uses agent event sessionKey when run-context lookup cannot resolve", () => {

View File

@@ -622,7 +622,7 @@ export function createAgentEventHandler({
: { ...eventForClients, data };
})()
: agentPayload;
if (last > 0 && evt.seq !== last + 1) {
if (last > 0 && evt.seq !== last + 1 && isControlUiVisible) {
broadcast("agent", {
runId: eventRunId,
stream: "error",
@@ -649,7 +649,7 @@ export function createAgentEventHandler({
// setting only controls whether tool details are sent as channel
// messages to messaging surfaces (Telegram, Discord, etc.).
const recipients = toolEventRecipients.get(evt.runId);
if (recipients && recipients.size > 0) {
if (isControlUiVisible && recipients && recipients.size > 0) {
broadcastToConnIds(
"agent",
sessionKey ? { ...toolPayload, ...buildSessionEventSnapshot(sessionKey) } : toolPayload,
@@ -661,7 +661,7 @@ export function createAgentEventHandler({
// not know the runId in advance, so they cannot register as run-scoped
// tool recipients. Mirror tool lifecycle onto a session-scoped event so
// they can render live pending tool cards without polling history.
if (sessionKey) {
if (isControlUiVisible && sessionKey) {
const sessionSubscribers = sessionEventSubscribers.getAll();
if (sessionSubscribers.size > 0) {
broadcastToConnIds(
@@ -677,7 +677,9 @@ export function createAgentEventHandler({
if (itemPhase === "start" && isControlUiVisible && sessionKey && !isAborted) {
flushBufferedChatDeltaIfNeeded(sessionKey, clientRunId, evt.runId, evt.seq);
}
broadcast("agent", agentPayload);
if (isControlUiVisible) {
broadcast("agent", agentPayload);
}
}
if (isControlUiVisible && sessionKey) {

View File

@@ -75,7 +75,7 @@ describe("agent-events sequencing", () => {
expect(phases).toEqual(["start", "end"]);
});
test("omits sessionKey for runs hidden from Control UI", async () => {
test("omits sessionKey for non-lifecycle runs hidden from Control UI", async () => {
resetAgentRunContextForTest();
registerAgentRunContext("run-hidden", {
sessionKey: "session-quietchat",
@@ -97,6 +97,49 @@ describe("agent-events sequencing", () => {
expect(receivedSessionKey).toBeUndefined();
});
test("preserves sessionKey for lifecycle events hidden from Control UI", async () => {
resetAgentRunContextForTest();
registerAgentRunContext("run-hidden-lifecycle", {
sessionKey: "session-quietchat",
isControlUiVisible: false,
});
let receivedSessionKey: string | undefined;
const stop = onAgentEvent((evt) => {
receivedSessionKey = evt.sessionKey;
});
emitAgentEvent({
runId: "run-hidden-lifecycle",
stream: "lifecycle",
data: { phase: "end" },
sessionKey: "session-quietchat",
});
stop();
expect(receivedSessionKey).toBe("session-quietchat");
});
test("falls back to registered sessionKey for hidden lifecycle events", async () => {
resetAgentRunContextForTest();
registerAgentRunContext("run-hidden-lifecycle-context", {
sessionKey: "session-quietchat-context",
isControlUiVisible: false,
});
let receivedSessionKey: string | undefined;
const stop = onAgentEvent((evt) => {
receivedSessionKey = evt.sessionKey;
});
emitAgentEvent({
runId: "run-hidden-lifecycle-context",
stream: "lifecycle",
data: { phase: "error", error: "boom" },
});
stop();
expect(receivedSessionKey).toBe("session-quietchat-context");
});
test("merges later run context updates into existing runs", async () => {
resetAgentRunContextForTest();
registerAgentRunContext("run-ctx", {

View File

@@ -215,7 +215,14 @@ export function emitAgentEvent(event: Omit<AgentEventPayload, "seq" | "ts">) {
const isControlUiVisible = context?.isControlUiVisible ?? true;
const eventSessionKey =
typeof event.sessionKey === "string" && event.sessionKey.trim() ? event.sessionKey : undefined;
const sessionKey = isControlUiVisible ? (eventSessionKey ?? context?.sessionKey) : undefined;
// Hidden channel-routed runs should not leak live assistant/tool traffic into
// Control UI, but lifecycle events still need the session key so gateway
// listeners can persist terminal session state even if run-context lookup is
// unavailable by the time the terminal event arrives. Terminal failures are
// emitted on the lifecycle stream with `phase: "error"`; the separate error
// stream remains redacted for hidden runs because it is observational only.
const preserveSessionKey = isControlUiVisible || event.stream === "lifecycle";
const sessionKey = preserveSessionKey ? (eventSessionKey ?? context?.sessionKey) : undefined;
const enriched: AgentEventPayload = {
...event,
sessionKey,