fix(codex): time out silent app-server turns

This commit is contained in:
Peter Steinberger
2026-05-15 13:44:45 +01:00
parent 111a17b6e8
commit 1e31bd2ac2
6 changed files with 70 additions and 27 deletions

View File

@@ -13,6 +13,7 @@ Docs: https://docs.openclaw.ai
- Cron: load runtime plugins before isolated cron model and delivery resolution so external channels can be selected for scheduled runs. (#82111) Thanks @medns.
- Twitch: keep gateway accounts running until shutdown instead of treating successful monitor startup as a clean channel exit, preventing immediate auto-restart loops. Fixes #60071. (#81853) Thanks @edenfunf.
- Agents/auto-reply: honor `agents.defaults.silentReply` and per-surface group silent-reply policy when generic agent-run failure fallbacks decide whether to send visible fallback text. Fixes #82060. (#82086) Thanks @taozengabc.
- Codex app-server: arm the short idle watchdog as soon as Codex accepts a turn, so accepted turns with no current-turn progress release the OpenClaw session lane before the outer model timeout. Fixes #82129. Thanks @Francois3d.
- Control UI/WebChat: focus the composer when users click the visible input chrome and restore larger, labeled desktop composer controls while preserving compact mobile taps. Fixes #45656. Thanks @BunsDev.
- System events: keep owner downgrades in structured metadata while rendering queued prompt text as plain `System:` lines, preserving least-privilege wakeups without prompt-visible trust labels. (#82067)
- Slack: default outbound bot link unfurls off so agent-sent URLs no longer expand into inline previews unless `channels.slack.unfurlLinks` is enabled. (#82123) Thanks @kibi-bsp.

View File

@@ -95,7 +95,7 @@ Supported `appServer` fields:
| `headers` | `{}` | Extra WebSocket headers. |
| `clearEnv` | `[]` | Extra environment variable names removed from the spawned stdio app-server process after OpenClaw builds its inherited environment. |
| `requestTimeoutMs` | `60000` | Timeout for app-server control-plane calls. |
| `turnCompletionIdleTimeoutMs` | `60000` | Quiet window after a turn-scoped app-server request while OpenClaw waits for `turn/completed`. |
| `turnCompletionIdleTimeoutMs` | `60000` | Quiet window after Codex accepts a turn or after a turn-scoped app-server request while OpenClaw waits for `turn/completed`. |
| `mode` | `"yolo"` unless local Codex requirements disallow YOLO | Preset for YOLO or guardian-reviewed execution. |
| `approvalPolicy` | `"never"` or an allowed guardian approval policy | Native Codex approval policy sent to thread start, resume, and turn. |
| `sandbox` | `"danger-full-access"` or an allowed guardian sandbox | Native Codex sandbox mode sent to thread start and resume. |
@@ -253,12 +253,13 @@ Dynamic tool budgets are capped at 600000 ms. On timeout, OpenClaw aborts the
tool signal where supported and returns a failed dynamic-tool response to Codex
so the turn can continue instead of leaving the session in `processing`.
After OpenClaw responds to a Codex turn-scoped app-server request, the harness
also expects Codex to finish the native turn with `turn/completed`. If the
app-server goes quiet for `appServer.turnCompletionIdleTimeoutMs` after that
response, OpenClaw best-effort interrupts the Codex turn, records a diagnostic
timeout, and releases the OpenClaw session lane so follow-up chat messages are
not queued behind a stale native turn.
After Codex accepts a turn, and after OpenClaw responds to a turn-scoped
app-server request, the harness expects Codex to make current-turn progress and
eventually finish the native turn with `turn/completed`. If the app-server goes
quiet for `appServer.turnCompletionIdleTimeoutMs`, OpenClaw best-effort
interrupts the Codex turn, records a diagnostic timeout, and releases the
OpenClaw session lane so follow-up chat messages are not queued behind a stale
native turn.
Any non-terminal notification for the same turn, including
`rawResponseItem/completed`, disarms that short watchdog because Codex has

View File

@@ -493,7 +493,7 @@ Supported `appServer` fields:
| `headers` | `{}` | Extra WebSocket headers. |
| `clearEnv` | `[]` | Extra environment variable names removed from the spawned stdio app-server process after OpenClaw builds its inherited environment. OpenClaw keeps per-agent `CODEX_HOME` and inherited `HOME` for local launches. |
| `requestTimeoutMs` | `60000` | Timeout for app-server control-plane calls. |
| `turnCompletionIdleTimeoutMs` | `60000` | Quiet window after a turn-scoped Codex app-server request while OpenClaw waits for `turn/completed`. Raise this for slow post-tool or status-only synthesis phases. |
| `turnCompletionIdleTimeoutMs` | `60000` | Quiet window after Codex accepts a turn or after a turn-scoped app-server request while OpenClaw waits for `turn/completed`. Raise this for slow post-tool or status-only synthesis phases. |
| `mode` | `"yolo"` unless local Codex requirements disallow YOLO | Preset for YOLO or guardian-reviewed execution. Local stdio requirements that omit `danger-full-access`, `never` approval, or the `user` reviewer make the implicit default guardian. |
| `approvalPolicy` | `"never"` or an allowed guardian approval policy | Native Codex approval policy sent to thread start/resume/turn. Guardian defaults prefer `"on-request"` when allowed. |
| `sandbox` | `"danger-full-access"` or an allowed guardian sandbox | Native Codex sandbox mode sent to thread start/resume. Guardian defaults prefer `"workspace-write"` when allowed, otherwise `"read-only"`. When an OpenClaw sandbox is active, `danger-full-access` is narrowed to `"workspace-write"`. |
@@ -511,16 +511,17 @@ budgets are capped at 600000 ms. On timeout, OpenClaw aborts the tool signal
where supported and returns a failed dynamic-tool response to Codex so the turn
can continue instead of leaving the session in `processing`.
After OpenClaw responds to a Codex turn-scoped app-server request, the harness
also expects Codex to finish the native turn with `turn/completed`. If the
app-server goes quiet for `appServer.turnCompletionIdleTimeoutMs` after that
response, OpenClaw best-effort interrupts the Codex turn, records a diagnostic
timeout, and releases the OpenClaw session lane so follow-up chat messages are
not queued behind a stale native turn. Any non-terminal notification for the
same turn, including `rawResponseItem/completed`, disarms that short watchdog
because Codex has proven the turn is still alive; the longer terminal watchdog
continues to protect genuinely stuck turns. Global app-server notifications,
such as rate-limit updates, do not reset turn-idle progress. When Codex emits a
After Codex accepts a turn, and after OpenClaw responds to a turn-scoped
app-server request, the harness expects Codex to make current-turn progress and
eventually finish the native turn with `turn/completed`. If the app-server goes
quiet for `appServer.turnCompletionIdleTimeoutMs`, OpenClaw best-effort
interrupts the Codex turn, records a diagnostic timeout, and releases the
OpenClaw session lane so follow-up chat messages are not queued behind a stale
native turn. Any non-terminal notification for the same turn, including
`rawResponseItem/completed`, disarms that short watchdog because Codex has
proven the turn is still alive; the longer terminal watchdog continues to
protect genuinely stuck turns. Global app-server notifications, such as
rate-limit updates, do not reset turn-idle progress. When Codex emits a
completed `agentMessage` item and then goes quiet without `turn/completed`,
OpenClaw treats the assistant output as effectively complete, best-effort
interrupts the native Codex turn, and releases the session lane. Timeout

View File

@@ -333,7 +333,7 @@
},
"appServer.turnCompletionIdleTimeoutMs": {
"label": "Turn Completion Idle Timeout",
"help": "Maximum quiet time after a turn-scoped Codex app-server request before OpenClaw interrupts the turn while waiting for turn/completed.",
"help": "Maximum quiet time after Codex accepts a turn or after a turn-scoped app-server request before OpenClaw interrupts the turn while waiting for turn/completed.",
"advanced": true
},
"appServer.approvalPolicy": {

View File

@@ -1921,7 +1921,7 @@ describe("runCodexAppServerAttempt", () => {
);
params.timeoutMs = 60_000;
const run = runCodexAppServerAttempt(params, { turnTerminalIdleTimeoutMs: 5 });
const run = runCodexAppServerAttempt(params, { turnCompletionIdleTimeoutMs: 5 });
await harness.waitForMethod("turn/start");
const result = await run;
@@ -1953,7 +1953,7 @@ describe("runCodexAppServerAttempt", () => {
);
params.timeoutMs = 200;
const run = runCodexAppServerAttempt(params, { turnTerminalIdleTimeoutMs: 15 });
const run = runCodexAppServerAttempt(params, { turnCompletionIdleTimeoutMs: 15 });
await harness.waitForMethod("turn/start");
await harness.notify(rateLimitsUpdated(Date.now() + 60_000));
await new Promise((resolve) => setTimeout(resolve, 20));
@@ -3842,6 +3842,45 @@ describe("runCodexAppServerAttempt", () => {
expect(result.timedOut).toBe(false);
});
it("does not time out when turn progress arrives before turn/start returns", async () => {
let harness: ReturnType<typeof createAppServerHarness>;
harness = createAppServerHarness(async (method) => {
if (method === "thread/start") {
return threadStartResult();
}
if (method === "turn/start") {
await harness.notify({
method: "turn/started",
params: {
threadId: "thread-1",
turnId: "turn-1",
turn: { id: "turn-1", status: "inProgress" },
},
});
return turnStartResult("turn-1", "inProgress");
}
return {};
});
const params = createParams(
path.join(tempDir, "session.jsonl"),
path.join(tempDir, "workspace"),
);
params.timeoutMs = 60_000;
const run = runCodexAppServerAttempt(params, {
turnCompletionIdleTimeoutMs: 5,
turnTerminalIdleTimeoutMs: 60_000,
});
await harness.waitForMethod("turn/start");
await new Promise((resolve) => setTimeout(resolve, 20));
expect(harness.request.mock.calls.some(([method]) => method === "turn/interrupt")).toBe(false);
await harness.completeTurn({ threadId: "thread-1", turnId: "turn-1" });
const result = await run;
expect(result.aborted).toBe(false);
expect(result.timedOut).toBe(false);
});
it("completes when turn/start returns a terminal turn without a follow-up notification", async () => {
const harness = createAppServerHarness(async (method) => {
if (method === "thread/start") {

View File

@@ -1283,10 +1283,11 @@ export async function runCodexAppServerAttempt(
activeOpenClawDynamicToolCallIds,
)
) {
// The short completion-idle watchdog only guards the blind gap after
// OpenClaw hands a turn-scoped request result back to Codex. Bookkeeping
// that closes the just-served OpenClaw dynamic tool item is still part of
// that handoff, so keep the short watchdog armed for that notification.
// The short completion-idle watchdog guards blind gaps after Codex
// accepts a turn or after OpenClaw hands a turn-scoped request result
// back to Codex. Bookkeeping that closes the just-served OpenClaw
// dynamic tool item is still part of that handoff, so keep the short
// watchdog armed for that notification.
disarmTurnCompletionIdleWatch();
}
// Determine terminal-turn status before invoking the projector so a throw
@@ -1637,6 +1638,8 @@ export async function runCodexAppServerAttempt(
});
emitLifecycleStart();
const activeProjector = projector;
turnTerminalIdleWatchArmed = true;
touchTurnCompletionActivity("turn:start", { arm: true });
for (const notification of pendingNotifications.splice(0)) {
await enqueueNotification(notification);
}
@@ -1669,8 +1672,6 @@ export async function runCodexAppServerAttempt(
abort: () => runAbortController.abort("aborted"),
};
setActiveEmbeddedRun(params.sessionId, handle, params.sessionKey);
turnTerminalIdleWatchArmed = true;
touchTurnCompletionActivity("turn:start");
const timeout = setTimeout(
() => {