fix(agents): forward explicit per-run timeout to LLM idle watchdog (#79426)

Merged via squash.

Prepared head SHA: 0e6cf9b4d5
Co-authored-by: legolaz8451 <18042830+legolaz8451@users.noreply.github.com>
Co-authored-by: joshavant <830519+joshavant@users.noreply.github.com>
Reviewed-by: @joshavant
This commit is contained in:
Eduardo Buitrago
2026-05-14 10:24:01 +02:00
committed by GitHub
parent 8717525fbc
commit 336ba2a2b3
7 changed files with 217 additions and 60 deletions

View File

@@ -189,6 +189,7 @@ Docs: https://docs.openclaw.ai
- Gateway: throttle assistant/thinking agent event fanout during streaming bursts without dropping buffered deltas. (#80335) Thanks @samzong.
- Codex app-server: keep the short post-tool completion watchdog armed across dynamic tool completion bookkeeping so embedded Codex runs fail fast and release their session lane when Codex goes quiet after a tool result. (#81697) Thanks @mbelinky.
- Models: restore authenticated CLI runtime providers in the `/models` picker while keeping legacy runtime aliases hidden from setup/default model choices. Closes #81212. (#81239) Thanks @anagnorisis2peripeteia.
- Agents/cron: honor a cron payload's explicit `timeoutSeconds` for the LLM idle watchdog even when it numerically equals `agents.defaults.timeoutSeconds`, preserving explicit per-run timeout intent and preventing stalled streaming replies from being cut to the implicit 120s cap. (#79426) Thanks @legolaz8451.
### Changes
@@ -713,62 +714,6 @@ Docs: https://docs.openclaw.ai
### Fixes
- Models/auth: keep `agents.defaults.model` when `openclaw models auth login` runs without `--set-default`, so provider onboarding patches add models without silently switching the primary. Fixes #78162. (#78241) Thanks @neeravmakwana.
- Control UI/chat: localize the remaining chat welcome, composer, run-control, session/model/thinking selector, and zh-CN Skills labels through the Control UI i18n pipeline so non-English browser locales no longer see those chat controls in English. Fixes #79937. Thanks @BunsDev.
- Control UI: surface browser-blocked WebSocket security failures with wss:// and loopback dashboard guidance instead of leaving the connection on a dead security error. Thanks @BunsDev.
- Gateway/diagnostics: keep active-only transient event-loop max-delay samples as info-level stability telemetry instead of warning-level liveness diagnostics. Thanks @BunsDev.
- Google/Gemini: default new API-key onboarding to stable `google/gemini-2.5-flash` instead of the preview Pro route, reducing surprise daily quota exhaustion. Fixes #79670. Thanks @HugeBunny.
- Amazon Bedrock: expose Claude thinking profiles through the lightweight provider policy surface so `/think:adaptive` validates before the Bedrock runtime plugin is loaded. Fixes #79754. Thanks @phoenixyy and @hclsys.
- Codex/transcripts: mirror dynamic tool calls and outputs into Codex app-server transcripts so tool activity is visible alongside assistant text instead of being elided, with per-item output capped at 12,000 characters. (#79952) Thanks @scoootscooob.
- Memory: close temp SQLite handles before failed atomic reindex cleanup and retry Windows EBUSY/EPERM/EACCES temp file removals, so `memory index --force` does not abort or leave temp sidecars on locked filesystems. Fixes #79708. Thanks @LobsterFarmerAmp and @hclsys.
- Agents/CLI: add an explicit `reseedFromRawTranscriptWhenUncompacted` backend opt-in so safe invalidated CLI sessions can reseed from a bounded raw OpenClaw transcript tail before compaction while auth-boundary resets remain no-raw. Fixes #79713. (#79764) Thanks @hclsys.
- Agents/CLI: handle resumed CLI JSONL output and bound supervisor output buffering so resumed runs stay readable without letting noisy child output grow unbounded.
- Codex app-server: honor per-call `timeoutMs`, configured `image_generate` timeouts, and media image-understanding timeouts for dynamic tool calls, capped at 600000 ms, so slow image generation and image analysis no longer fail at the 30s bridge default. Fixes #79810. Thanks @omarshahine.
- Agents/sandbox: include the container workspace path hint in sandbox-root escape errors while preserving shortened host workspace roots. Fixes #79712. Thanks @haumanto and @hclsys.
- macOS/device pairing: let the native app read CLI PEM device identities and let the TypeScript loader migrate legacy Swift raw-key identities without generating a new device id, preventing repeated pairing prompts when `OPENCLAW_STATE_DIR` is shared. Fixes #76815. Thanks @BunsDev.
- Image generation: honor configured web-fetch SSRF policy across OpenAI, Google, MiniMax, OpenRouter, and Vydra provider requests so RFC2544 fake-IP proxy opt-ins reach generation calls. Fixes #79716. (#79765) Thanks @hclsys.
- Telegram: persist reply-chain message cache records as a compact append log instead of rewriting the full cache on every inbound message, reducing large-group turn latency.
- Telegram/CLI-backend: mirror outbound replies to the session transcript so CLI-backend agent responses create `.jsonl` session files, preventing `sessionId=unknown` on subsequent runs. Fixes #75991.
- Gateway/nodes: allow approved chat-channel macOS node exec replays to cross transient agent WebSocket reconnects only when node, agent session, and channel target metadata still match, restoring Telegram/WeCom host=node approvals without opening a general backend replay bypass. Fixes #77656. Thanks @BunsDev.
- QQBot: route gateway WebSocket connections through the ambient proxy agent so deployments with `https_proxy`, `HTTPS_PROXY`, or `HTTP_PROXY` can reach the QQ gateway. (#72961) Thanks @xialonglee.
- Agents/subagents: treat `sessions_spawn` `model: "default"` as the default-model fallback and ignore ACP-only stream targets for native sub-agent spawns. Fixes #72078. (#72101) Thanks @xialonglee.
- Agents/failover: stop retrying assistant-prefill format rejections across auth profiles or model fallbacks, surfacing the deterministic provider error instead of requeueing the lane. Fixes #79688. (#79728) Thanks @hclsys.
- Google/Gemini: resolve missing Gemini 3 Flash catalog rows through the Google provider template path so image-capable media-understanding models keep `input: ["text", "image"]` instead of falling back to text-only metadata. Fixes #79750. (#79759) Thanks @fenglanhua and @hclsys.
- Memory/QMD: warn with a manual stale collection removal hint when QMD reports a path/pattern conflict but `collection list` lacks verifiable metadata, avoiding unsafe stderr-only rebinds. Refs #71783. (#72297) Thanks @MonkeyLeeT.
- Models/auth: make `openclaw models status --check` and dashboard auth health honor effective auth profile order while keeping stale profiles visible. (#79685) Thanks @nimbleenigma.
- Agents/failover: classify bare `stream_read_error` streaming failures as transient timeouts so configured model fallback runs instead of surfacing the raw transport error. Fixes #79689. (#79692) Thanks @hekunwang.
- Agents/failover: persist overloaded auth-profile cooldown marks before exhausted fallback summaries surface, so immediate fallback retries honor the recorded cooldown state.
- Docs/Subagents: correct the listed sub-agent bootstrap context files to include `SOUL.md`, `IDENTITY.md`, and `USER.md`. (#79470) Thanks @lastguru-net.
- Backup: keep live backup archives from copying current agent session transcripts, cron run logs, and delivery queues while preserving workspace lock/temp files and keeping `--json` output parseable when volatile files are skipped. Fixes #72249. (#72251) Thanks @abnershang.
- Backup: place the temp manifest outside every backed-up asset so `backup create --verify` still passes when `TMPDIR` resolves inside a source path (for example `~/.openclaw/tmp`), avoiding the duplicate root manifest that otherwise tripped `Expected exactly one backup manifest entry, found 2`. Fixes #75007. Thanks @YaanFPV.
- OpenAI/Codex: install the Codex runtime plugin from npm during OpenAI onboarding and load it automatically for implicit OpenAI model routes, while preserving manual PI runtime overrides. Fixes #79358.
- OpenAI/realtime voice: defer `response.create` while a realtime response is still active, retry after `response.done`/`response.cancelled`, and align GA input transcription/noise-reduction defaults with the Codex realtime reference so Discord/Voice Call consult results can resume speaking instead of tripping the active-response race.
- OpenAI/realtime voice: avoid duplicate barge-in cancellation requests, log realtime model interruption/cutoff events in Discord voice logs, and treat OpenAI's no-active-response cancellation reply as a completed cancel so Discord voice sessions do not wedge pending speech after fast interruptions.
- Agents/runtime: strip trailing assistant prefill for Claude-family OpenAI Responses routes, persist prompt/assistant profile cooldown marks before fallback, and show the configured container root in sandbox escape diagnostics. Fixes #79688 and #79712. Thanks @stainlu and @mushuiyu886.
- Gateway: avoid false degraded event-loop health during rapid health/readiness/status probes unless sustained load has delay co-evidence, while keeping hard delay detection immediate. (#77028) Thanks @rubencu.
- Markdown: keep blockquote spans off trailing paragraph separators. Fixes #79646.
- Plugin SDK/LM Studio: recover Harmony plain-text tool calls from LM Studio streams. Fixes #78326.
- Control UI: refresh the model cache after `session_status(model=...)` changes a session model. Fixes #79613.
- Agents/context-engine: share loop-hook checkpoints with the after-turn finalizer so messages are not replayed. Fixes #79630.
- Codex app-server: keep native hook relays alive for long-running turns so shell and file approvals stay reachable until the configured run window finishes. (#77533) Thanks @rubencu.
- Gateway/macOS: clear ignored SIGUSR1 restart state, skip redundant package-update restarts when the refreshed LaunchAgent already serves the expected version, and give launchd a 10s throttle plus 20s shutdown window so update restarts do not leave old gateways alive or fight supervisor recovery. Fixes #79577; refs #78699 and #60885. Thanks @BunsDev.
- Status/Codex: route Codex-harness `openai/*` usage through the OpenAI Codex quota provider and scope CLI status usage to the default agent auth store so `/status` and `openclaw status --usage` show Codex quota windows again. Fixes #79312. Thanks @keshavbotagent.
- Matrix: keep joined strict DM rooms discoverable when stale `m.direct` mappings already point at an older strict room, and let `dm.sessionScope: "per-room"` promote safe unmapped strict rooms through the existing unnamed/unaliased room gate. Fixes #79514. Thanks @stainlu.
- Gateway/agent: pass the session-key agent id into inline image attachment validation so the first image in a fresh per-agent session uses the agent's vision-capable model override instead of the text-only system default. Fixes #79407. Thanks @pandadev66.
- Gateway/maintenance: prune dedupe overflow against a stable excess count and keep active agent retries from starting duplicate runs after cache eviction. (#73841) Thanks @thesomewhatyou.
- Control UI/subagents: suppress internal `subagent_announce` handoff prompts from requester transcripts and hide legacy inter-session wrapper rows so completed subagent results no longer surface runtime context in WebChat history. (#79618) Thanks @joshavant.
- Discord: preserve username target resolution for Discord outbound sends. (#79076) Thanks @vincentkoc.
- Gateway/sessions: rotate generated transcript paths when gateway sessions reset, complementing the daily-rollover transcript persistence. (#79076) Thanks @vincentkoc.
- Dependencies: pin the transitive `fast-uri` production dependency to `3.1.2` so the production dependency audit no longer resolves the vulnerable `<=3.1.1` range. Thanks @shakkernerd.
- Plugins/install: fail managed npm plugin installs when OpenClaw cannot repair a required plugin-local `node_modules/openclaw` peer link, preventing that peer-link failure mode from producing unusable `@openclaw/codex` installs. Refs #79462. Thanks @ai-hpc.
- xAI/tools: register and execute `x_search` and `code_execution` when the xAI API key comes from an auth profile, keeping the plugin tool gate aligned with `openclaw onboard --auth-choice xai-api-key`. Fixes #79353. Thanks @dbernaltbn.
- Cron/agents: recognize same-target `edit``write` recovery in `isSameToolMutationAction`, so a successful `write` to a path clears an earlier failed `edit` on the same path. Stops cron from reporting fatal failures when an agent self-heals across `edit` and `write`, while preserving same-tool fingerprint matching, blocking different-target writes, and excluding tools (including `apply_patch`) whose real call args do not produce a stable `path` fingerprint segment. Fixes #79024. Thanks @RenzoMXD.
- Gateway/Tailscale: add opt-in `gateway.tailscale.preserveFunnel` so when `tailscale.mode = "serve"` and an externally configured Tailscale Funnel route already covers the gateway port, OpenClaw skips re-applying `tailscale serve` on startup and skips the `resetOnExit` teardown for that run, keeping operator-managed Funnel exposure alive across gateway restarts. Fixes #57241. Thanks @RenzoMXD.
- CLI/router: when `openclaw <name>` does not match a CLI subcommand, check plugin tool manifests first so names like `lcm_recent` get an agent-tool diagnostic instead of the misleading suggestion to add the tool name to `plugins.allow`. Fixes #77214. Thanks @100yenadmin.
- QA-lab/parity: bump the live mock-openai parity baseline from `claude-opus-4-6`/`claude-sonnet-4-6` to `claude-opus-4-7`/`claude-sonnet-4-7` and the candidate alt from `gpt-5.4-alt` to `gpt-5.5-alt` in `openclaw-release-checks.yml` and `qa-live-transports-convex.yml`, matching the active Opus 4.7 / GPT-5.5 defaults already used elsewhere on main. Carries forward the surface-bump portion of #74290. Thanks @100yenadmin.
- QA-lab/scenarios: raise the `approval-turn-tool-followthrough` per-turn fallback timeouts from 20s/30s to 60s so cold mock-gateway parity runs do not flake on the approval-turn chain. Carries forward the timeout-bump portion of #74290. Thanks @100yenadmin.
- Gateway/restart continuation: treat routed post-reboot agent turns as trusted internal continuations while preserving the original Telegram topic route, and retry briefly when the previous run is still shutting down, so owner-only tools remain available for chained restart workflows after reboot.
- MS Teams: normalize pre-thread-qualified route session keys before deriving channel-thread lanes so cached route reuse cannot create malformed mixed `:thread:OLD:thread:NEW` sessions. Fixes #66771. (#78850) Thanks @harrisali0101.
- Agents/compaction: keep the recent tail after manual `/compact` when Pi returns an empty or no-op compaction summary, preventing blank checkpoints from replacing the live context.
- Native commands: handle slash commands before workspace and agent-reply bootstrap so Telegram `/status` and other command-only native replies do not wait behind full agent turn setup.
- Telegram/groups: include the recent local chat window and nearby reply-target window as generic inbound context so stale reply ancestry does not overshadow the live group conversation.
@@ -786,6 +731,7 @@ Docs: https://docs.openclaw.ai
- Gateway/macOS: `repairLaunchAgentBootstrap` no longer kickstarts an already-running LaunchAgent, preventing unnecessary service restarts and session disconnects when repair runs against a healthy gateway. Fixes #77428. Thanks @ramitrkar-hash.
- Gateway/macOS: `openclaw gateway stop --disable` now persists the LaunchAgent disable bit even after a previous bootout left the service not loaded, keeping the explicit stay-down path reliable. (#78412) Thanks @wdeveloper16.
- CLI/status: keep lean `openclaw status --json` off manifest-backed channel discovery so configured-channel checks do not repeatedly rescan plugin metadata. Fixes #79129.
- Gateway/Tailscale: add opt-in `gateway.tailscale.preserveFunnel` so when `tailscale.mode = "serve"` and an externally configured Tailscale Funnel route already covers the gateway port, OpenClaw skips re-applying `tailscale serve` on startup and skips the `resetOnExit` teardown for that run, keeping operator-managed Funnel exposure alive across gateway restarts. Fixes #57241. Thanks @RenzoMXD.
- Control UI/chat: hide retired and non-public Google Gemini model IDs from chat model catalogs and route the bare `gemini-3-pro` alias to Gemini 3.1 Pro Preview instead of the shut-down Gemini 3 Pro Preview. Thanks @BunsDev.
- CLI/infer: canonicalize case-only catalog model refs in `infer model run --model` so mixed-case provider/model strings resolve to the canonical catalog entry instead of failing with `Unknown model`. (#78940) Thanks @ai-hpc.
- CLI/infer: allow explicit local `infer model run --model <provider/model>` probes to use exact bundled static catalog rows before the provider is written to config, surfacing missing credentials as auth errors instead of `Unknown model`.

View File

@@ -1340,6 +1340,7 @@ export async function runEmbeddedPiAgent(
execOverrides: params.execOverrides,
bashElevated: params.bashElevated,
timeoutMs: params.timeoutMs,
runTimeoutOverrideMs: params.runTimeoutOverrideMs,
runId: params.runId,
abortSignal: attemptAbortController.signal,
replyOperation: params.replyOperation,

View File

@@ -2388,14 +2388,23 @@ export async function runEmbeddedAttempt(
let idleTimeoutTrigger: ((error: Error) => void) | undefined;
// Wrap stream with idle timeout detection
// Wrap stream with idle timeout detection.
//
// Prefer the caller's explicit `runTimeoutOverrideMs` when provided —
// it carries the "this run was launched with a deliberate per-run
// timeout" signal without losing it when the value numerically equals
// `agents.defaults.timeoutSeconds`. Fall back to the value-equality
// heuristic for callers that haven't been migrated to plumb the flag.
const configuredRunTimeoutMs = resolveAgentTimeoutMs({
cfg: params.config,
});
const resolvedRunTimeoutMs =
params.runTimeoutOverrideMs ??
(params.timeoutMs !== configuredRunTimeoutMs ? params.timeoutMs : undefined);
const idleTimeoutMs = resolveLlmIdleTimeoutMs({
cfg: params.config,
trigger: params.trigger,
runTimeoutMs: params.timeoutMs !== configuredRunTimeoutMs ? params.timeoutMs : undefined,
runTimeoutMs: resolvedRunTimeoutMs,
modelRequestTimeoutMs: (params.model as { requestTimeoutMs?: number }).requestTimeoutMs,
model: params.model as { baseUrl?: string },
});

View File

@@ -149,6 +149,16 @@ export type RunEmbeddedPiAgentParams = {
>;
bashElevated?: ExecElevatedDefaults;
timeoutMs: number;
/**
* Explicit per-run timeout override, in milliseconds, when the caller knows
* the run was launched with a deliberate per-run value (e.g. a cron payload's
* `timeoutSeconds`) rather than inheriting `agents.defaults.timeoutSeconds`.
* When set, the LLM idle watchdog honors this value directly instead of
* inferring "explicitness" from `timeoutMs !== agents.defaults.timeoutSeconds`,
* which fails when the explicit value happens to numerically equal the agent
* default.
*/
runTimeoutOverrideMs?: number;
runId: string;
abortSignal?: AbortSignal;
onExecutionStarted?: () => void;

View File

@@ -0,0 +1,163 @@
import "./isolated-agent.mocks.js";
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import { clearAllBootstrapSnapshots } from "../agents/bootstrap-cache.js";
import { runEmbeddedPiAgent } from "../agents/pi-embedded.js";
import { clearSessionStoreCacheForTest } from "../config/sessions/store.js";
import { resetAgentRunContextForTest } from "../infra/agent-events.js";
import { createCliDeps, mockAgentPayloads } from "./isolated-agent.delivery.test-helpers.js";
import { runCronIsolatedAgentTurn } from "./isolated-agent.js";
import {
makeCfg,
makeJob,
withTempCronHome,
writeSessionStoreEntries,
} from "./isolated-agent.test-harness.js";
function lastEmbeddedCall(): { runTimeoutOverrideMs?: number; timeoutMs?: number } {
const calls = vi.mocked(runEmbeddedPiAgent).mock.calls;
expect(calls.length).toBeGreaterThan(0);
return calls.at(-1)?.[0] as { runTimeoutOverrideMs?: number; timeoutMs?: number };
}
const envSnapshot = {
HOME: process.env.HOME,
USERPROFILE: process.env.USERPROFILE,
HOMEDRIVE: process.env.HOMEDRIVE,
HOMEPATH: process.env.HOMEPATH,
OPENCLAW_HOME: process.env.OPENCLAW_HOME,
OPENCLAW_STATE_DIR: process.env.OPENCLAW_STATE_DIR,
} as const;
function restoreSnapshotEnv() {
for (const [key, value] of Object.entries(envSnapshot)) {
if (value === undefined) {
delete process.env[key];
} else {
process.env[key] = value;
}
}
}
describe("runCronIsolatedAgentTurn — explicit per-run timeout signal", () => {
beforeEach(() => {
vi.mocked(runEmbeddedPiAgent).mockClear();
});
afterEach(() => {
restoreSnapshotEnv();
vi.doUnmock("../agents/pi-embedded.js");
vi.doUnmock("../agents/model-catalog.js");
vi.doUnmock("../agents/model-selection.js");
vi.doUnmock("../agents/subagent-announce.js");
vi.doUnmock("../gateway/call.js");
clearSessionStoreCacheForTest();
resetAgentRunContextForTest();
clearAllBootstrapSnapshots();
vi.restoreAllMocks();
vi.resetModules();
});
// Regression: when a cron job's payload `timeoutSeconds` numerically equals
// `agents.defaults.timeoutSeconds`, the run is still an *explicit* per-run
// override. The embedded runner used to detect "explicit" by comparing
// `params.timeoutMs !== resolveAgentTimeoutMs({cfg})` — which collapses to
// `false` in this case, stripping the runTimeoutMs signal and letting the
// LLM idle watchdog fall back to the implicit 120s cap.
// Fix: forward `runTimeoutOverrideMs` from the cron entry point so the
// explicit-vs-default distinction survives the merge into `timeoutMs`.
it("forwards runTimeoutOverrideMs when payload.timeoutSeconds equals the agent default", async () => {
await withTempCronHome(async (home) => {
const storePath = await writeSessionStoreEntries(home, {
"agent:main:main": {
sessionId: "main-session",
updatedAt: Date.now(),
lastProvider: "webchat",
lastTo: "",
},
});
mockAgentPayloads([{ text: "ok" }]);
const cfg = makeCfg(home, storePath, {
agents: { defaults: { timeoutSeconds: 300 } },
});
await runCronIsolatedAgentTurn({
cfg,
deps: createCliDeps(),
job: {
...makeJob({ kind: "agentTurn", message: "do it", timeoutSeconds: 300 }),
delivery: { mode: "none" },
},
message: "do it",
sessionKey: "cron:job-1",
});
const call = lastEmbeddedCall();
expect(call.runTimeoutOverrideMs).toBe(300_000);
});
});
it("forwards runTimeoutOverrideMs when payload.timeoutSeconds differs from the agent default", async () => {
await withTempCronHome(async (home) => {
const storePath = await writeSessionStoreEntries(home, {
"agent:main:main": {
sessionId: "main-session",
updatedAt: Date.now(),
lastProvider: "webchat",
lastTo: "",
},
});
mockAgentPayloads([{ text: "ok" }]);
const cfg = makeCfg(home, storePath, {
agents: { defaults: { timeoutSeconds: 300 } },
});
await runCronIsolatedAgentTurn({
cfg,
deps: createCliDeps(),
job: {
...makeJob({ kind: "agentTurn", message: "do it", timeoutSeconds: 600 }),
delivery: { mode: "none" },
},
message: "do it",
sessionKey: "cron:job-1",
});
const call = lastEmbeddedCall();
expect(call.runTimeoutOverrideMs).toBe(600_000);
});
});
it("leaves runTimeoutOverrideMs undefined when payload omits timeoutSeconds", async () => {
await withTempCronHome(async (home) => {
const storePath = await writeSessionStoreEntries(home, {
"agent:main:main": {
sessionId: "main-session",
updatedAt: Date.now(),
lastProvider: "webchat",
lastTo: "",
},
});
mockAgentPayloads([{ text: "ok" }]);
const cfg = makeCfg(home, storePath, {
agents: { defaults: { timeoutSeconds: 300 } },
});
await runCronIsolatedAgentTurn({
cfg,
deps: createCliDeps(),
job: {
...makeJob({ kind: "agentTurn", message: "do it" }),
delivery: { mode: "none" },
},
message: "do it",
sessionKey: "cron:job-1",
});
const call = lastEmbeddedCall();
expect(call.runTimeoutOverrideMs).toBeUndefined();
});
});
});

View File

@@ -82,6 +82,8 @@ export function createCronPromptExecutor(params: {
resolvedVerboseLevel: VerboseLevel;
thinkLevel: ThinkLevel | undefined;
timeoutMs: number;
/** Set when the cron payload's `timeoutSeconds` was explicitly configured. */
runTimeoutOverrideMs?: number;
senderIsOwner: boolean;
messageChannel: string | undefined;
suppressExecNotifyOnExit: boolean;
@@ -231,6 +233,7 @@ export function createCronPromptExecutor(params: {
}).enabled,
verboseLevel: params.resolvedVerboseLevel,
timeoutMs: params.timeoutMs,
runTimeoutOverrideMs: params.runTimeoutOverrideMs,
bootstrapContextMode: params.agentPayload?.lightContext ? "lightweight" : undefined,
bootstrapContextRunKind: "cron",
toolsAllow: params.agentPayload?.toolsAllow,
@@ -315,6 +318,8 @@ export async function executeCronRun(params: {
) => void;
thinkLevel: ThinkLevel | undefined;
timeoutMs: number;
/** Set when the cron payload's `timeoutSeconds` was explicitly configured. */
runTimeoutOverrideMs?: number;
senderIsOwner: boolean;
suppressExecNotifyOnExit: boolean;
runStartedAt?: number;
@@ -340,6 +345,7 @@ export async function executeCronRun(params: {
resolvedVerboseLevel,
thinkLevel: params.thinkLevel,
timeoutMs: params.timeoutMs,
runTimeoutOverrideMs: params.runTimeoutOverrideMs,
messageChannel: params.resolvedDelivery.channel,
suppressExecNotifyOnExit: params.suppressExecNotifyOnExit,
resolvedDelivery: params.resolvedDelivery,

View File

@@ -468,6 +468,13 @@ type PreparedCronRunContext = {
liveSelection: CronLiveSelection;
thinkLevel: ThinkLevel | undefined;
timeoutMs: number;
/**
* Set when the cron payload's `timeoutSeconds` was explicitly configured
* for this run (independent of whether its numeric value happens to equal
* `agents.defaults.timeoutSeconds`). Forwarded to the embedded runner so
* the LLM idle watchdog can honor the cron's per-run choice.
*/
runTimeoutOverrideMs?: number;
};
type CronPreparationResult =
@@ -650,11 +657,24 @@ async function prepareCronRunContext(params: {
}
}
const explicitTimeoutSeconds =
input.job.payload.kind === "agentTurn" ? input.job.payload.timeoutSeconds : undefined;
const timeoutMs = resolveAgentTimeoutMs({
cfg: cfgWithAgentDefaults,
overrideSeconds:
input.job.payload.kind === "agentTurn" ? input.job.payload.timeoutSeconds : undefined,
overrideSeconds: explicitTimeoutSeconds,
});
// Carry the "this run had an explicit per-run timeout" signal forward.
// `resolveAgentTimeoutMs` collapses overrideSeconds + the agent default into
// one number; the LLM idle watchdog at the embedded-runner attempt loses the
// explicit-vs-default distinction without this companion field, which would
// otherwise force the implicit 120 s cap whenever the cron payload's
// `timeoutSeconds` happens to numerically equal `agents.defaults.timeoutSeconds`.
const runTimeoutOverrideMs =
typeof explicitTimeoutSeconds === "number" &&
Number.isFinite(explicitTimeoutSeconds) &&
explicitTimeoutSeconds > 0
? explicitTimeoutSeconds * 1000
: undefined;
const agentPayload = input.job.payload.kind === "agentTurn" ? input.job.payload : null;
const { deliveryPlan, deliveryRequested, resolvedDelivery, toolPolicy } =
await resolveCronDeliveryContext({
@@ -799,6 +819,7 @@ async function prepareCronRunContext(params: {
liveSelection,
thinkLevel,
timeoutMs,
runTimeoutOverrideMs,
},
};
}
@@ -1146,6 +1167,7 @@ export async function runCronIsolatedAgentTurn(params: {
isAborted,
thinkLevel: prepared.context.thinkLevel,
timeoutMs: prepared.context.timeoutMs,
runTimeoutOverrideMs: prepared.context.runTimeoutOverrideMs,
suppressExecNotifyOnExit: prepared.context.suppressExecNotifyOnExit,
senderIsOwner: prepared.context.senderIsOwner,
});