fix(agents): forward explicit per-run timeout to LLM idle watchdog (#79426)

Merged via squash. Prepared head SHA: 0e6cf9b4d5 Co-authored-by: legolaz8451 <18042830+legolaz8451@users.noreply.github.com> Co-authored-by: joshavant <830519+joshavant@users.noreply.github.com> Reviewed-by: @joshavant
2026-05-19 05:24:46 +00:00 · 2026-05-14 10:24:01 +02:00
parent 8717525fbc
commit 336ba2a2b3
7 changed files with 217 additions and 60 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -189,6 +189,7 @@ Docs: https://docs.openclaw.ai
 - Gateway: throttle assistant/thinking agent event fanout during streaming bursts without dropping buffered deltas. (#80335) Thanks @samzong.
 - Codex app-server: keep the short post-tool completion watchdog armed across dynamic tool completion bookkeeping so embedded Codex runs fail fast and release their session lane when Codex goes quiet after a tool result. (#81697) Thanks @mbelinky.
 - Models: restore authenticated CLI runtime providers in the `/models` picker while keeping legacy runtime aliases hidden from setup/default model choices. Closes #81212. (#81239) Thanks @anagnorisis2peripeteia.
+- Agents/cron: honor a cron payload's explicit `timeoutSeconds` for the LLM idle watchdog even when it numerically equals `agents.defaults.timeoutSeconds`, preserving explicit per-run timeout intent and preventing stalled streaming replies from being cut to the implicit 120s cap. (#79426) Thanks @legolaz8451.

 ### Changes

@@ -713,62 +714,6 @@ Docs: https://docs.openclaw.ai

 ### Fixes

- Models/auth: keep `agents.defaults.model` when `openclaw models auth login` runs without `--set-default`, so provider onboarding patches add models without silently switching the primary. Fixes #78162. (#78241) Thanks @neeravmakwana.
- Control UI/chat: localize the remaining chat welcome, composer, run-control, session/model/thinking selector, and zh-CN Skills labels through the Control UI i18n pipeline so non-English browser locales no longer see those chat controls in English. Fixes #79937. Thanks @BunsDev.
- Control UI: surface browser-blocked WebSocket security failures with wss:// and loopback dashboard guidance instead of leaving the connection on a dead security error. Thanks @BunsDev.
- Gateway/diagnostics: keep active-only transient event-loop max-delay samples as info-level stability telemetry instead of warning-level liveness diagnostics. Thanks @BunsDev.
- Google/Gemini: default new API-key onboarding to stable `google/gemini-2.5-flash` instead of the preview Pro route, reducing surprise daily quota exhaustion. Fixes #79670. Thanks @HugeBunny.
- Amazon Bedrock: expose Claude thinking profiles through the lightweight provider policy surface so `/think:adaptive` validates before the Bedrock runtime plugin is loaded. Fixes #79754. Thanks @phoenixyy and @hclsys.
- Codex/transcripts: mirror dynamic tool calls and outputs into Codex app-server transcripts so tool activity is visible alongside assistant text instead of being elided, with per-item output capped at 12,000 characters. (#79952) Thanks @scoootscooob.
- Memory: close temp SQLite handles before failed atomic reindex cleanup and retry Windows EBUSY/EPERM/EACCES temp file removals, so `memory index --force` does not abort or leave temp sidecars on locked filesystems. Fixes #79708. Thanks @LobsterFarmerAmp and @hclsys.
- Agents/CLI: add an explicit `reseedFromRawTranscriptWhenUncompacted` backend opt-in so safe invalidated CLI sessions can reseed from a bounded raw OpenClaw transcript tail before compaction while auth-boundary resets remain no-raw. Fixes #79713. (#79764) Thanks @hclsys.
- Agents/CLI: handle resumed CLI JSONL output and bound supervisor output buffering so resumed runs stay readable without letting noisy child output grow unbounded.
- Codex app-server: honor per-call `timeoutMs`, configured `image_generate` timeouts, and media image-understanding timeouts for dynamic tool calls, capped at 600000 ms, so slow image generation and image analysis no longer fail at the 30s bridge default. Fixes #79810. Thanks @omarshahine.
- Agents/sandbox: include the container workspace path hint in sandbox-root escape errors while preserving shortened host workspace roots. Fixes #79712. Thanks @haumanto and @hclsys.
- macOS/device pairing: let the native app read CLI PEM device identities and let the TypeScript loader migrate legacy Swift raw-key identities without generating a new device id, preventing repeated pairing prompts when `OPENCLAW_STATE_DIR` is shared. Fixes #76815. Thanks @BunsDev.
- Image generation: honor configured web-fetch SSRF policy across OpenAI, Google, MiniMax, OpenRouter, and Vydra provider requests so RFC2544 fake-IP proxy opt-ins reach generation calls. Fixes #79716. (#79765) Thanks @hclsys.
- Telegram: persist reply-chain message cache records as a compact append log instead of rewriting the full cache on every inbound message, reducing large-group turn latency.
- Telegram/CLI-backend: mirror outbound replies to the session transcript so CLI-backend agent responses create `.jsonl` session files, preventing `sessionId=unknown` on subsequent runs. Fixes #75991.
- Gateway/nodes: allow approved chat-channel macOS node exec replays to cross transient agent WebSocket reconnects only when node, agent session, and channel target metadata still match, restoring Telegram/WeCom host=node approvals without opening a general backend replay bypass. Fixes #77656. Thanks @BunsDev.
- QQBot: route gateway WebSocket connections through the ambient proxy agent so deployments with `https_proxy`, `HTTPS_PROXY`, or `HTTP_PROXY` can reach the QQ gateway. (#72961) Thanks @xialonglee.
- Agents/subagents: treat `sessions_spawn` `model: "default"` as the default-model fallback and ignore ACP-only stream targets for native sub-agent spawns. Fixes #72078. (#72101) Thanks @xialonglee.
- Agents/failover: stop retrying assistant-prefill format rejections across auth profiles or model fallbacks, surfacing the deterministic provider error instead of requeueing the lane. Fixes #79688. (#79728) Thanks @hclsys.
- Google/Gemini: resolve missing Gemini 3 Flash catalog rows through the Google provider template path so image-capable media-understanding models keep `input: ["text", "image"]` instead of falling back to text-only metadata. Fixes #79750. (#79759) Thanks @fenglanhua and @hclsys.
- Memory/QMD: warn with a manual stale collection removal hint when QMD reports a path/pattern conflict but `collection list` lacks verifiable metadata, avoiding unsafe stderr-only rebinds. Refs #71783. (#72297) Thanks @MonkeyLeeT.
- Models/auth: make `openclaw models status --check` and dashboard auth health honor effective auth profile order while keeping stale profiles visible. (#79685) Thanks @nimbleenigma.
- Agents/failover: classify bare `stream_read_error` streaming failures as transient timeouts so configured model fallback runs instead of surfacing the raw transport error. Fixes #79689. (#79692) Thanks @hekunwang.
- Agents/failover: persist overloaded auth-profile cooldown marks before exhausted fallback summaries surface, so immediate fallback retries honor the recorded cooldown state.
- Docs/Subagents: correct the listed sub-agent bootstrap context files to include `SOUL.md`, `IDENTITY.md`, and `USER.md`. (#79470) Thanks @lastguru-net.
- Backup: keep live backup archives from copying current agent session transcripts, cron run logs, and delivery queues while preserving workspace lock/temp files and keeping `--json` output parseable when volatile files are skipped. Fixes #72249. (#72251) Thanks @abnershang.
- Backup: place the temp manifest outside every backed-up asset so `backup create --verify` still passes when `TMPDIR` resolves inside a source path (for example `~/.openclaw/tmp`), avoiding the duplicate root manifest that otherwise tripped `Expected exactly one backup manifest entry, found 2`. Fixes #75007. Thanks @YaanFPV.
- OpenAI/Codex: install the Codex runtime plugin from npm during OpenAI onboarding and load it automatically for implicit OpenAI model routes, while preserving manual PI runtime overrides. Fixes #79358.
- OpenAI/realtime voice: defer `response.create` while a realtime response is still active, retry after `response.done`/`response.cancelled`, and align GA input transcription/noise-reduction defaults with the Codex realtime reference so Discord/Voice Call consult results can resume speaking instead of tripping the active-response race.
- OpenAI/realtime voice: avoid duplicate barge-in cancellation requests, log realtime model interruption/cutoff events in Discord voice logs, and treat OpenAI's no-active-response cancellation reply as a completed cancel so Discord voice sessions do not wedge pending speech after fast interruptions.
- Agents/runtime: strip trailing assistant prefill for Claude-family OpenAI Responses routes, persist prompt/assistant profile cooldown marks before fallback, and show the configured container root in sandbox escape diagnostics. Fixes #79688 and #79712. Thanks @stainlu and @mushuiyu886.
- Gateway: avoid false degraded event-loop health during rapid health/readiness/status probes unless sustained load has delay co-evidence, while keeping hard delay detection immediate. (#77028) Thanks @rubencu.
- Markdown: keep blockquote spans off trailing paragraph separators. Fixes #79646.
- Plugin SDK/LM Studio: recover Harmony plain-text tool calls from LM Studio streams. Fixes #78326.
- Control UI: refresh the model cache after `session_status(model=...)` changes a session model. Fixes #79613.
- Agents/context-engine: share loop-hook checkpoints with the after-turn finalizer so messages are not replayed. Fixes #79630.
- Codex app-server: keep native hook relays alive for long-running turns so shell and file approvals stay reachable until the configured run window finishes. (#77533) Thanks @rubencu.
- Gateway/macOS: clear ignored SIGUSR1 restart state, skip redundant package-update restarts when the refreshed LaunchAgent already serves the expected version, and give launchd a 10s throttle plus 20s shutdown window so update restarts do not leave old gateways alive or fight supervisor recovery. Fixes #79577; refs #78699 and #60885. Thanks @BunsDev.
- Status/Codex: route Codex-harness `openai/*` usage through the OpenAI Codex quota provider and scope CLI status usage to the default agent auth store so `/status` and `openclaw status --usage` show Codex quota windows again. Fixes #79312. Thanks @keshavbotagent.
- Matrix: keep joined strict DM rooms discoverable when stale `m.direct` mappings already point at an older strict room, and let `dm.sessionScope: "per-room"` promote safe unmapped strict rooms through the existing unnamed/unaliased room gate. Fixes #79514. Thanks @stainlu.
- Gateway/agent: pass the session-key agent id into inline image attachment validation so the first image in a fresh per-agent session uses the agent's vision-capable model override instead of the text-only system default. Fixes #79407. Thanks @pandadev66.
- Gateway/maintenance: prune dedupe overflow against a stable excess count and keep active agent retries from starting duplicate runs after cache eviction. (#73841) Thanks @thesomewhatyou.
- Control UI/subagents: suppress internal `subagent_announce` handoff prompts from requester transcripts and hide legacy inter-session wrapper rows so completed subagent results no longer surface runtime context in WebChat history. (#79618) Thanks @joshavant.
- Discord: preserve username target resolution for Discord outbound sends. (#79076) Thanks @vincentkoc.
- Gateway/sessions: rotate generated transcript paths when gateway sessions reset, complementing the daily-rollover transcript persistence. (#79076) Thanks @vincentkoc.
- Dependencies: pin the transitive `fast-uri` production dependency to `3.1.2` so the production dependency audit no longer resolves the vulnerable `<=3.1.1` range. Thanks @shakkernerd.
- Plugins/install: fail managed npm plugin installs when OpenClaw cannot repair a required plugin-local `node_modules/openclaw` peer link, preventing that peer-link failure mode from producing unusable `@openclaw/codex` installs. Refs #79462. Thanks @ai-hpc.
- xAI/tools: register and execute `x_search` and `code_execution` when the xAI API key comes from an auth profile, keeping the plugin tool gate aligned with `openclaw onboard --auth-choice xai-api-key`. Fixes #79353. Thanks @dbernaltbn.
- Cron/agents: recognize same-target `edit`↔`write` recovery in `isSameToolMutationAction`, so a successful `write` to a path clears an earlier failed `edit` on the same path. Stops cron from reporting fatal failures when an agent self-heals across `edit` and `write`, while preserving same-tool fingerprint matching, blocking different-target writes, and excluding tools (including `apply_patch`) whose real call args do not produce a stable `path` fingerprint segment. Fixes #79024. Thanks @RenzoMXD.
- Gateway/Tailscale: add opt-in `gateway.tailscale.preserveFunnel` so when `tailscale.mode = "serve"` and an externally configured Tailscale Funnel route already covers the gateway port, OpenClaw skips re-applying `tailscale serve` on startup and skips the `resetOnExit` teardown for that run, keeping operator-managed Funnel exposure alive across gateway restarts. Fixes #57241. Thanks @RenzoMXD.
- CLI/router: when `openclaw <name>` does not match a CLI subcommand, check plugin tool manifests first so names like `lcm_recent` get an agent-tool diagnostic instead of the misleading suggestion to add the tool name to `plugins.allow`. Fixes #77214. Thanks @100yenadmin.
- QA-lab/parity: bump the live mock-openai parity baseline from `claude-opus-4-6`/`claude-sonnet-4-6` to `claude-opus-4-7`/`claude-sonnet-4-7` and the candidate alt from `gpt-5.4-alt` to `gpt-5.5-alt` in `openclaw-release-checks.yml` and `qa-live-transports-convex.yml`, matching the active Opus 4.7 / GPT-5.5 defaults already used elsewhere on main. Carries forward the surface-bump portion of #74290. Thanks @100yenadmin.
- QA-lab/scenarios: raise the `approval-turn-tool-followthrough` per-turn fallback timeouts from 20s/30s to 60s so cold mock-gateway parity runs do not flake on the approval-turn chain. Carries forward the timeout-bump portion of #74290. Thanks @100yenadmin.
- Gateway/restart continuation: treat routed post-reboot agent turns as trusted internal continuations while preserving the original Telegram topic route, and retry briefly when the previous run is still shutting down, so owner-only tools remain available for chained restart workflows after reboot.
- MS Teams: normalize pre-thread-qualified route session keys before deriving channel-thread lanes so cached route reuse cannot create malformed mixed `:thread:OLD:thread:NEW` sessions. Fixes #66771. (#78850) Thanks @harrisali0101.
 - Agents/compaction: keep the recent tail after manual `/compact` when Pi returns an empty or no-op compaction summary, preventing blank checkpoints from replacing the live context.
 - Native commands: handle slash commands before workspace and agent-reply bootstrap so Telegram `/status` and other command-only native replies do not wait behind full agent turn setup.
 - Telegram/groups: include the recent local chat window and nearby reply-target window as generic inbound context so stale reply ancestry does not overshadow the live group conversation.
@@ -786,6 +731,7 @@ Docs: https://docs.openclaw.ai
 - Gateway/macOS: `repairLaunchAgentBootstrap` no longer kickstarts an already-running LaunchAgent, preventing unnecessary service restarts and session disconnects when repair runs against a healthy gateway. Fixes #77428. Thanks @ramitrkar-hash.
 - Gateway/macOS: `openclaw gateway stop --disable` now persists the LaunchAgent disable bit even after a previous bootout left the service not loaded, keeping the explicit stay-down path reliable. (#78412) Thanks @wdeveloper16.
 - CLI/status: keep lean `openclaw status --json` off manifest-backed channel discovery so configured-channel checks do not repeatedly rescan plugin metadata. Fixes #79129.
+- Gateway/Tailscale: add opt-in `gateway.tailscale.preserveFunnel` so when `tailscale.mode = "serve"` and an externally configured Tailscale Funnel route already covers the gateway port, OpenClaw skips re-applying `tailscale serve` on startup and skips the `resetOnExit` teardown for that run, keeping operator-managed Funnel exposure alive across gateway restarts. Fixes #57241. Thanks @RenzoMXD.
 - Control UI/chat: hide retired and non-public Google Gemini model IDs from chat model catalogs and route the bare `gemini-3-pro` alias to Gemini 3.1 Pro Preview instead of the shut-down Gemini 3 Pro Preview. Thanks @BunsDev.
 - CLI/infer: canonicalize case-only catalog model refs in `infer model run --model` so mixed-case provider/model strings resolve to the canonical catalog entry instead of failing with `Unknown model`. (#78940) Thanks @ai-hpc.
 - CLI/infer: allow explicit local `infer model run --model <provider/model>` probes to use exact bundled static catalog rows before the provider is written to config, surfacing missing credentials as auth errors instead of `Unknown model`.
--- a/src/agents/pi-embedded-runner/run.ts
+++ b/src/agents/pi-embedded-runner/run.ts
@@ -1340,6 +1340,7 @@ export async function runEmbeddedPiAgent(
            execOverrides: params.execOverrides,
            bashElevated: params.bashElevated,
            timeoutMs: params.timeoutMs,
+            runTimeoutOverrideMs: params.runTimeoutOverrideMs,
            runId: params.runId,
            abortSignal: attemptAbortController.signal,
            replyOperation: params.replyOperation,
--- a/src/agents/pi-embedded-runner/run/attempt.ts
+++ b/src/agents/pi-embedded-runner/run/attempt.ts
@@ -2388,14 +2388,23 @@ export async function runEmbeddedAttempt(

      let idleTimeoutTrigger: ((error: Error) => void) | undefined;

-      // Wrap stream with idle timeout detection
+      // Wrap stream with idle timeout detection.
+      //
+      // Prefer the caller's explicit `runTimeoutOverrideMs` when provided —
+      // it carries the "this run was launched with a deliberate per-run
+      // timeout" signal without losing it when the value numerically equals
+      // `agents.defaults.timeoutSeconds`. Fall back to the value-equality
+      // heuristic for callers that haven't been migrated to plumb the flag.
      const configuredRunTimeoutMs = resolveAgentTimeoutMs({
        cfg: params.config,
      });
+      const resolvedRunTimeoutMs =
+        params.runTimeoutOverrideMs ??
+        (params.timeoutMs !== configuredRunTimeoutMs ? params.timeoutMs : undefined);
      const idleTimeoutMs = resolveLlmIdleTimeoutMs({
        cfg: params.config,
        trigger: params.trigger,
-        runTimeoutMs: params.timeoutMs !== configuredRunTimeoutMs ? params.timeoutMs : undefined,
+        runTimeoutMs: resolvedRunTimeoutMs,
        modelRequestTimeoutMs: (params.model as { requestTimeoutMs?: number }).requestTimeoutMs,
        model: params.model as { baseUrl?: string },
      });
--- a/src/agents/pi-embedded-runner/run/params.ts
+++ b/src/agents/pi-embedded-runner/run/params.ts
@@ -149,6 +149,16 @@ export type RunEmbeddedPiAgentParams = {
  >;
  bashElevated?: ExecElevatedDefaults;
  timeoutMs: number;
+  /**
+   * Explicit per-run timeout override, in milliseconds, when the caller knows
+   * the run was launched with a deliberate per-run value (e.g. a cron payload's
+   * `timeoutSeconds`) rather than inheriting `agents.defaults.timeoutSeconds`.
+   * When set, the LLM idle watchdog honors this value directly instead of
+   * inferring "explicitness" from `timeoutMs !== agents.defaults.timeoutSeconds`,
+   * which fails when the explicit value happens to numerically equal the agent
+   * default.
+   */
+  runTimeoutOverrideMs?: number;
  runId: string;
  abortSignal?: AbortSignal;
  onExecutionStarted?: () => void;
--- a/src/cron/isolated-agent.run-timeout-override.test.ts
+++ b/src/cron/isolated-agent.run-timeout-override.test.ts
@@ -0,0 +1,163 @@
+import "./isolated-agent.mocks.js";
+import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
+import { clearAllBootstrapSnapshots } from "../agents/bootstrap-cache.js";
+import { runEmbeddedPiAgent } from "../agents/pi-embedded.js";
+import { clearSessionStoreCacheForTest } from "../config/sessions/store.js";
+import { resetAgentRunContextForTest } from "../infra/agent-events.js";
+import { createCliDeps, mockAgentPayloads } from "./isolated-agent.delivery.test-helpers.js";
+import { runCronIsolatedAgentTurn } from "./isolated-agent.js";
+import {
+  makeCfg,
+  makeJob,
+  withTempCronHome,
+  writeSessionStoreEntries,
+} from "./isolated-agent.test-harness.js";
+
+function lastEmbeddedCall(): { runTimeoutOverrideMs?: number; timeoutMs?: number } {
+  const calls = vi.mocked(runEmbeddedPiAgent).mock.calls;
+  expect(calls.length).toBeGreaterThan(0);
+  return calls.at(-1)?.[0] as { runTimeoutOverrideMs?: number; timeoutMs?: number };
+}
+
+const envSnapshot = {
+  HOME: process.env.HOME,
+  USERPROFILE: process.env.USERPROFILE,
+  HOMEDRIVE: process.env.HOMEDRIVE,
+  HOMEPATH: process.env.HOMEPATH,
+  OPENCLAW_HOME: process.env.OPENCLAW_HOME,
+  OPENCLAW_STATE_DIR: process.env.OPENCLAW_STATE_DIR,
+} as const;
+
+function restoreSnapshotEnv() {
+  for (const [key, value] of Object.entries(envSnapshot)) {
+    if (value === undefined) {
+      delete process.env[key];
+    } else {
+      process.env[key] = value;
+    }
+  }
+}
+
+describe("runCronIsolatedAgentTurn — explicit per-run timeout signal", () => {
+  beforeEach(() => {
+    vi.mocked(runEmbeddedPiAgent).mockClear();
+  });
+
+  afterEach(() => {
+    restoreSnapshotEnv();
+    vi.doUnmock("../agents/pi-embedded.js");
+    vi.doUnmock("../agents/model-catalog.js");
+    vi.doUnmock("../agents/model-selection.js");
+    vi.doUnmock("../agents/subagent-announce.js");
+    vi.doUnmock("../gateway/call.js");
+    clearSessionStoreCacheForTest();
+    resetAgentRunContextForTest();
+    clearAllBootstrapSnapshots();
+    vi.restoreAllMocks();
+    vi.resetModules();
+  });
+
+  // Regression: when a cron job's payload `timeoutSeconds` numerically equals
+  // `agents.defaults.timeoutSeconds`, the run is still an *explicit* per-run
+  // override. The embedded runner used to detect "explicit" by comparing
+  // `params.timeoutMs !== resolveAgentTimeoutMs({cfg})` — which collapses to
+  // `false` in this case, stripping the runTimeoutMs signal and letting the
+  // LLM idle watchdog fall back to the implicit 120s cap.
+  // Fix: forward `runTimeoutOverrideMs` from the cron entry point so the
+  // explicit-vs-default distinction survives the merge into `timeoutMs`.
+  it("forwards runTimeoutOverrideMs when payload.timeoutSeconds equals the agent default", async () => {
+    await withTempCronHome(async (home) => {
+      const storePath = await writeSessionStoreEntries(home, {
+        "agent:main:main": {
+          sessionId: "main-session",
+          updatedAt: Date.now(),
+          lastProvider: "webchat",
+          lastTo: "",
+        },
+      });
+      mockAgentPayloads([{ text: "ok" }]);
+
+      const cfg = makeCfg(home, storePath, {
+        agents: { defaults: { timeoutSeconds: 300 } },
+      });
+
+      await runCronIsolatedAgentTurn({
+        cfg,
+        deps: createCliDeps(),
+        job: {
+          ...makeJob({ kind: "agentTurn", message: "do it", timeoutSeconds: 300 }),
+          delivery: { mode: "none" },
+        },
+        message: "do it",
+        sessionKey: "cron:job-1",
+      });
+
+      const call = lastEmbeddedCall();
+      expect(call.runTimeoutOverrideMs).toBe(300_000);
+    });
+  });
+
+  it("forwards runTimeoutOverrideMs when payload.timeoutSeconds differs from the agent default", async () => {
+    await withTempCronHome(async (home) => {
+      const storePath = await writeSessionStoreEntries(home, {
+        "agent:main:main": {
+          sessionId: "main-session",
+          updatedAt: Date.now(),
+          lastProvider: "webchat",
+          lastTo: "",
+        },
+      });
+      mockAgentPayloads([{ text: "ok" }]);
+
+      const cfg = makeCfg(home, storePath, {
+        agents: { defaults: { timeoutSeconds: 300 } },
+      });
+
+      await runCronIsolatedAgentTurn({
+        cfg,
+        deps: createCliDeps(),
+        job: {
+          ...makeJob({ kind: "agentTurn", message: "do it", timeoutSeconds: 600 }),
+          delivery: { mode: "none" },
+        },
+        message: "do it",
+        sessionKey: "cron:job-1",
+      });
+
+      const call = lastEmbeddedCall();
+      expect(call.runTimeoutOverrideMs).toBe(600_000);
+    });
+  });
+
+  it("leaves runTimeoutOverrideMs undefined when payload omits timeoutSeconds", async () => {
+    await withTempCronHome(async (home) => {
+      const storePath = await writeSessionStoreEntries(home, {
+        "agent:main:main": {
+          sessionId: "main-session",
+          updatedAt: Date.now(),
+          lastProvider: "webchat",
+          lastTo: "",
+        },
+      });
+      mockAgentPayloads([{ text: "ok" }]);
+
+      const cfg = makeCfg(home, storePath, {
+        agents: { defaults: { timeoutSeconds: 300 } },
+      });
+
+      await runCronIsolatedAgentTurn({
+        cfg,
+        deps: createCliDeps(),
+        job: {
+          ...makeJob({ kind: "agentTurn", message: "do it" }),
+          delivery: { mode: "none" },
+        },
+        message: "do it",
+        sessionKey: "cron:job-1",
+      });
+
+      const call = lastEmbeddedCall();
+      expect(call.runTimeoutOverrideMs).toBeUndefined();
+    });
+  });
+});
--- a/src/cron/isolated-agent/run-executor.ts
+++ b/src/cron/isolated-agent/run-executor.ts
@@ -82,6 +82,8 @@ export function createCronPromptExecutor(params: {
  resolvedVerboseLevel: VerboseLevel;
  thinkLevel: ThinkLevel | undefined;
  timeoutMs: number;
+  /** Set when the cron payload's `timeoutSeconds` was explicitly configured. */
+  runTimeoutOverrideMs?: number;
  senderIsOwner: boolean;
  messageChannel: string | undefined;
  suppressExecNotifyOnExit: boolean;
@@ -231,6 +233,7 @@ export function createCronPromptExecutor(params: {
          }).enabled,
          verboseLevel: params.resolvedVerboseLevel,
          timeoutMs: params.timeoutMs,
+          runTimeoutOverrideMs: params.runTimeoutOverrideMs,
          bootstrapContextMode: params.agentPayload?.lightContext ? "lightweight" : undefined,
          bootstrapContextRunKind: "cron",
          toolsAllow: params.agentPayload?.toolsAllow,
@@ -315,6 +318,8 @@ export async function executeCronRun(params: {
  ) => void;
  thinkLevel: ThinkLevel | undefined;
  timeoutMs: number;
+  /** Set when the cron payload's `timeoutSeconds` was explicitly configured. */
+  runTimeoutOverrideMs?: number;
  senderIsOwner: boolean;
  suppressExecNotifyOnExit: boolean;
  runStartedAt?: number;
@@ -340,6 +345,7 @@ export async function executeCronRun(params: {
    resolvedVerboseLevel,
    thinkLevel: params.thinkLevel,
    timeoutMs: params.timeoutMs,
+    runTimeoutOverrideMs: params.runTimeoutOverrideMs,
    messageChannel: params.resolvedDelivery.channel,
    suppressExecNotifyOnExit: params.suppressExecNotifyOnExit,
    resolvedDelivery: params.resolvedDelivery,
--- a/src/cron/isolated-agent/run.ts
+++ b/src/cron/isolated-agent/run.ts
@@ -468,6 +468,13 @@ type PreparedCronRunContext = {
  liveSelection: CronLiveSelection;
  thinkLevel: ThinkLevel | undefined;
  timeoutMs: number;
+  /**
+   * Set when the cron payload's `timeoutSeconds` was explicitly configured
+   * for this run (independent of whether its numeric value happens to equal
+   * `agents.defaults.timeoutSeconds`). Forwarded to the embedded runner so
+   * the LLM idle watchdog can honor the cron's per-run choice.
+   */
+  runTimeoutOverrideMs?: number;
 };

 type CronPreparationResult =
@@ -650,11 +657,24 @@ async function prepareCronRunContext(params: {
    }
  }

+  const explicitTimeoutSeconds =
+    input.job.payload.kind === "agentTurn" ? input.job.payload.timeoutSeconds : undefined;
  const timeoutMs = resolveAgentTimeoutMs({
    cfg: cfgWithAgentDefaults,
-    overrideSeconds:
-      input.job.payload.kind === "agentTurn" ? input.job.payload.timeoutSeconds : undefined,
+    overrideSeconds: explicitTimeoutSeconds,
  });
+  // Carry the "this run had an explicit per-run timeout" signal forward.
+  // `resolveAgentTimeoutMs` collapses overrideSeconds + the agent default into
+  // one number; the LLM idle watchdog at the embedded-runner attempt loses the
+  // explicit-vs-default distinction without this companion field, which would
+  // otherwise force the implicit 120 s cap whenever the cron payload's
+  // `timeoutSeconds` happens to numerically equal `agents.defaults.timeoutSeconds`.
+  const runTimeoutOverrideMs =
+    typeof explicitTimeoutSeconds === "number" &&
+    Number.isFinite(explicitTimeoutSeconds) &&
+    explicitTimeoutSeconds > 0
+      ? explicitTimeoutSeconds * 1000
+      : undefined;
  const agentPayload = input.job.payload.kind === "agentTurn" ? input.job.payload : null;
  const { deliveryPlan, deliveryRequested, resolvedDelivery, toolPolicy } =
    await resolveCronDeliveryContext({
@@ -799,6 +819,7 @@ async function prepareCronRunContext(params: {
      liveSelection,
      thinkLevel,
      timeoutMs,
+      runTimeoutOverrideMs,
    },
  };
 }
@@ -1146,6 +1167,7 @@ export async function runCronIsolatedAgentTurn(params: {
      isAborted,
      thinkLevel: prepared.context.thinkLevel,
      timeoutMs: prepared.context.timeoutMs,
+      runTimeoutOverrideMs: prepared.context.runTimeoutOverrideMs,
      suppressExecNotifyOnExit: prepared.context.suppressExecNotifyOnExit,
      senderIsOwner: prepared.context.senderIsOwner,
    });