* feat(gateway): implement claim check pattern to prevent OOM on large attachments
* fix: sanitize mediaId, refine trimEnd, remove warn log, add threshold and absolute path
* fix: enforce maxBytes before decoding and use dynamic path from saveMediaBuffer
* fix: enforce absolute maxBytes limit before Buffer allocation and preserve file extensions
* fix: align saveMediaBuffer arguments and satisfy oxfmt linter
* chore: strictly enforce linting rules (curly braces, unused vars, and error typing)
* fix: restrict offload to mainstream mimes to avoid extension-loss bug in store.ts for BMP/TIFF
* fix: restrict offload to mainstream mimes to bypass store.ts extension-loss bug
* chore: document bmp/tiff exclusion from offload whitelist in MIME_TO_EXT
* feat: implement agent-side resolver for opaque media URIs and finalize contract
* fix: support unicode media URIs and allow consecutive dots in safe IDs based on Codex review
* fix(gateway): enforce strict fail-fast for oversized media to prevent OOM bypass
* refactor(gateway): harden media offload with performance and security optimizations
This update refines the Claim Check pattern with industrial-grade guards:
- Performance: Implemented sampled Base64 validation for large payloads (>4KB) to prevent event loop blocking.
- Security: Added null-byte (\u0000) detection and reinforced path traversal guards.
- I18n: Updated media-uri regex to a blacklist-based character class for Unicode/Chinese filename support, with oxlint bypass for intentional control regex.
- Robustness: Enhanced error diagnostics with JSON-serialized IDs.
* fix: add HEIC/HEIF to offload allowlist and pass maxBytes to saveMediaBuffer
* fix(gateway): clean up offloaded media files on attachment parse failure
Address Codex review feedback: track saved media IDs and implement best-effort cleanup via deleteMediaBuffer if subsequent attachments fail validation, preventing orphaned files on disk.
* fix(gateway): enforce full base64 validation to prevent whitespace padding bypass
Address Codex review feedback: remove early return in isValidBase64 so padded payloads cannot bypass offload thresholds and reintroduce memory pressure. Updated related comments.
* fix(gateway): preserve offloaded media metadata and fix validation error mapping
Address Codex review feedback:
- Add \offloadedRefs\ to \ParsedMessageWithImages\ to expose structured metadata for offloaded attachments, preventing transcript media loss.
- Move \erifyDecodedSize\ outside the storage try-catch block to correctly surface client base64 validation failures as 4xx errors instead of 5xx \MediaOffloadError\.
- Add JSDoc TODOs indicating that upstream callers (chat.ts, agent.ts, server-node-events.ts) must explicitly pass the \supportsImages\ flag.
* fix(agents): explicitly allow media store dir when loading offloaded images
Address Codex review feedback: Pass getMediaDir() to loadWebMedia's localRoots for media-uri refs to prevent legacy path resolution mismatches from silently dropping large attachments.
* fix(gateway): resolve attachment offload regressions and error mapping
Address Codex review feedback:
- Pass \supportsImages\ dynamically in \chat.ts\ and \gent.ts\ based on model catalog, and explicitly in \server-node-events.ts\.
- Persist \offloadedRefs\ into the transcript pipeline in \chat.ts\ to preserve media metadata for >2MB attachments.
- Correctly map \MediaOffloadError\ to 5xx (UNAVAILABLE) to differentiate server storage faults from 4xx client validation errors.
* fix(gateway): dynamically compute supportsImages for overrides and node events
Address follow-up Codex review feedback:
- Use effective model (including overrides) to compute \supportsImages\ in \gent.ts\.
- Move session load earlier in \server-node-events.ts\ to dynamically compute \supportsImages\ rather than hardcoding true.
* fix(gateway): resolve capability edge cases reported by codex
Address final Codex edge cases:
- Refactor \gent.ts\ to compute \supportsImages\ even when no session key is present, ensuring text-only override requests without sessions safely drop attachments.
- Update catalog lookups in \chat.ts\, \gent.ts\, and \server-node-events.ts\ to strictly match both \id\ and \provider\ to prevent cross-provider model collisions.
* fix(agents): restore before_install hook for skill installs
Restore the plugin scanner security hook that was accidentally dropped during merge conflict resolution.
* fix: resolve attachment pathing, defer parsing after auth gates, and clean up node-event mocks
* fix: resolve syntax errors in test-env, fix missing helper imports, and optimize parsing sequence in node events
* fix(gateway): re-enforce message length limit after attachment parsing
Adds a secondary check to ensure the 20,000-char cap remains effective even after media markers are appended during the offload flow.
* fix(gateway): prevent dropping valid small images and clean up orphaned media on size rejection
* fix(gateway): share attachment image capability checks
* fix(gateway): preserve mixed attachment order
* fix: fail closed on unknown image capability (#55513) (thanks @Syysean)
* fix: classify offloaded attachment refs explicitly (#55513) (thanks @Syysean)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* gateway: prefer transcript model in sessions list
* gateway: keep live subagent model in session rows
* gateway: prefer selected model until runtime refresh
* gateway: simplify session model identity selection
* gateway: avoid transcript model fallback on cost-only reads
* fix: make cleanup "keep" persist subagent sessions indefinitely
* feat: expose subagent session metadata in sessions list
* fix: include status and timing in sessions_list tool
* fix: hide injected timestamp prefixes in chat ui
* feat: push session list updates over websocket
* feat: expose child subagent sessions in subagents list
* feat: add admin http endpoint to kill sessions
* Emit session.message websocket events for transcript updates
* Estimate session costs in sessions list
* Add direct session history HTTP and SSE endpoints
* Harden dashboard session events and history APIs
* Add session lifecycle gateway methods
* Add dashboard session API improvements
* Add dashboard session model and parent linkage support
* fix: tighten dashboard session API metadata
* Fix dashboard session cost metadata
* Persist accumulated session cost
* fix: stop followup queue drain cfg crash
* Fix dashboard session create and model metadata
* fix: stop guessing session model costs
* Gateway: cache OpenRouter pricing for configured models
* Gateway: add timeout session status
* Fix subagent spawn test config loading
* Gateway: preserve operator scopes without device identity
* Emit user message transcript events and deduplicate plugin warnings
* feat: emit sessions.changed lifecycle event on subagent spawn
Adds a session-lifecycle-events module (similar to transcript-events)
that emits create events when subagents are spawned. The gateway
server.impl.ts listens for these events and broadcasts sessions.changed
with reason=create to SSE subscribers, so dashboards can pick up new
subagent sessions without polling.
* Gateway: allow persistent dashboard orchestrator sessions
* fix: preserve operator scopes for token-authenticated backend clients
Backend clients (like agent-dashboard) that authenticate with a valid gateway
token but don't present a device identity were getting their scopes stripped.
The scope-clearing logic ran before checking the device identity decision,
so even when evaluateMissingDeviceIdentity returned 'allow' (because
roleCanSkipDeviceIdentity passed for token-authed operators), scopes were
already cleared.
Fix: also check decision.kind before clearing scopes, so token-authenticated
operators keep their requested scopes.
* Gateway: allow operator-token session kills
* Fix stale active subagent status after follow-up runs
* Fix dashboard image attachments in sessions send
* Fix completed session follow-up status updates
* feat: stream session tool events to operator UIs
* Add sessions.steer gateway coverage
* Persist subagent timing in session store
* Fix subagent session transcript event keys
* Fix active subagent session status in gateway
* bump session label max to 512
* Fix gateway send session reactivation
* fix: publish terminal session lifecycle state
* feat: change default session reset to effectively never
- Change DEFAULT_RESET_MODE from "daily" to "idle"
- Change DEFAULT_IDLE_MINUTES from 60 to 0 (0 = disabled/never)
- Allow idleMinutes=0 through normalization (don't clamp to 1)
- Treat idleMinutes=0 as "no idle expiry" in evaluateSessionFreshness
- Default behavior: mode "idle" + idleMinutes 0 = sessions never auto-reset
- Update test assertion for new default mode
* fix: prep session management followups (#50101) (thanks @clay-datacurve)
---------
Co-authored-by: Tyler Yust <TYTYYUST@YAHOO.COM>
* fix: preserve stored provider in resolveSessionModelRef for vendor-prefixed models
When an OpenRouter model with a vendor prefix (e.g. "anthropic/claude-haiku-4.5")
was successfully used and persisted to the session entry, the next call to
resolveSessionModelRef would re-parse the model string through parseModelRef,
which splits on the first slash and incorrectly extracts "anthropic" as the
provider — discarding the stored "openrouter" provider entirely. This caused
subsequent requests to attempt direct Anthropic API calls with an OpenRouter
API key, producing "credit balance too low" billing errors.
The fix trusts the explicitly stored modelProvider on the session entry and
skips parseModelRef re-parsing when a provider is already recorded. parseModelRef
is still used as a fallback when no provider is stored on the entry.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Changelog: add OpenRouter note for #22753
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* fix(gateway): normalize session key casing to prevent ghost sessions on Linux
On case-sensitive filesystems (Linux), mixed-case session keys like
agent:ops:MySession and agent:ops:mysession resolve to different store
entries, creating ghost duplicates that never converge.
Core changes in session-utils.ts:
- resolveSessionStoreKey: lowercase all session key components
- canonicalizeSpawnedByForAgent: accept cfg, resolve main-alias references
via canonicalizeMainSessionAlias after lowercasing
- loadSessionEntry: return legacyKey only when it differs from canonicalKey
- resolveGatewaySessionStoreTarget: scan store for case-insensitive matches;
add optional scanLegacyKeys param to skip disk reads for read-only callers
- Export findStoreKeysIgnoreCase for use by write-path consumers
- Compare global/unknown sentinels case-insensitively in all canonicalization
functions
sessions-resolve.ts:
- Make resolveSessionKeyFromResolveParams async for inline migration
- Check canonical key first (fast path), then fall back to legacy scan
- Delete ALL legacy case-variant keys in a single updateSessionStore pass
Fixes#12603
* fix(gateway): propagate canonical keys and clean up all case variants on write paths
- agent.ts: use canonicalizeSpawnedByForAgent (with cfg) instead of raw
toLowerCase; use findStoreKeysIgnoreCase to delete all legacy variants
on store write; pass canonicalKey to addChatRun, registerAgentRunContext,
resolveSendPolicy, and agentCommand
- sessions.ts: replace single-key migration with full case-variant cleanup
via findStoreKeysIgnoreCase in patch/reset/delete/compact handlers; add
case-insensitive fallback in preview (store already loaded); make
sessions.resolve handler async; pass scanLegacyKeys: false in preview
- server-node-events.ts: use findStoreKeysIgnoreCase to clean all legacy
variants on voice.transcript and agent.request write paths; pass
canonicalKey to addChatRun and agentCommand
* test(gateway): add session key case-normalization tests
Cover the case-insensitive session key canonicalization logic:
- resolveSessionStoreKey normalizes mixed-case bare and prefixed keys
- resolveSessionStoreKey resolves mixed-case main aliases (MAIN, Main)
- resolveGatewaySessionStoreTarget includes legacy mixed-case store keys
- resolveGatewaySessionStoreTarget collects all case-variant duplicates
- resolveGatewaySessionStoreTarget finds legacy main alias keys with
customized mainKey configuration
All 5 tests fail before the production changes, pass after.
* fix: clean legacy session alias cleanup gaps (openclaw#12846) thanks @mcaxtr
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix: prune stale session entries, cap entry count, and rotate sessions.json
The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.
Three mitigations, all running inside saveSessionStoreUnlocked() on every write:
1. Prune stale entries: remove entries with updatedAt older than 30 days
(configurable via session.maintenance.pruneDays in openclaw.json)
2. Cap entry count: keep only the 500 most recently updated entries
(configurable via session.maintenance.maxEntries). Entries without updatedAt
are evicted first.
3. File rotation: if the existing sessions.json exceeds 10MB before a write,
rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
backups (configurable via session.maintenance.rotateBytes).
All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.
Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.
27 new tests covering pruning, capping, rotation, and integration scenarios.
* feat: auto-prune expired cron run sessions (#12289)
Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.
New config option:
cron.sessionRetention: string | false (default: '24h')
The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.
Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely
Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps
Closes#12289
* fix: sweep cron session stores per agent
* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)
* fix: add warn-only session maintenance mode
* fix: warn-only maintenance defaults to active session
* fix: deliver maintenance warnings to active session
* docs: add session maintenance examples
* fix: accept duration and size maintenance thresholds
* refactor: share cron run session key check
* fix: format issues and replace defaultRuntime.warn with console.warn
---------
Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
* refactor: update cron job wake mode and run mode handling
- Changed default wake mode from 'next-heartbeat' to 'now' in CronJobEditor and related CLI commands.
- Updated cron-tool tests to reflect changes in run mode, introducing 'due' and 'force' options.
- Enhanced cron-tool logic to handle new run modes and ensure compatibility with existing job structures.
- Added new tests for delivery plan consistency and job execution behavior under various conditions.
- Improved normalization functions to handle wake mode and session target casing.
This refactor aims to streamline cron job configurations and enhance the overall user experience with clearer defaults and improved functionality.
* test: enhance cron job functionality and UI
- Added tests to ensure the isolated agent correctly announces the final payload text when delivering messages via Telegram.
- Implemented a new function to pick the last deliverable payload from a list of delivery payloads.
- Enhanced the cron service to maintain legacy "every" jobs while minute cron jobs recompute schedules.
- Updated the cron store migration tests to verify the addition of anchorMs to legacy every schedules.
- Improved the UI for displaying cron job details, including job state and delivery information, with new styles and layout adjustments.
These changes aim to improve the reliability and user experience of the cron job system.
* test: enhance sessions thinking level handling
- Added tests to verify that the correct thinking levels are applied during session spawning.
- Updated the sessions-spawn-tool to include a new parameter for overriding thinking levels.
- Enhanced the UI to support additional thinking levels, including "xhigh" and "full", and improved the handling of current options in dropdowns.
These changes aim to improve the flexibility and accuracy of thinking level configurations in session management.
* feat: enhance session management and cron job functionality
- Introduced passthrough arguments in the test-parallel script to allow for flexible command-line options.
- Updated session handling to hide cron run alias session keys from the sessions list, improving clarity.
- Enhanced the cron service to accurately record job start times and durations, ensuring better tracking of job execution.
- Added tests to verify the correct behavior of the cron service under various conditions, including zero-delay timers.
These changes aim to improve the usability and reliability of session and cron job management.
* feat: implement job running state checks in cron service
- Added functionality to prevent manual job runs if a job is already in progress, enhancing job management.
- Updated the `isJobDue` function to include checks for running jobs, ensuring accurate scheduling.
- Enhanced the `run` function to return a specific reason when a job is already running.
- Introduced a new test case to verify the behavior of forced manual runs during active job execution.
These changes aim to improve the reliability and clarity of cron job execution and management.
* feat: add session ID and key to CronRunLogEntry model
- Introduced `sessionid` and `sessionkey` properties to the `CronRunLogEntry` struct for enhanced tracking of session-related information.
- Updated the initializer and Codable conformance to accommodate the new properties, ensuring proper serialization and deserialization.
These changes aim to improve the granularity of logging and session management within the cron job system.
* fix: improve session display name resolution
- Updated the `resolveSessionDisplayName` function to ensure that both label and displayName are trimmed and default to an empty string if not present.
- Enhanced the logic to prevent returning the key if it matches the label or displayName, improving clarity in session naming.
These changes aim to enhance the accuracy and usability of session display names in the UI.
* perf: skip cron store persist when idle timer tick produces no changes
recomputeNextRuns now returns a boolean indicating whether any job
state was mutated. The idle path in onTimer only persists when the
return value is true, eliminating unnecessary file writes every 60s
for far-future or idle schedules.
* fix: prep for merge - explicit delivery mode migration, docs + changelog (#10776) (thanks @tyler6204)