* fix: prune stale session entries, cap entry count, and rotate sessions.json
The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.
Three mitigations, all running inside saveSessionStoreUnlocked() on every write:
1. Prune stale entries: remove entries with updatedAt older than 30 days
(configurable via session.maintenance.pruneDays in openclaw.json)
2. Cap entry count: keep only the 500 most recently updated entries
(configurable via session.maintenance.maxEntries). Entries without updatedAt
are evicted first.
3. File rotation: if the existing sessions.json exceeds 10MB before a write,
rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
backups (configurable via session.maintenance.rotateBytes).
All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.
Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.
27 new tests covering pruning, capping, rotation, and integration scenarios.
* feat: auto-prune expired cron run sessions (#12289)
Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.
New config option:
cron.sessionRetention: string | false (default: '24h')
The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.
Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely
Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps
Closes#12289
* fix: sweep cron session stores per agent
* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)
* fix: add warn-only session maintenance mode
* fix: warn-only maintenance defaults to active session
* fix: deliver maintenance warnings to active session
* docs: add session maintenance examples
* fix: accept duration and size maintenance thresholds
* refactor: share cron run session key check
* fix: format issues and replace defaultRuntime.warn with console.warn
---------
Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
Problem:
When users execute `/think off`, they still receive `reasoning_content`
from models configured with `reasoning: true` (e.g., GLM-4.7, GLM-4.6,
Kimi K2.5, MiniMax-M2.1).
Expected: `/think off` should completely disable reasoning content.
Actual: Reasoning content is still returned.
Root Cause:
The directive handlers delete `sessionEntry.thinkingLevel` when user
executes `/think off`. This causes the thinking level to become undefined,
and the system falls back to `resolveThinkingDefault()`, which checks the
model catalog and returns "low" for reasoning-capable models, ignoring the
user's explicit intent.
Why We Must Persist "off" (Design Rationale):
1. **Model-dependent defaults**: Unlike other directives where "off" means
use a global default, `thinkingLevel` has model-dependent defaults:
- Reasoning-capable models (GLM-4.7, etc.) → default "low"
- Other models → default "off"
2. **Existing pattern**: The codebase already follows this pattern for
`elevatedLevel`, which persists "off" explicitly to override defaults
that may be "on". The comment explains:
"Persist 'off' explicitly so `/elevated off` actually overrides defaults."
3. **User intent**: When a user explicitly executes `/think off`, they want
to disable thinking regardless of the model's capabilities. Deleting the
field breaks this intent by falling back to the model's default.
Solution:
Persist "off" value instead of deleting the field in all internal directive handlers:
- `src/auto-reply/reply/directive-handling.impl.ts`: Directive-only messages
- `src/auto-reply/reply/directive-handling.persist.ts`: Inline directives
- `src/commands/agent.ts`: CLI command-line flags
Gateway API Backward Compatibility:
The original implementation incorrectly mapped `null` to "off" in
`sessions-patch.ts` for consistency with internal handlers. This was a
breaking change because:
- Previously, `null` cleared the override (deleted the field)
- API clients lost the ability to "clear to default" via `null`
- This contradicts standard JSON semantics where `null` means "no value"
Restored original null semantics in `src/gateway/sessions-patch.ts`:
- `null` → delete field, fall back to model default (clear override)
- `"off"` → persist explicit override
- Other values → normalize and persist
This ensures backward compatibility for API clients while fixing the `/think off`
issue in internal handlers.
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
* refactor: consolidate duplicate utility functions
- Add escapeRegExp to src/utils.ts and remove 10 local duplicates
- Rename bash-tools clampNumber to clampWithDefault (different signature)
- Centralize formatError calls to use formatErrorMessage from infra/errors.ts
- Re-export formatErrorMessage from cli/cli-utils.ts to preserve API
* refactor: consolidate remaining escapeRegExp duplicates
* refactor: consolidate sleep, stripAnsi, and clamp duplicates
* fix: use .js extension for ESM imports of RoutePeerKind
The imports incorrectly used .ts extension which doesn't resolve
with moduleResolution: NodeNext. Changed to .js and added 'type'
import modifier.
* fix tsconfig
* refactor: unify peer kind to ChatType, rename dm to direct
- Replace RoutePeerKind with ChatType throughout codebase
- Change 'dm' literal values to 'direct' in routing/session keys
- Keep backward compat: normalizeChatType accepts 'dm' -> 'direct'
- Add ChatType export to plugin-sdk, deprecate RoutePeerKind
- Update session key parsing to accept both 'dm' and 'direct' markers
- Update all channel monitors and extensions to use ChatType
BREAKING CHANGE: Session keys now use 'direct' instead of 'dm'.
Existing 'dm' keys still work via backward compat layer.
* fix tests
* test: update session key expectations for dmdirect migration
- Fix test expectations to expect :direct: in generated output
- Add explicit backward compat test for normalizeChatType('dm')
- Keep input test data with :dm: keys to verify backward compat
* fix: accept legacy 'dm' in session key parsing for backward compat
getDmHistoryLimitFromSessionKey now accepts both :dm: and :direct:
to ensure old session keys continue to work correctly.
* test: add explicit backward compat tests for dmdirect migration
- session-key.test.ts: verify both :dm: and :direct: keys are valid
- getDmHistoryLimitFromSessionKey: verify both formats work
* feat: backward compat for resetByType.dm config key
* test: skip unix-path Nix tests on Windows
* initial commit
* feat: implement deriveSessionTotalTokens function and update usage tests
* Added deriveSessionTotalTokens function to calculate total tokens based on usage and context tokens.
* Updated usage tests to include cases for derived session total tokens.
* Refactored session usage calculations in multiple files to utilize the new function for improved accuracy.
* fix: restore overflow truncation fallback + changelog/test hardening (#11551) (thanks @tyler6204)
* feat(bluebubbles): auto-strip markdown from outbound messages (#7402)
* fix(security): add timeout to webhook body reading (#6762)
Adds 30-second timeout to readBody() in voice-call, bluebubbles, and nostr
webhook handlers. Prevents Slow-Loris DoS (CWE-400, CVSS 7.5).
Merged with existing maxBytes protection in voice-call.
* fix(security): unify Error objects and lint fixes in webhook timeouts (#6762)
* fix: prevent plugins from auto-enabling without user consent (#3961)
Changes default plugin enabled state from true to false in enablePluginEntry().
Preserves existing enabled:true values. Fixes#3932.
* fix: apply hierarchical mediaMaxMb config to all channels (#8749)
Generalizes resolveAttachmentMaxBytes() to use account → channel → global
config resolution for all channels, not just BlueBubbles. Fixes#7847.
* fix(bluebubbles): sanitize attachment filenames against header injection (#10333)
Strip ", \r, \n, and \\ from filenames after path.basename() to prevent
multipart Content-Disposition header injection (CWE-93, CVSS 5.4).
Also adds sanitization to setGroupIconBlueBubbles which had zero filename
sanitization.
* fix(lint): exclude extensions/ from Oxlint preflight check (#9313)
Extensions use PluginRuntime|null patterns that trigger
no-redundant-type-constituents because PluginRuntime resolves to any.
Excluding extensions/ from Oxlint unblocks user upgrades.
Re-applies the approach from closed PR #10087.
* fix(bluebubbles): add tempGuid to createNewChatWithMessage payload (#7745)
Non-Private-API mode (AppleScript) requires tempGuid in send payloads.
The main sendMessageBlueBubbles already had it, but createNewChatWithMessage
was missing it, causing 400 errors for new chat creation without Private API.
* fix: send stop-typing signal when run ends with NO_REPLY (#8785)
Adds onCleanup callback to the typing controller that fires when the
controller is cleaned up while typing was active (e.g., after NO_REPLY).
Channels using createTypingCallbacks automatically get stop-typing on
cleanup. This prevents the typing indicator from lingering in group chats
when the agent decides not to reply.
* fix(telegram): deduplicate skill commands in multi-agent setup (#5717)
Two fixes:
1. Skip duplicate workspace dirs when listing skill commands across agents.
Multiple agents sharing the same workspace would produce duplicate commands
with _2, _3 suffixes.
2. Clear stale commands via deleteMyCommands before registering new ones.
Commands from deleted skills now get cleaned up on restart.
* fix: add size limits to unbounded in-memory caches (#4948)
Adds max-size caps with oldest-entry eviction to prevent OOM in
long-running deployments:
- BlueBubbles serverInfoCache: 64 entries (already has TTL)
- Google Chat authCache: 32 entries
- Matrix directRoomCache: 1024 entries
- Discord presenceCache: 5000 entries per account
* fix: address review concerns (#11093)
- Chain deleteMyCommands → setMyCommands to prevent race condition (#5717)
- Rename enablePluginEntry to registerPluginEntry (now sets enabled: false)
- Add Slow-Loris timeout test for readJsonBody (#6023)
HOTFIX: Tool summaries were not being sent to chat channels when verbose mode
was enabled. The onToolResult callback was defined in the types but never
wired up in dispatch-from-config.ts.
This adds the missing callback alongside onBlockReply, using the same
dispatcher.sendBlockReply() path to deliver tool summaries to WhatsApp,
Telegram, and other chat channels.
Fixes verbose tool summaries not appearing in WhatsApp despite /verbose on.
* fix(slack): add mention stripPatterns for /new and /reset commands
Fixes#9937
The Slack dock was missing mentions.stripPatterns that Discord has.
This caused /new and /reset to fail when sent with a mention
(e.g. @bot /reset) because <@USERID> wasn't stripped before matching.
* fix(slack): strip mentions for /new and /reset (#9971) (thanks @ironbyte-rgb)
---------
Co-authored-by: ironbyte-rgb <amontaboi76@gmail.com>
Co-authored-by: George Pickett <gpickett00@gmail.com>
When starting a new session via /new or /reset, the token usage fields
(totalTokens, inputTokens, outputTokens, contextTokens) survived from the
previous session via the spread pattern in session init. This caused /status
to display misleading context usage from the old session.
Clear all four token metrics explicitly in the isNewSession block, alongside
the existing compactionCount reset. Also add diagnostic logging for session
forking via ParentSessionKey to help trace context inheritance.
* Fix subagent announce race and timeout handling
Bug 1: Subagent announce fires before model failover retries finish
- Problem: CLI provider emitted lifecycle error on each attempt, causing
subagent registry to prematurely call beginSubagentCleanup() and announce
with incorrect status before failover retries completed
- Fix: Removed lifecycle error emission from CLI provider's attempt-level
.catch() in agent-runner-execution.ts. Errors still propagate to
runWithModelFallback for retry, but no intermediate lifecycle events
are emitted. Only the final outcome (after all retries) emits lifecycle
events.
Bug 2: Hard 600s per-prompt timeout ignores runTimeoutSeconds=0
- Problem: When runTimeoutSeconds=0 (meaning 'no timeout'), the code
returned the default 600s timeout instead of respecting the 0 setting
- Fix: Modified resolveAgentTimeoutMs() to treat 0 as 'no timeout' and
return a very large timeout value (30 days) instead of the default.
This avoids setTimeout issues with Infinity while effectively providing
unlimited time for long-running tasks.
* fix: emit lifecycle:error for CLI failures (#6621) (thanks @tyler6204)
* chore: satisfy format/lint gates (#6621) (thanks @tyler6204)
* fix: restore build after upstream type changes (#6621) (thanks @tyler6204)
* test: fix createSystemPromptOverride tests to match new return type (#6621) (thanks @tyler6204)
* feat: Implement paragraph boundary flushing in block streaming
- Added `flushOnParagraph` option to `BlockReplyChunking` for immediate flushing on paragraph breaks.
- Updated `EmbeddedBlockChunker` to handle paragraph boundaries during chunking.
- Enhanced `createBlockReplyCoalescer` to support flushing on enqueue.
- Added tests to verify behavior of flushing with and without `flushOnEnqueue` set.
- Updated relevant types and interfaces to include `flushOnParagraph` and `flushOnEnqueue` options.
* fix: Improve streaming behavior and enhance block chunking logic
- Resolved issue with stuck typing indicator after streamed BlueBubbles replies.
- Refactored `EmbeddedBlockChunker` to streamline fence-split handling and ensure maxChars fallback for newline chunking.
- Added tests to validate new chunking behavior, including handling of paragraph breaks and fence scenarios.
- Updated changelog to reflect these changes.
* test: Add test for clamping long paragraphs in EmbeddedBlockChunker
- Introduced a new test case to verify that long paragraphs are correctly clamped to maxChars when flushOnParagraph is enabled.
- Updated logic in EmbeddedBlockChunker to handle cases where the next paragraph break exceeds maxChars, ensuring proper chunking behavior.
* refactor: streamline logging and improve error handling in message processing
- Removed verbose logging statements from the `processMessage` function to reduce clutter.
- Enhanced error handling by using `runtime.error` for typing restart failures.
- Updated the `applySystemPromptOverrideToSession` function to accept a string directly instead of a function, simplifying the prompt application process.
- Adjusted the `runEmbeddedAttempt` function to directly use the system prompt override without invoking it as a function.
* Slash new: use agent personality in session greeting
Previously /new and /reset used a generic greeting prompt. Agents with
personality files (IDENTITY.md, SOUL.md, etc) would respond out of
character until the conversation got going.
Now the prompt instructs the agent to greet users as their character,
using their defined voice, mannerisms, and mood from the start.
* Auto-reply: avoid workspace references in reset prompt
* fix: avoid workspace references in reset greeting (#5706) (thanks @bravostation)
---------
Co-authored-by: MoltBot <bot@moltbot.com>
Co-authored-by: Shadow <shadow@clawd.bot>
* fix(security): restrict inbound media staging to media directory
* docs: update MEDIA path guidance for security restrictions
- Update agent hint to warn against absolute/~ paths
- Update docs example to use https:// instead of /tmp/
---------
Co-authored-by: Evan Otero <evanotero@google.com>