--- title: "Session Pruning" summary: "How session pruning trims old tool results to reduce context bloat and improve cache efficiency" read_when: - You want to reduce LLM context growth from tool outputs - You are tuning agents.defaults.contextPruning --- # Session Pruning Session pruning trims **old tool results** from the in-memory context before each LLM call. It does **not** rewrite the on-disk session history (JSONL) -- it only affects what gets sent to the model for that request. ## Why prune Long-running sessions accumulate tool outputs (exec results, file reads, search results). These inflate the context window, increasing cost and eventually forcing [compaction](/concepts/compaction). Pruning removes stale tool output so the model sees a leaner context on each turn. Pruning is also important for **Anthropic prompt caching**. When a session goes idle past the cache TTL, the next request re-caches the full prompt. Pruning reduces the cache-write size for that first post-TTL request, which directly reduces cost. ## How it works Pruning runs in `cache-ttl` mode, which is the only supported mode: 1. **Check the clock** -- pruning only runs if the last Anthropic API call for the session is older than `ttl` (default `5m`). 2. **Find prunable messages** -- only `toolResult` messages are eligible. User and assistant messages are never modified. 3. **Protect recent context** -- the last `keepLastAssistants` assistant messages (default `3`) and all tool results after that cutoff are preserved. 4. **Soft-trim** oversized tool results -- keep the head and tail, insert `...`, and append a note with the original size. 5. **Hard-clear** remaining eligible results -- replace the entire content with a placeholder. 6. **Reset the TTL** -- subsequent requests keep cache until `ttl` expires again. ### What gets skipped - Tool results containing **image blocks** are never trimmed. - If there are not enough assistant messages to establish the cutoff, pruning is skipped entirely. - Pruning currently only activates for Anthropic API calls (and OpenRouter Anthropic models). ## Smart defaults OpenClaw auto-configures pruning for Anthropic profiles: | Profile type | Pruning | Heartbeat | Cache retention | | -------------------- | ------------------- | --------- | ------------------ | | OAuth or setup-token | `cache-ttl` enabled | `1h` | (provider default) | | API key | `cache-ttl` enabled | `30m` | `short` (5 min) | If you set any of these values explicitly, OpenClaw does not override them. Match `ttl` to your model `cacheRetention` policy for best results (`short` = 5 min, `long` = 1 hour). ## Pruning vs compaction | | Pruning | Compaction | | -------------- | --------------------------------- | ------------------------------- | | **What** | Trims tool result messages | Summarizes conversation history | | **Persisted?** | No (in-memory, per request) | Yes (in JSONL transcript) | | **Scope** | Tool results only | Entire conversation | | **Trigger** | Every LLM call (when TTL expired) | Context window threshold | Built-in tools already truncate their own output. Pruning is an additional layer that prevents long-running chats from accumulating too much tool output over time. See [Compaction](/concepts/compaction) for the summarization approach. ## Configuration ### Defaults (when enabled) | Setting | Default | Description | | ----------------------- | ----------------------------------- | ------------------------------------------------ | | `ttl` | `5m` | Prune only after this idle period | | `keepLastAssistants` | `3` | Protect tool results near recent assistant turns | | `softTrimRatio` | `0.3` | Context ratio for soft-trim eligibility | | `hardClearRatio` | `0.5` | Context ratio for hard-clear eligibility | | `minPrunableToolChars` | `50000` | Minimum tool result size to consider | | `softTrim.maxChars` | `4000` | Max chars after soft-trim | | `softTrim.headChars` | `1500` | Head portion to keep | | `softTrim.tailChars` | `1500` | Tail portion to keep | | `hardClear.enabled` | `true` | Enable hard-clear stage | | `hardClear.placeholder` | `[Old tool result content cleared]` | Replacement text | ### Examples Disable pruning (default state): ```json5 { agents: { defaults: { contextPruning: { mode: "off" }, }, }, } ``` Enable TTL-aware pruning: ```json5 { agents: { defaults: { contextPruning: { mode: "cache-ttl", ttl: "5m" }, }, }, } ``` Restrict pruning to specific tools: ```json5 { agents: { defaults: { contextPruning: { mode: "cache-ttl", tools: { allow: ["exec", "read"], deny: ["*image*"], }, }, }, }, } ``` Tool selection supports `*` wildcards, deny wins over allow, matching is case-insensitive, and an empty allow list means all tools are allowed. ## Context window estimation Pruning estimates the context window (chars = tokens x 4). The base window is resolved in this order: 1. `models.providers.*.models[].contextWindow` override. 2. Model definition `contextWindow` from the model registry. 3. Default `200000` tokens. If `agents.defaults.contextTokens` is set, it caps the resolved window. ## Related - [Compaction](/concepts/compaction) -- summarization-based context reduction - [Session Management](/concepts/session) -- session lifecycle and routing - [Gateway Configuration](/gateway/configuration) -- full config reference