Files
openclaw/docs/concepts/session-pruning.md

159 lines
6.1 KiB
Markdown

---
title: "Session Pruning"
summary: "How session pruning trims old tool results to reduce context bloat and improve cache efficiency"
read_when:
- You want to reduce LLM context growth from tool outputs
- You are tuning agents.defaults.contextPruning
---
# Session Pruning
Session pruning trims **old tool results** from the in-memory context before
each LLM call. It does **not** rewrite the on-disk session history (JSONL) --
it only affects what gets sent to the model for that request.
## Why prune
Long-running sessions accumulate tool outputs (exec results, file reads, search
results). These inflate the context window, increasing cost and eventually
forcing [compaction](/concepts/compaction). Pruning removes stale tool output so
the model sees a leaner context on each turn.
Pruning is also important for **Anthropic prompt caching**. When a session goes
idle past the cache TTL, the next request re-caches the full prompt. Pruning
reduces the cache-write size for that first post-TTL request, which directly
reduces cost.
## How it works
Pruning runs in `cache-ttl` mode, which is the only supported mode:
1. **Check the clock** -- pruning only runs if the last Anthropic API call for
the session is older than `ttl` (default `5m`).
2. **Find prunable messages** -- only `toolResult` messages are eligible. User
and assistant messages are never modified.
3. **Protect recent context** -- the last `keepLastAssistants` assistant
messages (default `3`) and all tool results after that cutoff are preserved.
4. **Soft-trim** oversized tool results -- keep the head and tail, insert
`...`, and append a note with the original size.
5. **Hard-clear** remaining eligible results -- replace the entire content with
a placeholder.
6. **Reset the TTL** -- subsequent requests keep cache until `ttl` expires
again.
### What gets skipped
- Tool results containing **image blocks** are never trimmed.
- If there are not enough assistant messages to establish the cutoff, pruning
is skipped entirely.
- Pruning currently only activates for Anthropic API calls (and OpenRouter
Anthropic models).
## Smart defaults
OpenClaw auto-configures pruning for Anthropic profiles:
| Profile type | Pruning | Heartbeat | Cache retention |
| -------------------- | ------------------- | --------- | ------------------ |
| OAuth or setup-token | `cache-ttl` enabled | `1h` | (provider default) |
| API key | `cache-ttl` enabled | `30m` | `short` (5 min) |
If you set any of these values explicitly, OpenClaw does not override them.
Match `ttl` to your model `cacheRetention` policy for best results (`short` =
5 min, `long` = 1 hour).
## Pruning vs compaction
| | Pruning | Compaction |
| -------------- | --------------------------------- | ------------------------------- |
| **What** | Trims tool result messages | Summarizes conversation history |
| **Persisted?** | No (in-memory, per request) | Yes (in JSONL transcript) |
| **Scope** | Tool results only | Entire conversation |
| **Trigger** | Every LLM call (when TTL expired) | Context window threshold |
Built-in tools already truncate their own output. Pruning is an additional layer
that prevents long-running chats from accumulating too much tool output over
time. See [Compaction](/concepts/compaction) for the summarization approach.
## Configuration
### Defaults (when enabled)
| Setting | Default | Description |
| ----------------------- | ----------------------------------- | ------------------------------------------------ |
| `ttl` | `5m` | Prune only after this idle period |
| `keepLastAssistants` | `3` | Protect tool results near recent assistant turns |
| `softTrimRatio` | `0.3` | Context ratio for soft-trim eligibility |
| `hardClearRatio` | `0.5` | Context ratio for hard-clear eligibility |
| `minPrunableToolChars` | `50000` | Minimum tool result size to consider |
| `softTrim.maxChars` | `4000` | Max chars after soft-trim |
| `softTrim.headChars` | `1500` | Head portion to keep |
| `softTrim.tailChars` | `1500` | Tail portion to keep |
| `hardClear.enabled` | `true` | Enable hard-clear stage |
| `hardClear.placeholder` | `[Old tool result content cleared]` | Replacement text |
### Examples
Disable pruning (default state):
```json5
{
agents: {
defaults: {
contextPruning: { mode: "off" },
},
},
}
```
Enable TTL-aware pruning:
```json5
{
agents: {
defaults: {
contextPruning: { mode: "cache-ttl", ttl: "5m" },
},
},
}
```
Restrict pruning to specific tools:
```json5
{
agents: {
defaults: {
contextPruning: {
mode: "cache-ttl",
tools: {
allow: ["exec", "read"],
deny: ["*image*"],
},
},
},
},
}
```
Tool selection supports `*` wildcards, deny wins over allow, matching is
case-insensitive, and an empty allow list means all tools are allowed.
## Context window estimation
Pruning estimates the context window (chars = tokens x 4). The base window is
resolved in this order:
1. `models.providers.*.models[].contextWindow` override.
2. Model definition `contextWindow` from the model registry.
3. Default `200000` tokens.
If `agents.defaults.contextTokens` is set, it caps the resolved window.
## Related
- [Compaction](/concepts/compaction) -- summarization-based context reduction
- [Session Management](/concepts/session) -- session lifecycle and routing
- [Gateway Configuration](/gateway/configuration) -- full config reference