6.1 KiB
title, summary, read_when
| title | summary | read_when | ||
|---|---|---|---|---|
| Session Pruning | How session pruning trims old tool results to reduce context bloat and improve cache efficiency |
|
Session Pruning
Session pruning trims old tool results from the in-memory context before each LLM call. It does not rewrite the on-disk session history (JSONL) -- it only affects what gets sent to the model for that request.
Why prune
Long-running sessions accumulate tool outputs (exec results, file reads, search results). These inflate the context window, increasing cost and eventually forcing compaction. Pruning removes stale tool output so the model sees a leaner context on each turn.
Pruning is also important for Anthropic prompt caching. When a session goes idle past the cache TTL, the next request re-caches the full prompt. Pruning reduces the cache-write size for that first post-TTL request, which directly reduces cost.
How it works
Pruning runs in cache-ttl mode, which is the only supported mode:
- Check the clock -- pruning only runs if the last Anthropic API call for
the session is older than
ttl(default5m). - Find prunable messages -- only
toolResultmessages are eligible. User and assistant messages are never modified. - Protect recent context -- the last
keepLastAssistantsassistant messages (default3) and all tool results after that cutoff are preserved. - Soft-trim oversized tool results -- keep the head and tail, insert
..., and append a note with the original size. - Hard-clear remaining eligible results -- replace the entire content with a placeholder.
- Reset the TTL -- subsequent requests keep cache until
ttlexpires again.
What gets skipped
- Tool results containing image blocks are never trimmed.
- If there are not enough assistant messages to establish the cutoff, pruning is skipped entirely.
- Pruning currently only activates for Anthropic API calls (and OpenRouter Anthropic models).
Smart defaults
OpenClaw auto-configures pruning for Anthropic profiles:
| Profile type | Pruning | Heartbeat | Cache retention |
|---|---|---|---|
| OAuth or setup-token | cache-ttl enabled |
1h |
(provider default) |
| API key | cache-ttl enabled |
30m |
short (5 min) |
If you set any of these values explicitly, OpenClaw does not override them.
Match ttl to your model cacheRetention policy for best results (short =
5 min, long = 1 hour).
Pruning vs compaction
| Pruning | Compaction | |
|---|---|---|
| What | Trims tool result messages | Summarizes conversation history |
| Persisted? | No (in-memory, per request) | Yes (in JSONL transcript) |
| Scope | Tool results only | Entire conversation |
| Trigger | Every LLM call (when TTL expired) | Context window threshold |
Built-in tools already truncate their own output. Pruning is an additional layer that prevents long-running chats from accumulating too much tool output over time. See Compaction for the summarization approach.
Configuration
Defaults (when enabled)
| Setting | Default | Description |
|---|---|---|
ttl |
5m |
Prune only after this idle period |
keepLastAssistants |
3 |
Protect tool results near recent assistant turns |
softTrimRatio |
0.3 |
Context ratio for soft-trim eligibility |
hardClearRatio |
0.5 |
Context ratio for hard-clear eligibility |
minPrunableToolChars |
50000 |
Minimum tool result size to consider |
softTrim.maxChars |
4000 |
Max chars after soft-trim |
softTrim.headChars |
1500 |
Head portion to keep |
softTrim.tailChars |
1500 |
Tail portion to keep |
hardClear.enabled |
true |
Enable hard-clear stage |
hardClear.placeholder |
[Old tool result content cleared] |
Replacement text |
Examples
Disable pruning (default state):
{
agents: {
defaults: {
contextPruning: { mode: "off" },
},
},
}
Enable TTL-aware pruning:
{
agents: {
defaults: {
contextPruning: { mode: "cache-ttl", ttl: "5m" },
},
},
}
Restrict pruning to specific tools:
{
agents: {
defaults: {
contextPruning: {
mode: "cache-ttl",
tools: {
allow: ["exec", "read"],
deny: ["*image*"],
},
},
},
},
}
Tool selection supports * wildcards, deny wins over allow, matching is
case-insensitive, and an empty allow list means all tools are allowed.
Context window estimation
Pruning estimates the context window (chars = tokens x 4). The base window is resolved in this order:
models.providers.*.models[].contextWindowoverride.- Model definition
contextWindowfrom the model registry. - Default
200000tokens.
If agents.defaults.contextTokens is set, it caps the resolved window.
Related
- Compaction -- summarization-based context reduction
- Session Management -- session lifecycle and routing
- Gateway Configuration -- full config reference