vultr/openclaw

Fork 0

mirror of https://github.com/openclaw/openclaw.git synced 2026-04-01 12:21:25 +00:00

Files

Vincent Koc 2880b3d3ff docs: rewrite session management, session pruning, and session tools pages

2026-03-30 07:18:58 +09:00

6.1 KiB

Raw Blame History

title, summary, read_when

title

summary

read_when

Session Pruning

How session pruning trims old tool results to reduce context bloat and improve cache efficiency

You want to reduce LLM context growth from tool outputs

You are tuning agents.defaults.contextPruning

Session Pruning

Session pruning trims old tool results from the in-memory context before each LLM call. It does not rewrite the on-disk session history (JSONL) -- it only affects what gets sent to the model for that request.

Why prune

Long-running sessions accumulate tool outputs (exec results, file reads, search results). These inflate the context window, increasing cost and eventually forcing compaction. Pruning removes stale tool output so the model sees a leaner context on each turn.

Pruning is also important for Anthropic prompt caching. When a session goes idle past the cache TTL, the next request re-caches the full prompt. Pruning reduces the cache-write size for that first post-TTL request, which directly reduces cost.

How it works

Pruning runs in cache-ttl mode, which is the only supported mode:

Check the clock -- pruning only runs if the last Anthropic API call for the session is older than ttl (default 5m).
Find prunable messages -- only toolResult messages are eligible. User and assistant messages are never modified.
Protect recent context -- the last keepLastAssistants assistant messages (default 3) and all tool results after that cutoff are preserved.
Soft-trim oversized tool results -- keep the head and tail, insert ..., and append a note with the original size.
Hard-clear remaining eligible results -- replace the entire content with a placeholder.
Reset the TTL -- subsequent requests keep cache until ttl expires again.

What gets skipped

Tool results containing image blocks are never trimmed.
If there are not enough assistant messages to establish the cutoff, pruning is skipped entirely.
Pruning currently only activates for Anthropic API calls (and OpenRouter Anthropic models).

Smart defaults

OpenClaw auto-configures pruning for Anthropic profiles:

Profile type	Pruning	Heartbeat	Cache retention
OAuth or setup-token	`cache-ttl` enabled	`1h`	(provider default)
API key	`cache-ttl` enabled	`30m`	`short` (5 min)

If you set any of these values explicitly, OpenClaw does not override them.

Match ttl to your model cacheRetention policy for best results (short = 5 min, long = 1 hour).

Pruning vs compaction

	Pruning	Compaction
What	Trims tool result messages	Summarizes conversation history
Persisted?	No (in-memory, per request)	Yes (in JSONL transcript)
Scope	Tool results only	Entire conversation
Trigger	Every LLM call (when TTL expired)	Context window threshold

Built-in tools already truncate their own output. Pruning is an additional layer that prevents long-running chats from accumulating too much tool output over time. See Compaction for the summarization approach.

Configuration

Defaults (when enabled)

Setting	Default	Description
`ttl`	`5m`	Prune only after this idle period
`keepLastAssistants`	`3`	Protect tool results near recent assistant turns
`softTrimRatio`	`0.3`	Context ratio for soft-trim eligibility
`hardClearRatio`	`0.5`	Context ratio for hard-clear eligibility
`minPrunableToolChars`	`50000`	Minimum tool result size to consider
`softTrim.maxChars`	`4000`	Max chars after soft-trim
`softTrim.headChars`	`1500`	Head portion to keep
`softTrim.tailChars`	`1500`	Tail portion to keep
`hardClear.enabled`	`true`	Enable hard-clear stage
`hardClear.placeholder`	`[Old tool result content cleared]`	Replacement text

Examples

Disable pruning (default state):

{
  agents: {
    defaults: {
      contextPruning: { mode: "off" },
    },
  },
}

Enable TTL-aware pruning:

{
  agents: {
    defaults: {
      contextPruning: { mode: "cache-ttl", ttl: "5m" },
    },
  },
}

Restrict pruning to specific tools:

{
  agents: {
    defaults: {
      contextPruning: {
        mode: "cache-ttl",
        tools: {
          allow: ["exec", "read"],
          deny: ["*image*"],
        },
      },
    },
  },
}

Tool selection supports * wildcards, deny wins over allow, matching is case-insensitive, and an empty allow list means all tools are allowed.

Context window estimation

Pruning estimates the context window (chars = tokens x 4). The base window is resolved in this order:

models.providers.*.models[].contextWindow override.
Model definition contextWindow from the model registry.
Default 200000 tokens.

If agents.defaults.contextTokens is set, it caps the resolved window.

Compaction -- summarization-based context reduction
Session Management -- session lifecycle and routing
Gateway Configuration -- full config reference

6.1 KiB Raw Blame History

Session Pruning

Why prune

How it works

What gets skipped

Smart defaults

Pruning vs compaction

Configuration

Defaults (when enabled)

Examples

Context window estimation

Related

6.1 KiB

Raw Blame History