6.3 KiB
summary, read_when, title
| summary | read_when | title | |||
|---|---|---|---|---|---|
| How OpenClaw compacts long sessions to stay within model context limits |
|
Compaction |
Compaction
Every model has a context window -- the maximum number of tokens it can see at once. As a conversation grows, it eventually approaches that limit. OpenClaw compacts older history into a summary so the session can continue without losing important context.
How compaction works
Compaction is a three-step process:
- Summarize older conversation turns into a compact summary.
- Persist the summary as a
compactionentry in the session transcript (JSONL). - Keep recent messages after the compaction point intact.
After compaction, future turns see the summary plus all messages after the compaction point. The on-disk transcript retains the full history -- compaction only changes what gets loaded into the model context.
Auto-compaction
Auto-compaction is on by default. It triggers in two situations:
- Threshold maintenance -- after a successful turn, when estimated context
usage exceeds
contextWindow - reserveTokens. - Overflow recovery -- the model returns a context-overflow error. OpenClaw compacts and retries the request.
When auto-compaction runs you will see:
Auto-compaction completein verbose mode/statusshowingCompactions: <count>
Pre-compaction memory flush
Before compacting, OpenClaw can run a silent turn that reminds the model to
write durable notes to disk. This prevents important context from being lost in
the summary. The flush is controlled by agents.defaults.compaction.memoryFlush
and runs once per compaction cycle. See Memory for details.
Manual compaction
Use /compact in any chat to force a compaction pass. You can optionally add
instructions to guide the summary:
/compact Focus on decisions and open questions
Configuration
Compaction model
By default, compaction uses the agent's primary model. You can override this with a different model for summarization -- useful when your primary model is small or local and you want a more capable summarizer:
{
agents: {
defaults: {
compaction: {
model: "openrouter/anthropic/claude-sonnet-4-6",
},
},
},
}
Reserve tokens and floor
reserveTokens-- headroom reserved for prompts and the next model output (Pi runtime default:16384).reserveTokensFloor-- minimum reserve enforced by OpenClaw (default:20000). Set to0to disable.keepRecentTokens-- how many tokens of recent conversation to preserve during compaction (default:20000).
Identifier preservation
Compaction summaries preserve opaque identifiers by default
(identifierPolicy: "strict"). Override with:
"off"-- no special identifier handling."custom"-- provide your own instructions viaidentifierInstructions.
Memory flush
{
agents: {
defaults: {
compaction: {
memoryFlush: {
enabled: true, // default
softThresholdTokens: 4000,
systemPrompt: "Session nearing compaction. Store durable memories now.",
prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
},
},
},
},
}
The flush triggers when context usage crosses
contextWindow - reserveTokensFloor - softThresholdTokens. It runs silently
(the user sees nothing) and is skipped when the workspace is read-only.
Compaction vs pruning
| Compaction | Session pruning | |
|---|---|---|
| What it does | Summarizes older conversation | Trims old tool results |
| Persisted? | Yes (in JSONL transcript) | No (in-memory only, per request) |
| Scope | Entire conversation history | Tool result messages only |
| Frequency | Once when threshold is reached | Every LLM call (when enabled) |
See Session Pruning for pruning details.
OpenAI server-side compaction
OpenClaw also supports OpenAI Responses server-side compaction for compatible direct OpenAI models. This is separate from local compaction and can run alongside it:
- Local compaction -- OpenClaw summarizes and persists into session JSONL.
- Server-side compaction -- OpenAI compacts context on the provider side when
store+context_managementare enabled.
See OpenAI provider for model params and overrides.
Custom context engines
Compaction behavior is owned by the active
context engine. The built-in engine uses the
summarization described above. Plugin engines (selected via
plugins.slots.contextEngine) can implement any strategy -- DAG summaries,
vector retrieval, incremental condensation, etc.
When a plugin engine sets ownsCompaction: true, OpenClaw delegates all
compaction decisions to the engine and does not run built-in auto-compaction.
When ownsCompaction is false or unset, the built-in auto-compaction still
runs, but the engine's compact() method handles /compact and overflow
recovery. If you are building a non-owning engine, implement compact() by
calling delegateCompactionToRuntime(...) from openclaw/plugin-sdk/core.
Troubleshooting
Compaction triggers too often?
- Check the model's context window -- small models compact more frequently.
- High
reserveTokensrelative to the context window can trigger early compaction. - Large tool outputs accumulate fast. Enable session pruning to reduce tool-result buildup.
Context feels stale after compaction?
- Use
/compact Focus on <topic>to guide the summary. - Increase
keepRecentTokensto preserve more recent conversation. - Enable the memory flush so durable notes survive compaction.
Need a fresh start?
/newor/resetstarts a new session ID without compacting.
For the full internal lifecycle (store schema, transcript structure, Pi runtime semantics), see Session Management Deep Dive.