docs: rewrite sessions/memory section -- compaction, memory, and new memory-search page

2026-05-06 03:20:20 +00:00 · 2026-03-30 07:09:40 +09:00
parent 6d9a7224aa
commit 3584a893e8
4 changed files with 561 additions and 136 deletions
--- a/docs/concepts/compaction.md
+++ b/docs/concepts/compaction.md
@@ -1,98 +1,141 @@
 ---
-summary: "Context window + compaction: how OpenClaw keeps sessions under model limits"
+summary: "How OpenClaw compacts long sessions to stay within model context limits"
 read_when:
  - You want to understand auto-compaction and /compact
  - You are debugging long sessions hitting context limits
+  - You want to tune compaction behavior or use a custom context engine
 title: "Compaction"
 ---

-# Context Window & Compaction
+# Compaction

-Every model has a **context window** (max tokens it can see). Long-running chats accumulate messages and tool results; once the window is tight, OpenClaw **compacts** older history to stay within limits.
+Every model has a **context window** -- the maximum number of tokens it can see
+at once. As a conversation grows, it eventually approaches that limit. OpenClaw
+**compacts** older history into a summary so the session can continue without
+losing important context.

-## What compaction is
+## How compaction works

-Compaction **summarizes older conversation** into a compact summary entry and keeps recent messages intact. The summary is stored in the session history, so future requests use:
+Compaction is a three-step process:

- The compaction summary
- Recent messages after the compaction point
+1. **Summarize** older conversation turns into a compact summary.
+2. **Persist** the summary as a `compaction` entry in the session transcript
+   (JSONL).
+3. **Keep** recent messages after the compaction point intact.

-Compaction **persists** in the session’s JSONL history.
+After compaction, future turns see the summary plus all messages after the
+compaction point. The on-disk transcript retains the full history -- compaction
+only changes what gets loaded into the model context.

-## Configuration
+## Auto-compaction

-Use the `agents.defaults.compaction` setting in your `openclaw.json` to configure compaction behavior (mode, target tokens, etc.).
-Compaction summarization preserves opaque identifiers by default (`identifierPolicy: "strict"`). You can override this with `identifierPolicy: "off"` or provide custom text with `identifierPolicy: "custom"` and `identifierInstructions`.
+Auto-compaction is **on by default**. It triggers in two situations:

-You can optionally specify a different model for compaction summarization via `agents.defaults.compaction.model`. This is useful when your primary model is a local or small model and you want compaction summaries produced by a more capable model. The override accepts any `provider/model-id` string:
+1. **Threshold maintenance** -- after a successful turn, when estimated context
+   usage exceeds `contextWindow - reserveTokens`.
+2. **Overflow recovery** -- the model returns a context-overflow error. OpenClaw
+   compacts and retries the request.

-```json
-{
-  "agents": {
-    "defaults": {
-      "compaction": {
-        "model": "openrouter/anthropic/claude-sonnet-4-6"
-      }
-    }
-  }
-}
-```
+When auto-compaction runs you will see:

-This also works with local models, for example a second Ollama model dedicated to summarization or a fine-tuned compaction specialist:
+- `Auto-compaction complete` in verbose mode
+- `/status` showing `Compactions: <count>`

-```json
-{
-  "agents": {
-    "defaults": {
-      "compaction": {
-        "model": "ollama/llama3.1:8b"
-      }
-    }
-  }
-}
-```
+### Pre-compaction memory flush

-When unset, compaction uses the agent's primary model.
-
-## Auto-compaction (default on)
-
-When a session nears or exceeds the model’s context window, OpenClaw triggers auto-compaction and may retry the original request using the compacted context.
-
-You’ll see:
-
- `🧹 Auto-compaction complete` in verbose mode
- `/status` showing `🧹 Compactions: <count>`
-
-Before compaction, OpenClaw can run a **silent memory flush** turn to store
-durable notes to disk. See [Memory](/concepts/memory) for details and config.
+Before compacting, OpenClaw can run a **silent turn** that reminds the model to
+write durable notes to disk. This prevents important context from being lost in
+the summary. The flush is controlled by `agents.defaults.compaction.memoryFlush`
+and runs once per compaction cycle. See [Memory](/concepts/memory) for details.

 ## Manual compaction

-Use `/compact` (optionally with instructions) to force a compaction pass:
+Use `/compact` in any chat to force a compaction pass. You can optionally add
+instructions to guide the summary:

 ```
 /compact Focus on decisions and open questions
 ```

-## Context window source
+## Configuration

-Context window is model-specific. OpenClaw uses the model definition from the configured provider catalog to determine limits.
+### Compaction model
+
+By default, compaction uses the agent's primary model. You can override this
+with a different model for summarization -- useful when your primary model is
+small or local and you want a more capable summarizer:
+
+```json5
+{
+  agents: {
+    defaults: {
+      compaction: {
+        model: "openrouter/anthropic/claude-sonnet-4-6",
+      },
+    },
+  },
+}
+```
+
+### Reserve tokens and floor
+
+- `reserveTokens` -- headroom reserved for prompts and the next model output
+  (Pi runtime default: `16384`).
+- `reserveTokensFloor` -- minimum reserve enforced by OpenClaw (default:
+  `20000`). Set to `0` to disable.
+- `keepRecentTokens` -- how many tokens of recent conversation to preserve
+  during compaction (default: `20000`).
+
+### Identifier preservation
+
+Compaction summaries preserve opaque identifiers by default
+(`identifierPolicy: "strict"`). Override with:
+
+- `"off"` -- no special identifier handling.
+- `"custom"` -- provide your own instructions via `identifierInstructions`.
+
+### Memory flush
+
+```json5
+{
+  agents: {
+    defaults: {
+      compaction: {
+        memoryFlush: {
+          enabled: true, // default
+          softThresholdTokens: 4000,
+          systemPrompt: "Session nearing compaction. Store durable memories now.",
+          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
+        },
+      },
+    },
+  },
+}
+```
+
+The flush triggers when context usage crosses
+`contextWindow - reserveTokensFloor - softThresholdTokens`. It runs silently
+(the user sees nothing) and is skipped when the workspace is read-only.

 ## Compaction vs pruning

- **Compaction**: summarises and **persists** in JSONL.
- **Session pruning**: trims old **tool results** only, **in-memory**, per request.
+|                  | Compaction                     | Session pruning                  |
+| ---------------- | ------------------------------ | -------------------------------- |
+| **What it does** | Summarizes older conversation  | Trims old tool results           |
+| **Persisted?**   | Yes (in JSONL transcript)      | No (in-memory only, per request) |
+| **Scope**        | Entire conversation history    | Tool result messages only        |
+| **Frequency**    | Once when threshold is reached | Every LLM call (when enabled)    |

-See [/concepts/session-pruning](/concepts/session-pruning) for pruning details.
+See [Session Pruning](/concepts/session-pruning) for pruning details.

 ## OpenAI server-side compaction

-OpenClaw also supports OpenAI Responses server-side compaction hints for
-compatible direct OpenAI models. This is separate from local OpenClaw
-compaction and can run alongside it.
+OpenClaw also supports OpenAI Responses server-side compaction for compatible
+direct OpenAI models. This is separate from local compaction and can run
+alongside it:

- Local compaction: OpenClaw summarizes and persists into session JSONL.
- Server-side compaction: OpenAI compacts context on the provider side when
+- **Local compaction** -- OpenClaw summarizes and persists into session JSONL.
+- **Server-side compaction** -- OpenAI compacts context on the provider side when
  `store` + `context_management` are enabled.

 See [OpenAI provider](/providers/openai) for model params and overrides.
@@ -100,24 +143,40 @@ See [OpenAI provider](/providers/openai) for model params and overrides.
 ## Custom context engines

 Compaction behavior is owned by the active
-[context engine](/concepts/context-engine). The legacy engine uses the built-in
+[context engine](/concepts/context-engine). The built-in engine uses the
 summarization described above. Plugin engines (selected via
-`plugins.slots.contextEngine`) can implement any compaction strategy — DAG
-summaries, vector retrieval, incremental condensation, etc.
+`plugins.slots.contextEngine`) can implement any strategy -- DAG summaries,
+vector retrieval, incremental condensation, etc.

 When a plugin engine sets `ownsCompaction: true`, OpenClaw delegates all
 compaction decisions to the engine and does not run built-in auto-compaction.

-When `ownsCompaction` is `false` or unset, OpenClaw may still use Pi's
-built-in in-attempt auto-compaction, but the active engine's `compact()` method
-still handles `/compact` and overflow recovery. There is no automatic fallback
-to the legacy engine's compaction path.
-
-If you are building a non-owning context engine, implement `compact()` by
+When `ownsCompaction` is `false` or unset, the built-in auto-compaction still
+runs, but the engine's `compact()` method handles `/compact` and overflow
+recovery. If you are building a non-owning engine, implement `compact()` by
 calling `delegateCompactionToRuntime(...)` from `openclaw/plugin-sdk/core`.

-## Tips
+## Troubleshooting

- Use `/compact` when sessions feel stale or context is bloated.
- Large tool outputs are already truncated; pruning can further reduce tool-result buildup.
- If you need a fresh slate, `/new` or `/reset` starts a new session id.
+**Compaction triggers too often?**
+
+- Check the model's context window -- small models compact more frequently.
+- High `reserveTokens` relative to the context window can trigger early
+  compaction.
+- Large tool outputs accumulate fast. Enable
+  [session pruning](/concepts/session-pruning) to reduce tool-result buildup.
+
+**Context feels stale after compaction?**
+
+- Use `/compact Focus on <topic>` to guide the summary.
+- Increase `keepRecentTokens` to preserve more recent conversation.
+- Enable the [memory flush](/concepts/memory) so durable notes survive
+  compaction.
+
+**Need a fresh start?**
+
+- `/new` or `/reset` starts a new session ID without compacting.
+
+For the full internal lifecycle (store schema, transcript structure, Pi runtime
+semantics), see
+[Session Management Deep Dive](/reference/session-management-compaction).