docs: restructure cli index and active memory pages

2026-05-06 10:20:42 +00:00 · 2026-04-23 00:41:46 -07:00
parent 14e69146e6
commit d1f91b52fa
2 changed files with 181 additions and 1893 deletions
--- a/docs/cli/index.md
+++ b/docs/cli/index.md
--- a/docs/concepts/active-memory.md
+++ b/docs/concepts/active-memory.md
@@ -20,10 +20,11 @@ have made the reply feel natural has already passed.
 Active memory gives the system one bounded chance to surface relevant memory
 before the main reply is generated.

-## Paste This Into Your Agent
+## Quick start

-Paste this into your agent if you want it to enable Active Memory with a
-self-contained, safe-default setup:
+Paste this into `openclaw.json` for a safe-default setup — plugin on, scoped to
+the `main` agent, direct-message sessions only, inherits the session model
+when available:

 ```json5
 {
@@ -49,12 +50,7 @@ self-contained, safe-default setup:
 }
 ```

-This turns the plugin on for the `main` agent, keeps it limited to direct-message
-style sessions by default, lets it inherit the current session model first, and
-uses the configured fallback model only if no explicit or inherited model is
-available.
-
-After that, restart the gateway:
+Then restart the gateway:

 ```bash
 openclaw gateway
@@ -67,54 +63,15 @@ To inspect it live in a conversation:
 /trace on
 ```

-## Turn active memory on
-
-The safest setup is:
-
-1. enable the plugin
-2. target one conversational agent
-3. keep logging on only while tuning
-
-Start with this in `openclaw.json`:
-
-```json5
-{
-  plugins: {
-    entries: {
-      "active-memory": {
-        enabled: true,
-        config: {
-          agents: ["main"],
-          allowedChatTypes: ["direct"],
-          modelFallback: "google/gemini-3-flash",
-          queryMode: "recent",
-          promptStyle: "balanced",
-          timeoutMs: 15000,
-          maxSummaryChars: 220,
-          persistTranscripts: false,
-          logging: true,
-        },
-      },
-    },
-  },
-}
-```
-
-Then restart the gateway:
-
-```bash
-openclaw gateway
-```
-
-What this means:
+What the key fields do:

 - `plugins.entries.active-memory.enabled: true` turns the plugin on
 - `config.agents: ["main"]` opts only the `main` agent into active memory
- `config.allowedChatTypes: ["direct"]` keeps active memory on for direct-message style sessions only by default
- if `config.model` is unset, active memory inherits the current session model first
- `config.modelFallback` optionally provides your own fallback provider/model for recall
- `config.promptStyle: "balanced"` uses the default general-purpose prompt style for `recent` mode
- active memory still runs only on eligible interactive persistent chat sessions
+- `config.allowedChatTypes: ["direct"]` scopes it to direct-message sessions (opt in groups/channels explicitly)
+- `config.model` (optional) pins a dedicated recall model; unset inherits the current session model
+- `config.modelFallback` is used only when no explicit or inherited model resolves
+- `config.promptStyle: "balanced"` is the default for `recent` mode
+- Active memory still runs only for eligible interactive persistent chat sessions

 ## Speed recommendations

@@ -123,83 +80,45 @@ the same model you already use for normal replies. That is the safest default
 because it follows your existing provider, auth, and model preferences.

 If you want Active Memory to feel faster, use a dedicated inference model
-instead of borrowing the main chat model.
+instead of borrowing the main chat model. Recall quality matters, but latency
+matters more than for the main answer path, and Active Memory's tool surface
+is narrow (it only calls `memory_search` and `memory_get`).

-Example fast-provider setup:
+Good fast-model options:

-```json5
-models: {
-  providers: {
-    cerebras: {
-      baseUrl: "https://api.cerebras.ai/v1",
-      apiKey: "${CEREBRAS_API_KEY}",
-      api: "openai-completions",
-      models: [{ id: "gpt-oss-120b", name: "GPT OSS 120B (Cerebras)" }],
-    },
-  },
-},
-plugins: {
-  entries: {
-    "active-memory": {
-      enabled: true,
-      config: {
-        model: "cerebras/gpt-oss-120b",
-      },
-    },
-  },
-}
-```
-
-Fast-model options worth considering:
-
- `cerebras/gpt-oss-120b` for a fast dedicated recall model with a narrow tool surface
+- `cerebras/gpt-oss-120b` for a dedicated low-latency recall model
+- `google/gemini-3-flash` as a low-latency fallback without changing your primary chat model
 - your normal session model, by leaving `config.model` unset
- a low-latency fallback model such as `google/gemini-3-flash` when you want a separate recall model without changing your primary chat model
-
-Why Cerebras is a strong speed-oriented option for Active Memory:
-
- the Active Memory tool surface is narrow: it only calls `memory_search` and `memory_get`
- recall quality matters, but latency matters more than for the main answer path
- a dedicated fast provider avoids tying memory recall latency to your primary chat provider
-
-If you do not want a separate speed-optimized model, leave `config.model` unset
-and let Active Memory inherit the current session model.

 ### Cerebras setup

-Add a provider entry like this:
+Add a Cerebras provider and point Active Memory at it:

 ```json5
-models: {
-  providers: {
-    cerebras: {
-      baseUrl: "https://api.cerebras.ai/v1",
-      apiKey: "${CEREBRAS_API_KEY}",
-      api: "openai-completions",
-      models: [{ id: "gpt-oss-120b", name: "GPT OSS 120B (Cerebras)" }],
+{
+  models: {
+    providers: {
+      cerebras: {
+        baseUrl: "https://api.cerebras.ai/v1",
+        apiKey: "${CEREBRAS_API_KEY}",
+        api: "openai-completions",
+        models: [{ id: "gpt-oss-120b", name: "GPT OSS 120B (Cerebras)" }],
+      },
    },
  },
-}
-```
-
-Then point Active Memory at it:
-
-```json5
-plugins: {
-  entries: {
-    "active-memory": {
-      enabled: true,
-      config: {
-        model: "cerebras/gpt-oss-120b",
+  plugins: {
+    entries: {
+      "active-memory": {
+        enabled: true,
+        config: { model: "cerebras/gpt-oss-120b" },
      },
    },
  },
 }
 ```

-Caveat:
-
- make sure the Cerebras API key actually has model access for the model you choose, because `/v1/models` visibility alone does not guarantee `chat/completions` access
+Make sure the Cerebras API key actually has `chat/completions` access for the
+chosen model — `/v1/models` visibility alone does not guarantee it.

 ## How to see it

@@ -396,7 +315,70 @@ If the connection is weak, it should return `NONE`.

 ## Query modes

-`config.queryMode` controls how much conversation the blocking memory sub-agent sees.
+`config.queryMode` controls how much conversation the blocking memory sub-agent
+sees. Pick the smallest mode that still answers follow-up questions well;
+timeout budgets should grow with context size (`message` < `recent` < `full`).
+
+<Tabs>
+  <Tab title="message">
+    Only the latest user message is sent.
+
+    ```text
+    Latest user message only
+    ```
+
+    Use this when:
+
+    - you want the fastest behavior
+    - you want the strongest bias toward stable preference recall
+    - follow-up turns do not need conversational context
+
+    Start around `3000` to `5000` ms for `config.timeoutMs`.
+
+  </Tab>
+
+  <Tab title="recent">
+    The latest user message plus a small recent conversational tail is sent.
+
+    ```text
+    Recent conversation tail:
+    user: ...
+    assistant: ...
+    user: ...
+
+    Latest user message:
+    ...
+    ```
+
+    Use this when:
+
+    - you want a better balance of speed and conversational grounding
+    - follow-up questions often depend on the last few turns
+
+    Start around `15000` ms for `config.timeoutMs`.
+
+  </Tab>
+
+  <Tab title="full">
+    The full conversation is sent to the blocking memory sub-agent.
+
+    ```text
+    Full conversation context:
+    user: ...
+    assistant: ...
+    user: ...
+    ...
+    ```
+
+    Use this when:
+
+    - the strongest recall quality matters more than latency
+    - the conversation contains important setup far back in the thread
+
+    Start around `15000` ms or higher depending on thread size.
+
+  </Tab>
+</Tabs>

 ## Prompt styles

@@ -490,75 +472,6 @@ Prompt customization is not recommended unless you are deliberately testing a
 different recall contract. The default prompt is tuned to return either `NONE`
 or compact user-fact context for the main model.

-### `message`
-
-Only the latest user message is sent.
-
-```text
-Latest user message only
-```
-
-Use this when:
-
- you want the fastest behavior
- you want the strongest bias toward stable preference recall
- follow-up turns do not need conversational context
-
-Recommended timeout:
-
- start around `3000` to `5000` ms
-
-### `recent`
-
-The latest user message plus a small recent conversational tail is sent.
-
-```text
-Recent conversation tail:
-user: ...
-assistant: ...
-user: ...
-
-Latest user message:
-...
-```
-
-Use this when:
-
- you want a better balance of speed and conversational grounding
- follow-up questions often depend on the last few turns
-
-Recommended timeout:
-
- start around `15000` ms
-
-### `full`
-
-The full conversation is sent to the blocking memory sub-agent.
-
-```text
-Full conversation context:
-user: ...
-assistant: ...
-user: ...
-...
-```
-
-Use this when:
-
- the strongest recall quality matters more than latency
- the conversation contains important setup far back in the thread
-
-Recommended timeout:
-
- increase it substantially compared with `message` or `recent`
- start around `15000` ms or higher depending on thread size
-
-In general, timeout should increase with context size:
-
-```text
-message < recent < full
-```
-
 ## Transcript persistence

 Active memory blocking memory sub-agent runs create a real `session.jsonl`
@@ -702,181 +615,38 @@ If active memory is too slow:

 ## Common issues

-### Embedding provider changed unexpectedly
+Active Memory rides on the normal `memory_search` pipeline under
+`agents.defaults.memorySearch`, so most recall surprises are embedding-provider
+problems, not Active Memory bugs.

-Active Memory uses the normal `memory_search` pipeline under
-`agents.defaults.memorySearch`. That means embedding-provider setup is only a
-requirement when your `memorySearch` setup requires embeddings for the behavior
-you want.
+<AccordionGroup>
+  <Accordion title="Embedding provider switched or stopped working">
+    If `memorySearch.provider` is unset, OpenClaw auto-detects the first
+    available embedding provider. A new API key, quota exhaustion, or a
+    rate-limited hosted provider can change which provider resolves between
+    runs. If no provider resolves, `memory_search` may degrade to lexical-only
+    retrieval; runtime failures after a provider is already selected do not
+    fall back automatically.

-In practice:
+    Pin the provider (and an optional fallback) explicitly to make selection
+    deterministic. See [Memory Search](/concepts/memory-search) for the full
+    list of providers and pinning examples.

- explicit provider setup is **required** if you want a provider that is not
-  auto-detected, such as `ollama`
- explicit provider setup is **required** if auto-detection does not resolve
-  any usable embedding provider for your environment
- explicit provider setup is **highly recommended** if you want deterministic
-  provider selection instead of "first available wins"
- explicit provider setup is usually **not required** if auto-detection already
-  resolves the provider you want and that provider is stable in your deployment
+  </Accordion>

-If `memorySearch.provider` is unset, OpenClaw auto-detects the first available
-embedding provider.
-
-That can be confusing in real deployments:
-
- a newly available API key can change which provider memory search uses
- one command or diagnostics surface may make the selected provider look
-  different from the path you are actually hitting during live memory sync or
-  search bootstrap
- hosted providers can fail with quota or rate-limit errors that only show up
-  once Active Memory starts issuing recall searches before each reply
-
-Active Memory can still run without embeddings when `memory_search` can operate
-in degraded lexical-only mode, which typically happens when no embedding
-provider can be resolved.
-
-Do not assume the same fallback on provider runtime failures such as quota
-exhaustion, rate limits, network/provider errors, or missing local/remote
-models after a provider has already been selected.
-
-In practice:
-
- if no embedding provider can be resolved, `memory_search` may degrade to
-  lexical-only retrieval
- if an embedding provider is resolved and then fails at runtime, OpenClaw does
-  not currently guarantee a lexical fallback for that request
- if you need deterministic provider selection, pin
-  `agents.defaults.memorySearch.provider`
- if you need provider failover on runtime errors, configure
-  `agents.defaults.memorySearch.fallback` explicitly
-
-If you depend on embedding-backed recall, multimodal indexing, or a specific
-local/remote provider, pin the provider explicitly instead of relying on
-auto-detection.
-
-Common pinning examples:
-
-OpenAI:
-
-```json5
-{
-  agents: {
-    defaults: {
-      memorySearch: {
-        provider: "openai",
-        model: "text-embedding-3-small",
-      },
-    },
-  },
-}
-```
-
-Gemini:
-
-```json5
-{
-  agents: {
-    defaults: {
-      memorySearch: {
-        provider: "gemini",
-        model: "gemini-embedding-001",
-      },
-    },
-  },
-}
-```
-
-Ollama:
-
-```json5
-{
-  agents: {
-    defaults: {
-      memorySearch: {
-        provider: "ollama",
-        model: "nomic-embed-text",
-      },
-    },
-  },
-}
-```
-
-If you expect provider failover on runtime errors such as quota exhaustion,
-pinning a provider alone is not enough. Configure an explicit fallback too:
-
-```json5
-{
-  agents: {
-    defaults: {
-      memorySearch: {
-        provider: "openai",
-        fallback: "gemini",
-      },
-    },
-  },
-}
-```
-
-### Debugging provider issues
-
-If Active Memory is slow, empty, or appears to switch providers unexpectedly:
-
- watch the gateway logs while reproducing the problem; look for lines such as
-  `active-memory: ... start|done`, `memory sync failed (search-bootstrap)`, or
-  provider-specific embedding errors
- turn on `/trace on` to surface the plugin-owned Active Memory debug summary in
-  the session
- turn on `/verbose on` if you also want the normal `🧩 Active Memory: ...`
-  status line after each reply
- run `openclaw memory status --deep` to inspect the current memory-search
-  backend and index health
- check `agents.defaults.memorySearch.provider` and related auth/config to make
-  sure the provider you expect is actually the one that can resolve at runtime
- if you use `ollama`, verify the configured embedding model is installed, for
-  example `ollama list`
-
-Example debugging loop:
-
-```text
-1. Start the gateway and watch its logs
-2. In the chat session, run /trace on
-3. Send one message that should trigger Active Memory
-4. Compare the chat-visible debug line with the gateway log lines
-5. If provider choice is ambiguous, pin agents.defaults.memorySearch.provider explicitly
-```
-
-Example:
-
-```json5
-{
-  agents: {
-    defaults: {
-      memorySearch: {
-        provider: "ollama",
-        model: "nomic-embed-text",
-      },
-    },
-  },
-}
-```
-
-Or, if you want Gemini embeddings:
-
-```json5
-{
-  agents: {
-    defaults: {
-      memorySearch: {
-        provider: "gemini",
-      },
-    },
-  },
-}
-```
-
-After changing the provider, restart the gateway and run a fresh test with
-`/trace on` so the Active Memory debug line reflects the new embedding path.
+  <Accordion title="Recall feels slow, empty, or inconsistent">
+    - Turn on `/trace on` to surface the plugin-owned Active Memory debug
+      summary in the session.
+    - Turn on `/verbose on` to also see the `🧩 Active Memory: ...` status line
+      after each reply.
+    - Watch gateway logs for `active-memory: ... start|done`,
+      `memory sync failed (search-bootstrap)`, or provider embedding errors.
+    - Run `openclaw memory status --deep` to inspect the memory-search backend
+      and index health.
+    - If you use `ollama`, confirm the embedding model is installed
+      (`ollama list`).
+  </Accordion>
+</AccordionGroup>

 ## Related pages