docs: split memory.md into concept intro + reference page

2026-05-06 04:20:43 +00:00 · 2026-03-19 10:57:47 -07:00
parent d57c327d45
commit b28cf6a8a4
2 changed files with 725 additions and 708 deletions
--- a/docs/concepts/memory.md
+++ b/docs/concepts/memory.md
@@ -34,8 +34,8 @@ These files live under the workspace (`agents.defaults.workspace`, default

 OpenClaw exposes two agent-facing tools for these Markdown files:

- `memory_search` — semantic recall over indexed snippets.
- `memory_get` — targeted read of a specific Markdown file/line range.
+- `memory_search` -- semantic recall over indexed snippets.
+- `memory_get` -- targeted read of a specific Markdown file/line range.

 `memory_get` now **degrades gracefully when a file doesn't exist** (for example,
 today's daily log before the first write). Both the builtin manager and the QMD
@@ -94,709 +94,15 @@ For the full compaction lifecycle, see
 ## Vector memory search

 OpenClaw can build a small vector index over `MEMORY.md` and `memory/*.md` so
-semantic queries can find related notes even when wording differs.
-
-Defaults:
-
- Enabled by default.
- Watches memory files for changes (debounced).
- Configure memory search under `agents.defaults.memorySearch` (not top-level
-  `memorySearch`).
- Uses remote embeddings by default. If `memorySearch.provider` is not set, OpenClaw auto-selects:
-  1. `local` if a `memorySearch.local.modelPath` is configured and the file exists.
-  2. `openai` if an OpenAI key can be resolved.
-  3. `gemini` if a Gemini key can be resolved.
-  4. `voyage` if a Voyage key can be resolved.
-  5. `mistral` if a Mistral key can be resolved.
-  6. Otherwise memory search stays disabled until configured.
- Local mode uses node-llama-cpp and may require `pnpm approve-builds`.
- Uses sqlite-vec (when available) to accelerate vector search inside SQLite.
- `memorySearch.provider = "ollama"` is also supported for local/self-hosted
-  Ollama embeddings (`/api/embeddings`), but it is not auto-selected.
-
-Remote embeddings **require** an API key for the embedding provider. OpenClaw
-resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
-variables. Codex OAuth only covers chat/completions and does **not** satisfy
-embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or
-`models.providers.google.apiKey`. For Voyage, use `VOYAGE_API_KEY` or
-`models.providers.voyage.apiKey`. For Mistral, use `MISTRAL_API_KEY` or
-`models.providers.mistral.apiKey`. Ollama typically does not require a real API
-key (a placeholder like `OLLAMA_API_KEY=ollama-local` is enough when needed by
-local policy).
-When using a custom OpenAI-compatible endpoint,
-set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`).
-
-### QMD backend (experimental)
-
-Set `memory.backend = "qmd"` to swap the built-in SQLite indexer for
-[QMD](https://github.com/tobi/qmd): a local-first search sidecar that combines
-BM25 + vectors + reranking. Markdown stays the source of truth; OpenClaw shells
-out to QMD for retrieval. Key points:
-
-**Prereqs**
-
- Disabled by default. Opt in per-config (`memory.backend = "qmd"`).
- Install the QMD CLI separately (`bun install -g https://github.com/tobi/qmd` or grab
-  a release) and make sure the `qmd` binary is on the gateway’s `PATH`.
- QMD needs an SQLite build that allows extensions (`brew install sqlite` on
-  macOS).
- QMD runs fully locally via Bun + `node-llama-cpp` and auto-downloads GGUF
-  models from HuggingFace on first use (no separate Ollama daemon required).
- The gateway runs QMD in a self-contained XDG home under
-  `~/.openclaw/agents/<agentId>/qmd/` by setting `XDG_CONFIG_HOME` and
-  `XDG_CACHE_HOME`.
- OS support: macOS and Linux work out of the box once Bun + SQLite are
-  installed. Windows is best supported via WSL2.
-
-**How the sidecar runs**
-
- The gateway writes a self-contained QMD home under
-  `~/.openclaw/agents/<agentId>/qmd/` (config + cache + sqlite DB).
- Collections are created via `qmd collection add` from `memory.qmd.paths`
-  (plus default workspace memory files), then `qmd update` + `qmd embed` run
-  on boot and on a configurable interval (`memory.qmd.update.interval`,
-  default 5 m).
- The gateway now initializes the QMD manager on startup, so periodic update
-  timers are armed even before the first `memory_search` call.
- Boot refresh now runs in the background by default so chat startup is not
-  blocked; set `memory.qmd.update.waitForBootSync = true` to keep the previous
-  blocking behavior.
- Searches run via `memory.qmd.searchMode` (default `qmd search --json`; also
-  supports `vsearch` and `query`). If the selected mode rejects flags on your
-  QMD build, OpenClaw retries with `qmd query`. If QMD fails or the binary is
-  missing, OpenClaw automatically falls back to the builtin SQLite manager so
-  memory tools keep working.
- OpenClaw does not expose QMD embed batch-size tuning today; batch behavior is
-  controlled by QMD itself.
- **First search may be slow**: QMD may download local GGUF models (reranker/query
-  expansion) on the first `qmd query` run.
-  - OpenClaw sets `XDG_CONFIG_HOME`/`XDG_CACHE_HOME` automatically when it runs QMD.
-  - If you want to pre-download models manually (and warm the same index OpenClaw
-    uses), run a one-off query with the agent’s XDG dirs.
-
-    OpenClaw’s QMD state lives under your **state dir** (defaults to `~/.openclaw`).
-    You can point `qmd` at the exact same index by exporting the same XDG vars
-    OpenClaw uses:
-
-    ```bash
-    # Pick the same state dir OpenClaw uses
-    STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"
-
-    export XDG_CONFIG_HOME="$STATE_DIR/agents/main/qmd/xdg-config"
-    export XDG_CACHE_HOME="$STATE_DIR/agents/main/qmd/xdg-cache"
-
-    # (Optional) force an index refresh + embeddings
-    qmd update
-    qmd embed
-
-    # Warm up / trigger first-time model downloads
-    qmd query "test" -c memory-root --json >/dev/null 2>&1
-    ```
-
-**Config surface (`memory.qmd.*`)**
-
- `command` (default `qmd`): override the executable path.
- `searchMode` (default `search`): pick which QMD command backs
-  `memory_search` (`search`, `vsearch`, `query`).
- `includeDefaultMemory` (default `true`): auto-index `MEMORY.md` + `memory/**/*.md`.
- `paths[]`: add extra directories/files (`path`, optional `pattern`, optional
-  stable `name`).
- `sessions`: opt into session JSONL indexing (`enabled`, `retentionDays`,
-  `exportDir`).
- `update`: controls refresh cadence and maintenance execution:
-  (`interval`, `debounceMs`, `onBoot`, `waitForBootSync`, `embedInterval`,
-  `commandTimeoutMs`, `updateTimeoutMs`, `embedTimeoutMs`).
- `limits`: clamp recall payload (`maxResults`, `maxSnippetChars`,
-  `maxInjectedChars`, `timeoutMs`).
- `scope`: same schema as [`session.sendPolicy`](/gateway/configuration-reference#session).
-  Default is DM-only (`deny` all, `allow` direct chats); loosen it to surface QMD
-  hits in groups/channels.
-  - `match.keyPrefix` matches the **normalized** session key (lowercased, with any
-    leading `agent:<id>:` stripped). Example: `discord:channel:`.
-  - `match.rawKeyPrefix` matches the **raw** session key (lowercased), including
-    `agent:<id>:`. Example: `agent:main:discord:`.
-  - Legacy: `match.keyPrefix: "agent:..."` is still treated as a raw-key prefix,
-    but prefer `rawKeyPrefix` for clarity.
- When `scope` denies a search, OpenClaw logs a warning with the derived
-  `channel`/`chatType` so empty results are easier to debug.
- Snippets sourced outside the workspace show up as
-  `qmd/<collection>/<relative-path>` in `memory_search` results; `memory_get`
-  understands that prefix and reads from the configured QMD collection root.
- When `memory.qmd.sessions.enabled = true`, OpenClaw exports sanitized session
-  transcripts (User/Assistant turns) into a dedicated QMD collection under
-  `~/.openclaw/agents/<id>/qmd/sessions/`, so `memory_search` can recall recent
-  conversations without touching the builtin SQLite index.
- `memory_search` snippets now include a `Source: <path#line>` footer when
-  `memory.citations` is `auto`/`on`; set `memory.citations = "off"` to keep
-  the path metadata internal (the agent still receives the path for
-  `memory_get`, but the snippet text omits the footer and the system prompt
-  warns the agent not to cite it).
-
-**Example**
-
-```json5
-memory: {
-  backend: "qmd",
-  citations: "auto",
-  qmd: {
-    includeDefaultMemory: true,
-    update: { interval: "5m", debounceMs: 15000 },
-    limits: { maxResults: 6, timeoutMs: 4000 },
-    scope: {
-      default: "deny",
-      rules: [
-        { action: "allow", match: { chatType: "direct" } },
-        // Normalized session-key prefix (strips `agent:<id>:`).
-        { action: "deny", match: { keyPrefix: "discord:channel:" } },
-        // Raw session-key prefix (includes `agent:<id>:`).
-        { action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } },
-      ]
-    },
-    paths: [
-      { name: "docs", path: "~/notes", pattern: "**/*.md" }
-    ]
-  }
-}
-```
-
-**Citations & fallback**
-
- `memory.citations` applies regardless of backend (`auto`/`on`/`off`).
- When `qmd` runs, we tag `status().backend = "qmd"` so diagnostics show which
-  engine served the results. If the QMD subprocess exits or JSON output can’t be
-  parsed, the search manager logs a warning and returns the builtin provider
-  (existing Markdown embeddings) until QMD recovers.
-
-### Additional memory paths
-
-If you want to index Markdown files outside the default workspace layout, add
-explicit paths:
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"]
-    }
-  }
-}
-```
-
-Notes:
-
- Paths can be absolute or workspace-relative.
- Directories are scanned recursively for `.md` files.
- By default, only Markdown files are indexed.
- If `memorySearch.multimodal.enabled = true`, OpenClaw also indexes supported image/audio files under `extraPaths` only. Default memory roots (`MEMORY.md`, `memory.md`, `memory/**/*.md`) stay Markdown-only.
- Symlinks are ignored (files or directories).
-
-### Multimodal memory files (Gemini image + audio)
-
-OpenClaw can index image and audio files from `memorySearch.extraPaths` when using Gemini embedding 2:
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      provider: "gemini",
-      model: "gemini-embedding-2-preview",
-      extraPaths: ["assets/reference", "voice-notes"],
-      multimodal: {
-        enabled: true,
-        modalities: ["image", "audio"], // or ["all"]
-        maxFileBytes: 10000000
-      },
-      remote: {
-        apiKey: "YOUR_GEMINI_API_KEY"
-      }
-    }
-  }
-}
-```
-
-Notes:
-
- Multimodal memory is currently supported only for `gemini-embedding-2-preview`.
- Multimodal indexing applies only to files discovered through `memorySearch.extraPaths`.
- Supported modalities in this phase: image and audio.
- `memorySearch.fallback` must stay `"none"` while multimodal memory is enabled.
- Matching image/audio file bytes are uploaded to the configured Gemini embedding endpoint during indexing.
- Supported image extensions: `.jpg`, `.jpeg`, `.png`, `.webp`, `.gif`, `.heic`, `.heif`.
- Supported audio extensions: `.mp3`, `.wav`, `.ogg`, `.opus`, `.m4a`, `.aac`, `.flac`.
- Search queries remain text, but Gemini can compare those text queries against indexed image/audio embeddings.
- `memory_get` still reads Markdown only; binary files are searchable but not returned as raw file contents.
-
-### Gemini embeddings (native)
-
-Set the provider to `gemini` to use the Gemini embeddings API directly:
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      provider: "gemini",
-      model: "gemini-embedding-001",
-      remote: {
-        apiKey: "YOUR_GEMINI_API_KEY"
-      }
-    }
-  }
-}
-```
-
-Notes:
-
- `remote.baseUrl` is optional (defaults to the Gemini API base URL).
- `remote.headers` lets you add extra headers if needed.
- Default model: `gemini-embedding-001`.
- `gemini-embedding-2-preview` is also supported: 8192 token limit and configurable dimensions (768 / 1536 / 3072, default 3072).
-
-#### Gemini Embedding 2 (preview)
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      provider: "gemini",
-      model: "gemini-embedding-2-preview",
-      outputDimensionality: 3072,  // optional: 768, 1536, or 3072 (default)
-      remote: {
-        apiKey: "YOUR_GEMINI_API_KEY"
-      }
-    }
-  }
-}
-```
-
-> **⚠️ Re-index required:** Switching from `gemini-embedding-001` (768 dimensions)
-> to `gemini-embedding-2-preview` (3072 dimensions) changes the vector size. The same is true if you
-> change `outputDimensionality` between 768, 1536, and 3072.
-> OpenClaw will automatically reindex when it detects a model or dimension change.
-
-If you want to use a **custom OpenAI-compatible endpoint** (OpenRouter, vLLM, or a proxy),
-you can use the `remote` configuration with the OpenAI provider:
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      provider: "openai",
-      model: "text-embedding-3-small",
-      remote: {
-        baseUrl: "https://api.example.com/v1/",
-        apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
-        headers: { "X-Custom-Header": "value" }
-      }
-    }
-  }
-}
-```
-
-If you don't want to set an API key, use `memorySearch.provider = "local"` or set
-`memorySearch.fallback = "none"`.
-
-Fallbacks:
-
- `memorySearch.fallback` can be `openai`, `gemini`, `voyage`, `mistral`, `ollama`, `local`, or `none`.
- The fallback provider is only used when the primary embedding provider fails.
-
-Batch indexing (OpenAI + Gemini + Voyage):
-
- Disabled by default. Set `agents.defaults.memorySearch.remote.batch.enabled = true` to enable for large-corpus indexing (OpenAI, Gemini, and Voyage).
- Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed.
- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2).
- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key.
- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.
-
-Why OpenAI batch is fast + cheap:
-
- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
- See the OpenAI Batch API docs and pricing for details:
-  - [https://platform.openai.com/docs/api-reference/batch](https://platform.openai.com/docs/api-reference/batch)
-  - [https://platform.openai.com/pricing](https://platform.openai.com/pricing)
-
-Config example:
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      provider: "openai",
-      model: "text-embedding-3-small",
-      fallback: "openai",
-      remote: {
-        batch: { enabled: true, concurrency: 2 }
-      },
-      sync: { watch: true }
-    }
-  }
-}
-```
-
-Tools:
-
- `memory_search` — returns snippets with file + line ranges.
- `memory_get` — read memory file content by path.
-
-Local mode:
-
- Set `agents.defaults.memorySearch.provider = "local"`.
- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.
-
-### How the memory tools work
-
- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.
-
-### What gets indexed (and when)
-
- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`).
- Index storage: per-agent SQLite at `~/.openclaw/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, OpenClaw automatically resets and reindexes the entire store.
-
-### Hybrid search (BM25 + vector)
-
-When enabled, OpenClaw combines:
-
- **Vector similarity** (semantic match, wording can differ)
- **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols)
-
-If full-text search is unavailable on your platform, OpenClaw falls back to vector-only search.
-
-#### Why hybrid?
-
-Vector search is great at “this means the same thing”:
-
- “Mac Studio gateway host” vs “the machine running the gateway”
- “debounce file updates” vs “avoid indexing on every write”
-
-But it can be weak at exact, high-signal tokens:
-
- IDs (`a828e60`, `b3b9895a…`)
- code symbols (`memorySearch.query.hybrid`)
- error strings ("sqlite-vec unavailable")
-
-BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases.
-Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get
-good results for both "natural language" queries and "needle in a haystack" queries.
-
-#### How we merge results (the current design)
-
-Implementation sketch:
-
-1. Retrieve a candidate pool from both sides:
-
- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity.
- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).
-
-2. Convert BM25 rank into a 0..1-ish score:
-
- `textScore = 1 / (1 + max(0, bm25Rank))`
-
-3. Union candidates by chunk id and compute a weighted score:
-
- `finalScore = vectorWeight * vectorScore + textWeight * textScore`
-
-Notes:
-
- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
- If FTS5 can't be created, we keep vector-only search (no hard failure).
-
-This isn't "IR-theory perfect", but it's simple, fast, and tends to improve recall/precision on real notes.
-If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization
-(min/max or z-score) before mixing.
-
-#### Post-processing pipeline
-
-After merging vector and keyword scores, two optional post-processing stages
-refine the result list before it reaches the agent:
-
-```
-Vector + Keyword → Weighted Merge → Temporal Decay → Sort → MMR → Top-K Results
-```
-
-Both stages are **off by default** and can be enabled independently.
-
-#### MMR re-ranking (diversity)
-
-When hybrid search returns results, multiple chunks may contain similar or overlapping content.
-For example, searching for "home network setup" might return five nearly identical snippets
-from different daily notes that all mention the same router configuration.
-
-**MMR (Maximal Marginal Relevance)** re-ranks the results to balance relevance with diversity,
-ensuring the top results cover different aspects of the query instead of repeating the same information.
-
-How it works:
-
-1. Results are scored by their original relevance (vector + BM25 weighted score).
-2. MMR iteratively selects results that maximize: `λ × relevance − (1−λ) × max_similarity_to_selected`.
-3. Similarity between results is measured using Jaccard text similarity on tokenized content.
-
-The `lambda` parameter controls the trade-off:
-
- `lambda = 1.0` → pure relevance (no diversity penalty)
- `lambda = 0.0` → maximum diversity (ignores relevance)
- Default: `0.7` (balanced, slight relevance bias)
-
-**Example — query: "home network setup"**
-
-Given these memory files:
-
-```
-memory/2026-02-10.md  → "Configured Omada router, set VLAN 10 for IoT devices"
-memory/2026-02-08.md  → "Configured Omada router, moved IoT to VLAN 10"
-memory/2026-02-05.md  → "Set up AdGuard DNS on 192.168.10.2"
-memory/network.md     → "Router: Omada ER605, AdGuard: 192.168.10.2, VLAN 10: IoT"
-```
-
-Without MMR — top 3 results:
-
-```
-1. memory/2026-02-10.md  (score: 0.92)  ← router + VLAN
-2. memory/2026-02-08.md  (score: 0.89)  ← router + VLAN (near-duplicate!)
-3. memory/network.md     (score: 0.85)  ← reference doc
-```
-
-With MMR (λ=0.7) — top 3 results:
-
-```
-1. memory/2026-02-10.md  (score: 0.92)  ← router + VLAN
-2. memory/network.md     (score: 0.85)  ← reference doc (diverse!)
-3. memory/2026-02-05.md  (score: 0.78)  ← AdGuard DNS (diverse!)
-```
-
-The near-duplicate from Feb 8 drops out, and the agent gets three distinct pieces of information.
-
-**When to enable:** If you notice `memory_search` returning redundant or near-duplicate snippets,
-especially with daily notes that often repeat similar information across days.
-
-#### Temporal decay (recency boost)
-
-Agents with daily notes accumulate hundreds of dated files over time. Without decay,
-a well-worded note from six months ago can outrank yesterday's update on the same topic.
-
-**Temporal decay** applies an exponential multiplier to scores based on the age of each result,
-so recent memories naturally rank higher while old ones fade:
-
-```
-decayedScore = score × e^(-λ × ageInDays)
-```
-
-where `λ = ln(2) / halfLifeDays`.
-
-With the default half-life of 30 days:
-
- Today's notes: **100%** of original score
- 7 days ago: **~84%**
- 30 days ago: **50%**
- 90 days ago: **12.5%**
- 180 days ago: **~1.6%**
-
-**Evergreen files are never decayed:**
-
- `MEMORY.md` (root memory file)
- Non-dated files in `memory/` (e.g., `memory/projects.md`, `memory/network.md`)
- These contain durable reference information that should always rank normally.
-
-**Dated daily files** (`memory/YYYY-MM-DD.md`) use the date extracted from the filename.
-Other sources (e.g., session transcripts) fall back to file modification time (`mtime`).
-
-**Example — query: "what's Rod's work schedule?"**
-
-Given these memory files (today is Feb 10):
-
-```
-memory/2025-09-15.md  → "Rod works Mon-Fri, standup at 10am, pairing at 2pm"  (148 days old)
-memory/2026-02-10.md  → "Rod has standup at 14:15, 1:1 with Zeb at 14:45"    (today)
-memory/2026-02-03.md  → "Rod started new team, standup moved to 14:15"        (7 days old)
-```
-
-Without decay:
-
-```
-1. memory/2025-09-15.md  (score: 0.91)  ← best semantic match, but stale!
-2. memory/2026-02-10.md  (score: 0.82)
-3. memory/2026-02-03.md  (score: 0.80)
-```
-
-With decay (halfLife=30):
-
-```
-1. memory/2026-02-10.md  (score: 0.82 × 1.00 = 0.82)  ← today, no decay
-2. memory/2026-02-03.md  (score: 0.80 × 0.85 = 0.68)  ← 7 days, mild decay
-3. memory/2025-09-15.md  (score: 0.91 × 0.03 = 0.03)  ← 148 days, nearly gone
-```
-
-The stale September note drops to the bottom despite having the best raw semantic match.
-
-**When to enable:** If your agent has months of daily notes and you find that old,
-stale information outranks recent context. A half-life of 30 days works well for
-daily-note-heavy workflows; increase it (e.g., 90 days) if you reference older notes frequently.
-
-#### Configuration
-
-Both features are configured under `memorySearch.query.hybrid`:
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      query: {
-        hybrid: {
-          enabled: true,
-          vectorWeight: 0.7,
-          textWeight: 0.3,
-          candidateMultiplier: 4,
-          // Diversity: reduce redundant results
-          mmr: {
-            enabled: true,    // default: false
-            lambda: 0.7       // 0 = max diversity, 1 = max relevance
-          },
-          // Recency: boost newer memories
-          temporalDecay: {
-            enabled: true,    // default: false
-            halfLifeDays: 30  // score halves every 30 days
-          }
-        }
-      }
-    }
-  }
-}
-```
-
-You can enable either feature independently:
-
- **MMR only** — useful when you have many similar notes but age doesn't matter.
- **Temporal decay only** — useful when recency matters but your results are already diverse.
- **Both** — recommended for agents with large, long-running daily note histories.
-
-### Embedding cache
-
-OpenClaw can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.
-
-Config:
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      cache: {
-        enabled: true,
-        maxEntries: 50000
-      }
-    }
-  }
-}
-```
-
-### Session memory search (experimental)
-
-You can optionally index **session transcripts** and surface them via `memory_search`.
-This is gated behind an experimental flag.
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      experimental: { sessionMemory: true },
-      sources: ["memory", "sessions"]
-    }
-  }
-}
-```
-
-Notes:
-
- Session indexing is **opt-in** (off by default).
- Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort).
- `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes.
- Results still include snippets only; `memory_get` remains limited to memory files.
- Session indexing is isolated per agent (only that agent’s session logs are indexed).
- Session logs live on disk (`~/.openclaw/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
-
-Delta thresholds (defaults shown):
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      sync: {
-        sessions: {
-          deltaBytes: 100000,   // ~100 KB
-          deltaMessages: 50     // JSONL lines
-        }
-      }
-    }
-  }
-}
-```
-
-### SQLite vector acceleration (sqlite-vec)
-
-When the sqlite-vec extension is available, OpenClaw stores embeddings in a
-SQLite virtual table (`vec0`) and performs vector distance queries in the
-database. This keeps search fast without loading every embedding into JS.
-
-Configuration (optional):
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      store: {
-        vector: {
-          enabled: true,
-          extensionPath: "/path/to/sqlite-vec"
-        }
-      }
-    }
-  }
-}
-```
-
-Notes:
-
- `enabled` defaults to true; when disabled, search falls back to in-process
-  cosine similarity over stored embeddings.
- If the sqlite-vec extension is missing or fails to load, OpenClaw logs the
-  error and continues with the JS fallback (no vector table).
- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds
-  or non-standard install locations).
-
-### Local embedding auto-download
-
- Default local embedding model: `hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf` (~0.6 GB).
- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry.
- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`.
- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason.
-
-### Custom OpenAI-compatible endpoint example
-
-```json5
-agents: {
-  defaults: {
-    memorySearch: {
-      provider: "openai",
-      model: "text-embedding-3-small",
-      remote: {
-        baseUrl: "https://api.example.com/v1/",
-        apiKey: "YOUR_REMOTE_API_KEY",
-        headers: {
-          "X-Organization": "org-id",
-          "X-Project": "project-id"
-        }
-      }
-    }
-  }
-}
-```
-
-Notes:
-
- `remote.*` takes precedence over `models.providers.openai.*`.
- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.
+semantic queries can find related notes even when wording differs. Hybrid search
+(BM25 + vector) is available for combining semantic matching with exact keyword
+lookups.
+
+Memory search supports multiple embedding providers (OpenAI, Gemini, Voyage,
+Mistral, Ollama, and local GGUF models), an optional QMD sidecar backend for
+advanced retrieval, and post-processing features like MMR diversity re-ranking
+and temporal decay.
+
+For the full configuration reference -- including embedding provider setup, QMD
+backend, hybrid search tuning, multimodal memory, and all config knobs -- see
+[Memory configuration reference](/reference/memory-config).
--- a/docs/reference/memory-config.md
+++ b/docs/reference/memory-config.md
@@ -0,0 +1,711 @@
+---
+title: "Memory configuration reference"
+summary: "Full configuration reference for OpenClaw memory search, embedding providers, QMD backend, hybrid search, and multimodal memory"
+read_when:
+  - You want to configure memory search providers or embedding models
+  - You want to set up the QMD backend
+  - You want to tune hybrid search, MMR, or temporal decay
+  - You want to enable multimodal memory indexing
+---
+
+# Memory configuration reference
+
+This page covers the full configuration surface for OpenClaw memory search. For
+the conceptual overview (file layout, memory tools, when to write memory, and the
+automatic flush), see [Memory](/concepts/memory).
+
+## Memory search defaults
+
+- Enabled by default.
+- Watches memory files for changes (debounced).
+- Configure memory search under `agents.defaults.memorySearch` (not top-level
+  `memorySearch`).
+- Uses remote embeddings by default. If `memorySearch.provider` is not set, OpenClaw auto-selects:
+  1. `local` if a `memorySearch.local.modelPath` is configured and the file exists.
+  2. `openai` if an OpenAI key can be resolved.
+  3. `gemini` if a Gemini key can be resolved.
+  4. `voyage` if a Voyage key can be resolved.
+  5. `mistral` if a Mistral key can be resolved.
+  6. Otherwise memory search stays disabled until configured.
+- Local mode uses node-llama-cpp and may require `pnpm approve-builds`.
+- Uses sqlite-vec (when available) to accelerate vector search inside SQLite.
+- `memorySearch.provider = "ollama"` is also supported for local/self-hosted
+  Ollama embeddings (`/api/embeddings`), but it is not auto-selected.
+
+Remote embeddings **require** an API key for the embedding provider. OpenClaw
+resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
+variables. Codex OAuth only covers chat/completions and does **not** satisfy
+embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or
+`models.providers.google.apiKey`. For Voyage, use `VOYAGE_API_KEY` or
+`models.providers.voyage.apiKey`. For Mistral, use `MISTRAL_API_KEY` or
+`models.providers.mistral.apiKey`. Ollama typically does not require a real API
+key (a placeholder like `OLLAMA_API_KEY=ollama-local` is enough when needed by
+local policy).
+When using a custom OpenAI-compatible endpoint,
+set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`).
+
+## QMD backend (experimental)
+
+Set `memory.backend = "qmd"` to swap the built-in SQLite indexer for
+[QMD](https://github.com/tobi/qmd): a local-first search sidecar that combines
+BM25 + vectors + reranking. Markdown stays the source of truth; OpenClaw shells
+out to QMD for retrieval. Key points:
+
+### Prerequisites
+
+- Disabled by default. Opt in per-config (`memory.backend = "qmd"`).
+- Install the QMD CLI separately (`bun install -g https://github.com/tobi/qmd` or grab
+  a release) and make sure the `qmd` binary is on the gateway's `PATH`.
+- QMD needs an SQLite build that allows extensions (`brew install sqlite` on
+  macOS).
+- QMD runs fully locally via Bun + `node-llama-cpp` and auto-downloads GGUF
+  models from HuggingFace on first use (no separate Ollama daemon required).
+- The gateway runs QMD in a self-contained XDG home under
+  `~/.openclaw/agents/<agentId>/qmd/` by setting `XDG_CONFIG_HOME` and
+  `XDG_CACHE_HOME`.
+- OS support: macOS and Linux work out of the box once Bun + SQLite are
+  installed. Windows is best supported via WSL2.
+
+### How the sidecar runs
+
+- The gateway writes a self-contained QMD home under
+  `~/.openclaw/agents/<agentId>/qmd/` (config + cache + sqlite DB).
+- Collections are created via `qmd collection add` from `memory.qmd.paths`
+  (plus default workspace memory files), then `qmd update` + `qmd embed` run
+  on boot and on a configurable interval (`memory.qmd.update.interval`,
+  default 5 m).
+- The gateway now initializes the QMD manager on startup, so periodic update
+  timers are armed even before the first `memory_search` call.
+- Boot refresh now runs in the background by default so chat startup is not
+  blocked; set `memory.qmd.update.waitForBootSync = true` to keep the previous
+  blocking behavior.
+- Searches run via `memory.qmd.searchMode` (default `qmd search --json`; also
+  supports `vsearch` and `query`). If the selected mode rejects flags on your
+  QMD build, OpenClaw retries with `qmd query`. If QMD fails or the binary is
+  missing, OpenClaw automatically falls back to the builtin SQLite manager so
+  memory tools keep working.
+- OpenClaw does not expose QMD embed batch-size tuning today; batch behavior is
+  controlled by QMD itself.
+- **First search may be slow**: QMD may download local GGUF models (reranker/query
+  expansion) on the first `qmd query` run.
+  - OpenClaw sets `XDG_CONFIG_HOME`/`XDG_CACHE_HOME` automatically when it runs QMD.
+  - If you want to pre-download models manually (and warm the same index OpenClaw
+    uses), run a one-off query with the agent's XDG dirs.
+
+    OpenClaw's QMD state lives under your **state dir** (defaults to `~/.openclaw`).
+    You can point `qmd` at the exact same index by exporting the same XDG vars
+    OpenClaw uses:
+
+    ```bash
+    # Pick the same state dir OpenClaw uses
+    STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"
+
+    export XDG_CONFIG_HOME="$STATE_DIR/agents/main/qmd/xdg-config"
+    export XDG_CACHE_HOME="$STATE_DIR/agents/main/qmd/xdg-cache"
+
+    # (Optional) force an index refresh + embeddings
+    qmd update
+    qmd embed
+
+    # Warm up / trigger first-time model downloads
+    qmd query "test" -c memory-root --json >/dev/null 2>&1
+    ```
+
+### Config surface (`memory.qmd.*`)
+
+- `command` (default `qmd`): override the executable path.
+- `searchMode` (default `search`): pick which QMD command backs
+  `memory_search` (`search`, `vsearch`, `query`).
+- `includeDefaultMemory` (default `true`): auto-index `MEMORY.md` + `memory/**/*.md`.
+- `paths[]`: add extra directories/files (`path`, optional `pattern`, optional
+  stable `name`).
+- `sessions`: opt into session JSONL indexing (`enabled`, `retentionDays`,
+  `exportDir`).
+- `update`: controls refresh cadence and maintenance execution:
+  (`interval`, `debounceMs`, `onBoot`, `waitForBootSync`, `embedInterval`,
+  `commandTimeoutMs`, `updateTimeoutMs`, `embedTimeoutMs`).
+- `limits`: clamp recall payload (`maxResults`, `maxSnippetChars`,
+  `maxInjectedChars`, `timeoutMs`).
+- `scope`: same schema as [`session.sendPolicy`](/gateway/configuration-reference#session).
+  Default is DM-only (`deny` all, `allow` direct chats); loosen it to surface QMD
+  hits in groups/channels.
+  - `match.keyPrefix` matches the **normalized** session key (lowercased, with any
+    leading `agent:<id>:` stripped). Example: `discord:channel:`.
+  - `match.rawKeyPrefix` matches the **raw** session key (lowercased), including
+    `agent:<id>:`. Example: `agent:main:discord:`.
+  - Legacy: `match.keyPrefix: "agent:..."` is still treated as a raw-key prefix,
+    but prefer `rawKeyPrefix` for clarity.
+- When `scope` denies a search, OpenClaw logs a warning with the derived
+  `channel`/`chatType` so empty results are easier to debug.
+- Snippets sourced outside the workspace show up as
+  `qmd/<collection>/<relative-path>` in `memory_search` results; `memory_get`
+  understands that prefix and reads from the configured QMD collection root.
+- When `memory.qmd.sessions.enabled = true`, OpenClaw exports sanitized session
+  transcripts (User/Assistant turns) into a dedicated QMD collection under
+  `~/.openclaw/agents/<id>/qmd/sessions/`, so `memory_search` can recall recent
+  conversations without touching the builtin SQLite index.
+- `memory_search` snippets now include a `Source: <path#line>` footer when
+  `memory.citations` is `auto`/`on`; set `memory.citations = "off"` to keep
+  the path metadata internal (the agent still receives the path for
+  `memory_get`, but the snippet text omits the footer and the system prompt
+  warns the agent not to cite it).
+
+### QMD example
+
+```json5
+memory: {
+  backend: "qmd",
+  citations: "auto",
+  qmd: {
+    includeDefaultMemory: true,
+    update: { interval: "5m", debounceMs: 15000 },
+    limits: { maxResults: 6, timeoutMs: 4000 },
+    scope: {
+      default: "deny",
+      rules: [
+        { action: "allow", match: { chatType: "direct" } },
+        // Normalized session-key prefix (strips `agent:<id>:`).
+        { action: "deny", match: { keyPrefix: "discord:channel:" } },
+        // Raw session-key prefix (includes `agent:<id>:`).
+        { action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } },
+      ]
+    },
+    paths: [
+      { name: "docs", path: "~/notes", pattern: "**/*.md" }
+    ]
+  }
+}
+```
+
+### Citations and fallback
+
+- `memory.citations` applies regardless of backend (`auto`/`on`/`off`).
+- When `qmd` runs, we tag `status().backend = "qmd"` so diagnostics show which
+  engine served the results. If the QMD subprocess exits or JSON output can't be
+  parsed, the search manager logs a warning and returns the builtin provider
+  (existing Markdown embeddings) until QMD recovers.
+
+## Additional memory paths
+
+If you want to index Markdown files outside the default workspace layout, add
+explicit paths:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"]
+    }
+  }
+}
+```
+
+Notes:
+
+- Paths can be absolute or workspace-relative.
+- Directories are scanned recursively for `.md` files.
+- By default, only Markdown files are indexed.
+- If `memorySearch.multimodal.enabled = true`, OpenClaw also indexes supported image/audio files under `extraPaths` only. Default memory roots (`MEMORY.md`, `memory.md`, `memory/**/*.md`) stay Markdown-only.
+- Symlinks are ignored (files or directories).
+
+## Multimodal memory files (Gemini image + audio)
+
+OpenClaw can index image and audio files from `memorySearch.extraPaths` when using Gemini embedding 2:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "gemini",
+      model: "gemini-embedding-2-preview",
+      extraPaths: ["assets/reference", "voice-notes"],
+      multimodal: {
+        enabled: true,
+        modalities: ["image", "audio"], // or ["all"]
+        maxFileBytes: 10000000
+      },
+      remote: {
+        apiKey: "YOUR_GEMINI_API_KEY"
+      }
+    }
+  }
+}
+```
+
+Notes:
+
+- Multimodal memory is currently supported only for `gemini-embedding-2-preview`.
+- Multimodal indexing applies only to files discovered through `memorySearch.extraPaths`.
+- Supported modalities in this phase: image and audio.
+- `memorySearch.fallback` must stay `"none"` while multimodal memory is enabled.
+- Matching image/audio file bytes are uploaded to the configured Gemini embedding endpoint during indexing.
+- Supported image extensions: `.jpg`, `.jpeg`, `.png`, `.webp`, `.gif`, `.heic`, `.heif`.
+- Supported audio extensions: `.mp3`, `.wav`, `.ogg`, `.opus`, `.m4a`, `.aac`, `.flac`.
+- Search queries remain text, but Gemini can compare those text queries against indexed image/audio embeddings.
+- `memory_get` still reads Markdown only; binary files are searchable but not returned as raw file contents.
+
+## Gemini embeddings (native)
+
+Set the provider to `gemini` to use the Gemini embeddings API directly:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "gemini",
+      model: "gemini-embedding-001",
+      remote: {
+        apiKey: "YOUR_GEMINI_API_KEY"
+      }
+    }
+  }
+}
+```
+
+Notes:
+
+- `remote.baseUrl` is optional (defaults to the Gemini API base URL).
+- `remote.headers` lets you add extra headers if needed.
+- Default model: `gemini-embedding-001`.
+- `gemini-embedding-2-preview` is also supported: 8192 token limit and configurable dimensions (768 / 1536 / 3072, default 3072).
+
+### Gemini Embedding 2 (preview)
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "gemini",
+      model: "gemini-embedding-2-preview",
+      outputDimensionality: 3072,  // optional: 768, 1536, or 3072 (default)
+      remote: {
+        apiKey: "YOUR_GEMINI_API_KEY"
+      }
+    }
+  }
+}
+```
+
+> **Re-index required:** Switching from `gemini-embedding-001` (768 dimensions)
+> to `gemini-embedding-2-preview` (3072 dimensions) changes the vector size. The same is true if you
+> change `outputDimensionality` between 768, 1536, and 3072.
+> OpenClaw will automatically reindex when it detects a model or dimension change.
+
+## Custom OpenAI-compatible endpoint
+
+If you want to use a custom OpenAI-compatible endpoint (OpenRouter, vLLM, or a proxy),
+you can use the `remote` configuration with the OpenAI provider:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      remote: {
+        baseUrl: "https://api.example.com/v1/",
+        apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
+        headers: { "X-Custom-Header": "value" }
+      }
+    }
+  }
+}
+```
+
+If you don't want to set an API key, use `memorySearch.provider = "local"` or set
+`memorySearch.fallback = "none"`.
+
+### Fallbacks
+
+- `memorySearch.fallback` can be `openai`, `gemini`, `voyage`, `mistral`, `ollama`, `local`, or `none`.
+- The fallback provider is only used when the primary embedding provider fails.
+
+### Batch indexing (OpenAI + Gemini + Voyage)
+
+- Disabled by default. Set `agents.defaults.memorySearch.remote.batch.enabled = true` to enable for large-corpus indexing (OpenAI, Gemini, and Voyage).
+- Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed.
+- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2).
+- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key.
+- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.
+
+Why OpenAI batch is fast and cheap:
+
+- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
+- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
+- See the OpenAI Batch API docs and pricing for details:
+  - [https://platform.openai.com/docs/api-reference/batch](https://platform.openai.com/docs/api-reference/batch)
+  - [https://platform.openai.com/pricing](https://platform.openai.com/pricing)
+
+Config example:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      fallback: "openai",
+      remote: {
+        batch: { enabled: true, concurrency: 2 }
+      },
+      sync: { watch: true }
+    }
+  }
+}
+```
+
+## How the memory tools work
+
+- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local to remote embeddings. No full file payload is returned.
+- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
+- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.
+
+## What gets indexed (and when)
+
+- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`).
+- Index storage: per-agent SQLite at `~/.openclaw/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
+- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
+- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, OpenClaw automatically resets and reindexes the entire store.
+
+## Hybrid search (BM25 + vector)
+
+When enabled, OpenClaw combines:
+
+- **Vector similarity** (semantic match, wording can differ)
+- **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols)
+
+If full-text search is unavailable on your platform, OpenClaw falls back to vector-only search.
+
+### Why hybrid
+
+Vector search is great at "this means the same thing":
+
+- "Mac Studio gateway host" vs "the machine running the gateway"
+- "debounce file updates" vs "avoid indexing on every write"
+
+But it can be weak at exact, high-signal tokens:
+
+- IDs (`a828e60`, `b3b9895a...`)
+- code symbols (`memorySearch.query.hybrid`)
+- error strings ("sqlite-vec unavailable")
+
+BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases.
+Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get
+good results for both "natural language" queries and "needle in a haystack" queries.
+
+### How we merge results (the current design)
+
+Implementation sketch:
+
+1. Retrieve a candidate pool from both sides:
+
+- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity.
+- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).
+
+2. Convert BM25 rank into a 0..1-ish score:
+
+- `textScore = 1 / (1 + max(0, bm25Rank))`
+
+3. Union candidates by chunk id and compute a weighted score:
+
+- `finalScore = vectorWeight * vectorScore + textWeight * textScore`
+
+Notes:
+
+- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
+- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
+- If FTS5 can't be created, we keep vector-only search (no hard failure).
+
+This isn't "IR-theory perfect", but it's simple, fast, and tends to improve recall/precision on real notes.
+If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization
+(min/max or z-score) before mixing.
+
+### Post-processing pipeline
+
+After merging vector and keyword scores, two optional post-processing stages
+refine the result list before it reaches the agent:
+
+```
+Vector + Keyword -> Weighted Merge -> Temporal Decay -> Sort -> MMR -> Top-K Results
+```
+
+Both stages are **off by default** and can be enabled independently.
+
+### MMR re-ranking (diversity)
+
+When hybrid search returns results, multiple chunks may contain similar or overlapping content.
+For example, searching for "home network setup" might return five nearly identical snippets
+from different daily notes that all mention the same router configuration.
+
+**MMR (Maximal Marginal Relevance)** re-ranks the results to balance relevance with diversity,
+ensuring the top results cover different aspects of the query instead of repeating the same information.
+
+How it works:
+
+1. Results are scored by their original relevance (vector + BM25 weighted score).
+2. MMR iteratively selects results that maximize: `lambda x relevance - (1-lambda) x max_similarity_to_selected`.
+3. Similarity between results is measured using Jaccard text similarity on tokenized content.
+
+The `lambda` parameter controls the trade-off:
+
+- `lambda = 1.0` -- pure relevance (no diversity penalty)
+- `lambda = 0.0` -- maximum diversity (ignores relevance)
+- Default: `0.7` (balanced, slight relevance bias)
+
+**Example -- query: "home network setup"**
+
+Given these memory files:
+
+```
+memory/2026-02-10.md  -> "Configured Omada router, set VLAN 10 for IoT devices"
+memory/2026-02-08.md  -> "Configured Omada router, moved IoT to VLAN 10"
+memory/2026-02-05.md  -> "Set up AdGuard DNS on 192.168.10.2"
+memory/network.md     -> "Router: Omada ER605, AdGuard: 192.168.10.2, VLAN 10: IoT"
+```
+
+Without MMR -- top 3 results:
+
+```
+1. memory/2026-02-10.md  (score: 0.92)  <- router + VLAN
+2. memory/2026-02-08.md  (score: 0.89)  <- router + VLAN (near-duplicate!)
+3. memory/network.md     (score: 0.85)  <- reference doc
+```
+
+With MMR (lambda=0.7) -- top 3 results:
+
+```
+1. memory/2026-02-10.md  (score: 0.92)  <- router + VLAN
+2. memory/network.md     (score: 0.85)  <- reference doc (diverse!)
+3. memory/2026-02-05.md  (score: 0.78)  <- AdGuard DNS (diverse!)
+```
+
+The near-duplicate from Feb 8 drops out, and the agent gets three distinct pieces of information.
+
+**When to enable:** If you notice `memory_search` returning redundant or near-duplicate snippets,
+especially with daily notes that often repeat similar information across days.
+
+### Temporal decay (recency boost)
+
+Agents with daily notes accumulate hundreds of dated files over time. Without decay,
+a well-worded note from six months ago can outrank yesterday's update on the same topic.
+
+**Temporal decay** applies an exponential multiplier to scores based on the age of each result,
+so recent memories naturally rank higher while old ones fade:
+
+```
+decayedScore = score x e^(-lambda x ageInDays)
+```
+
+where `lambda = ln(2) / halfLifeDays`.
+
+With the default half-life of 30 days:
+
+- Today's notes: **100%** of original score
+- 7 days ago: **~84%**
+- 30 days ago: **50%**
+- 90 days ago: **12.5%**
+- 180 days ago: **~1.6%**
+
+**Evergreen files are never decayed:**
+
+- `MEMORY.md` (root memory file)
+- Non-dated files in `memory/` (e.g., `memory/projects.md`, `memory/network.md`)
+- These contain durable reference information that should always rank normally.
+
+**Dated daily files** (`memory/YYYY-MM-DD.md`) use the date extracted from the filename.
+Other sources (e.g., session transcripts) fall back to file modification time (`mtime`).
+
+**Example -- query: "what's Rod's work schedule?"**
+
+Given these memory files (today is Feb 10):
+
+```
+memory/2025-09-15.md  -> "Rod works Mon-Fri, standup at 10am, pairing at 2pm"  (148 days old)
+memory/2026-02-10.md  -> "Rod has standup at 14:15, 1:1 with Zeb at 14:45"    (today)
+memory/2026-02-03.md  -> "Rod started new team, standup moved to 14:15"        (7 days old)
+```
+
+Without decay:
+
+```
+1. memory/2025-09-15.md  (score: 0.91)  <- best semantic match, but stale!
+2. memory/2026-02-10.md  (score: 0.82)
+3. memory/2026-02-03.md  (score: 0.80)
+```
+
+With decay (halfLife=30):
+
+```
+1. memory/2026-02-10.md  (score: 0.82 x 1.00 = 0.82)  <- today, no decay
+2. memory/2026-02-03.md  (score: 0.80 x 0.85 = 0.68)  <- 7 days, mild decay
+3. memory/2025-09-15.md  (score: 0.91 x 0.03 = 0.03)  <- 148 days, nearly gone
+```
+
+The stale September note drops to the bottom despite having the best raw semantic match.
+
+**When to enable:** If your agent has months of daily notes and you find that old,
+stale information outranks recent context. A half-life of 30 days works well for
+daily-note-heavy workflows; increase it (e.g., 90 days) if you reference older notes frequently.
+
+### Hybrid search configuration
+
+Both features are configured under `memorySearch.query.hybrid`:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      query: {
+        hybrid: {
+          enabled: true,
+          vectorWeight: 0.7,
+          textWeight: 0.3,
+          candidateMultiplier: 4,
+          // Diversity: reduce redundant results
+          mmr: {
+            enabled: true,    // default: false
+            lambda: 0.7       // 0 = max diversity, 1 = max relevance
+          },
+          // Recency: boost newer memories
+          temporalDecay: {
+            enabled: true,    // default: false
+            halfLifeDays: 30  // score halves every 30 days
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+You can enable either feature independently:
+
+- **MMR only** -- useful when you have many similar notes but age doesn't matter.
+- **Temporal decay only** -- useful when recency matters but your results are already diverse.
+- **Both** -- recommended for agents with large, long-running daily note histories.
+
+## Embedding cache
+
+OpenClaw can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.
+
+Config:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      cache: {
+        enabled: true,
+        maxEntries: 50000
+      }
+    }
+  }
+}
+```
+
+## Session memory search (experimental)
+
+You can optionally index **session transcripts** and surface them via `memory_search`.
+This is gated behind an experimental flag.
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      experimental: { sessionMemory: true },
+      sources: ["memory", "sessions"]
+    }
+  }
+}
+```
+
+Notes:
+
+- Session indexing is **opt-in** (off by default).
+- Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort).
+- `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes.
+- Results still include snippets only; `memory_get` remains limited to memory files.
+- Session indexing is isolated per agent (only that agent's session logs are indexed).
+- Session logs live on disk (`~/.openclaw/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
+
+Delta thresholds (defaults shown):
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      sync: {
+        sessions: {
+          deltaBytes: 100000,   // ~100 KB
+          deltaMessages: 50     // JSONL lines
+        }
+      }
+    }
+  }
+}
+```
+
+## SQLite vector acceleration (sqlite-vec)
+
+When the sqlite-vec extension is available, OpenClaw stores embeddings in a
+SQLite virtual table (`vec0`) and performs vector distance queries in the
+database. This keeps search fast without loading every embedding into JS.
+
+Configuration (optional):
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      store: {
+        vector: {
+          enabled: true,
+          extensionPath: "/path/to/sqlite-vec"
+        }
+      }
+    }
+  }
+}
+```
+
+Notes:
+
+- `enabled` defaults to true; when disabled, search falls back to in-process
+  cosine similarity over stored embeddings.
+- If the sqlite-vec extension is missing or fails to load, OpenClaw logs the
+  error and continues with the JS fallback (no vector table).
+- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds
+  or non-standard install locations).
+
+## Local embedding auto-download
+
+- Default local embedding model: `hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf` (~0.6 GB).
+- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry.
+- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`.
+- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason.
+
+## Custom OpenAI-compatible endpoint example
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      remote: {
+        baseUrl: "https://api.example.com/v1/",
+        apiKey: "YOUR_REMOTE_API_KEY",
+        headers: {
+          "X-Organization": "org-id",
+          "X-Project": "project-id"
+        }
+      }
+    }
+  }
+}
+```
+
+Notes:
+
+- `remote.*` takes precedence over `models.providers.openai.*`.
+- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.