From b28cf6a8a4d5096b31e2ae3d966788353ea33742 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Thu, 19 Mar 2026 10:57:47 -0700 Subject: [PATCH] docs: split memory.md into concept intro + reference page --- docs/concepts/memory.md | 722 +------------------------------- docs/reference/memory-config.md | 711 +++++++++++++++++++++++++++++++ 2 files changed, 725 insertions(+), 708 deletions(-) create mode 100644 docs/reference/memory-config.md diff --git a/docs/concepts/memory.md b/docs/concepts/memory.md index e020d4a9a49..f1f60071bc2 100644 --- a/docs/concepts/memory.md +++ b/docs/concepts/memory.md @@ -34,8 +34,8 @@ These files live under the workspace (`agents.defaults.workspace`, default OpenClaw exposes two agent-facing tools for these Markdown files: -- `memory_search` — semantic recall over indexed snippets. -- `memory_get` — targeted read of a specific Markdown file/line range. +- `memory_search` -- semantic recall over indexed snippets. +- `memory_get` -- targeted read of a specific Markdown file/line range. `memory_get` now **degrades gracefully when a file doesn't exist** (for example, today's daily log before the first write). Both the builtin manager and the QMD @@ -94,709 +94,15 @@ For the full compaction lifecycle, see ## Vector memory search OpenClaw can build a small vector index over `MEMORY.md` and `memory/*.md` so -semantic queries can find related notes even when wording differs. - -Defaults: - -- Enabled by default. -- Watches memory files for changes (debounced). -- Configure memory search under `agents.defaults.memorySearch` (not top-level - `memorySearch`). -- Uses remote embeddings by default. If `memorySearch.provider` is not set, OpenClaw auto-selects: - 1. `local` if a `memorySearch.local.modelPath` is configured and the file exists. - 2. `openai` if an OpenAI key can be resolved. - 3. `gemini` if a Gemini key can be resolved. - 4. `voyage` if a Voyage key can be resolved. - 5. `mistral` if a Mistral key can be resolved. - 6. Otherwise memory search stays disabled until configured. -- Local mode uses node-llama-cpp and may require `pnpm approve-builds`. -- Uses sqlite-vec (when available) to accelerate vector search inside SQLite. -- `memorySearch.provider = "ollama"` is also supported for local/self-hosted - Ollama embeddings (`/api/embeddings`), but it is not auto-selected. - -Remote embeddings **require** an API key for the embedding provider. OpenClaw -resolves keys from auth profiles, `models.providers.*.apiKey`, or environment -variables. Codex OAuth only covers chat/completions and does **not** satisfy -embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or -`models.providers.google.apiKey`. For Voyage, use `VOYAGE_API_KEY` or -`models.providers.voyage.apiKey`. For Mistral, use `MISTRAL_API_KEY` or -`models.providers.mistral.apiKey`. Ollama typically does not require a real API -key (a placeholder like `OLLAMA_API_KEY=ollama-local` is enough when needed by -local policy). -When using a custom OpenAI-compatible endpoint, -set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`). - -### QMD backend (experimental) - -Set `memory.backend = "qmd"` to swap the built-in SQLite indexer for -[QMD](https://github.com/tobi/qmd): a local-first search sidecar that combines -BM25 + vectors + reranking. Markdown stays the source of truth; OpenClaw shells -out to QMD for retrieval. Key points: - -**Prereqs** - -- Disabled by default. Opt in per-config (`memory.backend = "qmd"`). -- Install the QMD CLI separately (`bun install -g https://github.com/tobi/qmd` or grab - a release) and make sure the `qmd` binary is on the gateway’s `PATH`. -- QMD needs an SQLite build that allows extensions (`brew install sqlite` on - macOS). -- QMD runs fully locally via Bun + `node-llama-cpp` and auto-downloads GGUF - models from HuggingFace on first use (no separate Ollama daemon required). -- The gateway runs QMD in a self-contained XDG home under - `~/.openclaw/agents//qmd/` by setting `XDG_CONFIG_HOME` and - `XDG_CACHE_HOME`. -- OS support: macOS and Linux work out of the box once Bun + SQLite are - installed. Windows is best supported via WSL2. - -**How the sidecar runs** - -- The gateway writes a self-contained QMD home under - `~/.openclaw/agents//qmd/` (config + cache + sqlite DB). -- Collections are created via `qmd collection add` from `memory.qmd.paths` - (plus default workspace memory files), then `qmd update` + `qmd embed` run - on boot and on a configurable interval (`memory.qmd.update.interval`, - default 5 m). -- The gateway now initializes the QMD manager on startup, so periodic update - timers are armed even before the first `memory_search` call. -- Boot refresh now runs in the background by default so chat startup is not - blocked; set `memory.qmd.update.waitForBootSync = true` to keep the previous - blocking behavior. -- Searches run via `memory.qmd.searchMode` (default `qmd search --json`; also - supports `vsearch` and `query`). If the selected mode rejects flags on your - QMD build, OpenClaw retries with `qmd query`. If QMD fails or the binary is - missing, OpenClaw automatically falls back to the builtin SQLite manager so - memory tools keep working. -- OpenClaw does not expose QMD embed batch-size tuning today; batch behavior is - controlled by QMD itself. -- **First search may be slow**: QMD may download local GGUF models (reranker/query - expansion) on the first `qmd query` run. - - OpenClaw sets `XDG_CONFIG_HOME`/`XDG_CACHE_HOME` automatically when it runs QMD. - - If you want to pre-download models manually (and warm the same index OpenClaw - uses), run a one-off query with the agent’s XDG dirs. - - OpenClaw’s QMD state lives under your **state dir** (defaults to `~/.openclaw`). - You can point `qmd` at the exact same index by exporting the same XDG vars - OpenClaw uses: - - ```bash - # Pick the same state dir OpenClaw uses - STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}" - - export XDG_CONFIG_HOME="$STATE_DIR/agents/main/qmd/xdg-config" - export XDG_CACHE_HOME="$STATE_DIR/agents/main/qmd/xdg-cache" - - # (Optional) force an index refresh + embeddings - qmd update - qmd embed - - # Warm up / trigger first-time model downloads - qmd query "test" -c memory-root --json >/dev/null 2>&1 - ``` - -**Config surface (`memory.qmd.*`)** - -- `command` (default `qmd`): override the executable path. -- `searchMode` (default `search`): pick which QMD command backs - `memory_search` (`search`, `vsearch`, `query`). -- `includeDefaultMemory` (default `true`): auto-index `MEMORY.md` + `memory/**/*.md`. -- `paths[]`: add extra directories/files (`path`, optional `pattern`, optional - stable `name`). -- `sessions`: opt into session JSONL indexing (`enabled`, `retentionDays`, - `exportDir`). -- `update`: controls refresh cadence and maintenance execution: - (`interval`, `debounceMs`, `onBoot`, `waitForBootSync`, `embedInterval`, - `commandTimeoutMs`, `updateTimeoutMs`, `embedTimeoutMs`). -- `limits`: clamp recall payload (`maxResults`, `maxSnippetChars`, - `maxInjectedChars`, `timeoutMs`). -- `scope`: same schema as [`session.sendPolicy`](/gateway/configuration-reference#session). - Default is DM-only (`deny` all, `allow` direct chats); loosen it to surface QMD - hits in groups/channels. - - `match.keyPrefix` matches the **normalized** session key (lowercased, with any - leading `agent::` stripped). Example: `discord:channel:`. - - `match.rawKeyPrefix` matches the **raw** session key (lowercased), including - `agent::`. Example: `agent:main:discord:`. - - Legacy: `match.keyPrefix: "agent:..."` is still treated as a raw-key prefix, - but prefer `rawKeyPrefix` for clarity. -- When `scope` denies a search, OpenClaw logs a warning with the derived - `channel`/`chatType` so empty results are easier to debug. -- Snippets sourced outside the workspace show up as - `qmd//` in `memory_search` results; `memory_get` - understands that prefix and reads from the configured QMD collection root. -- When `memory.qmd.sessions.enabled = true`, OpenClaw exports sanitized session - transcripts (User/Assistant turns) into a dedicated QMD collection under - `~/.openclaw/agents//qmd/sessions/`, so `memory_search` can recall recent - conversations without touching the builtin SQLite index. -- `memory_search` snippets now include a `Source: ` footer when - `memory.citations` is `auto`/`on`; set `memory.citations = "off"` to keep - the path metadata internal (the agent still receives the path for - `memory_get`, but the snippet text omits the footer and the system prompt - warns the agent not to cite it). - -**Example** - -```json5 -memory: { - backend: "qmd", - citations: "auto", - qmd: { - includeDefaultMemory: true, - update: { interval: "5m", debounceMs: 15000 }, - limits: { maxResults: 6, timeoutMs: 4000 }, - scope: { - default: "deny", - rules: [ - { action: "allow", match: { chatType: "direct" } }, - // Normalized session-key prefix (strips `agent::`). - { action: "deny", match: { keyPrefix: "discord:channel:" } }, - // Raw session-key prefix (includes `agent::`). - { action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } }, - ] - }, - paths: [ - { name: "docs", path: "~/notes", pattern: "**/*.md" } - ] - } -} -``` - -**Citations & fallback** - -- `memory.citations` applies regardless of backend (`auto`/`on`/`off`). -- When `qmd` runs, we tag `status().backend = "qmd"` so diagnostics show which - engine served the results. If the QMD subprocess exits or JSON output can’t be - parsed, the search manager logs a warning and returns the builtin provider - (existing Markdown embeddings) until QMD recovers. - -### Additional memory paths - -If you want to index Markdown files outside the default workspace layout, add -explicit paths: - -```json5 -agents: { - defaults: { - memorySearch: { - extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"] - } - } -} -``` - -Notes: - -- Paths can be absolute or workspace-relative. -- Directories are scanned recursively for `.md` files. -- By default, only Markdown files are indexed. -- If `memorySearch.multimodal.enabled = true`, OpenClaw also indexes supported image/audio files under `extraPaths` only. Default memory roots (`MEMORY.md`, `memory.md`, `memory/**/*.md`) stay Markdown-only. -- Symlinks are ignored (files or directories). - -### Multimodal memory files (Gemini image + audio) - -OpenClaw can index image and audio files from `memorySearch.extraPaths` when using Gemini embedding 2: - -```json5 -agents: { - defaults: { - memorySearch: { - provider: "gemini", - model: "gemini-embedding-2-preview", - extraPaths: ["assets/reference", "voice-notes"], - multimodal: { - enabled: true, - modalities: ["image", "audio"], // or ["all"] - maxFileBytes: 10000000 - }, - remote: { - apiKey: "YOUR_GEMINI_API_KEY" - } - } - } -} -``` - -Notes: - -- Multimodal memory is currently supported only for `gemini-embedding-2-preview`. -- Multimodal indexing applies only to files discovered through `memorySearch.extraPaths`. -- Supported modalities in this phase: image and audio. -- `memorySearch.fallback` must stay `"none"` while multimodal memory is enabled. -- Matching image/audio file bytes are uploaded to the configured Gemini embedding endpoint during indexing. -- Supported image extensions: `.jpg`, `.jpeg`, `.png`, `.webp`, `.gif`, `.heic`, `.heif`. -- Supported audio extensions: `.mp3`, `.wav`, `.ogg`, `.opus`, `.m4a`, `.aac`, `.flac`. -- Search queries remain text, but Gemini can compare those text queries against indexed image/audio embeddings. -- `memory_get` still reads Markdown only; binary files are searchable but not returned as raw file contents. - -### Gemini embeddings (native) - -Set the provider to `gemini` to use the Gemini embeddings API directly: - -```json5 -agents: { - defaults: { - memorySearch: { - provider: "gemini", - model: "gemini-embedding-001", - remote: { - apiKey: "YOUR_GEMINI_API_KEY" - } - } - } -} -``` - -Notes: - -- `remote.baseUrl` is optional (defaults to the Gemini API base URL). -- `remote.headers` lets you add extra headers if needed. -- Default model: `gemini-embedding-001`. -- `gemini-embedding-2-preview` is also supported: 8192 token limit and configurable dimensions (768 / 1536 / 3072, default 3072). - -#### Gemini Embedding 2 (preview) - -```json5 -agents: { - defaults: { - memorySearch: { - provider: "gemini", - model: "gemini-embedding-2-preview", - outputDimensionality: 3072, // optional: 768, 1536, or 3072 (default) - remote: { - apiKey: "YOUR_GEMINI_API_KEY" - } - } - } -} -``` - -> **⚠️ Re-index required:** Switching from `gemini-embedding-001` (768 dimensions) -> to `gemini-embedding-2-preview` (3072 dimensions) changes the vector size. The same is true if you -> change `outputDimensionality` between 768, 1536, and 3072. -> OpenClaw will automatically reindex when it detects a model or dimension change. - -If you want to use a **custom OpenAI-compatible endpoint** (OpenRouter, vLLM, or a proxy), -you can use the `remote` configuration with the OpenAI provider: - -```json5 -agents: { - defaults: { - memorySearch: { - provider: "openai", - model: "text-embedding-3-small", - remote: { - baseUrl: "https://api.example.com/v1/", - apiKey: "YOUR_OPENAI_COMPAT_API_KEY", - headers: { "X-Custom-Header": "value" } - } - } - } -} -``` - -If you don't want to set an API key, use `memorySearch.provider = "local"` or set -`memorySearch.fallback = "none"`. - -Fallbacks: - -- `memorySearch.fallback` can be `openai`, `gemini`, `voyage`, `mistral`, `ollama`, `local`, or `none`. -- The fallback provider is only used when the primary embedding provider fails. - -Batch indexing (OpenAI + Gemini + Voyage): - -- Disabled by default. Set `agents.defaults.memorySearch.remote.batch.enabled = true` to enable for large-corpus indexing (OpenAI, Gemini, and Voyage). -- Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed. -- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2). -- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key. -- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability. - -Why OpenAI batch is fast + cheap: - -- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously. -- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously. -- See the OpenAI Batch API docs and pricing for details: - - [https://platform.openai.com/docs/api-reference/batch](https://platform.openai.com/docs/api-reference/batch) - - [https://platform.openai.com/pricing](https://platform.openai.com/pricing) - -Config example: - -```json5 -agents: { - defaults: { - memorySearch: { - provider: "openai", - model: "text-embedding-3-small", - fallback: "openai", - remote: { - batch: { enabled: true, concurrency: 2 } - }, - sync: { watch: true } - } - } -} -``` - -Tools: - -- `memory_search` — returns snippets with file + line ranges. -- `memory_get` — read memory file content by path. - -Local mode: - -- Set `agents.defaults.memorySearch.provider = "local"`. -- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI). -- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback. - -### How the memory tools work - -- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned. -- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected. -- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent. - -### What gets indexed (and when) - -- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`). -- Index storage: per-agent SQLite at `~/.openclaw/memory/.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token). -- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync. -- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, OpenClaw automatically resets and reindexes the entire store. - -### Hybrid search (BM25 + vector) - -When enabled, OpenClaw combines: - -- **Vector similarity** (semantic match, wording can differ) -- **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols) - -If full-text search is unavailable on your platform, OpenClaw falls back to vector-only search. - -#### Why hybrid? - -Vector search is great at “this means the same thing”: - -- “Mac Studio gateway host” vs “the machine running the gateway” -- “debounce file updates” vs “avoid indexing on every write” - -But it can be weak at exact, high-signal tokens: - -- IDs (`a828e60`, `b3b9895a…`) -- code symbols (`memorySearch.query.hybrid`) -- error strings ("sqlite-vec unavailable") - -BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases. -Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get -good results for both "natural language" queries and "needle in a haystack" queries. - -#### How we merge results (the current design) - -Implementation sketch: - -1. Retrieve a candidate pool from both sides: - -- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity. -- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better). - -2. Convert BM25 rank into a 0..1-ish score: - -- `textScore = 1 / (1 + max(0, bm25Rank))` - -3. Union candidates by chunk id and compute a weighted score: - -- `finalScore = vectorWeight * vectorScore + textWeight * textScore` - -Notes: - -- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages. -- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches. -- If FTS5 can't be created, we keep vector-only search (no hard failure). - -This isn't "IR-theory perfect", but it's simple, fast, and tends to improve recall/precision on real notes. -If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization -(min/max or z-score) before mixing. - -#### Post-processing pipeline - -After merging vector and keyword scores, two optional post-processing stages -refine the result list before it reaches the agent: - -``` -Vector + Keyword → Weighted Merge → Temporal Decay → Sort → MMR → Top-K Results -``` - -Both stages are **off by default** and can be enabled independently. - -#### MMR re-ranking (diversity) - -When hybrid search returns results, multiple chunks may contain similar or overlapping content. -For example, searching for "home network setup" might return five nearly identical snippets -from different daily notes that all mention the same router configuration. - -**MMR (Maximal Marginal Relevance)** re-ranks the results to balance relevance with diversity, -ensuring the top results cover different aspects of the query instead of repeating the same information. - -How it works: - -1. Results are scored by their original relevance (vector + BM25 weighted score). -2. MMR iteratively selects results that maximize: `λ × relevance − (1−λ) × max_similarity_to_selected`. -3. Similarity between results is measured using Jaccard text similarity on tokenized content. - -The `lambda` parameter controls the trade-off: - -- `lambda = 1.0` → pure relevance (no diversity penalty) -- `lambda = 0.0` → maximum diversity (ignores relevance) -- Default: `0.7` (balanced, slight relevance bias) - -**Example — query: "home network setup"** - -Given these memory files: - -``` -memory/2026-02-10.md → "Configured Omada router, set VLAN 10 for IoT devices" -memory/2026-02-08.md → "Configured Omada router, moved IoT to VLAN 10" -memory/2026-02-05.md → "Set up AdGuard DNS on 192.168.10.2" -memory/network.md → "Router: Omada ER605, AdGuard: 192.168.10.2, VLAN 10: IoT" -``` - -Without MMR — top 3 results: - -``` -1. memory/2026-02-10.md (score: 0.92) ← router + VLAN -2. memory/2026-02-08.md (score: 0.89) ← router + VLAN (near-duplicate!) -3. memory/network.md (score: 0.85) ← reference doc -``` - -With MMR (λ=0.7) — top 3 results: - -``` -1. memory/2026-02-10.md (score: 0.92) ← router + VLAN -2. memory/network.md (score: 0.85) ← reference doc (diverse!) -3. memory/2026-02-05.md (score: 0.78) ← AdGuard DNS (diverse!) -``` - -The near-duplicate from Feb 8 drops out, and the agent gets three distinct pieces of information. - -**When to enable:** If you notice `memory_search` returning redundant or near-duplicate snippets, -especially with daily notes that often repeat similar information across days. - -#### Temporal decay (recency boost) - -Agents with daily notes accumulate hundreds of dated files over time. Without decay, -a well-worded note from six months ago can outrank yesterday's update on the same topic. - -**Temporal decay** applies an exponential multiplier to scores based on the age of each result, -so recent memories naturally rank higher while old ones fade: - -``` -decayedScore = score × e^(-λ × ageInDays) -``` - -where `λ = ln(2) / halfLifeDays`. - -With the default half-life of 30 days: - -- Today's notes: **100%** of original score -- 7 days ago: **~84%** -- 30 days ago: **50%** -- 90 days ago: **12.5%** -- 180 days ago: **~1.6%** - -**Evergreen files are never decayed:** - -- `MEMORY.md` (root memory file) -- Non-dated files in `memory/` (e.g., `memory/projects.md`, `memory/network.md`) -- These contain durable reference information that should always rank normally. - -**Dated daily files** (`memory/YYYY-MM-DD.md`) use the date extracted from the filename. -Other sources (e.g., session transcripts) fall back to file modification time (`mtime`). - -**Example — query: "what's Rod's work schedule?"** - -Given these memory files (today is Feb 10): - -``` -memory/2025-09-15.md → "Rod works Mon-Fri, standup at 10am, pairing at 2pm" (148 days old) -memory/2026-02-10.md → "Rod has standup at 14:15, 1:1 with Zeb at 14:45" (today) -memory/2026-02-03.md → "Rod started new team, standup moved to 14:15" (7 days old) -``` - -Without decay: - -``` -1. memory/2025-09-15.md (score: 0.91) ← best semantic match, but stale! -2. memory/2026-02-10.md (score: 0.82) -3. memory/2026-02-03.md (score: 0.80) -``` - -With decay (halfLife=30): - -``` -1. memory/2026-02-10.md (score: 0.82 × 1.00 = 0.82) ← today, no decay -2. memory/2026-02-03.md (score: 0.80 × 0.85 = 0.68) ← 7 days, mild decay -3. memory/2025-09-15.md (score: 0.91 × 0.03 = 0.03) ← 148 days, nearly gone -``` - -The stale September note drops to the bottom despite having the best raw semantic match. - -**When to enable:** If your agent has months of daily notes and you find that old, -stale information outranks recent context. A half-life of 30 days works well for -daily-note-heavy workflows; increase it (e.g., 90 days) if you reference older notes frequently. - -#### Configuration - -Both features are configured under `memorySearch.query.hybrid`: - -```json5 -agents: { - defaults: { - memorySearch: { - query: { - hybrid: { - enabled: true, - vectorWeight: 0.7, - textWeight: 0.3, - candidateMultiplier: 4, - // Diversity: reduce redundant results - mmr: { - enabled: true, // default: false - lambda: 0.7 // 0 = max diversity, 1 = max relevance - }, - // Recency: boost newer memories - temporalDecay: { - enabled: true, // default: false - halfLifeDays: 30 // score halves every 30 days - } - } - } - } - } -} -``` - -You can enable either feature independently: - -- **MMR only** — useful when you have many similar notes but age doesn't matter. -- **Temporal decay only** — useful when recency matters but your results are already diverse. -- **Both** — recommended for agents with large, long-running daily note histories. - -### Embedding cache - -OpenClaw can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text. - -Config: - -```json5 -agents: { - defaults: { - memorySearch: { - cache: { - enabled: true, - maxEntries: 50000 - } - } - } -} -``` - -### Session memory search (experimental) - -You can optionally index **session transcripts** and surface them via `memory_search`. -This is gated behind an experimental flag. - -```json5 -agents: { - defaults: { - memorySearch: { - experimental: { sessionMemory: true }, - sources: ["memory", "sessions"] - } - } -} -``` - -Notes: - -- Session indexing is **opt-in** (off by default). -- Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort). -- `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes. -- Results still include snippets only; `memory_get` remains limited to memory files. -- Session indexing is isolated per agent (only that agent’s session logs are indexed). -- Session logs live on disk (`~/.openclaw/agents//sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts. - -Delta thresholds (defaults shown): - -```json5 -agents: { - defaults: { - memorySearch: { - sync: { - sessions: { - deltaBytes: 100000, // ~100 KB - deltaMessages: 50 // JSONL lines - } - } - } - } -} -``` - -### SQLite vector acceleration (sqlite-vec) - -When the sqlite-vec extension is available, OpenClaw stores embeddings in a -SQLite virtual table (`vec0`) and performs vector distance queries in the -database. This keeps search fast without loading every embedding into JS. - -Configuration (optional): - -```json5 -agents: { - defaults: { - memorySearch: { - store: { - vector: { - enabled: true, - extensionPath: "/path/to/sqlite-vec" - } - } - } - } -} -``` - -Notes: - -- `enabled` defaults to true; when disabled, search falls back to in-process - cosine similarity over stored embeddings. -- If the sqlite-vec extension is missing or fails to load, OpenClaw logs the - error and continues with the JS fallback (no vector table). -- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds - or non-standard install locations). - -### Local embedding auto-download - -- Default local embedding model: `hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf` (~0.6 GB). -- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry. -- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`. -- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason. - -### Custom OpenAI-compatible endpoint example - -```json5 -agents: { - defaults: { - memorySearch: { - provider: "openai", - model: "text-embedding-3-small", - remote: { - baseUrl: "https://api.example.com/v1/", - apiKey: "YOUR_REMOTE_API_KEY", - headers: { - "X-Organization": "org-id", - "X-Project": "project-id" - } - } - } - } -} -``` - -Notes: - -- `remote.*` takes precedence over `models.providers.openai.*`. -- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults. +semantic queries can find related notes even when wording differs. Hybrid search +(BM25 + vector) is available for combining semantic matching with exact keyword +lookups. + +Memory search supports multiple embedding providers (OpenAI, Gemini, Voyage, +Mistral, Ollama, and local GGUF models), an optional QMD sidecar backend for +advanced retrieval, and post-processing features like MMR diversity re-ranking +and temporal decay. + +For the full configuration reference -- including embedding provider setup, QMD +backend, hybrid search tuning, multimodal memory, and all config knobs -- see +[Memory configuration reference](/reference/memory-config). diff --git a/docs/reference/memory-config.md b/docs/reference/memory-config.md new file mode 100644 index 00000000000..a25f57dc868 --- /dev/null +++ b/docs/reference/memory-config.md @@ -0,0 +1,711 @@ +--- +title: "Memory configuration reference" +summary: "Full configuration reference for OpenClaw memory search, embedding providers, QMD backend, hybrid search, and multimodal memory" +read_when: + - You want to configure memory search providers or embedding models + - You want to set up the QMD backend + - You want to tune hybrid search, MMR, or temporal decay + - You want to enable multimodal memory indexing +--- + +# Memory configuration reference + +This page covers the full configuration surface for OpenClaw memory search. For +the conceptual overview (file layout, memory tools, when to write memory, and the +automatic flush), see [Memory](/concepts/memory). + +## Memory search defaults + +- Enabled by default. +- Watches memory files for changes (debounced). +- Configure memory search under `agents.defaults.memorySearch` (not top-level + `memorySearch`). +- Uses remote embeddings by default. If `memorySearch.provider` is not set, OpenClaw auto-selects: + 1. `local` if a `memorySearch.local.modelPath` is configured and the file exists. + 2. `openai` if an OpenAI key can be resolved. + 3. `gemini` if a Gemini key can be resolved. + 4. `voyage` if a Voyage key can be resolved. + 5. `mistral` if a Mistral key can be resolved. + 6. Otherwise memory search stays disabled until configured. +- Local mode uses node-llama-cpp and may require `pnpm approve-builds`. +- Uses sqlite-vec (when available) to accelerate vector search inside SQLite. +- `memorySearch.provider = "ollama"` is also supported for local/self-hosted + Ollama embeddings (`/api/embeddings`), but it is not auto-selected. + +Remote embeddings **require** an API key for the embedding provider. OpenClaw +resolves keys from auth profiles, `models.providers.*.apiKey`, or environment +variables. Codex OAuth only covers chat/completions and does **not** satisfy +embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or +`models.providers.google.apiKey`. For Voyage, use `VOYAGE_API_KEY` or +`models.providers.voyage.apiKey`. For Mistral, use `MISTRAL_API_KEY` or +`models.providers.mistral.apiKey`. Ollama typically does not require a real API +key (a placeholder like `OLLAMA_API_KEY=ollama-local` is enough when needed by +local policy). +When using a custom OpenAI-compatible endpoint, +set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`). + +## QMD backend (experimental) + +Set `memory.backend = "qmd"` to swap the built-in SQLite indexer for +[QMD](https://github.com/tobi/qmd): a local-first search sidecar that combines +BM25 + vectors + reranking. Markdown stays the source of truth; OpenClaw shells +out to QMD for retrieval. Key points: + +### Prerequisites + +- Disabled by default. Opt in per-config (`memory.backend = "qmd"`). +- Install the QMD CLI separately (`bun install -g https://github.com/tobi/qmd` or grab + a release) and make sure the `qmd` binary is on the gateway's `PATH`. +- QMD needs an SQLite build that allows extensions (`brew install sqlite` on + macOS). +- QMD runs fully locally via Bun + `node-llama-cpp` and auto-downloads GGUF + models from HuggingFace on first use (no separate Ollama daemon required). +- The gateway runs QMD in a self-contained XDG home under + `~/.openclaw/agents//qmd/` by setting `XDG_CONFIG_HOME` and + `XDG_CACHE_HOME`. +- OS support: macOS and Linux work out of the box once Bun + SQLite are + installed. Windows is best supported via WSL2. + +### How the sidecar runs + +- The gateway writes a self-contained QMD home under + `~/.openclaw/agents//qmd/` (config + cache + sqlite DB). +- Collections are created via `qmd collection add` from `memory.qmd.paths` + (plus default workspace memory files), then `qmd update` + `qmd embed` run + on boot and on a configurable interval (`memory.qmd.update.interval`, + default 5 m). +- The gateway now initializes the QMD manager on startup, so periodic update + timers are armed even before the first `memory_search` call. +- Boot refresh now runs in the background by default so chat startup is not + blocked; set `memory.qmd.update.waitForBootSync = true` to keep the previous + blocking behavior. +- Searches run via `memory.qmd.searchMode` (default `qmd search --json`; also + supports `vsearch` and `query`). If the selected mode rejects flags on your + QMD build, OpenClaw retries with `qmd query`. If QMD fails or the binary is + missing, OpenClaw automatically falls back to the builtin SQLite manager so + memory tools keep working. +- OpenClaw does not expose QMD embed batch-size tuning today; batch behavior is + controlled by QMD itself. +- **First search may be slow**: QMD may download local GGUF models (reranker/query + expansion) on the first `qmd query` run. + - OpenClaw sets `XDG_CONFIG_HOME`/`XDG_CACHE_HOME` automatically when it runs QMD. + - If you want to pre-download models manually (and warm the same index OpenClaw + uses), run a one-off query with the agent's XDG dirs. + + OpenClaw's QMD state lives under your **state dir** (defaults to `~/.openclaw`). + You can point `qmd` at the exact same index by exporting the same XDG vars + OpenClaw uses: + + ```bash + # Pick the same state dir OpenClaw uses + STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}" + + export XDG_CONFIG_HOME="$STATE_DIR/agents/main/qmd/xdg-config" + export XDG_CACHE_HOME="$STATE_DIR/agents/main/qmd/xdg-cache" + + # (Optional) force an index refresh + embeddings + qmd update + qmd embed + + # Warm up / trigger first-time model downloads + qmd query "test" -c memory-root --json >/dev/null 2>&1 + ``` + +### Config surface (`memory.qmd.*`) + +- `command` (default `qmd`): override the executable path. +- `searchMode` (default `search`): pick which QMD command backs + `memory_search` (`search`, `vsearch`, `query`). +- `includeDefaultMemory` (default `true`): auto-index `MEMORY.md` + `memory/**/*.md`. +- `paths[]`: add extra directories/files (`path`, optional `pattern`, optional + stable `name`). +- `sessions`: opt into session JSONL indexing (`enabled`, `retentionDays`, + `exportDir`). +- `update`: controls refresh cadence and maintenance execution: + (`interval`, `debounceMs`, `onBoot`, `waitForBootSync`, `embedInterval`, + `commandTimeoutMs`, `updateTimeoutMs`, `embedTimeoutMs`). +- `limits`: clamp recall payload (`maxResults`, `maxSnippetChars`, + `maxInjectedChars`, `timeoutMs`). +- `scope`: same schema as [`session.sendPolicy`](/gateway/configuration-reference#session). + Default is DM-only (`deny` all, `allow` direct chats); loosen it to surface QMD + hits in groups/channels. + - `match.keyPrefix` matches the **normalized** session key (lowercased, with any + leading `agent::` stripped). Example: `discord:channel:`. + - `match.rawKeyPrefix` matches the **raw** session key (lowercased), including + `agent::`. Example: `agent:main:discord:`. + - Legacy: `match.keyPrefix: "agent:..."` is still treated as a raw-key prefix, + but prefer `rawKeyPrefix` for clarity. +- When `scope` denies a search, OpenClaw logs a warning with the derived + `channel`/`chatType` so empty results are easier to debug. +- Snippets sourced outside the workspace show up as + `qmd//` in `memory_search` results; `memory_get` + understands that prefix and reads from the configured QMD collection root. +- When `memory.qmd.sessions.enabled = true`, OpenClaw exports sanitized session + transcripts (User/Assistant turns) into a dedicated QMD collection under + `~/.openclaw/agents//qmd/sessions/`, so `memory_search` can recall recent + conversations without touching the builtin SQLite index. +- `memory_search` snippets now include a `Source: ` footer when + `memory.citations` is `auto`/`on`; set `memory.citations = "off"` to keep + the path metadata internal (the agent still receives the path for + `memory_get`, but the snippet text omits the footer and the system prompt + warns the agent not to cite it). + +### QMD example + +```json5 +memory: { + backend: "qmd", + citations: "auto", + qmd: { + includeDefaultMemory: true, + update: { interval: "5m", debounceMs: 15000 }, + limits: { maxResults: 6, timeoutMs: 4000 }, + scope: { + default: "deny", + rules: [ + { action: "allow", match: { chatType: "direct" } }, + // Normalized session-key prefix (strips `agent::`). + { action: "deny", match: { keyPrefix: "discord:channel:" } }, + // Raw session-key prefix (includes `agent::`). + { action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } }, + ] + }, + paths: [ + { name: "docs", path: "~/notes", pattern: "**/*.md" } + ] + } +} +``` + +### Citations and fallback + +- `memory.citations` applies regardless of backend (`auto`/`on`/`off`). +- When `qmd` runs, we tag `status().backend = "qmd"` so diagnostics show which + engine served the results. If the QMD subprocess exits or JSON output can't be + parsed, the search manager logs a warning and returns the builtin provider + (existing Markdown embeddings) until QMD recovers. + +## Additional memory paths + +If you want to index Markdown files outside the default workspace layout, add +explicit paths: + +```json5 +agents: { + defaults: { + memorySearch: { + extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"] + } + } +} +``` + +Notes: + +- Paths can be absolute or workspace-relative. +- Directories are scanned recursively for `.md` files. +- By default, only Markdown files are indexed. +- If `memorySearch.multimodal.enabled = true`, OpenClaw also indexes supported image/audio files under `extraPaths` only. Default memory roots (`MEMORY.md`, `memory.md`, `memory/**/*.md`) stay Markdown-only. +- Symlinks are ignored (files or directories). + +## Multimodal memory files (Gemini image + audio) + +OpenClaw can index image and audio files from `memorySearch.extraPaths` when using Gemini embedding 2: + +```json5 +agents: { + defaults: { + memorySearch: { + provider: "gemini", + model: "gemini-embedding-2-preview", + extraPaths: ["assets/reference", "voice-notes"], + multimodal: { + enabled: true, + modalities: ["image", "audio"], // or ["all"] + maxFileBytes: 10000000 + }, + remote: { + apiKey: "YOUR_GEMINI_API_KEY" + } + } + } +} +``` + +Notes: + +- Multimodal memory is currently supported only for `gemini-embedding-2-preview`. +- Multimodal indexing applies only to files discovered through `memorySearch.extraPaths`. +- Supported modalities in this phase: image and audio. +- `memorySearch.fallback` must stay `"none"` while multimodal memory is enabled. +- Matching image/audio file bytes are uploaded to the configured Gemini embedding endpoint during indexing. +- Supported image extensions: `.jpg`, `.jpeg`, `.png`, `.webp`, `.gif`, `.heic`, `.heif`. +- Supported audio extensions: `.mp3`, `.wav`, `.ogg`, `.opus`, `.m4a`, `.aac`, `.flac`. +- Search queries remain text, but Gemini can compare those text queries against indexed image/audio embeddings. +- `memory_get` still reads Markdown only; binary files are searchable but not returned as raw file contents. + +## Gemini embeddings (native) + +Set the provider to `gemini` to use the Gemini embeddings API directly: + +```json5 +agents: { + defaults: { + memorySearch: { + provider: "gemini", + model: "gemini-embedding-001", + remote: { + apiKey: "YOUR_GEMINI_API_KEY" + } + } + } +} +``` + +Notes: + +- `remote.baseUrl` is optional (defaults to the Gemini API base URL). +- `remote.headers` lets you add extra headers if needed. +- Default model: `gemini-embedding-001`. +- `gemini-embedding-2-preview` is also supported: 8192 token limit and configurable dimensions (768 / 1536 / 3072, default 3072). + +### Gemini Embedding 2 (preview) + +```json5 +agents: { + defaults: { + memorySearch: { + provider: "gemini", + model: "gemini-embedding-2-preview", + outputDimensionality: 3072, // optional: 768, 1536, or 3072 (default) + remote: { + apiKey: "YOUR_GEMINI_API_KEY" + } + } + } +} +``` + +> **Re-index required:** Switching from `gemini-embedding-001` (768 dimensions) +> to `gemini-embedding-2-preview` (3072 dimensions) changes the vector size. The same is true if you +> change `outputDimensionality` between 768, 1536, and 3072. +> OpenClaw will automatically reindex when it detects a model or dimension change. + +## Custom OpenAI-compatible endpoint + +If you want to use a custom OpenAI-compatible endpoint (OpenRouter, vLLM, or a proxy), +you can use the `remote` configuration with the OpenAI provider: + +```json5 +agents: { + defaults: { + memorySearch: { + provider: "openai", + model: "text-embedding-3-small", + remote: { + baseUrl: "https://api.example.com/v1/", + apiKey: "YOUR_OPENAI_COMPAT_API_KEY", + headers: { "X-Custom-Header": "value" } + } + } + } +} +``` + +If you don't want to set an API key, use `memorySearch.provider = "local"` or set +`memorySearch.fallback = "none"`. + +### Fallbacks + +- `memorySearch.fallback` can be `openai`, `gemini`, `voyage`, `mistral`, `ollama`, `local`, or `none`. +- The fallback provider is only used when the primary embedding provider fails. + +### Batch indexing (OpenAI + Gemini + Voyage) + +- Disabled by default. Set `agents.defaults.memorySearch.remote.batch.enabled = true` to enable for large-corpus indexing (OpenAI, Gemini, and Voyage). +- Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed. +- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2). +- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key. +- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability. + +Why OpenAI batch is fast and cheap: + +- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously. +- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously. +- See the OpenAI Batch API docs and pricing for details: + - [https://platform.openai.com/docs/api-reference/batch](https://platform.openai.com/docs/api-reference/batch) + - [https://platform.openai.com/pricing](https://platform.openai.com/pricing) + +Config example: + +```json5 +agents: { + defaults: { + memorySearch: { + provider: "openai", + model: "text-embedding-3-small", + fallback: "openai", + remote: { + batch: { enabled: true, concurrency: 2 } + }, + sync: { watch: true } + } + } +} +``` + +## How the memory tools work + +- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local to remote embeddings. No full file payload is returned. +- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected. +- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent. + +## What gets indexed (and when) + +- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`). +- Index storage: per-agent SQLite at `~/.openclaw/memory/.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token). +- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync. +- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, OpenClaw automatically resets and reindexes the entire store. + +## Hybrid search (BM25 + vector) + +When enabled, OpenClaw combines: + +- **Vector similarity** (semantic match, wording can differ) +- **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols) + +If full-text search is unavailable on your platform, OpenClaw falls back to vector-only search. + +### Why hybrid + +Vector search is great at "this means the same thing": + +- "Mac Studio gateway host" vs "the machine running the gateway" +- "debounce file updates" vs "avoid indexing on every write" + +But it can be weak at exact, high-signal tokens: + +- IDs (`a828e60`, `b3b9895a...`) +- code symbols (`memorySearch.query.hybrid`) +- error strings ("sqlite-vec unavailable") + +BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases. +Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get +good results for both "natural language" queries and "needle in a haystack" queries. + +### How we merge results (the current design) + +Implementation sketch: + +1. Retrieve a candidate pool from both sides: + +- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity. +- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better). + +2. Convert BM25 rank into a 0..1-ish score: + +- `textScore = 1 / (1 + max(0, bm25Rank))` + +3. Union candidates by chunk id and compute a weighted score: + +- `finalScore = vectorWeight * vectorScore + textWeight * textScore` + +Notes: + +- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages. +- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches. +- If FTS5 can't be created, we keep vector-only search (no hard failure). + +This isn't "IR-theory perfect", but it's simple, fast, and tends to improve recall/precision on real notes. +If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization +(min/max or z-score) before mixing. + +### Post-processing pipeline + +After merging vector and keyword scores, two optional post-processing stages +refine the result list before it reaches the agent: + +``` +Vector + Keyword -> Weighted Merge -> Temporal Decay -> Sort -> MMR -> Top-K Results +``` + +Both stages are **off by default** and can be enabled independently. + +### MMR re-ranking (diversity) + +When hybrid search returns results, multiple chunks may contain similar or overlapping content. +For example, searching for "home network setup" might return five nearly identical snippets +from different daily notes that all mention the same router configuration. + +**MMR (Maximal Marginal Relevance)** re-ranks the results to balance relevance with diversity, +ensuring the top results cover different aspects of the query instead of repeating the same information. + +How it works: + +1. Results are scored by their original relevance (vector + BM25 weighted score). +2. MMR iteratively selects results that maximize: `lambda x relevance - (1-lambda) x max_similarity_to_selected`. +3. Similarity between results is measured using Jaccard text similarity on tokenized content. + +The `lambda` parameter controls the trade-off: + +- `lambda = 1.0` -- pure relevance (no diversity penalty) +- `lambda = 0.0` -- maximum diversity (ignores relevance) +- Default: `0.7` (balanced, slight relevance bias) + +**Example -- query: "home network setup"** + +Given these memory files: + +``` +memory/2026-02-10.md -> "Configured Omada router, set VLAN 10 for IoT devices" +memory/2026-02-08.md -> "Configured Omada router, moved IoT to VLAN 10" +memory/2026-02-05.md -> "Set up AdGuard DNS on 192.168.10.2" +memory/network.md -> "Router: Omada ER605, AdGuard: 192.168.10.2, VLAN 10: IoT" +``` + +Without MMR -- top 3 results: + +``` +1. memory/2026-02-10.md (score: 0.92) <- router + VLAN +2. memory/2026-02-08.md (score: 0.89) <- router + VLAN (near-duplicate!) +3. memory/network.md (score: 0.85) <- reference doc +``` + +With MMR (lambda=0.7) -- top 3 results: + +``` +1. memory/2026-02-10.md (score: 0.92) <- router + VLAN +2. memory/network.md (score: 0.85) <- reference doc (diverse!) +3. memory/2026-02-05.md (score: 0.78) <- AdGuard DNS (diverse!) +``` + +The near-duplicate from Feb 8 drops out, and the agent gets three distinct pieces of information. + +**When to enable:** If you notice `memory_search` returning redundant or near-duplicate snippets, +especially with daily notes that often repeat similar information across days. + +### Temporal decay (recency boost) + +Agents with daily notes accumulate hundreds of dated files over time. Without decay, +a well-worded note from six months ago can outrank yesterday's update on the same topic. + +**Temporal decay** applies an exponential multiplier to scores based on the age of each result, +so recent memories naturally rank higher while old ones fade: + +``` +decayedScore = score x e^(-lambda x ageInDays) +``` + +where `lambda = ln(2) / halfLifeDays`. + +With the default half-life of 30 days: + +- Today's notes: **100%** of original score +- 7 days ago: **~84%** +- 30 days ago: **50%** +- 90 days ago: **12.5%** +- 180 days ago: **~1.6%** + +**Evergreen files are never decayed:** + +- `MEMORY.md` (root memory file) +- Non-dated files in `memory/` (e.g., `memory/projects.md`, `memory/network.md`) +- These contain durable reference information that should always rank normally. + +**Dated daily files** (`memory/YYYY-MM-DD.md`) use the date extracted from the filename. +Other sources (e.g., session transcripts) fall back to file modification time (`mtime`). + +**Example -- query: "what's Rod's work schedule?"** + +Given these memory files (today is Feb 10): + +``` +memory/2025-09-15.md -> "Rod works Mon-Fri, standup at 10am, pairing at 2pm" (148 days old) +memory/2026-02-10.md -> "Rod has standup at 14:15, 1:1 with Zeb at 14:45" (today) +memory/2026-02-03.md -> "Rod started new team, standup moved to 14:15" (7 days old) +``` + +Without decay: + +``` +1. memory/2025-09-15.md (score: 0.91) <- best semantic match, but stale! +2. memory/2026-02-10.md (score: 0.82) +3. memory/2026-02-03.md (score: 0.80) +``` + +With decay (halfLife=30): + +``` +1. memory/2026-02-10.md (score: 0.82 x 1.00 = 0.82) <- today, no decay +2. memory/2026-02-03.md (score: 0.80 x 0.85 = 0.68) <- 7 days, mild decay +3. memory/2025-09-15.md (score: 0.91 x 0.03 = 0.03) <- 148 days, nearly gone +``` + +The stale September note drops to the bottom despite having the best raw semantic match. + +**When to enable:** If your agent has months of daily notes and you find that old, +stale information outranks recent context. A half-life of 30 days works well for +daily-note-heavy workflows; increase it (e.g., 90 days) if you reference older notes frequently. + +### Hybrid search configuration + +Both features are configured under `memorySearch.query.hybrid`: + +```json5 +agents: { + defaults: { + memorySearch: { + query: { + hybrid: { + enabled: true, + vectorWeight: 0.7, + textWeight: 0.3, + candidateMultiplier: 4, + // Diversity: reduce redundant results + mmr: { + enabled: true, // default: false + lambda: 0.7 // 0 = max diversity, 1 = max relevance + }, + // Recency: boost newer memories + temporalDecay: { + enabled: true, // default: false + halfLifeDays: 30 // score halves every 30 days + } + } + } + } + } +} +``` + +You can enable either feature independently: + +- **MMR only** -- useful when you have many similar notes but age doesn't matter. +- **Temporal decay only** -- useful when recency matters but your results are already diverse. +- **Both** -- recommended for agents with large, long-running daily note histories. + +## Embedding cache + +OpenClaw can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text. + +Config: + +```json5 +agents: { + defaults: { + memorySearch: { + cache: { + enabled: true, + maxEntries: 50000 + } + } + } +} +``` + +## Session memory search (experimental) + +You can optionally index **session transcripts** and surface them via `memory_search`. +This is gated behind an experimental flag. + +```json5 +agents: { + defaults: { + memorySearch: { + experimental: { sessionMemory: true }, + sources: ["memory", "sessions"] + } + } +} +``` + +Notes: + +- Session indexing is **opt-in** (off by default). +- Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort). +- `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes. +- Results still include snippets only; `memory_get` remains limited to memory files. +- Session indexing is isolated per agent (only that agent's session logs are indexed). +- Session logs live on disk (`~/.openclaw/agents//sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts. + +Delta thresholds (defaults shown): + +```json5 +agents: { + defaults: { + memorySearch: { + sync: { + sessions: { + deltaBytes: 100000, // ~100 KB + deltaMessages: 50 // JSONL lines + } + } + } + } +} +``` + +## SQLite vector acceleration (sqlite-vec) + +When the sqlite-vec extension is available, OpenClaw stores embeddings in a +SQLite virtual table (`vec0`) and performs vector distance queries in the +database. This keeps search fast without loading every embedding into JS. + +Configuration (optional): + +```json5 +agents: { + defaults: { + memorySearch: { + store: { + vector: { + enabled: true, + extensionPath: "/path/to/sqlite-vec" + } + } + } + } +} +``` + +Notes: + +- `enabled` defaults to true; when disabled, search falls back to in-process + cosine similarity over stored embeddings. +- If the sqlite-vec extension is missing or fails to load, OpenClaw logs the + error and continues with the JS fallback (no vector table). +- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds + or non-standard install locations). + +## Local embedding auto-download + +- Default local embedding model: `hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf` (~0.6 GB). +- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry. +- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`. +- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason. + +## Custom OpenAI-compatible endpoint example + +```json5 +agents: { + defaults: { + memorySearch: { + provider: "openai", + model: "text-embedding-3-small", + remote: { + baseUrl: "https://api.example.com/v1/", + apiKey: "YOUR_REMOTE_API_KEY", + headers: { + "X-Organization": "org-id", + "X-Project": "project-id" + } + } + } + } +} +``` + +Notes: + +- `remote.*` takes precedence over `models.providers.openai.*`. +- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.