fix(docs): add active memory speed recommendations

2026-05-06 09:30:43 +00:00 · 2026-04-16 00:55:04 -05:00
parent 6f5459364a
commit b10ae0bf13
1 changed files with 85 additions and 0 deletions
--- a/docs/concepts/active-memory.md
+++ b/docs/concepts/active-memory.md
@@ -116,6 +116,91 @@ What this means:
 - `config.promptStyle: "balanced"` uses the default general-purpose prompt style for `recent` mode
 - active memory still runs only on eligible interactive persistent chat sessions

+## Speed recommendations
+
+The simplest setup is to leave `config.model` unset and let Active Memory use
+the same model you already use for normal replies. That is the safest default
+because it follows your existing provider, auth, and model preferences.
+
+If you want Active Memory to feel faster, use a dedicated inference model
+instead of borrowing the main chat model.
+
+Example fast-provider setup:
+
+```json5
+models: {
+  providers: {
+    cerebras: {
+      baseUrl: "https://api.cerebras.ai/v1",
+      apiKey: "${CEREBRAS_API_KEY}",
+      api: "openai-completions",
+      models: [{ id: "gpt-oss-120b", name: "GPT OSS 120B (Cerebras)" }],
+    },
+  },
+},
+plugins: {
+  entries: {
+    "active-memory": {
+      enabled: true,
+      config: {
+        model: "cerebras/gpt-oss-120b",
+      },
+    },
+  },
+}
+```
+
+Fast-model options worth considering:
+
+- `cerebras/gpt-oss-120b` for a fast dedicated recall model with a narrow tool surface
+- your normal session model, by leaving `config.model` unset
+- a low-latency fallback model such as `google/gemini-3-flash` when you want a separate recall model without changing your primary chat model
+
+Why Cerebras is a strong speed-oriented option for Active Memory:
+
+- the Active Memory tool surface is narrow: it only calls `memory_search` and `memory_get`
+- recall quality matters, but latency matters more than for the main answer path
+- a dedicated fast provider avoids tying memory recall latency to your primary chat provider
+
+If you do not want a separate speed-optimized model, leave `config.model` unset
+and let Active Memory inherit the current session model.
+
+### Cerebras setup
+
+Add a provider entry like this:
+
+```json5
+models: {
+  providers: {
+    cerebras: {
+      baseUrl: "https://api.cerebras.ai/v1",
+      apiKey: "${CEREBRAS_API_KEY}",
+      api: "openai-completions",
+      models: [{ id: "gpt-oss-120b", name: "GPT OSS 120B (Cerebras)" }],
+    },
+  },
+}
+```
+
+Then point Active Memory at it:
+
+```json5
+plugins: {
+  entries: {
+    "active-memory": {
+      enabled: true,
+      config: {
+        model: "cerebras/gpt-oss-120b",
+      },
+    },
+  },
+}
+```
+
+Caveat:
+
+- make sure the Cerebras API key actually has model access for the model you choose, because `/v1/models` visibility alone does not guarantee `chat/completions` access
+
 ## How to see it

 Active memory injects a hidden untrusted prompt prefix for the model. It does