mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 18:10:45 +00:00
feat(memory): configurable local embedding contextSize (default 4096) (#70544)
node-llama-cpp defaults contextSize to "auto", which on large embedding models like Qwen3-Embedding-8B (trained context 40,960) inflates gateway VRAM from ~8.8 GB to ~32 GB and causes OOM on single-GPU hosts that share the gateway with an LLM runtime. Expose memorySearch.local.contextSize in openclaw.json (number | "auto"), default to 4096 which comfortably covers typical memory-search chunks (128–512 tokens) while keeping non-weight VRAM bounded. Closes #69667.
This commit is contained in:
@@ -198,10 +198,11 @@ arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v2:0
|
||||
|
||||
## Local embedding config
|
||||
|
||||
| Key | Type | Default | Description |
|
||||
| --------------------- | -------- | ---------------------- | ------------------------------- |
|
||||
| `local.modelPath` | `string` | auto-downloaded | Path to GGUF model file |
|
||||
| `local.modelCacheDir` | `string` | node-llama-cpp default | Cache dir for downloaded models |
|
||||
| Key | Type | Default | Description |
|
||||
| --------------------- | ------------------ | ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `local.modelPath` | `string` | auto-downloaded | Path to GGUF model file |
|
||||
| `local.modelCacheDir` | `string` | node-llama-cpp default | Cache dir for downloaded models |
|
||||
| `local.contextSize` | `number \| "auto"` | `4096` | Context window size for the embedding context. 4096 covers typical chunks (128–512 tokens) while bounding non-weight VRAM. Lower to 1024–2048 on constrained hosts. `"auto"` uses the model's trained maximum — not recommended for 8B+ models (Qwen3-Embedding-8B: 40 960 tokens → ~32 GB VRAM vs ~8.8 GB at 4096). |
|
||||
|
||||
Default model: `embeddinggemma-300m-qat-Q8_0.gguf` (~0.6 GB, auto-downloaded).
|
||||
Requires native build: `pnpm approve-builds` then `pnpm rebuild node-llama-cpp`.
|
||||
|
||||
Reference in New Issue
Block a user