fix: retain local memory runtime deps

This commit is contained in:
Peter Steinberger
2026-04-30 15:22:26 +01:00
parent 9d037d2f5a
commit ac599c9e53
8 changed files with 155 additions and 16 deletions

View File

@@ -33,8 +33,9 @@ For multi-endpoint setups, `provider` can also be a custom
`models.providers.<id>` entry, such as `ollama-5080`, when that provider sets
`api: "ollama"` or another embedding adapter owner.
For local embeddings with no API key, install the optional `node-llama-cpp`
runtime package next to OpenClaw and use `provider: "local"`.
For local embeddings with no API key, set `provider: "local"`. Packaged
installs retain the native `node-llama-cpp` runtime in OpenClaw's managed plugin
runtime-deps tree; run `openclaw doctor --fix` if that tree needs repair.
Some OpenAI-compatible embedding endpoints require asymmetric labels such as
`input_type: "query"` for searches and `input_type: "document"` or `"passage"`

View File

@@ -284,7 +284,7 @@ For custom OpenAI-compatible endpoints or overriding provider defaults:
| `local.modelCacheDir` | `string` | node-llama-cpp default | Cache dir for downloaded models |
| `local.contextSize` | `number \| "auto"` | `4096` | Context window size for the embedding context. 4096 covers typical chunks (128512 tokens) while bounding non-weight VRAM. Lower to 10242048 on constrained hosts. `"auto"` uses the model's trained maximum — not recommended for 8B+ models (Qwen3-Embedding-8B: 40 960 tokens → ~32 GB VRAM vs ~8.8 GB at 4096). |
Default model: `embeddinggemma-300m-qat-Q8_0.gguf` (~0.6 GB, auto-downloaded). Requires native build: `pnpm approve-builds` then `pnpm rebuild node-llama-cpp`.
Default model: `embeddinggemma-300m-qat-Q8_0.gguf` (~0.6 GB, auto-downloaded). Packaged installs repair the native `node-llama-cpp` runtime through managed plugin runtime deps when `provider: "local"` is configured. Source checkouts still require native build approval: `pnpm approve-builds` then `pnpm rebuild node-llama-cpp`.
Use the standalone CLI to verify the same provider path the Gateway uses: