fix(memory): cap ollama non-batch embedding concurrency

2026-05-06 16:50:43 +00:00 · 2026-04-28 00:33:53 +01:00
parent 5de3196a60
commit 802f13ac15
15 changed files with 103 additions and 14 deletions
--- a/docs/reference/memory-config.md
+++ b/docs/reference/memory-config.md
@@ -386,6 +386,7 @@ Prevents re-embedding unchanged text during reindex or transcript updates.

 | Key                           | Type      | Default | Description                |
 | ----------------------------- | --------- | ------- | -------------------------- |
+| `remote.nonBatchConcurrency`  | `number`  | `4`     | Parallel inline embeddings |
 | `remote.batch.enabled`        | `boolean` | `false` | Enable batch embedding API |
 | `remote.batch.concurrency`    | `number`  | `2`     | Parallel batch jobs        |
 | `remote.batch.wait`           | `boolean` | `true`  | Wait for batch completion  |
@@ -394,7 +395,9 @@ Prevents re-embedding unchanged text during reindex or transcript updates.

 Available for `openai`, `gemini`, and `voyage`. OpenAI batch is typically fastest and cheapest for large backfills.

-This is separate from `sync.embeddingBatchTimeoutSeconds`, which controls inline embedding calls used by local/self-hosted providers and hosted providers when provider batch APIs are not active.
+`remote.nonBatchConcurrency` controls inline embedding calls used by local/self-hosted providers and hosted providers when provider batch APIs are not active. Ollama defaults to `1` for non-batch indexing to avoid overwhelming smaller local hosts; set a higher value on larger machines.
+
+This is separate from `sync.embeddingBatchTimeoutSeconds`, which controls the timeout for inline embedding calls.

 ---