* fix(memory): await search-sync before returning results to prevent stale index
When the gateway process has been running for a while, memory_search
returns stale results because startAsyncSearchSync fires off the index
sync as a background task (void ... .catch()) without waiting for it
to complete. Search results are then read from the old index state.
Change startAsyncSearchSync from sync/fire-and-forget to async/await
so that the index is synced before search results are returned. This
ensures memory_search reflects the current filesystem state, matching
the behavior of the CLI command which creates
a fresh manager each time.
Fixes#52115
* test(memory): prove search waits for dirty sync
* test(memory): align search with synchronous sync
---------
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
The atomic reindex file ops hardcoded the WAL sidecar pair (-wal/-shm)
when moving, removing, and backing up index files. NFS-backed memory
stores run SQLite under journal_mode=DELETE, which produces a
rollback-journal (-journal) sidecar instead. As a result an index swap
left the previous targets stale -journal next to the freshly published
Guard memory index identity resolution against empty or whitespace provider models by falling back to fts-only, and use fts-only as the fallback source model when an adapter fallback cannot resolve a model.
This prevents empty expectedModel mismatch reasons that can leave memory search dirty while preserving registered adapter default-model resolution.
Refs #90787
cleanupAgedMemoryReindexTempFiles only removed WAL sidecars (-wal/-shm) of orphaned reindex temp DBs. On NFS-backed stores configureMemorySqliteWalMaintenance -> requireRollbackJournalMode forces journal_mode=DELETE, so the reindex temp DB uses a rollback journal; a hard crash leaves an orphaned .tmp-<uuid>-journal that leaked forever (cleanup neither deleted nor even discovered it). Add -journal to both the delete set (memoryIndexFileSuffixes) and the discovery set (reindexTempEntrySuffixes), with regression tests for the temp-plus-journal and stranded-journal cases.
Recover assistant turns that complete tool work without producing a visible final answer, while preserving intentional silent replies.
Use concrete tool-instance replay safety across embedded, Codex, and Copilot runtimes so unknown, mutating, async-started, and durable recall operations fail closed. Preserve genuine empty Codex final items without promoting commentary or tool-progress echoes.
Supersedes #90872. Thanks @fuller-stack-dev.
Co-authored-by: fuller-stack-dev <263060202+fuller-stack-dev@users.noreply.github.com>
* fix(memory): accept local default model path migration
Treat the official local default embedding model's hf URI and downloaded GGUF path identities as equivalent so upgraded local memory indexes do not pause solely on path-format changes.
* fix(memory): satisfy local identity lint
Avoid filtered array tail access in the local model filename helper while preserving the same compatibility behavior.
* fix(memory): preserve local embedding identity aliases
---------
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Resolve explicit relative SQLite DB paths before caching handles and centralize durable SQLite connection pragmas so busy_timeout is applied before WAL/NFS negotiation.
Follow-up to #92745 after maintainer autoreview found that the skipped recall event widened the shipped MemoryHostEvent union and changed limited legacy reads.
Keep readMemoryHostEvents() source-compatible by filtering diagnostic records before applying limits, and expose skipped recall diagnostics through the opt-in MemoryHostEventRecord/readMemoryHostEventRecords path.
Original skipped-recall behavior landed in #92745 by @mushuiyu886.
Record diagnostic events when memory_search returns durable memory hits that are intentionally excluded from short-term promotion, so users can distinguish eligibility decisions from recall tracking failures.
Route OpenAI/OpenAI-compatible request_headers_too_large embedding failures into the existing memory-core batch splitter instead of aborting bulk memory indexing.
Tighten the classifier to require header-too-large wording rather than a bare 431 status token, so unrelated provider errors do not fan out into recursive requests.
Fixes#92465.
Thanks @mushuiyu886 for the fix and @BrettHamlin for the report and proof.
Detects NFS-backed SQLite database paths in the shared WAL helper and uses rollback journaling for those paths while preserving WAL/checkpoint maintenance on local filesystems. The NFS path now verifies SQLite's effective journal mode before disabling WAL maintenance, and core/memory/proxy-capture callers pass database path context into the centralized helper.
Fixes#90491.
Proof: local focused Vitest/format/lint; autoreview clean after fixing the journal-mode verification finding; Crabbox AWS focused test run `run_2ea7014350da`; Crabbox AWS changed gate `run_c828bbfe7d23`; exact-head GitHub CI green on `59674305ecd863d4815eec6098ccd3daab79ca4f`.
* fix(memory): abort orphaned embedding work when memory_search times out
memory_search raced its 15s deadline with Promise.race and returned a clean
timeout to the agent, but the underlying embedQueryWithRetry loop kept
retrying (3 attempts x 60s) against the embedding backend with no consumer.
Thread the tool-owned AbortSignal through manager.search ->
embedQueryWithRetry -> runEmbeddingOperationWithTimeout so the deadline
cancels in-flight embedding work, stops the retry loop, and skips
fallback-provider activation for an absent caller.
Fixes#91718
* fix(memory): let the deadline result win before aborting the search
Abort listeners dispatch synchronously, so an abort-aware search could
reject the raced task before the timeout promise resolved and replace the
stable 'memory_search timed out after 15s' result with a provider-wrapped
abort error. Resolve the timeout first, then abort.
* fix(memory): scope deadline abort to builtin embeddings
* fix(memory): preserve deadline signal across fallback
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Fixes#91216.
Preserve the live memory SQLite index during atomic reindex swaps by publishing the POSIX main DB with an overwrite rename, keeping target WAL/SHM sidecars rollbackable until publish succeeds, and refusing no-create post-swap reopens that would otherwise auto-create an empty DB.
Verification:
- node scripts/run-vitest.mjs extensions/memory-core/src/memory/manager.atomic-reindex.test.ts extensions/memory-core/src/memory/manager-db-probe.test.ts extensions/memory-core/src/memory/manager.self-heal-missing-identity.test.ts extensions/memory-core/src/memory/manager-reindex-state.test.ts extensions/memory-core/src/memory/manager.fts-only-reindex.test.ts extensions/memory-core/src/memory/manager.readonly-recovery.test.ts
- git diff --check origin/main...HEAD
- node scripts/run-oxlint.mjs extensions/memory-core/src/memory/manager-atomic-reindex.ts extensions/memory-core/src/memory/manager.atomic-reindex.test.ts extensions/memory-core/src/memory/manager-db.ts extensions/memory-core/src/memory/manager-db-probe.test.ts extensions/memory-core/src/memory/manager-sync-ops.ts
- .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
- GitHub CI on 60df2b4178
* [AI] fix(memory): self-heal missing index identity by initializing provider during sync
When a gateway's memory index identity becomes "missing" with chunks
already indexed, canRebuildMissingIdentity stays false because
this.provider is null (async provider init hasn completed yet), so
needsMissingIdentityReindex is false, and the sync loop bails out
with dirty=true forever — the gateway never self-heals.
Now, when indexIdentity.status is "missing" and a provider is
configured but this.provider is null, runSync() calls
ensureProviderInitialized() first, then re-evaluates the identity
state. If the provider becomes available,
canRebuildMissingIdentity flips to true, unlocking the self-heal
reindex path.
Refs #91167
* [AI] fix(memory): allow FTS-only self-heal when chunks are all FTS-only and provider unavailable
When a gateway's memory index identity is 'missing' with chunks already
indexed, canRebuildMissingIdentity stays false if the embedding provider
is unavailable, causing the sync loop to bail out with dirty=true forever.
The previous approach (calling ensureProviderInitialized inside runSync)
was redundant because the public sync() method already initializes the
provider before runSyncWithReadonlyRecovery.
The real fix: when every existing chunk has model='fts-only', rebuilding
the index as FTS-only is safe — no semantic data is lost. So
canRebuildMissingIdentity should also be true when hasOnlyFtsChunks,
even if the provider is unavailable.
Also adds hasSemanticChunks() helper to detect whether any chunks have
a non-fts-only model.
Non-forced test: seeds FTS-only chunks with no meta, syncs without
force, verifies identity transitions from 'missing' to 'valid'.
Refs #91167
* [AI] fix(memory): gate hasSemanticChunks scan to missing-identity path only
Only compute hasOnlyFtsChunks when identity is missing, chunks exist,
and the provider is unavailable. This avoids scanning the chunks table
for model classification on every ordinary sync.
* test(memory): protect semantic index self-heal
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Fix QMD watcher ignore handling for explicitly configured roots whose directory names are normally ignored, and prefer the most-specific configured watch root for overlapping collections.
Validated with focused QMD/tooling tests, full core support boundary tests, green CI, and ClawSweeper re-review.
ensureCollections() never rebound a managed collection after its root
path changed (e.g. an agent workspace repoint): listCollectionsBestEffort()
read `qmd collection list`, whose output carries no filesystem path, so
listed.path was always undefined and shouldRebindCollection took its
defensive branch and skipped the rebind. The collection stayed pinned to
the stale root and recall kept resolving the old location.
Enrich the listed collections with their path via `qmd collection show`
so path-change detection works and the rebind fires.
Closes#91251
(cherry picked from commit 4a225011e9)
Gateway/background sync now repairs missing memory index metadata with the existing full reindex path when the configured embedding provider is available, while preserving dirty/paused state instead of downgrading semantic chunks when embeddings are unavailable.
Fixes#90338
Make lexical FTS/LIKE search ignore embedding model identity so exact keyword recall survives provider/model changes. Vector search remains model-scoped, and refreshed or stale FTS rows are cleaned by path/source with live-chunk filtering to prevent old orphan rows from surfacing.
Fixes#48300
Move Zalo hosted outbound media metadata and expiry into plugin state, add SDK chunked hosted media storage, and keep CI/type/lint gates green after rebase.