fix(memory): retry reindex on socket errors (#76311)

Summary:
- The PR broadens memory-core's embedding retry classifier for socket/network errors, adds focused classifier tests, and adds an Unreleased changelog fix.
- Reproducibility: yes. Current main's retry classifier rejects the socket/fetch samples, and the reindex embedding path delegates to that classifier; a read-only regex probe confirmed the PR accepts the target samples.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(memory): retry reindex on socket errors

Validation:
- ClawSweeper review passed for head b4618c4532.
- Required merge gates passed before the squash merge.

Prepared head SHA: b4618c4532
Review: https://github.com/openclaw/openclaw/pull/76311#issuecomment-4364956064

Co-authored-by: buyitsydney <buyitsydney@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
This commit is contained in:
buyitsydney
2026-05-03 11:21:06 +08:00
committed by GitHub
parent 31ed93ff58
commit 5f5e0a3633
3 changed files with 16 additions and 1 deletions

View File

@@ -25,6 +25,7 @@ Docs: https://docs.openclaw.ai
- Auto-reply/queue: treat reset-triggered `/new` and `/reset` turns as interrupt runs across active-run queue handling, so steer/followup modes cannot delay a fresh session behind existing work. Fixes #74093. (#74144) Thanks @ruji9527 and @yelog.
- Cron: preserve manual `cron.run` IDs in `cron.runs` history so manual run acknowledgements can be correlated with finished run records. Fixes #76276.
- CLI/devices: request `operator.admin` for `openclaw devices approve <requestId>` only when the exact pending device request would mint or inherit admin-scoped operator access, while keeping lower-scope approvals on the pairing scope.
- Memory/embedding: broaden the embedding reindex retry classifier to include transient socket-layer errors (`fetch failed`, `ECONNRESET`, `socket hang up`, `UND_ERR_*`, `closed`) so memory reindex survives provider network hiccups instead of aborting mid-run. Related #56815, #44166. (#76311) Thanks @buyitsydney.
- Gateway: keep directly requested plugin tools invokable under restrictive tool profiles while preserving explicit deny lists and the HTTP safety deny list, preventing catalog/invoke mismatches that surface as "Tool not available". Thanks @BunsDev.
- Gateway/update: allow beta binaries to refresh gateway services when the config was last written by the matching stable release version, avoiding false newer-config downgrade blocks during beta channel updates.
- Channels: keep Matrix and Mattermost bundled in the core package instead of advertising external npm installs before those channels are cut over. Thanks @vincentkoc.

View File

@@ -71,6 +71,20 @@ describe("memory embedding policy", () => {
expect(waits).toEqual([500, 1000]);
});
it("retries transient socket/network embedding errors", async () => {
const messages = [
"TypeError: fetch failed | other side closed",
"undici error: UND_ERR_SOCKET",
"read ECONNRESET",
"socket hang up",
"ETIMEDOUT",
];
for (const message of messages) {
expect(isRetryableMemoryEmbeddingError(message)).toBe(true);
}
});
it("retries too-many-tokens-per-day errors", async () => {
let calls = 0;
const waits: number[] = [];

View File

@@ -81,7 +81,7 @@ export function buildMemoryEmbeddingBatches<T extends MemoryEmbeddingChunk>(
}
export function isRetryableMemoryEmbeddingError(message: string): boolean {
return /(rate[_ ]limit|too many requests|429|resource has been exhausted|5\d\d|cloudflare|tokens per day)/i.test(
return /(rate[_ ]limit|too many requests|429|resource has been exhausted|5\d\d|cloudflare|tokens per day|fetch failed|other side closed|ECONNRESET|ECONNREFUSED|ETIMEDOUT|EPIPE|UND_ERR_|socket hang up|network error|read ECONN|timed out)/i.test(
message,
);
}