openclaw

mirror of https://github.com/openclaw/openclaw.git synced 2026-06-22 22:08:09 +00:00

Files

Omar Shahine fc6400ede3 fix(imessage): always-on inbound recovery and dedupe (#91335 )

* feat(imessage): always-on inbound recovery, deprecate catchup

Replaces the opt-in catchup subsystem with always-on inbound replay
protection that brings iMessage in line with the other channels, and
fixes #89237 (stale backlog dispatched as fresh after bridge recovery).

- New inbound-dedupe.ts: persistent claimable GUID dedupe (claim/commit/
  release) plus a stale-backlog age fence that suppresses live rows whose
  send date is materially older than arrival (logged, never silent).
- monitor-provider: claim at ingestion, carry the exact claimed key on the
  debouncer entry, commit on successful flush / release on dispatch failure
  (per-unit so a coalesced bucket cannot strand a sibling claim). Keeps the
  local startup since_rowid watermark so startup-window rows are not skipped.
- Deprecate catchup: delete catchup.ts + catchup-bridge.ts, remove the
  channels.imessage.catchup schema, cursor migration, and config-guard nag.
  Back-compat: strip the retired key before validation; new imessage doctor
  contract reports + removes it on doctor --fix.
- Docs updated for the new recovery model.

Net -947 prod LOC.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(imessage): recover downtime messages via since_rowid replay

Builds downtime recovery on the new inbound dedupe instead of restoring the
old catchup subsystem. On startup the monitor passes the last dispatched rowid
(a persisted per-account cursor) to imsg watch.subscribe as since_rowid, so imsg
replays the messages that landed while the gateway was down, then tails live.
The GUID dedupe drops anything already handled, so no cursor/retry bookkeeping
is needed.

- recovery-cursor.ts: minimal persisted per-account lastDispatchedRowid.
- monitor-provider: since_rowid = cursor (capped to the most recent
  IMESSAGE_RECOVERY_MAX_ROWS); split the age fence on the startup rowid boundary
  so replayed rows (<= boundary) use the wider recovery window and live rows
  (> boundary) keep the tight #89237 fence; advance the cursor on commit.
- Local only: remote SSH cliPath cannot read chat.db, so it tails from the
  current rowid (suppress-and-move-on) as before.

Restores missed-message recovery that the catchup removal dropped, with no
config and a fraction of the old LOC.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(imessage): make recovery cursor advance failure- and suppression-safe

Addresses two cursor-state regressions in the downtime-recovery path:

- Failed replay rows could be skipped forever: a released (failed) row keeps
  its dedupe claim for retry, but a later successful row in the same flush
  advanced the cursor past it, so the next startup's since_rowid skipped it.
  Hold a per-session floor at the lowest released rowid and never advance the
  cursor past it.
- Suppressed live backlog could be re-delivered after a restart: a live row
  suppressed under the tight live fence was not recorded, so after a restart it
  fell under the wider recovery window (its rowid now below the new boundary)
  and was delivered. Commit its dedupe key on suppression so the recovery
  replay treats it as already handled.

Both caught by Codex autoreview. Adds regression tests for the floor and the
suppression record.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(imessage): bound the GUID-less replay key length

Hash the composite fallback key's variable parts (conversation, sender,
created_at, text) so the key is length-bounded regardless of message text.
The persistent dedupe store already hashes keys internally, so this was not a
live overflow, but the bounded key removes the dependency on that and keeps the
fallback fail-open. Flagged by autoreview.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(imessage): recover downtime messages on remote cliPath setups too

The since_rowid replay runs over the imsg RPC client, so driving it from the
persisted recovery cursor (not the local chat.db boundary) makes downtime
recovery work for remote SSH cliPath gateways — the topology the old RPC-based
catchup served and that the rowid-boundary-only version regressed. Local setups
keep the wider, capped recovery window via the chat.db boundary; remote uses the
live age-fence window. Flagged by autoreview.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(imessage): seed recovery cursor from retired catchup cursor on upgrade

A one-time, self-cleaning migration: when the recovery cursor is empty on the
first startup after upgrade, seed it from the retired imessage.catchup-cursors
lastSeenRowid and consume the legacy entry. Without this a user who had catchup
enabled would not replay messages missed across the upgrade restart. Flagged by
autoreview.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(imessage): preserve catchup recovery on upgrade

---------

Co-authored-by: Omar Shahine <10343873+omarshahine@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>