Reduce repeated gateway warning noise in startup/auth retry paths while preserving credential mismatch and rate-limit audit visibility.
Also hardens empty embedded-assistant retry handling by carrying lifecycle state through the missing-assistant guard, and keeps the relevant regression coverage in gateway and agent tests.
* fix(telegram): enable TCP keepalive on getUpdates connections to prevent NAT timeout stalls
Long-polling connections to api.telegram.org stay idle for up to the
getUpdates timeout (~900 s). Most home/office NAT tables expire idle TCP
entries after 60–1800 s (commonly ~1000 s). When the NAT entry is
silently dropped the connection hangs rather than returning an error,
leaving the grammY runner stuck until the 90 s stall watchdog fires and
forces a restart cycle.
Fix: unconditionally set `keepAlive: true` and
`keepAliveInitialDelay: 30_000` (30 s) on the undici Agent `connect`
options built in `buildTelegramConnectOptions`. OS-level TCP keepalive
probes sent every ~75 s (OS default) will:
1. Refresh the NAT table entry before it expires.
2. Surface dead connections immediately with ETIMEDOUT instead of
hanging forever.
The `return Object.keys(connect).length > 0 ? connect : null` guard is
also removed; `connect` is now always non-empty so it always returns the
object.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 92e454c0614256201cdf6f0f73c7897d006616d4)
* fix(telegram): stop self-flagging disconnected on poll-cycle start; widen channel connect grace to 300s
(cherry picked from commit 1ca963a05dac0d9d605e9a15dc97fced9cf7725e)
* fix(telegram): catch hung polling startups that preserve inherited connected:true
The widened 300s channel connect grace and the removal of connected:false from
notePollingStart left a path where a polling restart could hang forever
looking healthy. notePollingStart clears lastConnectedAt, lastEventAt, and
lastTransportActivityAt but deliberately omits connected, so server-channels'
patch-merge inherits a connected:true from the previous lifecycle. After grace,
evaluateChannelHealth's stale-socket branch requires lastTransportActivityAt
to be non-null and the connected:false branch is masked, so the channel sits
healthy with no first getUpdates.
Add a post-grace branch to evaluateChannelHealth that flags polling channels
as stale-socket when connected:true is paired with null lastConnectedAt and
null lastTransportActivityAt and a non-null lastStartAt. Scoped to mode:polling
so webhook channels and channels without continuous transport tracking are
not falsely flagged. Align TELEGRAM_POLLING_CONNECT_GRACE_MS in the Telegram
status diagnostic with DEFAULT_CHANNEL_CONNECT_GRACE_MS so openclaw channels
status agrees with the shared health monitor on the grace window. Refresh
the notePollingStart comment to point at the new evaluateChannelHealth branch.
Addresses clawsweeper review on #83304 (P1 connect-grace startup-hang, P2
diagnostic grace drift). Tests cover the new flagged path, the in-grace happy
path, and the prior-successful-connect happy path.
* fix(telegram): clear polling connected state on startup
* fix(gateway): add defense-in-depth health-policy branch for hung polling startups
Defense in depth on top of 87db46c576's notePollingStart connected:false fix.
The primary path (notePollingStart writes connected:false explicitly so
evaluateChannelHealth's existing connected===false branch catches a hung
restart) is unchanged. This adds a defensive post-grace branch that catches
the same hang via a different signature -- inherited connected:true paired
with null lastConnectedAt and null lastTransportActivityAt -- in case a
future code path forgets to clear the inherited connected flag on lifecycle
start. Scoped to mode:polling so webhook channels and channels without
continuous transport tracking are not falsely flagged.
Also bump lastStartAt: Date.now() - 121_000 to 301_000 in the spool-handler
timeout test added by upstream #83505 so it falls past the widened 300s
TELEGRAM_POLLING_CONNECT_GRACE_MS suppression window (mirroring the same
fixup already applied to the two adjacent polling-startup tests).
* revert(telegram,gateway): keep connect grace at 120s
Drop the 120s -> 300s widening from this PR after maintainer feedback that
the extra grace masks real startup bugs. The defense-in-depth checks added
in earlier commits (notePollingStart clearing inherited connected state,
the stale-socket policy branch, the per-snapshot startup grace test) all
work fine at 120s and remain valuable on their own.
Reverts in:
- src/gateway/channel-health-policy.ts: DEFAULT_CHANNEL_CONNECT_GRACE_MS 300 -> 120
- extensions/telegram/src/status-issues.ts: TELEGRAM_POLLING_CONNECT_GRACE_MS 300 -> 120
- extensions/telegram/src/status.test.ts: lastStartAt 301_000 -> 121_000 (3 cases)
The new channel-health-policy.test.ts cases use explicit channelConnectGraceMs:
10_000 in the policy, so they are unaffected by the default constant change.
* fix(telegram): narrow polling keepalive fix
---------
Co-authored-by: Yibei Ou <yibeiou@Yibeis-Mac-mini.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
Make plugin-state enforce the plugin-wide live-row fuse by evicting only from the namespace currently being written, preserving sibling namespace rows and still failing atomically when the current namespace cannot free enough rows.
Raise the plugin-wide cap to 6,000 rows, keep Telegram's persistent message-cache namespace at 3,000 entries, and document the updated SDK runtime contract. Harden legacy plugin-state import so capacity pressure cannot archive a source after losing imported keys, with focused regression coverage for Telegram-shaped namespaces and migration rollback.
Also restore the Docker runtime-assets preflight step in full release validation so release workflow contract tests stay aligned.
Verification: focused plugin-state, migration, Telegram, workflow-contract, lint, deprecated-API, diff-check, Blacksmith Testbox, CI, CodeQL, Workflow Sanity, OpenGrep, and autoreview all passed on PR head fee021cfa6.
Co-authored-by: Keshav's Bot <keshavbotagent@gmail.com>
Use read-only Telegram account inspection for prompt-time channel actions, inline buttons, and reaction guidance so unresolved SecretRef tokens retain configured non-secret behavior before runtime snapshot hydration.
Match runtime Telegram account lookup for normalized config keys and multi-account fallback guards, while keeping sends/actions on the existing strict credential resolution path.
Fixes#75433.
Co-authored-by: Shubhankar Tripathy <reach2shubhankar@gmail.com>
Route Telegram sendMessage action replies through durable outbound delivery so completed agent responses remain retryable when the gateway send path times out.
Verified with focused Telegram/outbound tests, extension test typecheck, prepare build/check/full test gates, and green CI rerun for head 20b45687e1.
* fix(telegram): preserve command slots for aliases
* fix: report Telegram alias command overflow
* fix: preserve Telegram alias menu order
* docs: drop release-owned changelog entry
---------
Co-authored-by: wuyangfan <yangfan.wu@succaiss.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Remove the Telegram DM thread reply policy config and use Telegram bot capability as the single source of truth for DM topic session splitting.
DM messages with message_thread_id now split into thread-scoped sessions only when Telegram getMe reports has_topics_enabled for the bot. Doctor removes retired dm.threadReplies and direct.*.threadReplies keys, docs explain the upgrade behavior, and startup keeps cached bot info as a non-auth fallback when a fresh probe fails.
Refs #86513.
Thanks @alexph-dev.
Verification:
- pnpm docs:list
- pnpm exec oxfmt --check --threads=1 extensions/telegram/src/channel.ts extensions/telegram/src/channel.gateway.test.ts extensions/telegram/src/doctor-contract.ts extensions/telegram/src/doctor.test.ts
- git diff --check
- node scripts/run-vitest.mjs extensions/telegram/src/channel.gateway.test.ts extensions/telegram/src/doctor.test.ts extensions/telegram/src/bot/helpers.test.ts extensions/telegram/src/bot-message-context.dm-threads.test.ts extensions/telegram/src/config-schema.test.ts
- pnpm config:channels:check
- pnpm config:docs:check
- .agents/skills/autoreview/scripts/autoreview --mode local
- GitHub Actions: CI 26468039803, Workflow Sanity 26468040057, OpenGrep 26468039472, Real behavior proof 26468036483, CodeQL 26468039466, CodeQL Critical Quality 26468039473
Known CI caveat: checks-windows-node-test failed before tests because Windows runner setup left Node 22.19.0 active while the job requested Node 24.x; the same setup failure is present on current main CI run 26468063947.
Behavior addressed: Telegram direct-message turns no longer drop an earlier overlapping normal reply, while authorized aborts and explicit/native/plugin/skill command turns still supersede active reply work.
Real environment tested: local OpenClaw focused Telegram test shard plus existing contributor Telegram screenshot/log proof in the PR body.
Exact steps or command run after this patch: pnpm test extensions/telegram/src/telegram-reply-fence.test.ts extensions/telegram/src/bot-message-dispatch.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
Evidence after fix: 2 test files passed, 93 tests passed; final autoreview clean with no accepted/actionable findings.
Observed result after fix: overlapping normal Telegram DMs use non-interrupting reply fences and both final replies remain deliverable; direct /stop, authorized built-in commands, and explicit text/native command turns still supersede.
What was not tested: fresh live Telegram Desktop rerun by this agent; PR retains contributor screenshot/log proof and the Real behavior proof bot remains red despite proof labels.
Thanks @neeravmakwana.
Co-authored-by: Neerav Makwana <261249544+neeravmakwana@users.noreply.github.com>