Commit Graph

32 Commits

Author SHA1 Message Date
Peter Steinberger
7d4bf8f285 fix(telegram): bound transport cooldown expiry 2026-05-30 14:16:57 -04:00
amittell
f7c32fc8be fix(telegram): lower polling keepalive delay (#83304)
* fix(telegram): enable TCP keepalive on getUpdates connections to prevent NAT timeout stalls

Long-polling connections to api.telegram.org stay idle for up to the
getUpdates timeout (~900 s). Most home/office NAT tables expire idle TCP
entries after 60–1800 s (commonly ~1000 s). When the NAT entry is
silently dropped the connection hangs rather than returning an error,
leaving the grammY runner stuck until the 90 s stall watchdog fires and
forces a restart cycle.

Fix: unconditionally set `keepAlive: true` and
`keepAliveInitialDelay: 30_000` (30 s) on the undici Agent `connect`
options built in `buildTelegramConnectOptions`. OS-level TCP keepalive
probes sent every ~75 s (OS default) will:
1. Refresh the NAT table entry before it expires.
2. Surface dead connections immediately with ETIMEDOUT instead of
   hanging forever.

The `return Object.keys(connect).length > 0 ? connect : null` guard is
also removed; `connect` is now always non-empty so it always returns the
object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 92e454c0614256201cdf6f0f73c7897d006616d4)

* fix(telegram): stop self-flagging disconnected on poll-cycle start; widen channel connect grace to 300s

(cherry picked from commit 1ca963a05dac0d9d605e9a15dc97fced9cf7725e)

* fix(telegram): catch hung polling startups that preserve inherited connected:true

The widened 300s channel connect grace and the removal of connected:false from
notePollingStart left a path where a polling restart could hang forever
looking healthy. notePollingStart clears lastConnectedAt, lastEventAt, and
lastTransportActivityAt but deliberately omits connected, so server-channels'
patch-merge inherits a connected:true from the previous lifecycle. After grace,
evaluateChannelHealth's stale-socket branch requires lastTransportActivityAt
to be non-null and the connected:false branch is masked, so the channel sits
healthy with no first getUpdates.

Add a post-grace branch to evaluateChannelHealth that flags polling channels
as stale-socket when connected:true is paired with null lastConnectedAt and
null lastTransportActivityAt and a non-null lastStartAt. Scoped to mode:polling
so webhook channels and channels without continuous transport tracking are
not falsely flagged. Align TELEGRAM_POLLING_CONNECT_GRACE_MS in the Telegram
status diagnostic with DEFAULT_CHANNEL_CONNECT_GRACE_MS so openclaw channels
status agrees with the shared health monitor on the grace window. Refresh
the notePollingStart comment to point at the new evaluateChannelHealth branch.

Addresses clawsweeper review on #83304 (P1 connect-grace startup-hang, P2
diagnostic grace drift). Tests cover the new flagged path, the in-grace happy
path, and the prior-successful-connect happy path.

* fix(telegram): clear polling connected state on startup

* fix(gateway): add defense-in-depth health-policy branch for hung polling startups

Defense in depth on top of 87db46c576's notePollingStart connected:false fix.
The primary path (notePollingStart writes connected:false explicitly so
evaluateChannelHealth's existing connected===false branch catches a hung
restart) is unchanged. This adds a defensive post-grace branch that catches
the same hang via a different signature -- inherited connected:true paired
with null lastConnectedAt and null lastTransportActivityAt -- in case a
future code path forgets to clear the inherited connected flag on lifecycle
start. Scoped to mode:polling so webhook channels and channels without
continuous transport tracking are not falsely flagged.

Also bump lastStartAt: Date.now() - 121_000 to 301_000 in the spool-handler
timeout test added by upstream #83505 so it falls past the widened 300s
TELEGRAM_POLLING_CONNECT_GRACE_MS suppression window (mirroring the same
fixup already applied to the two adjacent polling-startup tests).

* revert(telegram,gateway): keep connect grace at 120s

Drop the 120s -> 300s widening from this PR after maintainer feedback that
the extra grace masks real startup bugs. The defense-in-depth checks added
in earlier commits (notePollingStart clearing inherited connected state,
the stale-socket policy branch, the per-snapshot startup grace test) all
work fine at 120s and remain valuable on their own.

Reverts in:
- src/gateway/channel-health-policy.ts: DEFAULT_CHANNEL_CONNECT_GRACE_MS 300 -> 120
- extensions/telegram/src/status-issues.ts: TELEGRAM_POLLING_CONNECT_GRACE_MS 300 -> 120
- extensions/telegram/src/status.test.ts: lastStartAt 301_000 -> 121_000 (3 cases)

The new channel-health-policy.test.ts cases use explicit channelConnectGraceMs:
10_000 in the policy, so they are unaffected by the default constant change.

* fix(telegram): narrow polling keepalive fix

---------

Co-authored-by: Yibei Ou <yibeiou@Yibeis-Mac-mini.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
2026-05-28 10:13:13 +05:30
Andy Ye
1fd8de8495 fix(telegram): treat ENETDOWN as transient network failure (#86762) 2026-05-26 18:40:31 +01:00
Peter Steinberger
c1a026a976 fix: stabilize tests and reduce plugin memory churn 2026-05-26 00:01:30 +01:00
Jesse Merhi
7c2425a518 Support HTTPS managed proxy CA trust (#79171)
* fix: support HTTPS managed proxy CA trust

* fix: strip IP SNI for HTTPS proxy dispatchers

* fix: harden managed proxy undici dispatchers

* docs: refresh proxy baselines

* fix: drop stale whatsapp undici dependency

* fix: satisfy proxy dispatcher lint and tests

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-17 06:23:30 +01:00
Peter Steinberger
2bfbe25239 test: dedupe telegram fetch mock calls 2026-05-12 15:30:17 +01:00
Peter Steinberger
9bf3c9ede1 test: tighten telegram fetch fallback assertions 2026-05-11 03:17:56 +01:00
Peter Steinberger
35ceba0e4e test: clear telegram fetch broad matchers 2026-05-10 08:55:28 +01:00
Ayaan Zaidi
1b38f80088 fix(telegram): cool down unhealthy transports 2026-05-10 10:16:59 +05:30
Peter Steinberger
0f31b6424e test: tighten proxy fetch assertions 2026-05-08 06:28:56 +01:00
Peter Steinberger
eabae023eb perf: lazy load memory embedding runtime 2026-05-08 05:39:13 +01:00
Peter Steinberger
9ef37d1907 test: tighten assertions and harness coverage 2026-05-08 05:28:12 +01:00
Ayaan Zaidi
252456e2f6 fix(telegram): recover sticky fallback transport 2026-05-08 09:15:31 +05:30
Peter Steinberger
e873c1e1f8 fix: quiet telegram ipv4 fallback noise 2026-05-02 05:39:28 +01:00
Peter Steinberger
dc9f1b8525 fix(telegram): honor managed proxy env 2026-04-29 12:18:49 +01:00
Peter Steinberger
59a4d7fb06 fix(telegram): normalize bot endpoint api roots 2026-04-28 06:36:38 +01:00
Peter Steinberger
74a667f119 fix(telegram): retry startup control calls on fallback transport 2026-04-28 06:02:05 +01:00
Peter Steinberger
a8c548f4f3 test: route extension tests through sdk seams 2026-04-27 22:34:21 +01:00
Peter Steinberger
60fea81cf1 fix(telegram): harden polling transport liveness (#69476)
* fix(telegram): release undici dispatchers via TelegramTransport.close()

TelegramTransport now exposes an explicit close() that destroys every
owned undici dispatcher (default Agent plus lazily-created IPv4 and
IP-pinned fallback Agents) and the TCP sockets they hold. Dispatcher
constructors are also given bounded keep-alive defaults
(keepAliveTimeout, keepAliveMaxTimeout, connections, pipelining) as a
defence-in-depth layer so the pool cannot grow unbounded even if a
caller forgets to call close().

Without this, every transport that went through a fallback retry left
its fallback Agents anchored forever in a closure; long-running polling
sessions accumulated hundreds of ESTABLISHED keep-alive sockets to
api.telegram.org, saturating the per-IP quota on upstream forward
proxies and making the currently-active outbound node time out while
every other node still tested healthy.

Mock dispatchers in fetch.test.ts gain destroy() spies so the close()
chain is assertable. Call sites that built caller-owned transports from
globalThis.fetch (delivery.resolve-media, test helpers) return an async
no-op close(), matching the new required surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): dispose polling transport on shutdown and dirty rebuild

Every recoverable network error and stall-watchdog trip sets
TelegramPollingTransportState.#transportDirty so the next polling
cycle rebuilds the transport inside acquireForNextCycle(). Previously
the rebuild simply overwrote the field, leaving the old transport's
keep-alive sockets anchored in the now-unreferenced dispatcher — the
polling loop has no natural GC point for these resources, and Node's
object GC never touches OS-level sockets.

acquireForNextCycle() now closes the previous transport (fire-and-
forget so the polling cycle is not blocked by a slow destroy) before
swapping in the rebuilt one. dispose() is a new method that the owning
TelegramPollingSession calls from the finally block of runUntilAbort(),
so a single transport is always tied to a single polling session
lifetime. After dispose(), acquireForNextCycle() returns undefined to
prevent zombie rebuilds.

Under high sustained polling traffic over long-lived sessions, this is
what stops the per-gateway connection count to api.telegram.org from
growing indefinitely and saturating upstream proxy quotas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): note Telegram undici dispatcher lifecycle fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): disable HTTP/2 for all Telegram polling dispatchers

Undici 8 enables HTTP/2 ALPN by default, but Telegram's long-polling
connections stall on Windows due to IPv6 + H2 multiplexing issues. The
core fetch-guard already sets allowH2:false for guarded paths, but the
Telegram extension creates its own Agent/ProxyAgent/EnvHttpProxyAgent
instances directly from undici without this flag.

Apply allowH2:false to all dispatcher constructors in the Telegram
transport layer, matching the approach used in src/infra/net/undici-runtime.ts.

Fixes #66885

* fix: avoid false telegram polling stall restarts

* fix(telegram): publish polling health liveness

---------

Co-authored-by: Ethan Chen <ethanbit@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Magicray1217 <magicray1217@users.noreply.github.com>
Co-authored-by: aoao <aoao@openclaw>
2026-04-20 23:03:57 +01:00
Tak Hoffman
958c34e82c feat(qa-lab): Add proxy capture stack and QA Lab inspector (#64895)
* Add proxy capture core and CLI

* Expand transport capture coverage

* Add QA Lab capture backend

* Refine QA Lab capture UI

* Fix proxy capture review feedback

* Fix proxy run cleanup and TTS capture

* Fix proxy capture transport follow-ups

* Fix debug proxy CONNECT target parsing

* Harden QA Lab asset path containment
2026-04-11 12:34:57 -05:00
Peter Steinberger
849e0d0a7f test: narrow telegram sticker cache imports 2026-04-10 23:12:59 +01:00
Peter Steinberger
225431665a test: trim telegram media retry import cost 2026-04-03 12:01:10 +01:00
Peter Steinberger
bb3ea2137b test: move telegram fetch coverage into extensions 2026-04-03 11:37:41 +01:00
Peter Steinberger
847faa3d04 test: trim extension test import churn 2026-04-03 04:41:08 +01:00
Peter Steinberger
c9d68fb9c2 fix: repair ci test and loader regressions 2026-03-27 18:41:47 +00:00
Peter Steinberger
b96f5d94db chore(telegram): downgrade default network logs 2026-03-27 02:29:32 +00:00
Peter Steinberger
83bb647238 test: speed up telegram extension suites 2026-03-24 15:16:18 +00:00
Peter Steinberger
fc9739313c test: harden channel suite isolation 2026-03-23 11:09:12 +00:00
Peter Steinberger
55ad5d7bd7 fix(security): harden explicit-proxy SSRF pinning 2026-03-22 23:05:42 -07:00
Josh Avant
68bc6effc0 Telegram: stabilize pairing/session/forum routing and reply formatting tests (#50155)
* Telegram: stabilize Area 2 DM and model callbacks

* Telegram: fix dispatch test deps wiring

* Telegram: stabilize area2 test harness and gate flaky sticker e2e

* Telegram: address review feedback on config reload and tests

* Telegram tests: use plugin-sdk reply dispatcher import

* Telegram tests: add routing reload regression and track sticker skips

* Telegram: add polling-session backoff regression test

* Telegram tests: mock loadWebMedia through plugin-sdk path

* Telegram: refresh native and callback routing config

* Telegram tests: fix compact callback config typing
2026-03-19 00:01:14 -05:00
Ayaan Zaidi
e4825a0f93 fix(telegram): unify transport fallback chain (#49148)
* fix(telegram): unify transport fallback chain

* fix: address telegram fallback review comments

* fix: validate pinned SSRF overrides

* fix: unify telegram fallback retries (#49148)
2026-03-17 22:44:15 +05:30
scoootscooob
e5bca0832f refactor: move Telegram channel implementation to extensions/ (#45635)
* refactor: move Telegram channel implementation to extensions/telegram/src/

Move all Telegram channel code (123 files + 10 bot/ files + 8 channel plugin
files) from src/telegram/ and src/channels/plugins/*/telegram.ts to
extensions/telegram/src/. Leave thin re-export shims at original locations so
cross-cutting src/ imports continue to resolve.

- Fix all relative import paths in moved files (../X/ -> ../../../src/X/)
- Fix vi.mock paths in 60 test files
- Fix inline typeof import() expressions
- Update tsconfig.plugin-sdk.dts.json rootDir to "." for cross-directory DTS
- Update write-plugin-sdk-entry-dts.ts for new rootDir structure
- Move channel plugin files with correct path remapping

* fix: support keyed telegram send deps

* fix: sync telegram extension copies with latest main

* fix: correct import paths and remove misplaced files in telegram extension

* fix: sync outbound-adapter with main (add sendTelegramPayloadMessages) and fix delivery.test import path
2026-03-14 02:50:17 -07:00