fix(telegram): harden polling transport liveness (#69476)

* fix(telegram): release undici dispatchers via TelegramTransport.close()

TelegramTransport now exposes an explicit close() that destroys every
owned undici dispatcher (default Agent plus lazily-created IPv4 and
IP-pinned fallback Agents) and the TCP sockets they hold. Dispatcher
constructors are also given bounded keep-alive defaults
(keepAliveTimeout, keepAliveMaxTimeout, connections, pipelining) as a
defence-in-depth layer so the pool cannot grow unbounded even if a
caller forgets to call close().

Without this, every transport that went through a fallback retry left
its fallback Agents anchored forever in a closure; long-running polling
sessions accumulated hundreds of ESTABLISHED keep-alive sockets to
api.telegram.org, saturating the per-IP quota on upstream forward
proxies and making the currently-active outbound node time out while
every other node still tested healthy.

Mock dispatchers in fetch.test.ts gain destroy() spies so the close()
chain is assertable. Call sites that built caller-owned transports from
globalThis.fetch (delivery.resolve-media, test helpers) return an async
no-op close(), matching the new required surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): dispose polling transport on shutdown and dirty rebuild

Every recoverable network error and stall-watchdog trip sets
TelegramPollingTransportState.#transportDirty so the next polling
cycle rebuilds the transport inside acquireForNextCycle(). Previously
the rebuild simply overwrote the field, leaving the old transport's
keep-alive sockets anchored in the now-unreferenced dispatcher — the
polling loop has no natural GC point for these resources, and Node's
object GC never touches OS-level sockets.

acquireForNextCycle() now closes the previous transport (fire-and-
forget so the polling cycle is not blocked by a slow destroy) before
swapping in the rebuilt one. dispose() is a new method that the owning
TelegramPollingSession calls from the finally block of runUntilAbort(),
so a single transport is always tied to a single polling session
lifetime. After dispose(), acquireForNextCycle() returns undefined to
prevent zombie rebuilds.

Under high sustained polling traffic over long-lived sessions, this is
what stops the per-gateway connection count to api.telegram.org from
growing indefinitely and saturating upstream proxy quotas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): note Telegram undici dispatcher lifecycle fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): disable HTTP/2 for all Telegram polling dispatchers

Undici 8 enables HTTP/2 ALPN by default, but Telegram's long-polling
connections stall on Windows due to IPv6 + H2 multiplexing issues. The
core fetch-guard already sets allowH2:false for guarded paths, but the
Telegram extension creates its own Agent/ProxyAgent/EnvHttpProxyAgent
instances directly from undici without this flag.

Apply allowH2:false to all dispatcher constructors in the Telegram
transport layer, matching the approach used in src/infra/net/undici-runtime.ts.

Fixes #66885

* fix: avoid false telegram polling stall restarts

* fix(telegram): publish polling health liveness

---------

Co-authored-by: Ethan Chen <ethanbit@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Magicray1217 <magicray1217@users.noreply.github.com>
Co-authored-by: aoao <aoao@openclaw>
This commit is contained in:
Peter Steinberger
2026-04-20 23:03:57 +01:00
committed by GitHub
parent 250c756fb4
commit 60fea81cf1
18 changed files with 823 additions and 75 deletions

View File

@@ -917,6 +917,7 @@ Per-account, per-group, and per-topic overrides are supported (same inheritance
- Node 22+ + custom fetch/proxy can trigger immediate abort behavior if AbortSignal types mismatch.
- Some hosts resolve `api.telegram.org` to IPv6 first; broken IPv6 egress can cause intermittent Telegram API failures.
- If logs include `TypeError: fetch failed` or `Network request for 'getUpdates' failed!`, OpenClaw now retries these as recoverable network errors.
- If logs include `Polling stall detected`, OpenClaw restarts polling and rebuilds the Telegram transport. Persistent stalls usually point to proxy, DNS, IPv6, or TLS egress issues between the host and `api.telegram.org`.
- On VPS hosts with unstable direct egress/TLS, route Telegram API calls through `channels.telegram.proxy`:
```yaml

View File

@@ -45,13 +45,14 @@ Full troubleshooting: [/channels/whatsapp#troubleshooting](/channels/whatsapp#tr
### Telegram failure signatures
| Symptom | Fastest check | Fix |
| ----------------------------------- | ----------------------------------------------- | --------------------------------------------------------------------------- |
| `/start` but no usable reply flow | `openclaw pairing list telegram` | Approve pairing or change DM policy. |
| Bot online but group stays silent | Verify mention requirement and bot privacy mode | Disable privacy mode for group visibility or mention bot. |
| Send failures with network errors | Inspect logs for Telegram API call failures | Fix DNS/IPv6/proxy routing to `api.telegram.org`. |
| `setMyCommands` rejected at startup | Inspect logs for `BOT_COMMANDS_TOO_MUCH` | Reduce plugin/skill/custom Telegram commands or disable native menus. |
| Upgraded and allowlist blocks you | `openclaw security audit` and config allowlists | Run `openclaw doctor --fix` or replace `@username` with numeric sender IDs. |
| Symptom | Fastest check | Fix |
| ----------------------------------- | ------------------------------------------------ | --------------------------------------------------------------------------- |
| `/start` but no usable reply flow | `openclaw pairing list telegram` | Approve pairing or change DM policy. |
| Bot online but group stays silent | Verify mention requirement and bot privacy mode | Disable privacy mode for group visibility or mention bot. |
| Send failures with network errors | Inspect logs for Telegram API call failures | Fix DNS/IPv6/proxy routing to `api.telegram.org`. |
| Polling stalls or reconnects slowly | `openclaw logs --follow` for polling diagnostics | Upgrade, then check proxy/DNS/IPv6 if `getUpdates` keeps timing out. |
| `setMyCommands` rejected at startup | Inspect logs for `BOT_COMMANDS_TOO_MUCH` | Reduce plugin/skill/custom Telegram commands or disable native menus. |
| Upgraded and allowlist blocks you | `openclaw security audit` and config allowlists | Run `openclaw doctor --fix` or replace `@username` with numeric sender IDs. |
Full troubleshooting: [/channels/telegram#troubleshooting](/channels/telegram#troubleshooting)