Commit Graph

53313 Commits

Author SHA1 Message Date
amittell
f7c32fc8be fix(telegram): lower polling keepalive delay (#83304)
* fix(telegram): enable TCP keepalive on getUpdates connections to prevent NAT timeout stalls

Long-polling connections to api.telegram.org stay idle for up to the
getUpdates timeout (~900 s). Most home/office NAT tables expire idle TCP
entries after 60–1800 s (commonly ~1000 s). When the NAT entry is
silently dropped the connection hangs rather than returning an error,
leaving the grammY runner stuck until the 90 s stall watchdog fires and
forces a restart cycle.

Fix: unconditionally set `keepAlive: true` and
`keepAliveInitialDelay: 30_000` (30 s) on the undici Agent `connect`
options built in `buildTelegramConnectOptions`. OS-level TCP keepalive
probes sent every ~75 s (OS default) will:
1. Refresh the NAT table entry before it expires.
2. Surface dead connections immediately with ETIMEDOUT instead of
   hanging forever.

The `return Object.keys(connect).length > 0 ? connect : null` guard is
also removed; `connect` is now always non-empty so it always returns the
object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 92e454c0614256201cdf6f0f73c7897d006616d4)

* fix(telegram): stop self-flagging disconnected on poll-cycle start; widen channel connect grace to 300s

(cherry picked from commit 1ca963a05dac0d9d605e9a15dc97fced9cf7725e)

* fix(telegram): catch hung polling startups that preserve inherited connected:true

The widened 300s channel connect grace and the removal of connected:false from
notePollingStart left a path where a polling restart could hang forever
looking healthy. notePollingStart clears lastConnectedAt, lastEventAt, and
lastTransportActivityAt but deliberately omits connected, so server-channels'
patch-merge inherits a connected:true from the previous lifecycle. After grace,
evaluateChannelHealth's stale-socket branch requires lastTransportActivityAt
to be non-null and the connected:false branch is masked, so the channel sits
healthy with no first getUpdates.

Add a post-grace branch to evaluateChannelHealth that flags polling channels
as stale-socket when connected:true is paired with null lastConnectedAt and
null lastTransportActivityAt and a non-null lastStartAt. Scoped to mode:polling
so webhook channels and channels without continuous transport tracking are
not falsely flagged. Align TELEGRAM_POLLING_CONNECT_GRACE_MS in the Telegram
status diagnostic with DEFAULT_CHANNEL_CONNECT_GRACE_MS so openclaw channels
status agrees with the shared health monitor on the grace window. Refresh
the notePollingStart comment to point at the new evaluateChannelHealth branch.

Addresses clawsweeper review on #83304 (P1 connect-grace startup-hang, P2
diagnostic grace drift). Tests cover the new flagged path, the in-grace happy
path, and the prior-successful-connect happy path.

* fix(telegram): clear polling connected state on startup

* fix(gateway): add defense-in-depth health-policy branch for hung polling startups

Defense in depth on top of 87db46c576's notePollingStart connected:false fix.
The primary path (notePollingStart writes connected:false explicitly so
evaluateChannelHealth's existing connected===false branch catches a hung
restart) is unchanged. This adds a defensive post-grace branch that catches
the same hang via a different signature -- inherited connected:true paired
with null lastConnectedAt and null lastTransportActivityAt -- in case a
future code path forgets to clear the inherited connected flag on lifecycle
start. Scoped to mode:polling so webhook channels and channels without
continuous transport tracking are not falsely flagged.

Also bump lastStartAt: Date.now() - 121_000 to 301_000 in the spool-handler
timeout test added by upstream #83505 so it falls past the widened 300s
TELEGRAM_POLLING_CONNECT_GRACE_MS suppression window (mirroring the same
fixup already applied to the two adjacent polling-startup tests).

* revert(telegram,gateway): keep connect grace at 120s

Drop the 120s -> 300s widening from this PR after maintainer feedback that
the extra grace masks real startup bugs. The defense-in-depth checks added
in earlier commits (notePollingStart clearing inherited connected state,
the stale-socket policy branch, the per-snapshot startup grace test) all
work fine at 120s and remain valuable on their own.

Reverts in:
- src/gateway/channel-health-policy.ts: DEFAULT_CHANNEL_CONNECT_GRACE_MS 300 -> 120
- extensions/telegram/src/status-issues.ts: TELEGRAM_POLLING_CONNECT_GRACE_MS 300 -> 120
- extensions/telegram/src/status.test.ts: lastStartAt 301_000 -> 121_000 (3 cases)

The new channel-health-policy.test.ts cases use explicit channelConnectGraceMs:
10_000 in the policy, so they are unaffected by the default constant change.

* fix(telegram): narrow polling keepalive fix

---------

Co-authored-by: Yibei Ou <yibeiou@Yibeis-Mac-mini.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
2026-05-28 10:13:13 +05:30
Ayaan Zaidi
51d7f3c143 ci(mantis): route telegram proof runs to us-east-1 2026-05-28 10:10:32 +05:30
Vincent Koc
c841218ace fix(agents): bound native pdf error bodies 2026-05-28 06:39:55 +02:00
Dallin Romney
647e18aa04 test: deflake agent image root checks (#87499) 2026-05-27 21:32:04 -07:00
Ayaan Zaidi
771ddcf184 fix(android): trust private LAN credentials 2026-05-28 10:00:32 +05:30
Ayaan Zaidi
5f3d6cde19 fix(android): keep LAN cleartext untrusted 2026-05-28 10:00:32 +05:30
Ayaan Zaidi
633c40aa65 fix(android): preserve private LAN TLS pins 2026-05-28 10:00:32 +05:30
Ayaan Zaidi
ec3ac182c5 fix(android): allow private LAN pairing 2026-05-28 10:00:32 +05:30
Vincent Koc
6ae4a00a66 fix(qa): reject loose openwebui probe timeouts 2026-05-28 06:27:04 +02:00
Vincent Koc
a0ba9f2b72 fix(media): cancel oversized fetch responses 2026-05-28 06:20:23 +02:00
Masato Hoshino
313d6ae1b3 fix(whatsapp): strip control characters from outbound document fileName (#77114)
Merged via squash.

Prepared head SHA: 5417a8ee2c
Co-authored-by: masatohoshino <246810661+masatohoshino@users.noreply.github.com>
Co-authored-by: mcaxtr <7562095+mcaxtr@users.noreply.github.com>
Reviewed-by: @mcaxtr
2026-05-28 01:17:52 -03:00
Dallin Romney
8d21ac3f6e refactor: share QA runtime helpers (#87412)
* refactor: share QA runtime helpers

* refactor: keep QA helpers private

* refactor: keep QA helpers on private runtime seam

* chore: prune stale QA duplicate ignores

* fix: align qa runtime boundary alias

* fix: avoid startup memory lint conversion
2026-05-27 21:16:24 -07:00
Vincent Koc
96b8df75d5 fix(media): cancel ignored input fetch bodies 2026-05-28 06:13:24 +02:00
Vincent Koc
6adf2340fb fix(qa): parse kitchen sink rpc guardrails strictly 2026-05-28 06:05:24 +02:00
Vincent Koc
736e04cb90 fix(media): drain ignored download responses 2026-05-28 05:53:09 +02:00
Vincent Koc
6a324f6400 fix(perf): keep abort leak thresholds active 2026-05-28 05:29:40 +02:00
Agustin Rivera
b860a0d4d0 fix: harden qqbot direct media uploads
Harden QQBot direct media URL uploads by downloading through the local SSRF guard before QQ upload, disabling redirects, bounding fetch/setup and body reads, and routing downloaded buffers through the existing one-shot/chunked size gate.

Co-authored-by: Agustin Rivera <agustin@rivera-web.com>
2026-05-28 04:21:46 +01:00
Vincent Koc
751cd0c9b8 fix(doctor): validate normalized tool schemas 2026-05-28 05:09:58 +02:00
Vincent Koc
f5e48f767f fix(perf): keep startup memory budgets active 2026-05-28 05:07:34 +02:00
Dallin Romney
d165100c93 perf(tests): refactor embedded attempt runner helpers (#87410)
* refactor: extract embedded attempt runner helpers

* fix: remove unused attempt queue type import

* fix: restore attempt helper coverage

* fix: clear attempt cleanup ci

* fix: restore model prompt transform extraction
2026-05-27 20:04:36 -07:00
Dallin Romney
5887119e8d chore: stop tracking generated diffs viewer runtime (#87405)
* chore: stop tracking generated diffs viewer runtime

* test(diffs): generate viewer runtime fixture when missing
2026-05-27 19:59:35 -07:00
Vincent Koc
bf22893cb6 fix(perf): reject loose extension memory numeric flags 2026-05-28 04:57:51 +02:00
Peter Steinberger
edd4c62da1 perf: dedupe persisted skill prompts (#87458)
* perf: dedupe persisted skills prompts

* fix: account for blobbed skill prompts

* fix: prune unreferenced skill prompt blobs

* fix: refresh skill prompt blob lifecycle

* fix: prune skill prompt blob temp files

* chore: rerun ci

* fix: keep blobbed store serialized cache

* fix: preserve blobbed store cache fast paths

* fix: protect in-flight session prompt blobs

* fix: revalidate session prompt blob cleanup

* test: avoid bundled channel load in image tool tests

* fix: revalidate session prompt blobs before commit

* fix: keep CI guard and media root tests lean
2026-05-28 03:52:03 +01:00
Vincent Koc
6fe7dddcf2 fix(qa): reject loose Docker scheduler numeric env 2026-05-28 04:48:56 +02:00
Vincent Koc
3ef34702c8 fix(qa): reject loose gateway CPU numeric flags 2026-05-28 04:38:41 +02:00
bladin
e0d003b372 fix(whatsapp): support pluginHooks.messageReceived in channel/account config schema (#86426)
Merged via squash.

Prepared head SHA: 27003a8d5a
Co-authored-by: bladin <1740879+bladin@users.noreply.github.com>
Co-authored-by: mcaxtr <7562095+mcaxtr@users.noreply.github.com>
Reviewed-by: @mcaxtr
2026-05-27 23:31:47 -03:00
Peter Steinberger
2229122077 fix: keep private SDK declarations local 2026-05-28 03:28:27 +01:00
Vincent Koc
8b78ded074 test(agents): cover tool schema quarantine in turns 2026-05-28 04:26:00 +02:00
Vincent Koc
ac28c0611d fix(qa): reject loose gauntlet numeric flags 2026-05-28 04:24:13 +02:00
Dallin Romney
3005b62242 perf(plugins) refactor plugin SDK declarations for flat package types (#87165)
* refactor: flatten plugin sdk declarations

* fix: align package inventory with flat sdk declarations

* refactor: move packed sdk smoke to fixture

* test: simplify packed sdk type smoke

* fix(canvas): use focused number runtime helpers

* fix(ci): stabilize sdk boundary checks

* test: guard private sdk declaration leaks

Co-authored-by: Peter Steinberger <steipete@gmail.com>

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-27 19:22:32 -07:00
Vincent Koc
b6e354f6ca fix(file-transfer): handle late tar pipe errors 2026-05-28 04:14:57 +02:00
Vincent Koc
d1577a2ff2 fix(perf): reject invalid startup bench counts 2026-05-28 03:48:55 +02:00
Andy
d2319d718c fix(status): keep default JSON scan lean
Default `openclaw status --json` stays on the lean health-probe path while preserving the JSON task summary, local update/install metadata, explicit probe timeouts, and configured gateway handshake timeouts. Deeper memory, registry, remote git, and local status-RPC diagnostics remain behind `status --json --all`.

Also keeps generated diffs viewer output in its built form and ignores it in oxfmt so `pnpm build` leaves a clean tree.

Proof:
- `node scripts/run-vitest.mjs src/commands/status.scan.fast-json.test.ts src/commands/status-json-payload.test.ts src/commands/status.scan.shared.test.ts`
- `OPENCLAW_LOCAL_CHECK=0 node scripts/run-oxlint-shards.mjs --threads=8`
- `node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo`
- `node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo`
- `.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main`
- GitHub checks green for head `47a63f87ea7c2351994fdb71e8cc18041aa0b64e`

Thanks @andyylin.

Co-authored-by: Andy <andyylin@users.noreply.github.com>
2026-05-28 02:28:49 +01:00
Vincent Koc
5846878924 fix(auth): honor OAuth login cancellation 2026-05-28 03:12:40 +02:00
Vincent Koc
a20c091411 test(reply): avoid redundant settled hook return unions 2026-05-28 02:55:01 +02:00
Vincent Koc
069f33b410 test(openai): type malformed context window fixture 2026-05-28 02:55:01 +02:00
Vincent Koc
28a719f3da fix(agents): allow steering yielded subagents 2026-05-28 02:55:01 +02:00
Peter Steinberger
7c7fb7df67 chore(release): refresh plugin sdk baseline 2026-05-28 01:51:27 +01:00
Peter Steinberger
cee2a50fe6 chore(release): prepare 2026.5.28 2026-05-28 01:48:07 +01:00
Peter Steinberger
0e262d20e7 fix(discord): fence tool warning fallback delivery (#87465)
* fix(discord): fence recovered tool warning fallback

* fix(discord): keep warning fallback after failed final

* fix(reply): keep settled cleanup unconditional
2026-05-28 01:39:14 +01:00
Vincent Koc
748510b7a3 fix(doctor): validate tool schemas for configured agents 2026-05-28 02:17:43 +02:00
Peter Steinberger
45e6af5e57 fix: reject partial numeric runtime values 2026-05-27 20:10:01 -04:00
Peter Steinberger
d1aa3cb925 fix: reject partial numeric command values 2026-05-27 20:10:01 -04:00
WarrenJones
65e2120f8c fix(hooks): pass media metadata to received hook
Forward canonical inbound media metadata to plugin message_received hooks so plugins can inspect the same mediaPath, mediaUrl, mediaType, mediaPaths, mediaUrls, and mediaTypes fields already available to inbound_claim.

Verification:
- node scripts/run-vitest.mjs src/hooks/message-hook-mappers.test.ts
- /Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Refs: https://github.com/openclaw/openclaw/pull/87297
Co-authored-by: WarrenJones <8704779+WarrenJones@users.noreply.github.com>
2026-05-28 01:06:00 +01:00
Martin Kessler
d00e764e66 fix(heartbeat): stop pending final replay
Stop heartbeat runs from directly returning non-ack durable pending final text. Heartbeats now only clear ack-only pending state and otherwise continue the heartbeat turn, so stale prior final answers cannot be replayed through a later heartbeat/default route.

Keep the isolated heartbeat active-run guard so an immediate/manual heartbeat cannot overwrite an isolated heartbeat session that is still running.

Proof:
- node scripts/run-vitest.mjs src/auto-reply/reply/get-reply.fast-path.test.ts src/infra/heartbeat-runner.skips-busy-session-lane.test.ts
- git diff --check
- autoreview --mode local
- autoreview --mode branch --base origin/main
- GitHub CI 26543804437, CodeQL 26543804438, Critical Quality 26543804441, OpenGrep PR Diff 26543804440 rerun job 78197443511, Real behavior proof 26544027357

Refs #74257.

Co-authored-by: kesslerio <martin@kessler.io>
2026-05-28 00:58:57 +01:00
Peter Steinberger
c86667c5cf test(discord): use reply payload SDK test helper (#87454)
* test(discord): use reply payload SDK test helper

* build(plugin-sdk): exclude reply payload test helper
2026-05-28 00:57:22 +01:00
Peter Steinberger
ff0990d800 fix: accept uncommitted autoreview mode 2026-05-28 00:55:08 +01:00
Edward Abrams
05db911775 fix(outbound): thread session keys into outbound hooks (#73706)
Thread the canonical outbound session key into plugin message_sending and message_sent hook contexts, and align native command redirect routed delivery with the agent runtime session key. This lets plugins correlate agent_end with outbound delivery hooks without seeing missing or divergent session keys.

Verification:
- gh pr checks 73706 --repo openclaw/openclaw --watch=false
- Real behavior proof: https://github.com/openclaw/openclaw/actions/runs/26526635074/job/78131933497

Thanks @zeroaltitude.

Co-authored-by: Edward Abrams <zeroaltitude@gmail.com>
2026-05-28 00:43:27 +01:00
Vincent Koc
c9151ba902 fix(provider): bound local service startup 2026-05-28 01:38:35 +02:00
Peter Steinberger
1f1cdd84ea chore: forward gateway profiling env 2026-05-28 00:35:35 +01:00