Commit Graph

20528 Commits

Author SHA1 Message Date
Peter Steinberger
44ca47b2eb refactor: share allow-from store file reads 2026-04-21 00:54:08 +01:00
Peter Steinberger
3caf9faef5 test: use synthetic metadata channel labels 2026-04-21 00:52:52 +01:00
Peter Steinberger
71154bf3bf test: use synthetic approval pairing fixtures 2026-04-21 00:51:37 +01:00
Peter Steinberger
05835dd2d4 test: use synthetic heartbeat wake fixtures 2026-04-21 00:50:32 +01:00
Peter Steinberger
874306a2ac test: use synthetic task delivery fixtures 2026-04-21 00:49:06 +01:00
Peter Steinberger
aecd709dfe test: use synthetic task channel fixtures 2026-04-21 00:47:12 +01:00
Watchtower
dcb525de50 fix(pi-embedded-runner): gate silent-error retry on replay safety
Per @steipete review on #68310: the silent-error retry must not fire when the
failed attempt already recorded potential side effects (messaging tool sent,
cron add, or a mutating tool call that wasn't round-tripped as replay-safe).
Otherwise resubmission can duplicate those actions.

Adds `!attempt.replayMetadata.hadPotentialSideEffects` to the retry condition,
mirroring the gate used by resolveEmptyResponseRetryInstruction and the
planning-only / reasoning-only retry resolvers in run/incomplete-turn.ts.

Adds a new negative regression test:
  "does not retry when the failed attempt recorded side effects"
which reproduces the reviewer's repro — stopReason=error + output=0 + empty
content, but replayMetadata={hadPotentialSideEffects: true, replaySafe: false}.
Expected: no retry, surfaces incomplete-turn error. Confirmed locally.
2026-04-21 00:43:50 +01:00
Watchtower
5fb302ebf1 fix(pi-embedded-runner): retry silent stopReason=error turns (non-frontier models)
ollama/glm-5.1:cloud (and occasionally other models) can end a turn with
stopReason="error", usage.output=0, and empty content[] after a successful
tool-call sequence. The existing empty-response retry path in
src/agents/pi-embedded-runner/run/incomplete-turn.ts is gated on
isStrictAgenticSupportedProviderModel (gpt-5 family only), so non-frontier
models fall through to "incomplete turn detected" with payloads=0 and no
recovery. The user sees no reply and has to nudge.

Add a narrow, model-agnostic resubmission inside the attempt loop, placed
before the incompleteTurnText surface-to-user return:

  - stopReason === "error"
  - usage.output === 0
  - content.length === 0   (excludes reasoning-only error turns)
  - bounded by MAX_EMPTY_ERROR_RETRIES = 3

No instruction injection, no model gating; same prompt, same session
transcript (tool results already captured), just let the loop try again.

New test file run.empty-error-retry.test.ts covers:
  1. Retries for ollama/glm-5.1:cloud → succeeds on 2nd attempt.
  2. Caps at 3 retries → 4 total attempts → surfaces incomplete-turn error.
  3. Does NOT retry when output > 0 (preserve produced text).
  4. Does NOT retry when stopReason=stop + output=0 (NO_REPLY path).
  5. Retries for anthropic/claude-opus-4-7 too — model-agnostic.

Relates to #68281.
2026-04-21 00:43:50 +01:00
Peter Steinberger
982b1c9464 test(ci): reduce channel contract import cost 2026-04-21 00:40:07 +01:00
Peter Steinberger
503af7afa6 refactor: dedupe install scan skill spec 2026-04-21 00:32:42 +01:00
Peter Steinberger
1a834a0ff6 test: reuse runtime sidecar uniqueness helper 2026-04-21 00:32:42 +01:00
Dale Yarborough
7b5527a74e fix(gateway): prevent 1006 errors from race condition in WebSocket upgrade (#43392)
Merged via squash.

Prepared head SHA: 0bca6d3512
Co-authored-by: dalefrieswthat <176454532+dalefrieswthat@users.noreply.github.com>
Co-authored-by: grp06 <1573959+grp06@users.noreply.github.com>
Reviewed-by: @grp06
2026-04-20 16:29:14 -07:00
Peter Steinberger
da3f47ddd0 test: share generation live env helper 2026-04-21 00:24:18 +01:00
Peter Steinberger
a95b61560a test: dedupe reconnect drain fixtures 2026-04-21 00:24:17 +01:00
scoootscooob
f700ad32a8 providers: default Moonshot to Kimi 2.6 (#69477)
Merged via squash.

Prepared head SHA: 4d778d09d1
Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>
Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>
Reviewed-by: @scoootscooob
2026-04-20 16:15:29 -07:00
Peter Steinberger
a732b916f4 test: use synthetic media channel fixtures 2026-04-20 23:59:39 +01:00
Garry Tan
c8086b731a tasks: add detached task recovery hook before markLost (#69313)
Merged via squash.

Prepared head SHA: 24322af4f7
Co-authored-by: garrytan <19957+garrytan@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
2026-04-21 00:58:20 +02:00
Peter Steinberger
73f36b0c80 test: use synthetic outbound dispatch fixtures 2026-04-20 23:49:39 +01:00
Peter Steinberger
7ca649413a refactor: share env secret ref allowlist check 2026-04-20 23:42:11 +01:00
Peter Steinberger
3fd64772d6 test: use synthetic message media fixtures 2026-04-20 23:41:56 +01:00
Peter Steinberger
a06f4d0808 test: use synthetic outbound message fixtures 2026-04-20 23:38:56 +01:00
Peter Steinberger
d0b69a2064 test: use synthetic message channel fixtures 2026-04-20 23:37:28 +01:00
Peter Steinberger
a216b4ebc3 test: merge system run path binding cases 2026-04-20 23:34:59 +01:00
Peter Steinberger
0094f76314 refactor: share plugin config issue formatting 2026-04-20 23:34:19 +01:00
Peter Steinberger
6464cf4756 refactor: share plugin package version lookup 2026-04-20 23:34:19 +01:00
Peter Steinberger
4fb2e2309e refactor: share timeout abort helper with matrix 2026-04-20 23:34:19 +01:00
Peter Steinberger
8b7418b127 refactor: share channel doctor alias normalization 2026-04-20 23:34:19 +01:00
Peter Steinberger
1151d69bb8 test: use synthetic outbound routing fixtures 2026-04-20 23:33:25 +01:00
Peter Steinberger
31d545260e test: merge acp manager retry cases 2026-04-20 23:33:21 +01:00
Peter Steinberger
f6c9912e37 test: use synthetic outbound binding fixtures 2026-04-20 23:29:43 +01:00
Peter Steinberger
7e8b58cb25 test: use synthetic outbound utility fixtures 2026-04-20 23:27:18 +01:00
Peter Steinberger
b07c40a5a8 test: merge system run denial matrices 2026-04-20 23:26:37 +01:00
Peter Steinberger
11eae6b2d8 test: use synthetic message action fixtures 2026-04-20 23:25:08 +01:00
Peter Steinberger
c561e4c11b test: use synthetic outbound core fixtures 2026-04-20 23:20:51 +01:00
Peter Steinberger
f1a544ef6d perf: avoid sort-for-single selection 2026-04-20 23:20:31 +01:00
Peter Steinberger
da5a6b68bd refactor: share ssrf base url policy 2026-04-20 23:15:58 +01:00
Peter Steinberger
72571f0d38 test: decouple outbound target tests from bundled plugins 2026-04-20 23:14:50 +01:00
Peter Steinberger
e0c01bf956 perf: trim hot path allocations 2026-04-20 23:13:20 +01:00
Peter Steinberger
9f9235b692 test(channels): shard registry-backed contracts 2026-04-20 23:10:46 +01:00
ly85206559
35cb59e3b5 fix(agents): reapply compaction settings after resource loader reload (#65602) (#67146)
Merged via squash.

Prepared head SHA: 4978f7b8b5
Co-authored-by: ly85206559 <12526624+ly85206559@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
2026-04-20 15:10:24 -07:00
Peter Steinberger
d7c7905a52 refactor: share provider polling helper 2026-04-20 23:04:10 +01:00
Peter Steinberger
8f4920e2eb refactor: share line sdk types 2026-04-20 23:04:10 +01:00
Peter Steinberger
60fea81cf1 fix(telegram): harden polling transport liveness (#69476)
* fix(telegram): release undici dispatchers via TelegramTransport.close()

TelegramTransport now exposes an explicit close() that destroys every
owned undici dispatcher (default Agent plus lazily-created IPv4 and
IP-pinned fallback Agents) and the TCP sockets they hold. Dispatcher
constructors are also given bounded keep-alive defaults
(keepAliveTimeout, keepAliveMaxTimeout, connections, pipelining) as a
defence-in-depth layer so the pool cannot grow unbounded even if a
caller forgets to call close().

Without this, every transport that went through a fallback retry left
its fallback Agents anchored forever in a closure; long-running polling
sessions accumulated hundreds of ESTABLISHED keep-alive sockets to
api.telegram.org, saturating the per-IP quota on upstream forward
proxies and making the currently-active outbound node time out while
every other node still tested healthy.

Mock dispatchers in fetch.test.ts gain destroy() spies so the close()
chain is assertable. Call sites that built caller-owned transports from
globalThis.fetch (delivery.resolve-media, test helpers) return an async
no-op close(), matching the new required surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): dispose polling transport on shutdown and dirty rebuild

Every recoverable network error and stall-watchdog trip sets
TelegramPollingTransportState.#transportDirty so the next polling
cycle rebuilds the transport inside acquireForNextCycle(). Previously
the rebuild simply overwrote the field, leaving the old transport's
keep-alive sockets anchored in the now-unreferenced dispatcher — the
polling loop has no natural GC point for these resources, and Node's
object GC never touches OS-level sockets.

acquireForNextCycle() now closes the previous transport (fire-and-
forget so the polling cycle is not blocked by a slow destroy) before
swapping in the rebuilt one. dispose() is a new method that the owning
TelegramPollingSession calls from the finally block of runUntilAbort(),
so a single transport is always tied to a single polling session
lifetime. After dispose(), acquireForNextCycle() returns undefined to
prevent zombie rebuilds.

Under high sustained polling traffic over long-lived sessions, this is
what stops the per-gateway connection count to api.telegram.org from
growing indefinitely and saturating upstream proxy quotas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): note Telegram undici dispatcher lifecycle fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): disable HTTP/2 for all Telegram polling dispatchers

Undici 8 enables HTTP/2 ALPN by default, but Telegram's long-polling
connections stall on Windows due to IPv6 + H2 multiplexing issues. The
core fetch-guard already sets allowH2:false for guarded paths, but the
Telegram extension creates its own Agent/ProxyAgent/EnvHttpProxyAgent
instances directly from undici without this flag.

Apply allowH2:false to all dispatcher constructors in the Telegram
transport layer, matching the approach used in src/infra/net/undici-runtime.ts.

Fixes #66885

* fix: avoid false telegram polling stall restarts

* fix(telegram): publish polling health liveness

---------

Co-authored-by: Ethan Chen <ethanbit@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Magicray1217 <magicray1217@users.noreply.github.com>
Co-authored-by: aoao <aoao@openclaw>
2026-04-20 23:03:57 +01:00
Peter Steinberger
250c756fb4 test: share directive reply mock payloads 2026-04-20 22:51:16 +01:00
Peter Steinberger
ca2c9fef8c test: share gateway live client helpers 2026-04-20 22:51:16 +01:00
Peter Steinberger
c55e1f7566 test: share gateway broadcaster fixtures 2026-04-20 22:51:16 +01:00
Peter Steinberger
b7e5d9a96e test: decouple outbound tests from bundled plugins 2026-04-20 22:44:38 +01:00
Peter Steinberger
1f951f36fd test: remove unused agent runtime support 2026-04-20 22:36:22 +01:00
Peter Steinberger
3e6758f55a test: share provider usage runtime mocks 2026-04-20 22:36:22 +01:00
Peter Steinberger
c63fb08f81 test: share approval native runtime stubs 2026-04-20 22:36:22 +01:00