Commit Graph

19257 Commits

Author SHA1 Message Date
limenglin
92776b8d77 fix(gateway): defer cron AND heartbeat activation until sidecars are ready (#65322)
startGatewayRuntimeServices() previously started both the cron
scheduler AND heartbeat runner BEFORE gateway sidecars finished
initialising.  Because chat.history is marked unavailable until
sidecars complete, any cron job or heartbeat tick that called
chat.history during this window received a hard UNAVAILABLE error.

Fix: create a noop heartbeat placeholder in the early
startGatewayRuntimeServices() call, then activate the real
heartbeat runner, cron scheduler, and pending delivery recovery
in a new activateGatewayScheduledServices() function that runs
AFTER startGatewayPostAttachRuntime() completes.

channelHealthMonitor and model pricing refresh remain in the
early call since they do not depend on chat.history.

Root cause analysis by luban, cross-validated by tongluo.
Reviewer feedback addressed: heartbeat runner is now also
deferred (previously only cron was deferred).
2026-04-13 01:41:53 +01:00
Peter Steinberger
03d042d2b9 perf: mock hot agents import tests 2026-04-13 01:35:52 +01:00
Peter Steinberger
5b2ae49107 perf: reduce agents test import overhead 2026-04-13 01:26:44 +01:00
Vincent Koc
4c8337f27b test(agents): stabilize steer restart ordering 2026-04-13 01:25:45 +01:00
EVA
26945ddb49 agents: GPT-5.4 runtime completion rollup (#65219)
* agents: auto-activate strict-agentic for GPT-5 and emit blocked-exit liveness

Closes two hard blockers on the GPT-5.4 parity completion gate:

1) Criterion 1 (no stalls after planning) is universal, but the pre-existing
   strict-agentic execution contract was opt-in only. Out-of-the-box GPT-5
   openai / openai-codex users who never set
   `agents.defaults.embeddedPi.executionContract` still got only 1
   planning-only retry and then fell through to the normal completion path
   with the plan-only text, i.e. they still stalled.

   Introduce `resolveEffectiveExecutionContract(...)` in
   src/agents/execution-contract.ts. Behavior:

   - supported provider/model (openai or openai-codex + gpt-5-family) AND
     explicit "strict-agentic" or unspecified → "strict-agentic"
   - supported provider/model AND explicit "default" → "default" (opt-out)
   - unsupported provider/model → "default" regardless of explicit value

   `isStrictAgenticExecutionContractActive` now delegates to the effective
   resolver so the 2-retry + blocked-state treatment applies by default to
   every GPT-5 openai/codex run. Explicit opt-out still works for users who
   intentionally want the pre-parity-program behavior.

2) Criterion 4 (replay/liveness failures are explicit, not silent
   disappearance) is violated by the strict-agentic blocked exit itself.
   Every other terminal return path in src/agents/pi-embedded-runner/run.ts
   sets `replayInvalid` + `livenessState` via `setTerminalLifecycleMeta`,
   but the strict-agentic exit at run.ts:1615 falls through without them.

   Add explicit `livenessState: "abandoned"` + `replayInvalid` (via the
   shared `resolveReplayInvalidForAttempt` helper) to that exit, plus a
   `setTerminalLifecycleMeta` call so downstream observers (lifecycle log,
   ACP bridge, telemetry) see the same explicit terminal state they see on
   every other exit branch.

Regressions added:

- `auto-enables update_plan for unconfigured GPT-5 openai runs`
- `respects explicit default contract opt-out on GPT-5 runs`
- `does not auto-enable update_plan for non-openai providers even when unconfigured`
- `emits explicit replayInvalid + abandoned liveness state at the strict-agentic blocked exit`
- `auto-activates strict-agentic for unconfigured GPT-5 openai runs and surfaces the blocked state`
- `respects explicit default contract opt-out on GPT-5 openai runs`

Local validation:

- pnpm test src/agents/openclaw-tools.update-plan.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts src/agents/system-prompt.test.ts src/agents/openclaw-tools.sessions.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.test.ts

122/122 passing.

Refs #64227

* agents: address loop-6 review comments on strict-agentic contract

Triages all three loop-6 review comments on PR #64679:

1. Copilot: 'The strict-agentic blocked exit returns an error payload
   (isError: true) but sets livenessState to "abandoned". Elsewhere in
   the runner/lifecycle flow, error terminal states are treated as
   "blocked".' Verified: every other hardcoded error terminal branch in
   run.ts (role ordering at 1152, image size at 1206, schema error at
   1244, compaction timeout at 1128, aborted-with-no-payloads at 606)
   uses livenessState: "blocked". Match that convention at the
   strict-agentic blocked exit at 1634. Updated the 'emits explicit
   replayInvalid + abandoned liveness state' regression test to assert
   the new "blocked" value and renamed the assertion commentary.

2. Copilot: 'The JSDoc for resolveEffectiveExecutionContract says
   explicit "strict-agentic" in config always resolves to
   "strict-agentic", but the implementation collapses to "default"
   whenever the provider/mode is unsupported.' Rewrite the JSDoc to
   explicitly document the unsupported-provider collapse as the lead
   case (strict-agentic is a GPT-5-family openai/openai-codex-only
   runtime contract) before listing the supported-lane behavior matrix.
   No code change; this is a docstring-only clarification.

3. Greptile P2: 'Non-preferred Anthropic model constant. CLAUDE.md says
   to prefer sonnet-4.6 for Anthropic test constants.' Swap
   claude-opus-4-6 → claude-sonnet-4-6 in the two update_plan gating
   fixtures that assert non-openai providers don't auto-enable the
   planning tool. Behavior unchanged; model constant now matches repo
   testing guidance.

Local validation:

- pnpm test src/agents/openclaw-tools.update-plan.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts

29/29 passing.

Refs #64227

* test: rename strict-agentic blocked-exit liveness regression to match blocked state

Addresses loop-7 Copilot finding on PR #64679: loop 6 changed the
assertion to livenessState === 'blocked' to match the rest of the
hard-error terminal branches in run.ts, but the test title still said
'abandoned liveness state', which made failures and test output
misleading. Rename the test title to match the asserted value. No
code change beyond the it(...) title.

Validation: pnpm test src/agents/pi-embedded-runner/run.incomplete-turn.test.ts
(19/19 pass).

Refs #64227

* agents: widen strict-agentic auto-activation to handle prefixed and variant GPT-5 model ids

* Align strict-agentic retry matching

* runtime: harden strict-agentic model matching

---------

Co-authored-by: Eva <eva@100yen.org>
2026-04-12 16:36:11 -07:00
Peter Steinberger
b42937908d chore(release): prepare 2026.4.12-beta.1 2026-04-13 00:20:52 +01:00
Peter Steinberger
feb8e1e81f fix(test): remove duplicate trace directive fixtures 2026-04-13 00:20:52 +01:00
Peter Steinberger
9dbbee8a02 fix(test): align trace directive type stubs 2026-04-13 00:20:52 +01:00
Onur Solmaz
4503a43b90 Config: stabilize bundled channel metadata loading 2026-04-13 00:26:44 +02:00
Onur Solmaz
b2f94d9bb8 Config: refresh generated release baselines 2026-04-13 00:13:42 +02:00
pashpashpash
383c854313 CI: fix mainline regression blockers (#65269)
* MSTeams: align logger test expectations

* Gateway: fix CI follow-up regressions

* Config: refresh generated schema baseline

* VoiceCall: type webhook test doubles

* CI: retrigger blocker workflow

* CI: retrigger retry workflow

* Agents: fix current mainline agentic regressions

* Agents: type auth controller test mock

* CI: retrigger blocker validation

* Agents: repair OpenAI replay pairing order
2026-04-13 06:18:37 +09:00
scoootscooob
94ef2f1b0d CLI: detect env-backed audio providers (#65491)
* CLI: detect env-backed audio providers

* fix(cli): trust audio provider env detection

* Secrets: keep default provider env lookups stable

* Plugins: harden env-backed auth defaults

* Plugins: tighten trusted env var lookups

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 14:04:44 -07:00
Peter Steinberger
f619368769 test: lazy-load auth and gateway fixtures 2026-04-12 20:17:42 +01:00
Peter Steinberger
5d9a04d4c1 perf: lazy-load session store helpers 2026-04-12 20:17:42 +01:00
Peter Steinberger
fbaa7a34fa test: stabilize doctor streaming migration expectations 2026-04-12 12:17:20 -07:00
Peter Steinberger
e4841d767d test: stabilize loaded full-suite checks 2026-04-12 11:52:56 -07:00
Peter Steinberger
cb5a25d8d8 fix(discord): normalize legacy streaming aliases 2026-04-12 11:52:56 -07:00
Peter Steinberger
2c590bdbc4 test(gateway): align sessions send auth token 2026-04-12 11:52:33 -07:00
Peter Steinberger
35b0586cb1 build: update A2UI bundle hash 2026-04-12 11:41:24 -07:00
Peter Steinberger
903f771c93 fix: align trace protocol artifacts 2026-04-12 11:41:24 -07:00
Vincent Koc
0fd9aa8e00 refactor(plugins): centralize manifest owner trust policy (#65459)
* refactor(plugins): share manifest owner policy helpers

* test(plugins): cover activated manifest owner policy

* fix(plugins): honor explicit disable in setup discovery
2026-04-12 19:36:03 +01:00
Peter Steinberger
c8347e70da fix: align trace directive types 2026-04-12 11:30:44 -07:00
Peter Steinberger
e76c2812b7 style: apply oxfmt 2026-04-12 11:28:43 -07:00
Peter Steinberger
67af6f0baf fix: restore main CI checks 2026-04-12 11:28:43 -07:00
Marcus Castro
aa023e4283 refactor(whatsapp): centralize account connection lifecycle (#65427)
* refactor(whatsapp): centralize account connection lifecycle

* fix(whatsapp): harden controller open failure cleanup

* refactor(whatsapp): remove active listener fallback path

* fix(whatsapp): isolate controller registry state

* debug(whatsapp): trace typing presence updates

* docs(changelog): add whatsapp lifecycle fix note

* debug(whatsapp): log global presence mode

* chore(whatsapp): remove debug presence logs

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 15:24:49 -03:00
Tak Hoffman
c37e49f275 Add /trace toggle and fix Active Memory diagnostics 2026-04-12 13:20:22 -05:00
Vincent Koc
dda70915a0 fix(test): align gateway early runtime stubs 2026-04-12 19:15:08 +01:00
Vincent Koc
d4fb7d893d fix(ci): repair main tsgo regressions 2026-04-12 19:14:00 +01:00
Peter Steinberger
067f27f6a2 fix: normalize stale qmd binary paths 2026-04-12 19:08:59 +01:00
Peter Steinberger
19d8069aea fix: lazy-start gateway mcp loopback 2026-04-12 19:08:58 +01:00
Marcus Castro
000fc7f233 refactor(qa): add shared QA channel contract and harden worker startup (#64562)
* refactor(qa): add shared transport contract and suite migration

* refactor(qa): harden worker gateway startup

* fix(qa): scope waits and sanitize shutdown artifacts

* fix(qa): confine artifacts and redact preserved logs

* fix(qa): block symlink escapes in artifact paths

* fix(gateway): clear shutdown race timers

* fix(qa): harden shutdown cleanup paths

* fix(qa): sanitize gateway logs in thrown errors

* fix(qa): harden suite startup and artifact paths

* fix(qa): stage bundled plugins from mutated config

* fix(qa): broaden gateway log bearer redaction

* fix(qa-channel): restore runtime export

* fix(qa): stop failed gateway startups as a process tree

* fix(qa-channel): load runtime hook from api surface
2026-04-12 15:02:57 -03:00
Vincent Koc
fcae3bf943 fix(agents): preserve active-turn queued user prompts (#65478)
* fix(agents): preserve active-turn queued user prompts

* Update src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-12 19:02:55 +01:00
Peter Steinberger
4df9772b6e fix: trim timezone suffix from pretty logs 2026-04-12 18:58:27 +01:00
Peter Steinberger
87fa88ac3d fix: use literal runtime import for compaction 2026-04-12 18:56:27 +01:00
Peter Steinberger
e24b80b15e fix: clarify escaped skill path warnings 2026-04-12 10:53:31 -07:00
Vincent Koc
6437aa8532 fix(inbound-meta): unblock Claude CLI and scrub NULs (#65467)
* fix(inbound-meta): rename schema and scrub NULs

* fix(inbound-meta): harden untrusted context blocks

* fix(inbound-meta): preserve fenced metadata blocks

* fix(inbound-meta): cap untrusted context payloads
2026-04-12 18:52:48 +01:00
Peter Steinberger
15b86ac6d0 fix: narrow qmd defaults and clawblocker memory 2026-04-12 18:52:06 +01:00
saram ali
7995e408ce fix(discord): clear stale heartbeat timers in SafeGatewayPlugin.connect() (#65087)
* fix(discord): clear stale heartbeat timers in SafeGatewayPlugin.connect()

The @buape/carbon@0.15.0 heartbeat setup has a race where stopHeartbeat()
runs before heartbeatInterval is assigned, leaving a stale setInterval with
a closed reconnectCallback. When the stale interval fires ~41s later it
throws an uncaught exception that bypasses the EventEmitter error path and
crashes the gateway process via process.on('uncaughtException').

Add a connect() override in SafeGatewayPlugin that unconditionally clears
both heartbeatInterval and firstHeartbeatTimeout before calling super. The
parent's connect() only calls stopHeartbeat() when isConnecting=false; when
isConnecting=true it returns early without clearing — this override fills
that gap.

Fixes #65009. Related: #64011, #63387, #62038.

* test(discord): assert super.connect() delegation in SafeGatewayPlugin tests

* fix(ci): update raw-fetch allowlist line numbers for gateway-plugin.ts

The connect() override added in the heartbeat fix shifted the two
pre-existing fetch() callsites from lines 370/436 to 387/453.

* docs(changelog): add discord heartbeat crash note

* test(cli): align plugin registry load-context mock

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 18:40:04 +01:00
Peter Steinberger
a8e140e395 chore: bump version to 2026.4.12 2026-04-12 10:37:18 -07:00
Vincent Koc
9259e593e6 test(gateway): share transcript event waiters 2026-04-12 18:33:47 +01:00
Vincent Koc
9c2b094f3f test(gateway): share search session transcript fixtures 2026-04-12 18:32:04 +01:00
Vincent Koc
a24af49100 fix(update-cli): respawn plugin refresh after self-update (#65471)
* fix(update-cli): respawn plugin refresh after self-update

* Update src/cli/update-cli/update-command.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update CHANGELOG.md

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-12 18:26:43 +01:00
Vincent Koc
f00f0a9596 fix(agents): stop leaking session lock exit listeners (#65469)
* fix(agents): stop leaking session lock exit listeners

* Update src/agents/session-write-lock.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-12 18:22:12 +01:00
Vincent Koc
a5aceebc01 test(gateway): share bearer agents list invoke 2026-04-12 18:20:39 +01:00
Vincent Koc
27afd01577 test(gateway): share session history sse helpers 2026-04-12 18:17:50 +01:00
Vincent Koc
686e5976df test(gateway): share preauth hardening setup helpers 2026-04-12 18:04:22 +01:00
Vincent Koc
eddd9a1a1c test(gateway): share silent reconnect rejection assertions 2026-04-12 18:00:49 +01:00
Vincent Koc
b35becfb1d test(gateway): share plugin approval no-route context 2026-04-12 17:59:17 +01:00
Vincent Koc
2c5290a7b1 test(gateway): share paired ios operator fixture 2026-04-12 17:57:55 +01:00
Vincent Koc
ed1744bcaa test(heartbeat): cover isolated cron event consumption 2026-04-12 17:55:36 +01:00