Commit Graph

55 Commits

Author SHA1 Message Date
Peter Steinberger
22cb7fb6b7 chore(lint): enable no-promise-executor-return 2026-05-31 23:06:13 +01:00
Ashd.LW.
bc77f7a00a fix(gateway): explain ignored restart signal
Add actionable operator guidance when an unauthorized SIGUSR1 gateway restart is ignored because unmanaged restart is disabled.

The change is log-only: restart authorization and scheduling semantics are unchanged, and the existing run-loop test now asserts both the reason warning and the recovery hint.

Refs #79577
Refs #78110
Refs #82433

Co-authored-by: wAngByg <281221101+wAngByg@users.noreply.github.com>
2026-05-30 18:38:35 +01:00
Peter Steinberger
bb46b79d3c refactor: internalize OpenClaw agent runtime (#85341)
* refactor: extract agent core package

Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts.

* refactor: extract shared llm runtime

Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout.

* refactor: remove pi runtime internals

Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code.

* refactor: tighten agent session runtime

Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts.

* refactor: remove static model and pi auth paths

Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities.

* refactor: remove legacy provider compat paths

* docs: remove agent parity notes

* fix: skip provider wildcard metadata parsing

* refactor: share session extension sdk loading

* refactor: inline acpx proxy error formatter

* refactor: fold edit recovery into edit tool

* fix: accept extension batch separator

* test: align startup provider plugin expectations

* fix: restore provider-scoped release discovery

* test: align static asset packaging expectations

* fix: run static provider catalogs during scoped discovery

* fix: add provider entry catalogs for scoped live discovery

* fix: load lightweight provider catalog entries

* fix: refresh provider-scoped plugin metadata

* fix: keep provider catalog entries on release live path

* fix: keep static manifest models in release live checks

* fix: harden release model discovery

* fix: reduce OpenAI live cache probe reasoning

* fix: disable OpenAI cache probe reasoning

* ci: extend OpenAI gateway live timeout

* fix: extend live gateway model budget

* fix: stabilize release validation regressions

* fix: honor provider aliases in model rows

* fix: stabilize release validation lanes

* fix: stabilize release memory qa

* ci: stabilize release validation lanes

* ci: prefer ipv4 for live docker node calls

* fix: restore shared tool-call stream wrapper

* ci: remove legacy pi test shard alias

* fix: clean up embedded agent test drift

* fix: stabilize runtime alias status

* fix: clean up embedded agent ci drift

* fix: restore release ci invariants

* fix: clean up post-rebase runtime drift

* fix: restore release ci checks

* fix: restore release ci after rebase

* fix: remove stale pi runtime path

* test: align compaction runtime expectations

* test: update plugin prerelease expectations

* fix: handle claude live tool approvals

* fix: stabilize release validation gates

* fix: finish agent runtime import

* test: finish post-rebase agent runtime mocks

* fix: keep codex compaction native

* fix: stabilize codex app-server hook tests

* test: isolate codex diagnostic active run

* test: remove codex diagnostic completion race

# Conflicts:
#	extensions/codex/src/app-server/run-attempt.test.ts

* ci: fix full release manifest performance run id

* refactor: narrow llm plugin sdk boundary

* chore: drop generated google boundary stamps

* fix: repair rebase fallout

* fix: clean up rebased runtime references

* fix: decode codex jwt payloads as base64url

* fix: preserve shipped pi runtime alias

* fix: add scoped sdk virtual modules

* fix: decode llm codex oauth jwt as base64url

* fix: avoid stale vertex adc negative cache

* fix: harden tool arg decoding and codeql path

* fix: keep vertex adc negative checks live

* refactor: consolidate codex jwt and edit helpers

* fix: await codex oauth node runtime imports

* fix: preserve sdk tool and notice contracts

* fix: preserve shipped compat config boundaries

* fix: align codex oauth callback host

* fix: terminate agent-core loop streams on failure

* fix: keep codex oauth callback alive during fallback

* ci: include session tools in critical codeql scans

* fix: keep Cloudflare Anthropic provider auth header

* docs: redirect legacy pi runtime pages

* fix: honor bundled web provider compat discovery

* fix: protect session output spill files

* fix: keep legacy agent dir env blocked

* fix: contain auto-discovered skill symlinks

* fix: harden agent core sdk proxy surfaces

* fix: restore approval reaction sdk compat

* fix: keep live docker runs bounded

* fix: keep codex oauth redirect host aligned

* fix: resolve post-rebase agent runtime drift

* fix: redact anthropic oauth parse failures

* fix: preserve responses strict tool shaping

* fix: repair agent runtime rebase cleanup

* docs: redirect retired parity pages

* fix: bound auto-discovered resources to roots

* fix: repair post-rebase agent test drift

* fix: preserve bundled provider allowlist migration

* fix: preserve manifest-owned provider aliases

* fix: declare photon image dependency

* fix: keep provider headers out of proxy body

* fix: preserve shipped env aliases

* fix: refresh control ui i18n generated state

* fix: quote read fallback paths

* fix: preview edits through configured backend

* test: satisfy core test typecheck

* fix: preserve ZAI usage auth fallback

* test: repair codex diagnostic test

* fix: repair agent runtime rebase drift

* test: finish embedded runner import rename

* fix: repair agent runtime rebase integrations

* test: align compaction oauth fallback expectations

* fix: allow sdk-auth session models

* fix: update doctor tool schema import

* fix: preserve bedrock plugin region

* fix: stream harmony-like prose immediately

* ci: include session runtime in codeql shards

* fix: repair latest rebase integrations

* fix: honor explicit codex websocket transport

* fix: keep openai-compatible credentials provider-scoped

* fix: refresh sdk api baseline after rebase

* fix: route cli runtime aliases through openclaw harness

* test: rename stale harness mock expectation

* test: rename embedded agent overflow calls

* test: clean embedded auth test wording

* test: use openclaw stream types in deepinfra cache test

* fix: refresh sdk api baseline on latest main

* fix: honor bundled discovery compat allowlists

* fix: refresh sdk api baseline after latest rebase

* fix: remove stale rebase imports

* test: rename stale model catalog mock

* test: mock renamed doctor runtime modules

* fix: map canonical kimi env auth

* fix: use internal model registry in bench script

* fix: migrate deepinfra provider catalog entry

* fix: enforce builtin tool suppression

* fix: route compaction auth and proxy payloads safely

* refactor: prune unused llm registry leftovers

* test: update codex hooks session import

* test: fix model picker ci coverage

* test: align model picker auth mock types
2026-05-27 19:24:04 +01:00
YUHAO-corn
177ebdc24c fix(gateway): ignore inherited launchd env for respawn 2026-05-25 19:03:37 +01:00
Dirk
90caa3b610 fix(gateway): clear runtime config snapshot before in-process restart (#86388)
After config.patch writes new values to openclaw.json, a subsequent
SIGUSR1 in-process restart could overwrite them with a stale snapshot.

Root cause: run-loop's onIteration hook resets lanes and task registry,
but leaves the runtimeConfigSnapshot intact. loadConfig() then returns
the old snapshot via loadPinnedRuntimeConfig() instead of re-reading disk.

Fix: clearRuntimeConfigSnapshot() in the restart iteration hook so the
next startup reads fresh config from disk.

Refs #86350
2026-05-25 12:20:47 +01:00
Kaspre
6008375655 fix(gateway): honor restart drain budget for embedded runs
Honor configured restart drain budgets for embedded runs and avoid a second active-work drain after forced deferral timeout restarts.

Includes maintainer changelog entry.
2026-05-24 04:22:27 +01:00
Sebastien Tardif
99e44f623e fix(gateway): add .catch() to SIGTERM/SIGUSR1 signal handlers (#83131)
The SIGTERM handler's fire-and-forget IIFE can reject if the graceful
drain or tunnel-teardown throws. Without a catch, this becomes an
unhandled promise rejection. Add .catch() that logs the error and
falls back to a hard stop request. Same treatment for SIGUSR1.

Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
2026-05-22 19:47:09 +01:00
Vincent Koc
5aac7939db fix(gateway): drain replies during restart close 2026-05-17 18:12:52 +08:00
Josh Avant
99a269f8b4 fix(agents): mark interrupted sessions before restart (#82772)
* fix(agents): mark interrupted sessions before restart

* docs: add restart recovery changelog

* fix(agents): satisfy restart recovery type checks
2026-05-16 18:11:35 -05:00
Peter Steinberger
6369bf64cd fix(gateway): trace restart intent reasons 2026-05-16 21:23:06 +01:00
Peter Steinberger
d77c4bbb2d fix(gateway): harden startup restart queue (#82660) (thanks @samzong) 2026-05-16 18:57:58 +01:00
samzong
9b53a95d8e fix(gateway): queue startup restart signals
Signed-off-by: samzong <samzong.lu@gmail.com>
2026-05-16 18:57:58 +01:00
Peter Steinberger
605a2c87ae fix: carry gateway restart trace across respawn (#82396) (thanks @samzong) 2026-05-16 13:42:50 +01:00
Josh Avant
a1e208ee26 Recover stale embedded tool calls during gateway diagnostics (#82369)
* fix(gateway): recover stale embedded tool calls

* chore(changelog): note stale embedded tool recovery
2026-05-15 19:36:59 -05:00
Peter Steinberger
060768ef75 test: dedupe gateway run loop mock read 2026-05-13 10:33:35 +01:00
Peter Steinberger
dad240eecd test: guard gateway run loop mock calls 2026-05-12 07:20:43 +01:00
Shakker
5c65a22c8c test: count gateway lock release 2026-05-12 04:58:02 +01:00
Shakker
1d5785ba85 test: assert gateway restart handoffs 2026-05-11 13:02:47 +01:00
Val Alexander
fa79e9754e fix(gateway): harden macOS update restart lifecycle
Summary:
- Clear stale SIGUSR1 restart state before rejected or externally allowed restart handling can leave an in-flight token stuck.
- Verify the live gateway version after macOS package-update service refreshes and skip redundant restarts when the refreshed LaunchAgent already serves the expected version.
- Set generated LaunchAgents to a 10s throttle plus 20s shutdown window and widen gateway bind retries around supervisor-owned restarts.

Fixes #79577. Refs #78699 and #60885.

Verification:
- pnpm test src/cli/gateway-cli/run-loop.test.ts src/infra/infra-runtime.test.ts
- pnpm test src/cli/update-cli.test.ts src/daemon/launchd.test.ts src/gateway/server/http-listen.test.ts
- pnpm exec oxfmt --check --threads=1 src/cli/gateway-cli/run-loop.ts src/cli/gateway-cli/run-loop.test.ts
- pnpm check:changed
- Crabbox/Blacksmith wrapper smoke passed focused tests plus pnpm check:changed: https://github.com/openclaw/openclaw/actions/runs/25595985603
- PR CI was green before upstream main advanced; the latest rebased heads hit unrelated broad lint failures also reproduced on current main CI (for example https://github.com/openclaw/openclaw/actions/runs/25598671666). No failing lint diagnostics referenced this gateway/update diff.
2026-05-09 05:21:17 -05:00
Peter Steinberger
627b0073f2 test: remove gateway restart delay wait 2026-05-06 07:02:27 +01:00
Shakker
acb0acd8dd fix: add gateway supervisor restart handoff 2026-05-05 08:38:00 +01:00
Satoshi F.
103cdd9d96 fix(gateway): add safe restart coordinator (#76923)
Add a safe restart coordinator that preflights active Gateway work before restart.

- expose gateway.restart.preflight and gateway.restart.request RPC methods
- add explicit openclaw gateway restart --safe / openclaw daemon restart --safe path
- narrow restart blockers to running non-ended tasks so queued records no longer block indefinitely
- keep existing restart behavior unchanged; --force remains the immediate override

Co-authored-by: NikolaFC <54186359+NikolaFC@users.noreply.github.com>
Co-authored-by: galiniliev <5711535+galiniliev@users.noreply.github.com>
2026-05-04 10:58:36 -07:00
Vincent Koc
f6f8d74419 fix(gateway): expose restart drain controls 2026-05-02 14:43:59 -07:00
Peter Steinberger
ed8f50f240 refactor: simplify plugin dependency handling
Simplify plugin installation and runtime loading around package-manager-owned dependencies, with Jiti reserved for local/TS fallback paths.

Also scans npm plugin install roots so hoisted transitive dependencies are covered by dependency denylist and node_modules symlink checks.
2026-05-01 21:32:22 +01:00
Vincent Koc
1f41b8b44b fix(gateway): bound default restart deferral 2026-04-28 18:42:49 -07:00
Peter Steinberger
a0c850d188 test: stabilize gateway restart loop signals 2026-04-28 02:37:24 +01:00
Peter Steinberger
e651809084 perf: slim gateway startup imports 2026-04-28 02:26:27 +01:00
Peter Steinberger
59faa023fe fix(gateway): unblock sidecar startup 2026-04-27 21:34:44 +01:00
Peter Steinberger
7f3f108521 refactor(config): migrate plugin config access 2026-04-27 12:35:58 +01:00
Samuel Rodda
6c252cc54c fix(update): require applied gateway restarts
Require Control UI updates to observe a real gateway process replacement, surface skipped/error update outcomes, and verify the running gateway version after restart.\n\nAdds update.status restart-sentinel plumbing, docs, generated protocol model updates, and changelog attribution.\n\nLocal verification:\n- pnpm test src/gateway/server-methods/update.test.ts src/cli/gateway-cli/run-loop.test.ts src/infra/restart-sentinel.test.ts src/infra/process-respawn.test.ts src/infra/update-runner.test.ts ui/src/ui/app-gateway.node.test.ts ui/src/ui/controllers/config.test.ts\n- git diff --check\n- pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/gateway/protocol.md docs/gateway/configuration.md docs/web/control-ui.md\n- pnpm docs:check-mdx
2026-04-27 04:07:43 -05:00
Peter Steinberger
1ea12fe3e2 fix: stage bundled plugin runtime deps safely 2026-04-27 06:16:26 +01:00
Vincent Koc
ec1f72b6c5 fix(gateway): preserve restart drain for active runs
Fixes https://github.com/openclaw/openclaw/issues/65485
2026-04-25 01:35:47 -07:00
Peter Steinberger
a00b01f5ed fix: harden complex qa suite scenarios 2026-04-07 20:35:39 +01:00
Peter Steinberger
a65ab607c7 fix(gateway): use launchd KeepAlive restarts 2026-04-05 07:43:37 +01:00
Peter Steinberger
fe5819887b refactor(gateway): centralize discovery target handling 2026-03-23 00:38:31 -07:00
Peter Steinberger
deecf68b59 fix(gateway): fail closed on unresolved discovery endpoints 2026-03-23 00:27:37 -07:00
Peter Steinberger
680eff63fb fix: land SIGUSR1 orphan recovery regressions (#47719) (thanks @joeykrug) 2026-03-15 22:32:36 -07:00
Charles Dusek
54be30ef89 fix(agents): bound compaction retry wait and drain embedded runs on restart (#40324)
Merged via squash.

Prepared head SHA: cfd99562d6
Co-authored-by: cgdusek <38732970+cgdusek@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
2026-03-09 08:27:29 -07:00
Peter Steinberger
3caab9260c test: narrow gateway loop signal harness 2026-03-09 07:42:15 +00:00
Peter Steinberger
cc0f30f5fb test: fix windows runtime and restart loop harnesses 2026-03-09 07:22:23 +00:00
Peter Steinberger
c397a02c9a fix(queue): harden drain/abort/timeout race handling
- reject new lane enqueues once gateway drain begins
- always reset lane draining state and isolate onWait callback failures
- persist per-session abort cutoff and skip stale queued messages
- avoid false 600s agentTurn timeout in isolated cron jobs

Fixes #27407
Fixes #27332
Fixes #27427

Co-authored-by: Kevin Shenghui <shenghuikevin@github.com>
Co-authored-by: zjmy <zhangjunmengyang@gmail.com>
Co-authored-by: suko <miha.sukic@gmail.com>
2026-02-26 13:43:39 +01:00
Peter Steinberger
2f8c68ae4d refactor(test): dedupe run-loop signal harness setup 2026-02-22 21:19:09 +00:00
Peter Steinberger
e6383a2c13 fix(gateway): probe port liveness for stale lock recovery
Co-authored-by: Operative-001 <261882263+Operative-001@users.noreply.github.com>
2026-02-22 22:11:51 +01:00
Peter Steinberger
296b19e413 test: dedupe gateway browser discord and channel coverage 2026-02-22 17:11:54 +00:00
Peter Steinberger
ee7a43b895 test: replace slow gateway SIGTERM integration coverage 2026-02-22 17:06:35 +00:00
Peter Steinberger
edaa5ef7a5 refactor(gateway): simplify restart flow and expand lock tests 2026-02-22 10:44:47 +01:00
Peter Steinberger
dd07c06d00 fix: tighten gateway restart loop handling (#23416) (thanks @jeffwnli) 2026-02-22 10:38:32 +01:00
jeffr
01bd83d644 fix: release gateway lock before process.exit in run-loop
process.exit() called from inside an async IIFE bypasses the outer
try/finally block that releases the gateway lock. This leaves a stale
lock file pointing to a zombie PID, preventing the spawned child or
systemctl restart from acquiring the lock. Release the lock explicitly
before calling exit in both the restart-spawned and stop code paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 10:38:32 +01:00
cpojer
048e29ea35 chore: Fix types in tests 45/N. 2026-02-17 15:50:07 +09:00
cpojer
f2f17bafbc chore: Fix types in tests 30/N. 2026-02-17 14:32:57 +09:00