Commit Graph

246 Commits

Author SHA1 Message Date
Vincent Koc
7c6ad2327c fix(infra): narrow inherited gateway pid protection 2026-06-16 11:11:25 +08:00
amittell
d88f1bf217 fix(infra): preserve inherited gateway PID across reparent during cleanStaleGatewayProcessesSync
When a child openclaw process is spawned via a backgrounded subshell that
exits before the new process reaches the stale-pid sweep, the new process
is reparented to the supervisor (PID 1 / launchd) and the ancestor walk
in getSelfAndAncestorPidsSync can no longer see the running gateway. The
running gateway then shows up on lsof as an unrelated sibling on the
port and gets SIGKILL'd by cleanStaleGatewayProcessesSync, recreating
the issue #68451 supervisor restart loop across a reparent boundary.

Real-world trigger: a user ~/.zshrc auto-start block
  if ! pgrep -x openclaw-gateway >/dev/null; then
    (openclaw gateway >/dev/null 2>&1 &)
  fi
combined with codex per-turn `zsh -c "set -e; . shell_snapshot"` invocations
caused every chat turn on rh-bot to SIGKILL its launchd-managed gateway,
producing HTTP 000 errors and ~33 kill events captured by a forensic
launchd unified-log tracker before the zshrc was patched.

Fix: gateway-cli captures OPENCLAW_GATEWAY_SERVICE_PID from inherited env
BEFORE overwriting it with process.pid, then threads the captured PID
through cleanStaleGatewayProcessesSync into getSelfAndAncestorPidsSync's
exclusion set. The protection is opt-in per call site so existing
maintainer paths (openclaw update / openclaw doctor restart helpers) keep
their ability to terminate a running gateway intentionally.

The inherited-PID parser is strict positive-integer only: a malformed
inherited env value (`"123abc"`, `"123.4"`, `"0x7b"`, etc.) is rejected
rather than silently protecting PID 123 from cleanup and leaving the
stale listener alive. New focused unit tests cover the parser
contract.

Existing regression tests cover the reparent suicide-kill scenario and
the defensive ignore-non-positive-PID contract on the cleanup side.
2026-06-16 11:11:25 +08:00
Vincent Koc
b41c0b6746 fix(cli): preserve gateway request errors in json mode 2026-06-16 01:52:23 +02:00
Shakker
5c2487dc9a test: type cli test mocks 2026-06-15 19:34:16 +01:00
Chunyue Wang
df521a6459 fix(gateway): guard fast-path startup migrations (#93118)
* fix(cron): run legacy cron store migration in gateway fast path

* fix(cli): run gateway startup migrations

* fix(gateway): guard startup migrations and config selection

* fix(gateway): reconcile final startup environment

* fix(gateway): preserve guarded startup env semantics

* fix(gateway): guard service-mode recovery candidates

* fix(config): reconcile normalized env precedence

* fix(cli): clear replaced proxy signal handlers

* fix(gateway): reject invalid final config

* test(gateway): cover invalid future config reset guard

* test(cli): remove unused recovery state
2026-06-16 01:54:29 +08:00
ghitafilali
b2c5e790b4 fix: restart gateway after isolated cron setup timeout
Restart the gateway after isolated cron setup timeouts and harden stale cron-task finalization around restart/reload boundaries.

Verification: focused cron/tasks/CLI regression suites, gateway filesystem/session regression suite, test typecheck, core lint shards, git diff --check, autoreview, Blacksmith Testbox changed gate tbx_01kv2srgbex71w9ce5rwv2wtr4, and clean GitHub PR checks on 13d06b5d6f.
2026-06-14 18:26:27 +08:00
ooiuuii
d20fdf3b38 fix(gateway): mark active main sessions before restart shutdown aborts (#91357)
* Mark active main sessions during restart shutdown

* Type restart marker mock in close tests

* fix(gateway): preserve active run ownership across restart

* fix(gateway): preserve active runs across restart

* fix(gateway): close restart recovery edge cases

* fix(cron): preserve lifecycle ownership across restart

* fix(gateway): release rejected run contexts

* fix(gateway): preserve restart lifecycle ownership

* fix(cron): retain overlapping run ownership

* fix(agents): preserve restart terminal precedence

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-06-13 10:49:17 -07:00
Josh Avant
f385491c23 fix: clarify gateway SecretRef auth diagnostics (#92290)
* fix gateway secretref health diagnostics

* fix gateway health result type narrowing
2026-06-12 11:18:22 -05:00
openclaw-clownfish[bot]
99d0bdc23a fix(cli): validate gateway RPC timeout inputs
Reject malformed or explicit empty Gateway RPC timeout values before opening Gateway calls, align the shared Gateway RPC omitted-timeout fallback with the 30000 ms CLI default, and validate explicit `cron add --timeout-seconds` values at the CLI boundary.

Carries forward the useful source work from #54646 and the earlier timeout-validation context from #40953. #60661 remains separate accepted-run timeout semantics work and is intentionally not folded into this change.

Validation:
- `npm run review-results -- /tmp/clownfish-check-27341769444`
- `git diff --check`
- OpenClaw PR checks on `ce7bd8b9388a5689b14ddc2b3a984f7b4647e5ca`: 132 pass, 0 pending, 0 failing
- ClawSweeper re-review: https://github.com/openclaw/clawsweeper/actions/runs/27344244608

Co-authored-by: RayRuan <43744645+ruanrrn@users.noreply.github.com>
Co-authored-by: Homeran <11574611+comeran@users.noreply.github.com>
2026-06-11 20:52:07 +09:00
Alex Knight
3faf669801 Redact tool output secrets (#85196)
* redact tool output secrets

* Expand tool-output secret redaction

* fix(security): keep redaction prefilter in sync with expanded defaults

- build DEFAULT_REDACT_PREFILTER_RE from sources covering every default
  pattern family: new vendor prefixes, webhook hosts, bare query/form keys,
  userinfo/connection-string passwords, and percent/plus/invisible
  obfuscated keys (including trailing separator splices)
- run default-pattern redaction tests through the default options path and
  redact the vendor corpus per token so prefilter gaps fail tests
- fix quoted standalone assignment values containing the other quote char
  or an unterminated quote; never re-mask *** placeholders
- align net-policy URL query-name separator stripping with logging key
  normalization (Hangul fillers)

* fix(security): keep base64-prefix redaction out of media payloads

- pure-base64-alphabet token prefixes (gAAAA, AKIA, ASIA, dapi,
  ATCTT3xFfG, ATATT, ATBB) now require a non-alphanumeric left boundary,
  skip explicit ;base64, payload spans, and run unchunked so chunk starts
  cannot fake the boundary or hide the container from the lookbehind
- tokens after URL/path delimiters or assignments still mask; data-URL
  media survives redaction byte-identical (fixes chat media mirror CI)
- regression tests: tiny-PNG data URL, in-blob plus boundary,
  chunk-aligned large data URL, reset-path Fernet token, path AWS key

---------

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
2026-06-11 07:34:50 +10:00
Vincent Koc
7f1d82ab25 revert(sessions): defer session metadata sqlite
Reverts 538d36eaaa while preserving subsequent main changes. The beta-only SQLite downgrade rescue and reverse migration remain excluded.
2026-06-10 16:34:06 +09:00
Peter Steinberger
538d36eaaa refactor: move session metadata to SQLite (#91322)
* refactor: move session metadata to sqlite

* test: seed session stores with sqlite fixtures

* test: seed remaining session stores with sqlite fixtures

* fix: stabilize sqlite session cache freshness

* test: seed cli transcript metadata in sqlite
2026-06-07 23:17:35 -07:00
Peter Steinberger
b59b34f9d5 docs: document cli service tests 2026-06-04 19:39:51 -04:00
Peter Steinberger
69c27677f6 docs: document gateway cli helpers 2026-06-04 10:56:59 -04:00
Peter Steinberger
cb5d43ba95 docs: document cli utility helpers 2026-06-04 10:55:01 -04:00
Peter Steinberger
c8d21fe7f0 fix: recover suspicious gateway startup configs (#89480) 2026-06-02 10:12:35 -04:00
Peter Steinberger
27dde7a4d6 chore(lint): enable stricter error rules 2026-06-01 01:12:21 +01:00
Peter Steinberger
22cb7fb6b7 chore(lint): enable no-promise-executor-return 2026-05-31 23:06:13 +01:00
Peter Steinberger
b653d94918 chore(lint): enable no-useless-assignment 2026-05-31 22:40:48 +01:00
Peter Steinberger
f5eca3f84c chore(lint): enable object and reassignment rules 2026-05-31 09:32:52 +01:00
Peter Steinberger
deb7bc6539 chore(lint): enable readability lint rules 2026-05-31 07:17:57 +01:00
Peter Steinberger
00d8d7ead0 refactor: extract normalization core package
Extract shared normalization/coercion helpers into private @openclaw/normalization-core workspace package while preserving existing plugin SDK helper subpaths.\n\nAlso keeps direct normalization-core imports internal, wires UI/build/loader resolution, and replaces the slow PR network CodeQL lane with a fast added-line boundary scan while retaining full CodeQL for scheduled/manual runs.\n\nVerification: local moved tests, plugin SDK boundary tests, extension loader tests, agents-support shard, UI build/test, build artifacts, lint, workflow guards, autoreview, and GitHub CI passed on PR head 963d893715.
2026-05-31 01:33:00 +01:00
Ashd.LW.
bc77f7a00a fix(gateway): explain ignored restart signal
Add actionable operator guidance when an unauthorized SIGUSR1 gateway restart is ignored because unmanaged restart is disabled.

The change is log-only: restart authorization and scheduling semantics are unchanged, and the existing run-loop test now asserts both the reason warning and the recovery hint.

Refs #79577
Refs #78110
Refs #82433

Co-authored-by: wAngByg <281221101+wAngByg@users.noreply.github.com>
2026-05-30 18:38:35 +01:00
Peter Steinberger
de1dfab03e refactor: move terminal core into package (#88279)
* refactor: move terminal core into package

* refactor: move terminal module files

* fix: clean terminal package CI followups

* test: update lint suppression allowlist

* fix: ship terminal core runtime aliases
2026-05-30 11:07:45 +02:00
Peter Steinberger
b1117d9862 refactor: extract gateway client package (#87797)
* refactor: extract gateway client package

* chore: drop generated gateway package artifacts

* refactor: move gateway protocol package

* refactor: remove old gateway protocol tree

* test: keep auth compat split in run mode

* test: expose gateway wrapper options for internals

* fix: watch moved gateway package sources

* test: normalize slash command import guard

* chore: teach knip gateway package entries

* ci: route gateway client package checks

* fix: reuse ipaddr for gateway client hosts

* fix: sync gateway protocol usage schema
2026-05-29 02:23:42 +01:00
Peter Steinberger
b2fdbc53e8 fix: parse qa parent pid strictly 2026-05-28 14:41:02 -04:00
Peter Steinberger
2122dccb91 fix: parse gateway usage days strictly 2026-05-28 13:31:45 -04:00
Peter Steinberger
bb46b79d3c refactor: internalize OpenClaw agent runtime (#85341)
* refactor: extract agent core package

Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts.

* refactor: extract shared llm runtime

Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout.

* refactor: remove pi runtime internals

Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code.

* refactor: tighten agent session runtime

Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts.

* refactor: remove static model and pi auth paths

Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities.

* refactor: remove legacy provider compat paths

* docs: remove agent parity notes

* fix: skip provider wildcard metadata parsing

* refactor: share session extension sdk loading

* refactor: inline acpx proxy error formatter

* refactor: fold edit recovery into edit tool

* fix: accept extension batch separator

* test: align startup provider plugin expectations

* fix: restore provider-scoped release discovery

* test: align static asset packaging expectations

* fix: run static provider catalogs during scoped discovery

* fix: add provider entry catalogs for scoped live discovery

* fix: load lightweight provider catalog entries

* fix: refresh provider-scoped plugin metadata

* fix: keep provider catalog entries on release live path

* fix: keep static manifest models in release live checks

* fix: harden release model discovery

* fix: reduce OpenAI live cache probe reasoning

* fix: disable OpenAI cache probe reasoning

* ci: extend OpenAI gateway live timeout

* fix: extend live gateway model budget

* fix: stabilize release validation regressions

* fix: honor provider aliases in model rows

* fix: stabilize release validation lanes

* fix: stabilize release memory qa

* ci: stabilize release validation lanes

* ci: prefer ipv4 for live docker node calls

* fix: restore shared tool-call stream wrapper

* ci: remove legacy pi test shard alias

* fix: clean up embedded agent test drift

* fix: stabilize runtime alias status

* fix: clean up embedded agent ci drift

* fix: restore release ci invariants

* fix: clean up post-rebase runtime drift

* fix: restore release ci checks

* fix: restore release ci after rebase

* fix: remove stale pi runtime path

* test: align compaction runtime expectations

* test: update plugin prerelease expectations

* fix: handle claude live tool approvals

* fix: stabilize release validation gates

* fix: finish agent runtime import

* test: finish post-rebase agent runtime mocks

* fix: keep codex compaction native

* fix: stabilize codex app-server hook tests

* test: isolate codex diagnostic active run

* test: remove codex diagnostic completion race

# Conflicts:
#	extensions/codex/src/app-server/run-attempt.test.ts

* ci: fix full release manifest performance run id

* refactor: narrow llm plugin sdk boundary

* chore: drop generated google boundary stamps

* fix: repair rebase fallout

* fix: clean up rebased runtime references

* fix: decode codex jwt payloads as base64url

* fix: preserve shipped pi runtime alias

* fix: add scoped sdk virtual modules

* fix: decode llm codex oauth jwt as base64url

* fix: avoid stale vertex adc negative cache

* fix: harden tool arg decoding and codeql path

* fix: keep vertex adc negative checks live

* refactor: consolidate codex jwt and edit helpers

* fix: await codex oauth node runtime imports

* fix: preserve sdk tool and notice contracts

* fix: preserve shipped compat config boundaries

* fix: align codex oauth callback host

* fix: terminate agent-core loop streams on failure

* fix: keep codex oauth callback alive during fallback

* ci: include session tools in critical codeql scans

* fix: keep Cloudflare Anthropic provider auth header

* docs: redirect legacy pi runtime pages

* fix: honor bundled web provider compat discovery

* fix: protect session output spill files

* fix: keep legacy agent dir env blocked

* fix: contain auto-discovered skill symlinks

* fix: harden agent core sdk proxy surfaces

* fix: restore approval reaction sdk compat

* fix: keep live docker runs bounded

* fix: keep codex oauth redirect host aligned

* fix: resolve post-rebase agent runtime drift

* fix: redact anthropic oauth parse failures

* fix: preserve responses strict tool shaping

* fix: repair agent runtime rebase cleanup

* docs: redirect retired parity pages

* fix: bound auto-discovered resources to roots

* fix: repair post-rebase agent test drift

* fix: preserve bundled provider allowlist migration

* fix: preserve manifest-owned provider aliases

* fix: declare photon image dependency

* fix: keep provider headers out of proxy body

* fix: preserve shipped env aliases

* fix: refresh control ui i18n generated state

* fix: quote read fallback paths

* fix: preview edits through configured backend

* test: satisfy core test typecheck

* fix: preserve ZAI usage auth fallback

* test: repair codex diagnostic test

* fix: repair agent runtime rebase drift

* test: finish embedded runner import rename

* fix: repair agent runtime rebase integrations

* test: align compaction oauth fallback expectations

* fix: allow sdk-auth session models

* fix: update doctor tool schema import

* fix: preserve bedrock plugin region

* fix: stream harmony-like prose immediately

* ci: include session runtime in codeql shards

* fix: repair latest rebase integrations

* fix: honor explicit codex websocket transport

* fix: keep openai-compatible credentials provider-scoped

* fix: refresh sdk api baseline after rebase

* fix: route cli runtime aliases through openclaw harness

* test: rename stale harness mock expectation

* test: rename embedded agent overflow calls

* test: clean embedded auth test wording

* test: use openclaw stream types in deepinfra cache test

* fix: refresh sdk api baseline on latest main

* fix: honor bundled discovery compat allowlists

* fix: refresh sdk api baseline after latest rebase

* fix: remove stale rebase imports

* test: rename stale model catalog mock

* test: mock renamed doctor runtime modules

* fix: map canonical kimi env auth

* fix: use internal model registry in bench script

* fix: migrate deepinfra provider catalog entry

* fix: enforce builtin tool suppression

* fix: route compaction auth and proxy payloads safely

* refactor: prune unused llm registry leftovers

* test: update codex hooks session import

* test: fix model picker ci coverage

* test: align model picker auth mock types
2026-05-27 19:24:04 +01:00
Mason Huang
75221e0550 fix(agents): separate heartbeat runtime template (#85416)
Summary:
- The PR moves the runtime `HEARTBEAT.md` bootstrap template into `src/agents/templates`, keeps docs templates ... or other workspace files, adds a legacy heartbeat-template doctor repair, and updates package guards/tests.
- PR surface: Source +281, Tests +283, Docs +11, Config +1, Other 0. Total +576 across 15 files.
- Reproducibility: yes. from source inspection: current main loads `HEARTBEAT.md` from the docs template, and  ... pty heartbeat file non-empty to the runtime. I did not run a live heartbeat repro in this read-only review.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(doctor): recognize heartbeat docs boilerplate
- PR branch already contained follow-up commit before automerge: fix(agents): update heartbeat workspace test
- PR branch already contained follow-up commit before automerge: fix(doctor): tighten heartbeat template repair

Validation:
- ClawSweeper review passed for head e34e85864c.
- Required merge gates passed before the squash merge.

Prepared head SHA: e34e85864c
Review: https://github.com/openclaw/openclaw/pull/85416#issuecomment-4519851630

Co-authored-by: Mason Huang <masonxhuang@tencent.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: hxy91819
Co-authored-by: hxy91819 <8814856+hxy91819@users.noreply.github.com>
2026-05-27 12:30:22 +00:00
Peter Steinberger
6b391efa4e fix(cli): reject loose model and gateway numeric options 2026-05-27 04:27:02 -04:00
Peter Steinberger
f407e4e498 fix: validate gateway call timeouts 2026-05-27 08:22:01 +01:00
Peter Steinberger
145b57c734 perf(gateway): defer skipped-channel sidecars 2026-05-27 04:20:26 +01:00
Peter Steinberger
77d9ac30bb refactor: reuse shared coercion helpers (#86419)
* refactor: share talk event metric extraction

* refactor: reuse shared coercion helpers

* refactor: reuse shared primitive guards

* refactor: reuse shared record guard

* refactor: reuse shared primitive helpers

* refactor: reuse shared string guards

* refactor: reuse shared non-empty string guard

* refactor: share plugin primitive coercion helpers

* refactor: reuse plugin coercion helpers

* refactor: reuse plugin coercion helpers in more plugins

* refactor: reuse channel coercion helpers

* refactor: reuse monitor coercion helpers

* refactor: reuse provider coercion helpers

* refactor: reuse core coercion helpers

* refactor: reuse runtime coercion helpers

* refactor: reuse helper coercion in codex paths

* refactor: reuse helper coercion in runtime paths

* refactor: reuse codex app-server coercion helpers

* refactor: reuse codex record helpers

* refactor: reuse migration and qa record helpers

* refactor: reuse feishu and core helper guards

* refactor: reuse browser and policy coercion helpers

* refactor: reuse memory wiki record helper

* refactor: share boolean coercion helpers

* refactor: reuse finite number coercion

* refactor: reuse trimmed string list helpers

* refactor: reuse string list normalization

* refactor: reuse remaining string list helpers

* refactor: reuse string entry normalizer

* refactor: share sorted string helpers

* refactor: share string list normalization

* test: preserve command registry browser imports

* refactor: reuse trimmed list helpers

* refactor: reuse string dedupe helpers

* refactor: reuse local dedupe helpers

* refactor: reuse more string dedupe helpers

* refactor: reuse command string dedupe helpers

* refactor: dedupe memory path lists with helper

* refactor: expose string dedupe helpers to plugins

* refactor: reuse core string dedupe helpers

* refactor: reuse shared unique value helpers

* refactor: reuse unique helpers in agent utilities

* refactor: reuse unique helpers in config plumbing

* refactor: reuse unique helpers in extensions

* refactor: reuse unique helpers in core utilities

* refactor: reuse unique helpers in qa plugins

* refactor: reuse unique helpers in memory plugins

* refactor: reuse unique helpers in channel plugins

* refactor: reuse unique helpers in core tails

* refactor: reuse unique helper in comfy workflow

* refactor: reuse unique helpers in test utilities

* refactor: expose unique value helper to plugins

* refactor: reuse unique helpers for numeric lists

* refactor: replace index dedupe filters

* refactor: reuse string entry normalization

* refactor: reuse string normalization in plugin helpers

* refactor: reuse string normalization in extension helpers

* refactor: reuse string normalization in channel parsers

* refactor: reuse string normalization in memory search

* refactor: reuse string normalization in provider parsers

* refactor: reuse string normalization in qa helpers

* refactor: reuse string normalization in infra parsers

* refactor: reuse string normalization in messaging parsers

* refactor: reuse string normalization in core parsers

* refactor: reuse string normalization in extension parsers

* refactor: reuse string normalization in remaining parsers

* refactor: reuse string normalization in final parser spots

* refactor: reuse string normalization in qa media helpers

* refactor: reuse normalization in provider and media lists

* refactor: reuse normalization for remaining set filters

* refactor: reuse normalization in policy allowlists

* refactor: reuse normalization in session and owner lists

* refactor: centralize primitive string lists

* refactor: reuse lowercase entry helpers

* refactor: reuse sorted string helpers

* refactor: reuse unique trimmed helpers

* refactor: reuse string normalization helpers

* refactor: reuse catalog string helpers

* refactor: reuse remaining string helpers

* refactor: simplify remaining list normalization

* refactor: reuse codex auth order normalization

* chore: refresh plugin sdk api baseline

* fix: make shared string sorting deterministic

* chore: refresh plugin sdk api baseline

* fix: align host env security ordering
2026-05-25 21:20:41 +01:00
YUHAO-corn
177ebdc24c fix(gateway): ignore inherited launchd env for respawn 2026-05-25 19:03:37 +01:00
Dirk
90caa3b610 fix(gateway): clear runtime config snapshot before in-process restart (#86388)
After config.patch writes new values to openclaw.json, a subsequent
SIGUSR1 in-process restart could overwrite them with a stale snapshot.

Root cause: run-loop's onIteration hook resets lanes and task registry,
but leaves the runtimeConfigSnapshot intact. loadConfig() then returns
the old snapshot via loadPinnedRuntimeConfig() instead of re-reading disk.

Fix: clearRuntimeConfigSnapshot() in the restart iteration hook so the
next startup reads fresh config from disk.

Refs #86350
2026-05-25 12:20:47 +01:00
Kaspre
6008375655 fix(gateway): honor restart drain budget for embedded runs
Honor configured restart drain budgets for embedded runs and avoid a second active-work drain after forced deferral timeout restarts.

Includes maintainer changelog entry.
2026-05-24 04:22:27 +01:00
Nyx
2e5be0c7ff fix(gateway): pin relative state dir at startup
* fix(gateway): normalize explicit state dir overrides at startup

* test(gateway): simplify state-dir startup coverage

* test: fix state dir startup coverage

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-23 14:30:32 +01:00
Sebastien Tardif
99e44f623e fix(gateway): add .catch() to SIGTERM/SIGUSR1 signal handlers (#83131)
The SIGTERM handler's fire-and-forget IIFE can reject if the graceful
drain or tunnel-teardown throws. Without a catch, this becomes an
unhandled promise rejection. Add .catch() that logs the error and
falls back to a hard stop request. Same treatment for SIGUSR1.

Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
2026-05-22 19:47:09 +01:00
Vincent Koc
adc6adccd8 fix(update): detect nested macOS gateway ancestry (#85391)
* fix(update): detect nested macOS gateway ancestry

* fix(release): refresh shrinkwrap for CI npm

* fix(update): inherit gateway runtime pid for update guard
2026-05-23 00:00:38 +08:00
Vincent Koc
bdcaac06c6 fix(diagnostics): bound diagnostic buffers 2026-05-22 22:01:41 +08:00
Tung, Hsiao-Yu
4a9138556e fix(gateway): eager-load lifecycle runtime to survive in-place upgrades (#84890)
* fix(gateway): eager-load lifecycle runtime to survive in-place upgrades

After a package-swap update (e.g. via update.run), dist/ chunk hashes
rotate while the gateway is still running. The SIGUSR1 listener's first
dynamic import of the lifecycle runtime module then throws
ERR_MODULE_NOT_FOUND inside its async IIFE, silently rejects, and leaves
restart.ts's emittedRestartToken permanently unconsumed. From that point
every scheduleGatewaySigusr1Restart() — including the one update.run
schedules for itself — returns { coalesced: true } without scheduling
anything, and the gateway never restarts until manually kickstarted.

Fix:

1. Eagerly resolve the lifecycle runtime module as the first statement
   of runGatewayLoop, before any signal listener is installed. lifecycle.runtime
   is a 36-line re-export hub, so loading it once pulls the entire restart
   / respawn / queue / sentinel / handoff graph into memory, immune to
   later disk rotation. If the module is missing at startup, fail fast
   with a loud error so the supervisor can recover instead of running
   half-broken.

2. Defense in depth: catch SIGUSR1 IIFE rejections and call
   markGatewaySigusr1RestartHandled() via the eagerly captured reference,
   so a transient listener failure doesn't permanently stick the restart
   token.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(changelog): mention lifecycle restart eager load

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-05-22 20:44:05 +08:00
Peter Steinberger
4f4d108639 chore(lint): remove underscore-dangle allow list (#83542)
* chore(lint): reduce underscore-dangle exceptions

* chore(lint): reduce more underscore exceptions

* chore(lint): remove underscore-dangle allow list

* fix(lint): repair underscore cleanup regressions

* test(lint): track version define suppression
2026-05-18 14:56:06 +01:00
Peter Steinberger
9bdc183b7d fix(cli): keep subcommand help lightweight 2026-05-18 01:35:04 +01:00
Vincent Koc
7d6e45ef7c fix(qa-lab): clean orphaned gateway runtimes 2026-05-17 19:10:46 +08:00
Vincent Koc
5aac7939db fix(gateway): drain replies during restart close 2026-05-17 18:12:52 +08:00
Peter Steinberger
abb06c6e40 fix(gateway): require auth for exposed startup
Co-authored-by: Coy Geek <65363919+coygeek@users.noreply.github.com>
2026-05-17 05:32:26 +01:00
Josh Avant
99a269f8b4 fix(agents): mark interrupted sessions before restart (#82772)
* fix(agents): mark interrupted sessions before restart

* docs: add restart recovery changelog

* fix(agents): satisfy restart recovery type checks
2026-05-16 18:11:35 -05:00
Peter Steinberger
66c64a29ee fix(gateway): capture opt-in memory pressure snapshots (#82674)
* fix(gateway): persist critical memory pressure bundles

* docs(gateway): add memory pressure troubleshooting

* feat(gateway): gate memory pressure bundles

* feat(gateway): flatten memory pressure bundle config

* feat(gateway): rename memory pressure snapshot config

* fix(gateway): make memory pressure snapshots opt in

* docs(config): refresh config baseline

* fix(config): simplify memory pressure migration default
2026-05-16 21:52:09 +01:00
Peter Steinberger
6369bf64cd fix(gateway): trace restart intent reasons 2026-05-16 21:23:06 +01:00
Vincent Koc
e06782d5e7 fix(gateway): land linked diagnostics fixes
Fix logs.tail credential-header redaction and JSON-mode gateway transport errors.\n\nFixes #66832.\nFixes #79108.\nSupersedes #67041.\nSupersedes #79233.\n\nCo-authored-by: Mil Wang <mingjwan@microsoft.com>\nCo-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com>
2026-05-17 02:05:02 +08:00