`session_end` was only fired when a session was replaced, reset, deleted, or
compacted -- the gateway shutdown/restart paths closed the process without
enumerating active sessions, so downstream `session_end` plugins
(e.g. claude-mem) accumulated ghost rows in `active` state across restarts.
Issue reporter saw 11 orphaned sessions cause 63 timeouts/day from agent
pool exhaustion.
Add an in-memory active-session tracker
(`src/gateway/active-sessions-shutdown-tracker.ts`) populated by
`emitGatewaySessionStartPluginHook` and forgotten unconditionally by
`emitGatewaySessionEndPluginHook` (even when no plugin listens), so any
session that has already been finalized through the normal lifecycle is
never re-fired by the shutdown drain. The close handler then calls a new
`drainActiveSessionsForShutdown({ reason })` in `session-reset-service.ts`
between the `gateway:shutdown`/`gateway:pre-restart` lifecycle hooks and
the subsystem teardown steps; the drain races a bounded 2 s total timeout
so a slow plugin cannot block SIGTERM/SIGINT, surfacing the timeout as a
`session-end-drain` warning on the shutdown result.
Extend `PluginHookSessionEndReason` with `"shutdown"` and `"restart"` so
plugins can distinguish a graceful close from a planned restart; the close
handler picks `restart` when `restartExpectedMs` is set and `shutdown`
otherwise. Update `emitGatewaySessionStartPluginHook` to also accept
`storePath`, `sessionFile`, and `agentId` so the shutdown drain can build
the same `session_end` payload shape the normal lifecycle path emits, and
update the existing call sites in `session-reset-service.ts` and
`server-methods/sessions.ts` to pass those fields through.
Tests:
- `src/gateway/active-sessions-shutdown-tracker.test.ts` (new) -- tracker
insert/forget/clear semantics, idempotent re-noting, empty-id guard,
snapshot isolation.
- `src/gateway/drain-active-sessions-for-shutdown.test.ts` (new) -- drain
fires `session_end` with the right reason for every tracked session,
skips sessions already finalized via reset/delete/compaction, and still
forgets sessions even when no `session_end` plugin is registered.
- `src/gateway/server-close.test.ts` -- four new cases covering the
shutdown/restart drain wiring, the bounded timeout warning, and the
drain-skipped-when-no-helper case.
Docs:
- `docs/plugins/hooks.md` documents the new `shutdown`/`restart` values
on `PluginHookSessionEndReason`.
- `docs/automation/hooks.md` documents the post-`gateway:shutdown`
`session_end` drain step and its bounded execution guarantee.
Fixes#57790.
Summary:
- The PR adds a `before_agent_run` plugin hook with pass/block decisions, redacted blocked-turn persistence, diagnostics/docs/changelog updates, and focused runner, gateway, session, and plugin tests.
- Reproducibility: not applicable. as a feature PR rather than a current-main bug report. Current main lacks ` ... un`, while the PR head adds source coverage and copied live Gateway/WebChat log proof for the new behavior.
Automerge notes:
- PR branch already contained follow-up commit before automerge: fix: trim before agent hook PR scope
- PR branch already contained follow-up commit before automerge: fix: keep before-agent blocks redacted
- PR branch already contained follow-up commit before automerge: fix: keep runtime context out of model prompt
- PR branch already contained follow-up commit before automerge: docs: refresh config baseline after rebase
- PR branch already contained follow-up commit before automerge: fix: align blocked turn clients with redacted content
- PR branch already contained follow-up commit before automerge: fix: remove out-of-scope client block UI changes
Validation:
- ClawSweeper review passed for head 767e46fde8.
- Required merge gates passed before the squash merge.
Prepared head SHA: 767e46fde8
Review: https://github.com/openclaw/openclaw/pull/75035#issuecomment-4351843275
Co-authored-by: Jesse Merhi <jessejmerhi@gmail.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
For 891c7d9f1c: docs/plugins/hooks.md "Quick start" now lists the `priority`
and new `timeoutMs` opts that `api.on(...)` accepts, explaining that the
per-hook budget aborts a slow handler instead of letting plugin setup or
recall work consume the caller's configured model timeout. The change is
traceable to the new `OpenClawPluginApi.on` `{ priority?; timeoutMs? }`
signature and `PluginHookRegistration.timeoutMs` field added in the same
SHA.
- docs/plugins/hooks.md: add `cron_changed` to the Lifecycle hook catalog and
a Gateway lifecycle paragraph describing its typed event payload, run
status, delivery status, and removed-event job snapshot, so plugin authors
picking up f155a5f955 (#72773) have a canonical reference beyond the
sdk-overview bullet that already shipped in the same SHA.
- docs/help/environment.md: add a "Legacy environment variables" section for
aa1834a3ff so users see that `CLAWDBOT_*` and `MOLTBOT_*` prefixes are now
ignored and trigger an `OPENCLAW_LEGACY_ENV_VARS` deprecation warning,
with a rename example to `OPENCLAW_*`.
Scott Glover's commit 371b69b3e2 ('Expose cron jobId in plugin hook
context') added an optional jobId field on PluginHookAgentContext,
populated for cron-driven runs. The commit shipped without a docs
update or CHANGELOG entry, so plugin authors had no visible signal
that the new ctx.jobId field exists.
Surface ctx.jobId in two existing hook context references in
docs/plugins/hooks.md: the before_tool_call ctx-fields list, and the
runId/agent-lifecycle paragraph that already names ctx.runId — extend
it to note ctx.jobId on cron-driven runs and what plugins can do with
it (scope metrics, side effects, or state to a scheduled job).
Expose first-class hook correlation fields for plugin message and run lifecycle hooks, including frozen diagnostic trace copies for plugin-facing events.