mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 14:40:43 +00:00
fix: recover stuck diagnostic sessions safely
This commit is contained in:
@@ -920,6 +920,7 @@ Notes:
|
||||
enabled: true,
|
||||
flags: ["telegram.*"],
|
||||
stuckSessionWarnMs: 30000,
|
||||
stuckSessionAbortMs: 600000,
|
||||
|
||||
otel: {
|
||||
enabled: false,
|
||||
@@ -959,6 +960,7 @@ Notes:
|
||||
- `enabled`: master toggle for instrumentation output (default: `true`).
|
||||
- `flags`: array of flag strings enabling targeted log output (supports wildcards like `"telegram.*"` or `"*"`).
|
||||
- `stuckSessionWarnMs`: no-progress age threshold in ms for classifying long-running processing sessions as `session.long_running`, `session.stalled`, or `session.stuck`. Reply, tool, status, block, and ACP progress reset the timer; repeated `session.stuck` diagnostics back off while unchanged.
|
||||
- `stuckSessionAbortMs`: no-progress age threshold in ms before eligible stalled active work may be abort-drained for recovery. When unset, OpenClaw uses the safer extended embedded-run window of at least 10 minutes and 5x `stuckSessionWarnMs`.
|
||||
- `otel.enabled`: enables the OpenTelemetry export pipeline (default: `false`). For the full configuration, signal catalog, and privacy model, see [OpenTelemetry export](/gateway/opentelemetry).
|
||||
- `otel.endpoint`: collector URL for OTel export.
|
||||
- `otel.tracesEndpoint` / `otel.metricsEndpoint` / `otel.logsEndpoint`: optional signal-specific OTLP endpoints. When set, they override `otel.endpoint` for that signal only.
|
||||
|
||||
@@ -216,11 +216,18 @@ OpenClaw classifies sessions by the work it can still observe:
|
||||
still making progress.
|
||||
- `session.stalled`: active work exists, but the active run has not reported
|
||||
recent progress. Stalled embedded runs stay observe-only at first, then
|
||||
abort-drain after at least 10 minutes and 5x `diagnostics.stuckSessionWarnMs`
|
||||
with no progress so queued turns behind the lane can resume.
|
||||
abort-drain after `diagnostics.stuckSessionAbortMs` with no progress so queued
|
||||
turns behind the lane can resume. When unset, the abort threshold defaults to
|
||||
the safer extended window of at least 10 minutes and 5x
|
||||
`diagnostics.stuckSessionWarnMs`.
|
||||
- `session.stuck`: stale session bookkeeping with no active work. This releases
|
||||
the affected session lane immediately.
|
||||
|
||||
Recovery emits structured `session.recovery.requested` and
|
||||
`session.recovery.completed` events. Diagnostic session state is marked idle
|
||||
only after a mutating recovery outcome (`aborted` or `released`) and only if the
|
||||
same processing generation is still current.
|
||||
|
||||
Only `session.stuck` emits the `openclaw.session.stuck` counter, the
|
||||
`openclaw.session.stuck_age_ms` histogram, and the `openclaw.session.stuck`
|
||||
span. Repeated `session.stuck` diagnostics back off while the session remains
|
||||
|
||||
Reference in New Issue
Block a user