diff --git a/docs/gateway/opentelemetry.md b/docs/gateway/opentelemetry.md index 1bab14a7d83..03286ac618d 100644 --- a/docs/gateway/opentelemetry.md +++ b/docs/gateway/opentelemetry.md @@ -196,6 +196,30 @@ When any subkey is enabled, model and tool spans get bounded, redacted - `openclaw.session.stuck_age_ms` (histogram, attrs: `openclaw.state`; emitted only for stale session bookkeeping with no active work) - `openclaw.run.attempt` (counter, attrs: `openclaw.attempt`) +### Session liveness telemetry + +`diagnostics.stuckSessionWarnMs` is the no-progress age threshold for session +liveness diagnostics. A `processing` session does not age toward this threshold +while OpenClaw observes reply, tool, status, block, or ACP runtime progress. +Typing keepalives are not counted as progress, so a silent model or harness can +still be detected. + +OpenClaw classifies sessions by the work it can still observe: + +- `session.long_running`: active embedded work, model calls, or tool calls are + still making progress. +- `session.stalled`: active work exists, but the active run has not reported + recent progress. +- `session.stuck`: stale session bookkeeping with no active work. This is the + only liveness classification that releases the affected session lane. + +Only `session.stuck` emits the `openclaw.session.stuck` counter, the +`openclaw.session.stuck_age_ms` histogram, and the `openclaw.session.stuck` +span. Repeated `session.stuck` diagnostics back off while the session remains +unchanged, so dashboards should alert on sustained increases rather than every +heartbeat tick. For the config knob and defaults, see +[Configuration reference](/gateway/configuration-reference#diagnostics). + ### Harness lifecycle - `openclaw.harness.duration_ms` (histogram, attrs: `openclaw.harness.id`, `openclaw.harness.plugin`, `openclaw.outcome`, `openclaw.harness.phase` on errors)