Recover idle queued sessions whose diagnostic activity retained stale ownerless model or tool calls by classifying them as recoverable session.stuck after the usual recovery gates. Yield the event loop before stale session-lock process inspection so sync process lookup cannot monopolize lock contention paths.
Docs now describe the widened session.stuck telemetry contract for recoverable stale bookkeeping, including ownerless activity. Thanks @samuelsoaress.
Refs #84903.
Co-authored-by: samuelsoaress <samuelsoares177778@gmail.com>
Summary:
- This replacement PR adds inbound delivery diagnostic events, gateway status counters and warnings, transport ... ut, Prometheus/OpenTelemetry metrics, docs, changelog, and regression coverage for gateway delivery health.
- Reproducibility: no. high-confidence live reproduction of the original Feishu failure was run here. Source i ... ch/turn telemetry, and the source PR supplies after-fix live output for the connected WebChat gateway path.
Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(types): restore PR conflict resolution type checks
Validation:
- ClawSweeper review passed for head 6ffe08a9c7.
- Required merge gates passed before the squash merge.
Prepared head SHA: 6ffe08a9c7
Review: https://github.com/openclaw/openclaw/pull/85016#issuecomment-4510224436
Co-authored-by: Andi Liao <liaoandi95@gmail.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Adds bounded Talk lifecycle/audio diagnostics and session recovery metrics for OTEL, Prometheus, and stability snapshots after the Talk SDK/session refactor. Includes changelog/docs updates and Testbox/live proof.
Two recent commits added user-facing surface that left signature-style
references in docs stale:
- 4428661779 Alvin Tang (#20721, thanks @alvinttang) extends the
configured model 'input' modality set to also accept 'audio' and
'video', matching what providers like LM Studio already report.
docs/plugins/manifest.md model-fields table listed only
'text | image | document', so add 'audio' and 'video'.
- 44da034516 Vincent (thanks @oc-factus) adds a bounded openclaw.agent
attribute on the openclaw.tokens counter so per-agent dashboards can
group usage. docs/gateway/opentelemetry.md metric reference omitted
it; add it to the attrs list.
Logging.md had grown to 487 lines with ~300 lines dedicated to
OpenTelemetry export — wire protocol, full metric/span catalog, env
vars, captureContent shape, sampling, the diagnostic event catalog,
and protocol notes — leaving the genuine logging overview buried
behind exporter reference material.
Move the OTEL surface to a dedicated page and slim logging.md to a
focused logs overview:
- Add docs/gateway/opentelemetry.md (OpenTelemetry export). Same
content reorganized: how it fits together, quick start, signals,
configuration reference + env vars table, privacy/captureContent,
sampling/flushing, full metric and span catalog, diagnostic event
catalog, no-exporter mode, diagnostics flags pointer, disable.
- docs/logging.md: drop the OTEL section in favor of a short
'Diagnostics and OpenTelemetry' summary that cross-links the new
page and the diagnostics-flags page. Drops 273 lines net. Also
drops the redundant body H1, retitles to 'Logging' (was 'Logging
overview' which mismatched sidebar usage), and refreshes the
Related list.
- docs/docs.json: insert gateway/opentelemetry into the
'Health and diagnostics' sidebar group, reorder pages so the user-
facing health/run pages come before exporter/internals pages, and
put logging next to opentelemetry where readers naturally
associate them.
- docs/gateway/diagnostics.md, docs/gateway/logging.md,
docs/gateway/configuration-reference.md: cross-link the new page
and sentence-case stale Title-Cased Related entries on
diagnostics.md.