Commit Graph

26 Commits

Author SHA1 Message Date
Samuel Soares da Silva
286964cd6a fix(diagnostics): recover orphaned session activity
Recover idle queued sessions whose diagnostic activity retained stale ownerless model or tool calls by classifying them as recoverable session.stuck after the usual recovery gates. Yield the event loop before stale session-lock process inspection so sync process lookup cannot monopolize lock contention paths.

Docs now describe the widened session.stuck telemetry contract for recoverable stale bookkeeping, including ownerless activity. Thanks @samuelsoaress.

Refs #84903.

Co-authored-by: samuelsoaress <samuelsoares177778@gmail.com>
2026-05-27 02:47:42 +01:00
Alex Knight
f824e1596a Add OpenTelemetry LLM content spans (#86191)
* feat: add otel llm content spans

* fix: gate otel tool definitions separately

* fix(diagnostics): sanitize tool_call parts and truncate oversized OTEL content attributes

* fix: keep otel content truncation parseable

* fix: simplify codex model diagnostics

* fix(diagnostics): align opt-in GenAI span shape

* test(codex): align resume params after rebase

* fix(diagnostics): keep model content off shared event bus

* test(diagnostics): keep extension tests on sdk boundary

---------

Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-05-26 02:24:02 +01:00
Vincent Koc
ef8619d5f5 fix(diagnostics): expose missing telemetry signals (#86682) 2026-05-26 01:10:59 +01:00
Gaurav Prasad
558a05b6d0 feat(diagnostics): classify skill and tool usage (#80370)
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-05-23 16:08:55 +08:00
Vincent Koc
7f05be041e fix(diagnostics): harden observability exports and smokes (#85371)
* test(diagnostics): widen observability smokes

* fix(diagnostics): sanitize observability exports

* docs(diagnostics): format otel export docs
2026-05-23 15:27:43 +08:00
clawsweeper[bot]
5955f354f7 fix(status): add gateway delivery health telemetry (#85016)
Summary:
- This replacement PR adds inbound delivery diagnostic events, gateway status counters and warnings, transport ... ut, Prometheus/OpenTelemetry metrics, docs, changelog, and regression coverage for gateway delivery health.
- Reproducibility: no. high-confidence live reproduction of the original Feishu failure was run here. Source i ... ch/turn telemetry, and the source PR supplies after-fix live output for the connected WebChat gateway path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(types): restore PR conflict resolution type checks

Validation:
- ClawSweeper review passed for head 6ffe08a9c7.
- Required merge gates passed before the squash merge.

Prepared head SHA: 6ffe08a9c7
Review: https://github.com/openclaw/openclaw/pull/85016#issuecomment-4510224436

Co-authored-by: Andi Liao <liaoandi95@gmail.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
2026-05-21 16:55:29 +00:00
Peter Steinberger
e30be460e1 fix: shorten stalled Codex recovery window 2026-05-15 10:19:37 +01:00
Vincent Koc
e2501b2d6d fix(diagnostics): export Talk metrics after SDK refactor
Adds bounded Talk lifecycle/audio diagnostics and session recovery metrics for OTEL, Prometheus, and stability snapshots after the Talk SDK/session refactor. Includes changelog/docs updates and Testbox/live proof.
2026-05-06 02:01:52 -07:00
Vincent Koc
f4a63940cc docs: typography hygiene across 6 pages
Replaced 74 typography characters (curly quotes, apostrophes, em/en
dashes, non-breaking hyphens) with ASCII equivalents per
docs/CLAUDE.md heading and content hygiene rules.

- docs/gateway/opentelemetry.md: 13 chars
- docs/channels/msteams.md: 13 chars
- docs/tools/skills.md: 12 chars
- docs/start/setup.md: 12 chars
- docs/nodes/location-command.md: 12 chars
- docs/concepts/context-engine.md: 12 chars
2026-05-05 20:34:37 -07:00
Peter Steinberger
761e668acf fix: recover stuck diagnostic sessions safely 2026-05-05 04:01:37 +01:00
Vincent Koc
50da306c0a fix(telemetry): bound message diagnostics labels 2026-05-03 19:02:58 -07:00
Peter Steinberger
9a22473916 fix: recover stalled embedded diagnostic runs 2026-05-03 18:13:15 +01:00
Vincent Koc
c7b5302acf fix(plugins): repair missing clawhub installs 2026-05-02 08:01:37 -07:00
Peter Steinberger
010f7a58a1 build(plugins): externalize acpx release packages 2026-05-02 08:48:28 +01:00
Peter Steinberger
2be441062d docs: clarify session liveness telemetry 2026-05-02 00:55:24 +01:00
Peter Steinberger
32db81ca5c fix: classify session liveness diagnostics 2026-05-02 00:13:58 +01:00
Peter Steinberger
a95da5b52d fix(models): enrich local transport failure diagnostics 2026-04-27 09:25:38 +01:00
Vincent Koc
2194a8c64c docs(logging): document request trace scopes 2026-04-26 14:13:15 -07:00
Vincent Koc
f0566e410a docs(diagnostics): document model call size timing 2026-04-26 13:43:22 -07:00
Vincent Koc
df542f75a9 fix(logging): expose trace fields in file logs 2026-04-26 12:52:04 -07:00
Vincent Koc
19e41a1e69 docs(logging): clarify redaction surfaces 2026-04-26 11:09:56 -07:00
Vincent Koc
a77996dc56 fix(diagnostics): propagate trusted traceparent headers 2026-04-26 00:24:47 -07:00
Vincent Koc
f48dc96d43 docs(opentelemetry): document harness lifecycle metric, span, and diagnostic events from 82ddcf24f5 2026-04-25 23:54:30 -07:00
Vincent Koc
46b9044c3f docs: update model input modalities and OTEL token-metric attrs
Two recent commits added user-facing surface that left signature-style
references in docs stale:

- 4428661779 Alvin Tang (#20721, thanks @alvinttang) extends the
  configured model 'input' modality set to also accept 'audio' and
  'video', matching what providers like LM Studio already report.
  docs/plugins/manifest.md model-fields table listed only
  'text | image | document', so add 'audio' and 'video'.
- 44da034516 Vincent (thanks @oc-factus) adds a bounded openclaw.agent
  attribute on the openclaw.tokens counter so per-agent dashboards can
  group usage. docs/gateway/opentelemetry.md metric reference omitted
  it; add it to the attrs list.
2026-04-25 21:39:44 -07:00
Vincent Koc
2495585a32 feat(diagnostics-otel): add exporter health diagnostics
Adds diagnostics-otel exporter health events and signal-specific endpoint wiring, with docs and config schema coverage.
2026-04-25 18:34:44 -07:00
Vincent Koc
7741dbb759 docs: split OpenTelemetry export into its own page under gateway
Logging.md had grown to 487 lines with ~300 lines dedicated to
OpenTelemetry export — wire protocol, full metric/span catalog, env
vars, captureContent shape, sampling, the diagnostic event catalog,
and protocol notes — leaving the genuine logging overview buried
behind exporter reference material.

Move the OTEL surface to a dedicated page and slim logging.md to a
focused logs overview:

- Add docs/gateway/opentelemetry.md (OpenTelemetry export). Same
  content reorganized: how it fits together, quick start, signals,
  configuration reference + env vars table, privacy/captureContent,
  sampling/flushing, full metric and span catalog, diagnostic event
  catalog, no-exporter mode, diagnostics flags pointer, disable.
- docs/logging.md: drop the OTEL section in favor of a short
  'Diagnostics and OpenTelemetry' summary that cross-links the new
  page and the diagnostics-flags page. Drops 273 lines net. Also
  drops the redundant body H1, retitles to 'Logging' (was 'Logging
  overview' which mismatched sidebar usage), and refreshes the
  Related list.
- docs/docs.json: insert gateway/opentelemetry into the
  'Health and diagnostics' sidebar group, reorder pages so the user-
  facing health/run pages come before exporter/internals pages, and
  put logging next to opentelemetry where readers naturally
  associate them.
- docs/gateway/diagnostics.md, docs/gateway/logging.md,
  docs/gateway/configuration-reference.md: cross-link the new page
  and sentence-case stale Title-Cased Related entries on
  diagnostics.md.
2026-04-25 16:46:53 -07:00