docs: split OpenTelemetry export into its own page under gateway

Logging.md had grown to 487 lines with ~300 lines dedicated to OpenTelemetry export — wire protocol, full metric/span catalog, env vars, captureContent shape, sampling, the diagnostic event catalog, and protocol notes — leaving the genuine logging overview buried behind exporter reference material. Move the OTEL surface to a dedicated page and slim logging.md to a focused logs overview: - Add docs/gateway/opentelemetry.md (OpenTelemetry export). Same content reorganized: how it fits together, quick start, signals, configuration reference + env vars table, privacy/captureContent, sampling/flushing, full metric and span catalog, diagnostic event catalog, no-exporter mode, diagnostics flags pointer, disable. - docs/logging.md: drop the OTEL section in favor of a short 'Diagnostics and OpenTelemetry' summary that cross-links the new page and the diagnostics-flags page. Drops 273 lines net. Also drops the redundant body H1, retitles to 'Logging' (was 'Logging overview' which mismatched sidebar usage), and refreshes the Related list. - docs/docs.json: insert gateway/opentelemetry into the 'Health and diagnostics' sidebar group, reorder pages so the user- facing health/run pages come before exporter/internals pages, and put logging next to opentelemetry where readers naturally associate them. - docs/gateway/diagnostics.md, docs/gateway/logging.md, docs/gateway/configuration-reference.md: cross-link the new page and sentence-case stale Title-Cased Related entries on diagnostics.md.
2026-05-06 06:50:43 +00:00 · 2026-04-25 16:46:43 -07:00
parent a1090b6043
commit 7741dbb759
6 changed files with 341 additions and 307 deletions
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -1436,11 +1436,12 @@
                    "group": "Health and diagnostics",
                    "pages": [
                      "gateway/health",
-                      "gateway/diagnostics",
                      "gateway/heartbeat",
                      "gateway/doctor",
-                      "gateway/logging",
                      "logging",
+                      "gateway/opentelemetry",
+                      "gateway/logging",
+                      "gateway/diagnostics",
                      "gateway/troubleshooting"
                    ]
                  },
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@@ -909,7 +909,7 @@ Notes:
 - `enabled`: master toggle for instrumentation output (default: `true`).
 - `flags`: array of flag strings enabling targeted log output (supports wildcards like `"telegram.*"` or `"*"`).
 - `stuckSessionWarnMs`: age threshold in ms for emitting stuck-session warnings while a session remains in processing state.
- `otel.enabled`: enables the OpenTelemetry export pipeline (default: `false`).
+- `otel.enabled`: enables the OpenTelemetry export pipeline (default: `false`). For the full configuration, signal catalog, and privacy model, see [OpenTelemetry export](/gateway/opentelemetry).
 - `otel.endpoint`: collector URL for OTel export.
 - `otel.protocol`: `"http/protobuf"` (default) or `"grpc"`.
 - `otel.headers`: extra HTTP/gRPC metadata headers sent with OTel export requests.
--- a/docs/gateway/diagnostics.md
+++ b/docs/gateway/diagnostics.md
@@ -129,9 +129,10 @@ diagnostic event collection:
 Disabling diagnostics reduces bug-report detail. It does not affect normal
 Gateway logging.

-## Related docs
+## Related

- [Health Checks](/gateway/health)
+- [Health checks](/gateway/health)
 - [Gateway CLI](/cli/gateway#gateway-diagnostics-export)
- [Gateway Protocol](/gateway/protocol#system-and-identity)
+- [Gateway protocol](/gateway/protocol#system-and-identity)
 - [Logging](/logging)
+- [OpenTelemetry export](/gateway/opentelemetry) — separate flow for streaming diagnostics to a collector
--- a/docs/gateway/logging.md
+++ b/docs/gateway/logging.md
@@ -114,5 +114,6 @@ This keeps existing file logs stable while making interactive output scannable.

 ## Related

- [Logging overview](/logging)
+- [Logging](/logging)
+- [OpenTelemetry export](/gateway/opentelemetry)
 - [Diagnostics export](/gateway/diagnostics)
--- a/docs/gateway/opentelemetry.md
+++ b/docs/gateway/opentelemetry.md
@@ -0,0 +1,304 @@
+---
+summary: "Export OpenClaw diagnostics to any OpenTelemetry collector via the diagnostics-otel plugin (OTLP/HTTP)"
+title: "OpenTelemetry export"
+read_when:
+  - You want to send OpenClaw model usage, message flow, or session metrics to an OpenTelemetry collector
+  - You are wiring traces, metrics, or logs into Grafana, Datadog, Honeycomb, New Relic, Tempo, or another OTLP backend
+  - You need the exact metric names, span names, or attribute shapes to build dashboards or alerts
+---
+
+OpenClaw exports diagnostics through the bundled `diagnostics-otel` plugin
+using **OTLP/HTTP (protobuf)**. Any collector or backend that accepts OTLP/HTTP
+works without code changes. For local file logs and how to read them, see
+[Logging](/logging).
+
+## How it fits together
+
+- **Diagnostics events** are structured, in-process records emitted by the
+  Gateway and bundled plugins for model runs, message flow, sessions, queues,
+  and exec.
+- **`diagnostics-otel` plugin** subscribes to those events and exports them as
+  OpenTelemetry **metrics**, **traces**, and **logs** over OTLP/HTTP.
+- Exporters only attach when both the diagnostics surface and the plugin are
+  enabled, so the in-process cost stays near zero by default.
+
+## Quick start
+
+```json5
+{
+  plugins: {
+    allow: ["diagnostics-otel"],
+    entries: {
+      "diagnostics-otel": { enabled: true },
+    },
+  },
+  diagnostics: {
+    enabled: true,
+    otel: {
+      enabled: true,
+      endpoint: "http://otel-collector:4318",
+      protocol: "http/protobuf",
+      serviceName: "openclaw-gateway",
+      traces: true,
+      metrics: true,
+      logs: true,
+      sampleRate: 0.2,
+      flushIntervalMs: 60000,
+    },
+  },
+}
+```
+
+You can also enable the plugin from the CLI:
+
+```bash
+openclaw plugins enable diagnostics-otel
+```
+
+<Note>
+`protocol` currently supports `http/protobuf` only. `grpc` is ignored.
+</Note>
+
+## Signals exported
+
+| Signal      | What goes in it                                                                                                                   |
+| ----------- | --------------------------------------------------------------------------------------------------------------------------------- |
+| **Metrics** | Counters and histograms for token usage, cost, run duration, message flow, queue lanes, session state, exec, and memory pressure. |
+| **Traces**  | Spans for model usage, model calls, tool execution, exec, webhook/message processing, context assembly, and tool loops.           |
+| **Logs**    | Structured `logging.file` records exported over OTLP when `diagnostics.otel.logs` is enabled.                                     |
+
+Toggle `traces`, `metrics`, and `logs` independently. All three default to on
+when `diagnostics.otel.enabled` is true.
+
+## Configuration reference
+
+```json5
+{
+  diagnostics: {
+    enabled: true,
+    otel: {
+      enabled: true,
+      endpoint: "http://otel-collector:4318",
+      protocol: "http/protobuf", // grpc is ignored
+      serviceName: "openclaw-gateway",
+      headers: { "x-collector-token": "..." },
+      traces: true,
+      metrics: true,
+      logs: true,
+      sampleRate: 0.2, // root-span sampler, 0.0..1.0
+      flushIntervalMs: 60000, // metric export interval (min 1000ms)
+      captureContent: {
+        enabled: false,
+        inputMessages: false,
+        outputMessages: false,
+        toolInputs: false,
+        toolOutputs: false,
+        systemPrompt: false,
+      },
+    },
+  },
+}
+```
+
+### Environment variables
+
+| Variable                        | Purpose                                                                                                                                                                                                                                    |
+| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `OTEL_EXPORTER_OTLP_ENDPOINT`   | Override `diagnostics.otel.endpoint`. If the value already contains `/v1/traces`, `/v1/metrics`, or `/v1/logs`, it is used as-is.                                                                                                          |
+| `OTEL_SERVICE_NAME`             | Override `diagnostics.otel.serviceName`.                                                                                                                                                                                                   |
+| `OTEL_EXPORTER_OTLP_PROTOCOL`   | Override the wire protocol (only `http/protobuf` is honored today).                                                                                                                                                                        |
+| `OTEL_SEMCONV_STABILITY_OPT_IN` | Set to `gen_ai_latest_experimental` to emit the latest experimental GenAI span attribute (`gen_ai.provider.name`) instead of the legacy `gen_ai.system`. GenAI metrics always use bounded, low-cardinality semantic attributes regardless. |
+| `OPENCLAW_OTEL_PRELOADED`       | Set to `1` when another preload or host process already registered the global OpenTelemetry SDK. The plugin then skips its own NodeSDK lifecycle but still wires diagnostic listeners and honors `traces`/`metrics`/`logs`.                |
+
+## Privacy and content capture
+
+Raw model/tool content is **not** exported by default. Spans carry bounded
+identifiers (channel, provider, model, error category, hash-only request ids)
+and never include prompt text, response text, tool inputs, tool outputs, or
+session keys.
+
+Set `diagnostics.otel.captureContent.*` to `true` only when your collector and
+retention policy are approved for prompt, response, tool, or system-prompt
+text. Each subkey is opt-in independently:
+
+- `inputMessages` — user prompt content.
+- `outputMessages` — model response content.
+- `toolInputs` — tool argument payloads.
+- `toolOutputs` — tool result payloads.
+- `systemPrompt` — assembled system/developer prompt.
+
+When any subkey is enabled, model and tool spans get bounded, redacted
+`openclaw.content.*` attributes for that class only.
+
+## Sampling and flushing
+
+- **Traces:** `diagnostics.otel.sampleRate` (root-span only, `0.0` drops all,
+  `1.0` keeps all).
+- **Metrics:** `diagnostics.otel.flushIntervalMs` (minimum `1000`).
+- **Logs:** OTLP logs respect `logging.level` (file log level). Console
+  redaction does **not** apply to OTLP logs. High-volume installs should
+  prefer OTLP collector sampling/filtering over local sampling.
+
+## Exported metrics
+
+### Model usage
+
+- `openclaw.tokens` (counter, attrs: `openclaw.token`, `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
+- `openclaw.cost.usd` (counter, attrs: `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
+- `openclaw.run.duration_ms` (histogram, attrs: `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
+- `openclaw.context.tokens` (histogram, attrs: `openclaw.context`, `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
+- `gen_ai.client.token.usage` (histogram, GenAI semantic-conventions metric, attrs: `gen_ai.token.type` = `input`/`output`, `gen_ai.provider.name`, `gen_ai.operation.name`, `gen_ai.request.model`)
+- `gen_ai.client.operation.duration` (histogram, seconds, GenAI semantic-conventions metric, attrs: `gen_ai.provider.name`, `gen_ai.operation.name`, `gen_ai.request.model`, optional `error.type`)
+
+### Message flow
+
+- `openclaw.webhook.received` (counter, attrs: `openclaw.channel`, `openclaw.webhook`)
+- `openclaw.webhook.error` (counter, attrs: `openclaw.channel`, `openclaw.webhook`)
+- `openclaw.webhook.duration_ms` (histogram, attrs: `openclaw.channel`, `openclaw.webhook`)
+- `openclaw.message.queued` (counter, attrs: `openclaw.channel`, `openclaw.source`)
+- `openclaw.message.processed` (counter, attrs: `openclaw.channel`, `openclaw.outcome`)
+- `openclaw.message.duration_ms` (histogram, attrs: `openclaw.channel`, `openclaw.outcome`)
+- `openclaw.message.delivery.started` (counter, attrs: `openclaw.channel`, `openclaw.delivery.kind`)
+- `openclaw.message.delivery.duration_ms` (histogram, attrs: `openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`, `openclaw.errorCategory`)
+
+### Queues and sessions
+
+- `openclaw.queue.lane.enqueue` (counter, attrs: `openclaw.lane`)
+- `openclaw.queue.lane.dequeue` (counter, attrs: `openclaw.lane`)
+- `openclaw.queue.depth` (histogram, attrs: `openclaw.lane` or `openclaw.channel=heartbeat`)
+- `openclaw.queue.wait_ms` (histogram, attrs: `openclaw.lane`)
+- `openclaw.session.state` (counter, attrs: `openclaw.state`, `openclaw.reason`)
+- `openclaw.session.stuck` (counter, attrs: `openclaw.state`)
+- `openclaw.session.stuck_age_ms` (histogram, attrs: `openclaw.state`)
+- `openclaw.run.attempt` (counter, attrs: `openclaw.attempt`)
+
+### Exec
+
+- `openclaw.exec.duration_ms` (histogram, attrs: `openclaw.exec.target`, `openclaw.exec.mode`, `openclaw.outcome`, `openclaw.failureKind`)
+
+### Diagnostics internals (memory and tool loop)
+
+- `openclaw.memory.heap_used_bytes` (histogram, attrs: `openclaw.memory.kind`)
+- `openclaw.memory.rss_bytes` (histogram)
+- `openclaw.memory.pressure` (counter, attrs: `openclaw.memory.level`)
+- `openclaw.tool.loop.iterations` (counter, attrs: `openclaw.toolName`, `openclaw.outcome`)
+- `openclaw.tool.loop.duration_ms` (histogram, attrs: `openclaw.toolName`, `openclaw.outcome`)
+
+## Exported spans
+
+- `openclaw.model.usage`
+  - `openclaw.channel`, `openclaw.provider`, `openclaw.model`
+  - `openclaw.tokens.*` (input/output/cache_read/cache_write/total)
+  - `gen_ai.system` by default, or `gen_ai.provider.name` when the latest GenAI semantic conventions are opted in
+  - `gen_ai.request.model`, `gen_ai.operation.name`, `gen_ai.usage.*`
+- `openclaw.run`
+  - `openclaw.outcome`, `openclaw.channel`, `openclaw.provider`, `openclaw.model`, `openclaw.errorCategory`
+- `openclaw.model.call`
+  - `gen_ai.system` by default, or `gen_ai.provider.name` when the latest GenAI semantic conventions are opted in
+  - `gen_ai.request.model`, `gen_ai.operation.name`, `openclaw.provider`, `openclaw.model`, `openclaw.api`, `openclaw.transport`
+  - `openclaw.provider.request_id_hash` (bounded SHA-based hash of the upstream provider request id; raw ids are not exported)
+- `openclaw.tool.execution`
+  - `gen_ai.tool.name`, `openclaw.toolName`, `openclaw.errorCategory`, `openclaw.tool.params.*`
+- `openclaw.exec`
+  - `openclaw.exec.target`, `openclaw.exec.mode`, `openclaw.outcome`, `openclaw.failureKind`, `openclaw.exec.command_length`, `openclaw.exec.exit_code`, `openclaw.exec.timed_out`
+- `openclaw.webhook.processed`
+  - `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`
+- `openclaw.webhook.error`
+  - `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`, `openclaw.error`
+- `openclaw.message.processed`
+  - `openclaw.channel`, `openclaw.outcome`, `openclaw.chatId`, `openclaw.messageId`, `openclaw.reason`
+- `openclaw.message.delivery`
+  - `openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`, `openclaw.errorCategory`, `openclaw.delivery.result_count`
+- `openclaw.session.stuck`
+  - `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`
+- `openclaw.context.assembled`
+  - `openclaw.prompt.size`, `openclaw.history.size`, `openclaw.context.tokens`, `openclaw.errorCategory` (no prompt, history, response, or session-key content)
+- `openclaw.tool.loop`
+  - `openclaw.toolName`, `openclaw.outcome`, `openclaw.iterations`, `openclaw.errorCategory` (no loop messages, params, or tool output)
+- `openclaw.memory.pressure`
+  - `openclaw.memory.level`, `openclaw.memory.heap_used_bytes`, `openclaw.memory.rss_bytes`
+
+When content capture is explicitly enabled, model and tool spans can also
+include bounded, redacted `openclaw.content.*` attributes for the specific
+content classes you opted into.
+
+## Diagnostic event catalog
+
+The events below back the metrics and spans above. Plugins can also subscribe
+to them directly without OTLP export.
+
+**Model usage**
+
+- `model.usage` — tokens, cost, duration, context, provider/model/channel,
+  session ids. `usage` is provider/turn accounting for cost and telemetry;
+  `context.used` is the current prompt/context snapshot and can be lower than
+  provider `usage.total` when cached input or tool-loop calls are involved.
+
+**Message flow**
+
+- `webhook.received` / `webhook.processed` / `webhook.error`
+- `message.queued` / `message.processed`
+- `message.delivery.started` / `message.delivery.completed` / `message.delivery.error`
+
+**Queue and session**
+
+- `queue.lane.enqueue` / `queue.lane.dequeue`
+- `session.state` / `session.stuck`
+- `run.attempt`
+- `diagnostic.heartbeat` (aggregate counters: webhooks/queue/session)
+
+**Exec**
+
+- `exec.process.completed` — terminal outcome, duration, target, mode, exit
+  code, and failure kind. Command text and working directories are not
+  included.
+
+## Without an exporter
+
+You can keep diagnostics events available to plugins or custom sinks without
+running `diagnostics-otel`:
+
+```json5
+{
+  diagnostics: { enabled: true },
+}
+```
+
+For targeted debug output without raising `logging.level`, use diagnostics
+flags. Flags are case-insensitive and support wildcards (e.g. `telegram.*` or
+`*`):
+
+```json5
+{
+  diagnostics: { flags: ["telegram.http"] },
+}
+```
+
+Or as a one-off env override:
+
+```bash
+OPENCLAW_DIAGNOSTICS=telegram.http,telegram.payload openclaw gateway
+```
+
+Flag output goes to the standard log file (`logging.file`) and is still
+redacted by `logging.redactSensitive`. Full guide:
+[Diagnostics flags](/diagnostics/flags).
+
+## Disable
+
+```json5
+{
+  diagnostics: { otel: { enabled: false } },
+}
+```
+
+You can also leave `diagnostics-otel` out of `plugins.allow`, or run
+`openclaw plugins disable diagnostics-otel`.
+
+## Related
+
+- [Logging](/logging) — file logs, console output, CLI tailing, and the Control UI Logs tab
+- [Gateway logging internals](/gateway/logging) — WS log styles, subsystem prefixes, and console capture
+- [Diagnostics flags](/diagnostics/flags) — targeted debug-log flags
+- [Diagnostics export](/gateway/diagnostics) — operator support-bundle tool (separate from OTEL export)
+- [Configuration reference](/gateway/configuration-reference#diagnostics) — full `diagnostics.*` field reference
--- a/docs/logging.md
+++ b/docs/logging.md
@@ -1,14 +1,12 @@
 ---
-summary: "Logging overview: file logs, console output, CLI tailing, and the Control UI"
+summary: "File logs, console output, CLI tailing, and the Control UI Logs tab"
 read_when:
-  - You need a beginner-friendly overview of logging
-  - You want to configure log levels or formats
+  - You need a beginner-friendly overview of OpenClaw logging
+  - You want to configure log levels, formats, or redaction
  - You are troubleshooting and need to find logs quickly
-title: "Logging overview"
+title: "Logging"
 ---

-# Logging
-
 OpenClaw has two main log surfaces:

 - **File logs** (JSON lines) written by the Gateway.
@@ -171,308 +169,35 @@ Tool summaries can redact sensitive tokens before they hit the console:

 Redaction affects **console output only** and does not alter file logs.

-## Diagnostics + OpenTelemetry
+## Diagnostics and OpenTelemetry

-Diagnostics are structured, machine-readable events for model runs **and**
+Diagnostics are structured, machine-readable events for model runs and
 message-flow telemetry (webhooks, queueing, session state). They do **not**
-replace logs; they exist to feed metrics, traces, and other exporters.
+replace logs — they feed metrics, traces, and exporters. Events are emitted
+in-process whether or not you export them.

-Diagnostics events are emitted in-process, but exporters only attach when
-diagnostics + the exporter plugin are enabled.
+Two adjacent surfaces:

-### OpenTelemetry vs OTLP
+- **OpenTelemetry export** — send metrics, traces, and logs over OTLP/HTTP to
+  any OpenTelemetry-compatible collector or backend (Grafana, Datadog,
+  Honeycomb, New Relic, Tempo, etc.). Full configuration, signal catalog,
+  metric/span names, env vars, and privacy model live on a dedicated page:
+  [OpenTelemetry export](/gateway/opentelemetry).
+- **Diagnostics flags** — targeted debug-log flags that route extra logs to
+  `logging.file` without raising `logging.level`. Flags are case-insensitive
+  and support wildcards (`telegram.*`, `*`). Configure under `diagnostics.flags`
+  or via the `OPENCLAW_DIAGNOSTICS=...` env override. Full guide:
+  [Diagnostics flags](/diagnostics/flags).

- **OpenTelemetry (OTel)**: the data model + SDKs for traces, metrics, and logs.
- **OTLP**: the wire protocol used to export OTel data to a collector/backend.
- OpenClaw exports via **OTLP/HTTP (protobuf)** today.
+To enable diagnostics events for plugins or custom sinks without OTLP export:

-### Signals exported
-
- **Metrics**: counters + histograms (token usage, message flow, queueing).
- **Traces**: spans for model usage + webhook/message processing.
- **Logs**: exported over OTLP when `diagnostics.otel.logs` is enabled. Log
-  volume can be high; keep `logging.level` and exporter filters in mind.
-
-### Diagnostic event catalog
-
-Model usage:
-
- `model.usage`: tokens, cost, duration, context, provider/model/channel, session ids.
-  `usage` is provider/turn accounting for cost and telemetry; `context.used`
-  is the current prompt/context snapshot and can be lower than provider
-  `usage.total` when cached input or tool-loop calls are involved.
-
-Message flow:
-
- `webhook.received`: webhook ingress per channel.
- `webhook.processed`: webhook handled + duration.
- `webhook.error`: webhook handler errors.
- `message.queued`: message enqueued for processing.
- `message.processed`: outcome + duration + optional error.
- `message.delivery.started`: outbound delivery attempt started.
- `message.delivery.completed`: outbound delivery attempt finished + duration/result count.
- `message.delivery.error`: outbound delivery attempt failed + duration/bounded error category.
-
-Queue + session:
-
- `queue.lane.enqueue`: command queue lane enqueue + depth.
- `queue.lane.dequeue`: command queue lane dequeue + wait time.
- `session.state`: session state transition + reason.
- `session.stuck`: session stuck warning + age.
- `run.attempt`: run retry/attempt metadata.
- `diagnostic.heartbeat`: aggregate counters (webhooks/queue/session).
-
-Exec:
-
- `exec.process.completed`: terminal exec process outcome, duration, target, mode,
-  exit code, and failure kind. Command text and working directories are not
-  included.
-
-### Enable diagnostics (no exporter)
-
-Use this if you want diagnostics events available to plugins or custom sinks:
-
-```json
+```json5
 {
-  "diagnostics": {
-    "enabled": true
-  }
+  diagnostics: { enabled: true },
 }
 ```

-### Diagnostics flags (targeted logs)
-
-Use flags to turn on extra, targeted debug logs without raising `logging.level`.
-Flags are case-insensitive and support wildcards (e.g. `telegram.*` or `*`).
-
-```json
-{
-  "diagnostics": {
-    "flags": ["telegram.http"]
-  }
-}
-```
-
-Env override (one-off):
-
-```
-OPENCLAW_DIAGNOSTICS=telegram.http,telegram.payload
-```
-
-Notes:
-
- Flag logs go to the standard log file (same as `logging.file`).
- Output is still redacted according to `logging.redactSensitive`.
- Full guide: [/diagnostics/flags](/diagnostics/flags).
-
-### Export to OpenTelemetry
-
-Diagnostics can be exported via the `diagnostics-otel` plugin (OTLP/HTTP). This
-works with any OpenTelemetry collector/backend that accepts OTLP/HTTP.
-
-```json
-{
-  "plugins": {
-    "allow": ["diagnostics-otel"],
-    "entries": {
-      "diagnostics-otel": {
-        "enabled": true
-      }
-    }
-  },
-  "diagnostics": {
-    "enabled": true,
-    "otel": {
-      "enabled": true,
-      "endpoint": "http://otel-collector:4318",
-      "protocol": "http/protobuf",
-      "serviceName": "openclaw-gateway",
-      "traces": true,
-      "metrics": true,
-      "logs": true,
-      "sampleRate": 0.2,
-      "flushIntervalMs": 60000,
-      "captureContent": {
-        "enabled": false,
-        "inputMessages": false,
-        "outputMessages": false,
-        "toolInputs": false,
-        "toolOutputs": false,
-        "systemPrompt": false
-      }
-    }
-  }
-}
-```
-
-Notes:
-
- You can also enable the plugin with `openclaw plugins enable diagnostics-otel`.
- `protocol` currently supports `http/protobuf` only. `grpc` is ignored.
- Metrics include token usage, cost, context size, run duration, and message-flow
-  counters/histograms (webhooks, queueing, session state, queue depth/wait),
-  plus GenAI token usage and model-call duration histograms.
- Traces/metrics can be toggled with `traces` / `metrics` (default: on). Traces
-  include model usage spans plus webhook/message processing spans when enabled.
- Raw model/tool content is not exported by default. Use
-  `diagnostics.otel.captureContent` only when your collector and retention policy
-  are approved for prompt, response, tool, or system prompt text.
- Set `headers` when your collector requires auth.
- Environment variables supported: `OTEL_EXPORTER_OTLP_ENDPOINT`,
-  `OTEL_SERVICE_NAME`, `OTEL_EXPORTER_OTLP_PROTOCOL`.
- Set `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental` to emit the
-  latest experimental GenAI provider span attribute (`gen_ai.provider.name`)
-  instead of the legacy span attribute (`gen_ai.system`). GenAI metrics always
-  use bounded, low-cardinality semantic attributes.
- Set `OPENCLAW_OTEL_PRELOADED=1` when another preload or host process already
-  registered the global OpenTelemetry SDK. In that mode the plugin does not start
-  or shut down its own SDK, but it still wires OpenClaw diagnostic listeners and
-  honors `diagnostics.otel.traces`, `metrics`, and `logs`.
-
-### Exported metrics (names + types)
-
-Model usage:
-
- `openclaw.tokens` (counter, attrs: `openclaw.token`, `openclaw.channel`,
-  `openclaw.provider`, `openclaw.model`)
- `openclaw.cost.usd` (counter, attrs: `openclaw.channel`, `openclaw.provider`,
-  `openclaw.model`)
- `openclaw.run.duration_ms` (histogram, attrs: `openclaw.channel`,
-  `openclaw.provider`, `openclaw.model`)
- `openclaw.context.tokens` (histogram, attrs: `openclaw.context`,
-  `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
- `gen_ai.client.token.usage` (histogram, GenAI semantic-conventions metric,
-  attrs: `gen_ai.token.type` = `input`/`output`, `gen_ai.provider.name`,
-  `gen_ai.operation.name`, `gen_ai.request.model`)
- `gen_ai.client.operation.duration` (histogram, seconds, GenAI
-  semantic-conventions metric, attrs: `gen_ai.provider.name`,
-  `gen_ai.operation.name`, `gen_ai.request.model`, optional `error.type`)
-
-Message flow:
-
- `openclaw.webhook.received` (counter, attrs: `openclaw.channel`,
-  `openclaw.webhook`)
- `openclaw.webhook.error` (counter, attrs: `openclaw.channel`,
-  `openclaw.webhook`)
- `openclaw.webhook.duration_ms` (histogram, attrs: `openclaw.channel`,
-  `openclaw.webhook`)
- `openclaw.message.queued` (counter, attrs: `openclaw.channel`,
-  `openclaw.source`)
- `openclaw.message.processed` (counter, attrs: `openclaw.channel`,
-  `openclaw.outcome`)
- `openclaw.message.duration_ms` (histogram, attrs: `openclaw.channel`,
-  `openclaw.outcome`)
- `openclaw.message.delivery.started` (counter, attrs: `openclaw.channel`,
-  `openclaw.delivery.kind`)
- `openclaw.message.delivery.duration_ms` (histogram, attrs:
-  `openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`,
-  `openclaw.errorCategory`)
-
-Queues + sessions:
-
- `openclaw.queue.lane.enqueue` (counter, attrs: `openclaw.lane`)
- `openclaw.queue.lane.dequeue` (counter, attrs: `openclaw.lane`)
- `openclaw.queue.depth` (histogram, attrs: `openclaw.lane` or
-  `openclaw.channel=heartbeat`)
- `openclaw.queue.wait_ms` (histogram, attrs: `openclaw.lane`)
- `openclaw.session.state` (counter, attrs: `openclaw.state`, `openclaw.reason`)
- `openclaw.session.stuck` (counter, attrs: `openclaw.state`)
- `openclaw.session.stuck_age_ms` (histogram, attrs: `openclaw.state`)
- `openclaw.run.attempt` (counter, attrs: `openclaw.attempt`)
-
-Exec:
-
- `openclaw.exec.duration_ms` (histogram, attrs: `openclaw.exec.target`,
-  `openclaw.exec.mode`, `openclaw.outcome`, `openclaw.failureKind`)
-
-Diagnostics internals (memory + tool loop):
-
- `openclaw.memory.heap_used_bytes` (histogram, attrs: `openclaw.memory.kind`)
- `openclaw.memory.rss_bytes` (histogram)
- `openclaw.memory.pressure` (counter, attrs: `openclaw.memory.level`)
- `openclaw.tool.loop.iterations` (counter, attrs: `openclaw.toolName`,
-  `openclaw.outcome`)
- `openclaw.tool.loop.duration_ms` (histogram, attrs: `openclaw.toolName`,
-  `openclaw.outcome`)
-
-### Exported spans (names + key attributes)
-
- `openclaw.model.usage`
-  - `openclaw.channel`, `openclaw.provider`, `openclaw.model`
-  - `openclaw.tokens.*` (input/output/cache_read/cache_write/total)
-  - `gen_ai.system` by default, or `gen_ai.provider.name` when latest GenAI
-    semantic conventions are opted in
-  - `gen_ai.request.model`, `gen_ai.operation.name`, `gen_ai.usage.*`
- `openclaw.run`
-  - `openclaw.outcome`, `openclaw.channel`, `openclaw.provider`,
-    `openclaw.model`, `openclaw.errorCategory`
- `openclaw.model.call`
-  - `gen_ai.system` by default, or `gen_ai.provider.name` when latest GenAI
-    semantic conventions are opted in
-  - `gen_ai.request.model`, `gen_ai.operation.name`,
-    `openclaw.provider`, `openclaw.model`, `openclaw.api`,
-    `openclaw.transport`, `openclaw.provider.request_id_hash` (bounded
-    SHA-based hash of the upstream provider request id; raw ids are not
-    exported)
- `openclaw.tool.execution`
-  - `gen_ai.tool.name`, `openclaw.toolName`, `openclaw.errorCategory`,
-    `openclaw.tool.params.*`
- `openclaw.exec`
-  - `openclaw.exec.target`, `openclaw.exec.mode`, `openclaw.outcome`,
-    `openclaw.failureKind`, `openclaw.exec.command_length`,
-    `openclaw.exec.exit_code`, `openclaw.exec.timed_out`
- `openclaw.webhook.processed`
-  - `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`
- `openclaw.webhook.error`
-  - `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`,
-    `openclaw.error`
- `openclaw.message.processed`
-  - `openclaw.channel`, `openclaw.outcome`, `openclaw.chatId`,
-    `openclaw.messageId`, `openclaw.reason`
- `openclaw.message.delivery`
-  - `openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`,
-    `openclaw.errorCategory`, `openclaw.delivery.result_count`
- `openclaw.session.stuck`
-  - `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`
- `openclaw.context.assembled`
-  - `openclaw.prompt.size`, `openclaw.history.size`,
-    `openclaw.context.tokens`, `openclaw.errorCategory` (no prompt,
-    history, response, or session-key content)
- `openclaw.tool.loop`
-  - `openclaw.toolName`, `openclaw.outcome`, `openclaw.iterations`,
-    `openclaw.errorCategory` (no loop messages, params, or tool output)
- `openclaw.memory.pressure`
-  - `openclaw.memory.level`, `openclaw.memory.heap_used_bytes`,
-    `openclaw.memory.rss_bytes`
-
-When content capture is explicitly enabled, model/tool spans can also include
-bounded, redacted `openclaw.content.*` attributes for the specific content
-classes you opted into.
-
-### Sampling + flushing
-
- Trace sampling: `diagnostics.otel.sampleRate` (0.0–1.0, root spans only).
- Metric export interval: `diagnostics.otel.flushIntervalMs` (min 1000ms).
-
-### Protocol notes
-
- OTLP/HTTP endpoints can be set via `diagnostics.otel.endpoint` or
-  `OTEL_EXPORTER_OTLP_ENDPOINT`.
- If the endpoint already contains `/v1/traces` or `/v1/metrics`, it is used as-is.
- If the endpoint already contains `/v1/logs`, it is used as-is for logs.
- `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental` controls only the
-  GenAI span provider attribute shape. Existing dashboards that read
-  `gen_ai.system` can keep the default until they migrate.
- `OPENCLAW_OTEL_PRELOADED=1` reuses an externally registered OpenTelemetry SDK
-  for traces/metrics instead of starting a plugin-owned NodeSDK.
- `diagnostics.otel.logs` enables OTLP log export for the main logger output.
-
-### Log export behavior
-
- OTLP logs use the same structured records written to `logging.file`.
- Respect `logging.level` (file log level). Console redaction does **not** apply
-  to OTLP logs.
- High-volume installs should prefer OTLP collector sampling/filtering.
+For OTLP export to a collector, see [OpenTelemetry export](/gateway/opentelemetry).

 ## Troubleshooting tips

@@ -483,5 +208,7 @@ classes you opted into.

 ## Related

- [Gateway Logging Internals](/gateway/logging) — WS log styles, subsystem prefixes, and console capture
- [Diagnostics](/gateway/configuration-reference#diagnostics) — OpenTelemetry export and cache trace config
+- [OpenTelemetry export](/gateway/opentelemetry) — OTLP/HTTP export, metric/span catalog, privacy model
+- [Diagnostics flags](/diagnostics/flags) — targeted debug-log flags
+- [Gateway logging internals](/gateway/logging) — WS log styles, subsystem prefixes, and console capture
+- [Configuration reference](/gateway/configuration-reference#diagnostics) — full `diagnostics.*` field reference