docs: split OpenTelemetry export into its own page under gateway

Logging.md had grown to 487 lines with ~300 lines dedicated to
OpenTelemetry export — wire protocol, full metric/span catalog, env
vars, captureContent shape, sampling, the diagnostic event catalog,
and protocol notes — leaving the genuine logging overview buried
behind exporter reference material.

Move the OTEL surface to a dedicated page and slim logging.md to a
focused logs overview:

- Add docs/gateway/opentelemetry.md (OpenTelemetry export). Same
  content reorganized: how it fits together, quick start, signals,
  configuration reference + env vars table, privacy/captureContent,
  sampling/flushing, full metric and span catalog, diagnostic event
  catalog, no-exporter mode, diagnostics flags pointer, disable.
- docs/logging.md: drop the OTEL section in favor of a short
  'Diagnostics and OpenTelemetry' summary that cross-links the new
  page and the diagnostics-flags page. Drops 273 lines net. Also
  drops the redundant body H1, retitles to 'Logging' (was 'Logging
  overview' which mismatched sidebar usage), and refreshes the
  Related list.
- docs/docs.json: insert gateway/opentelemetry into the
  'Health and diagnostics' sidebar group, reorder pages so the user-
  facing health/run pages come before exporter/internals pages, and
  put logging next to opentelemetry where readers naturally
  associate them.
- docs/gateway/diagnostics.md, docs/gateway/logging.md,
  docs/gateway/configuration-reference.md: cross-link the new page
  and sentence-case stale Title-Cased Related entries on
  diagnostics.md.
This commit is contained in:
Vincent Koc
2026-04-25 16:46:43 -07:00
parent a1090b6043
commit 7741dbb759
6 changed files with 341 additions and 307 deletions

View File

@@ -1,14 +1,12 @@
---
summary: "Logging overview: file logs, console output, CLI tailing, and the Control UI"
summary: "File logs, console output, CLI tailing, and the Control UI Logs tab"
read_when:
- You need a beginner-friendly overview of logging
- You want to configure log levels or formats
- You need a beginner-friendly overview of OpenClaw logging
- You want to configure log levels, formats, or redaction
- You are troubleshooting and need to find logs quickly
title: "Logging overview"
title: "Logging"
---
# Logging
OpenClaw has two main log surfaces:
- **File logs** (JSON lines) written by the Gateway.
@@ -171,308 +169,35 @@ Tool summaries can redact sensitive tokens before they hit the console:
Redaction affects **console output only** and does not alter file logs.
## Diagnostics + OpenTelemetry
## Diagnostics and OpenTelemetry
Diagnostics are structured, machine-readable events for model runs **and**
Diagnostics are structured, machine-readable events for model runs and
message-flow telemetry (webhooks, queueing, session state). They do **not**
replace logs; they exist to feed metrics, traces, and other exporters.
replace logs they feed metrics, traces, and exporters. Events are emitted
in-process whether or not you export them.
Diagnostics events are emitted in-process, but exporters only attach when
diagnostics + the exporter plugin are enabled.
Two adjacent surfaces:
### OpenTelemetry vs OTLP
- **OpenTelemetry export** — send metrics, traces, and logs over OTLP/HTTP to
any OpenTelemetry-compatible collector or backend (Grafana, Datadog,
Honeycomb, New Relic, Tempo, etc.). Full configuration, signal catalog,
metric/span names, env vars, and privacy model live on a dedicated page:
[OpenTelemetry export](/gateway/opentelemetry).
- **Diagnostics flags** — targeted debug-log flags that route extra logs to
`logging.file` without raising `logging.level`. Flags are case-insensitive
and support wildcards (`telegram.*`, `*`). Configure under `diagnostics.flags`
or via the `OPENCLAW_DIAGNOSTICS=...` env override. Full guide:
[Diagnostics flags](/diagnostics/flags).
- **OpenTelemetry (OTel)**: the data model + SDKs for traces, metrics, and logs.
- **OTLP**: the wire protocol used to export OTel data to a collector/backend.
- OpenClaw exports via **OTLP/HTTP (protobuf)** today.
To enable diagnostics events for plugins or custom sinks without OTLP export:
### Signals exported
- **Metrics**: counters + histograms (token usage, message flow, queueing).
- **Traces**: spans for model usage + webhook/message processing.
- **Logs**: exported over OTLP when `diagnostics.otel.logs` is enabled. Log
volume can be high; keep `logging.level` and exporter filters in mind.
### Diagnostic event catalog
Model usage:
- `model.usage`: tokens, cost, duration, context, provider/model/channel, session ids.
`usage` is provider/turn accounting for cost and telemetry; `context.used`
is the current prompt/context snapshot and can be lower than provider
`usage.total` when cached input or tool-loop calls are involved.
Message flow:
- `webhook.received`: webhook ingress per channel.
- `webhook.processed`: webhook handled + duration.
- `webhook.error`: webhook handler errors.
- `message.queued`: message enqueued for processing.
- `message.processed`: outcome + duration + optional error.
- `message.delivery.started`: outbound delivery attempt started.
- `message.delivery.completed`: outbound delivery attempt finished + duration/result count.
- `message.delivery.error`: outbound delivery attempt failed + duration/bounded error category.
Queue + session:
- `queue.lane.enqueue`: command queue lane enqueue + depth.
- `queue.lane.dequeue`: command queue lane dequeue + wait time.
- `session.state`: session state transition + reason.
- `session.stuck`: session stuck warning + age.
- `run.attempt`: run retry/attempt metadata.
- `diagnostic.heartbeat`: aggregate counters (webhooks/queue/session).
Exec:
- `exec.process.completed`: terminal exec process outcome, duration, target, mode,
exit code, and failure kind. Command text and working directories are not
included.
### Enable diagnostics (no exporter)
Use this if you want diagnostics events available to plugins or custom sinks:
```json
```json5
{
"diagnostics": {
"enabled": true
}
diagnostics: { enabled: true },
}
```
### Diagnostics flags (targeted logs)
Use flags to turn on extra, targeted debug logs without raising `logging.level`.
Flags are case-insensitive and support wildcards (e.g. `telegram.*` or `*`).
```json
{
"diagnostics": {
"flags": ["telegram.http"]
}
}
```
Env override (one-off):
```
OPENCLAW_DIAGNOSTICS=telegram.http,telegram.payload
```
Notes:
- Flag logs go to the standard log file (same as `logging.file`).
- Output is still redacted according to `logging.redactSensitive`.
- Full guide: [/diagnostics/flags](/diagnostics/flags).
### Export to OpenTelemetry
Diagnostics can be exported via the `diagnostics-otel` plugin (OTLP/HTTP). This
works with any OpenTelemetry collector/backend that accepts OTLP/HTTP.
```json
{
"plugins": {
"allow": ["diagnostics-otel"],
"entries": {
"diagnostics-otel": {
"enabled": true
}
}
},
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "http://otel-collector:4318",
"protocol": "http/protobuf",
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true,
"logs": true,
"sampleRate": 0.2,
"flushIntervalMs": 60000,
"captureContent": {
"enabled": false,
"inputMessages": false,
"outputMessages": false,
"toolInputs": false,
"toolOutputs": false,
"systemPrompt": false
}
}
}
}
```
Notes:
- You can also enable the plugin with `openclaw plugins enable diagnostics-otel`.
- `protocol` currently supports `http/protobuf` only. `grpc` is ignored.
- Metrics include token usage, cost, context size, run duration, and message-flow
counters/histograms (webhooks, queueing, session state, queue depth/wait),
plus GenAI token usage and model-call duration histograms.
- Traces/metrics can be toggled with `traces` / `metrics` (default: on). Traces
include model usage spans plus webhook/message processing spans when enabled.
- Raw model/tool content is not exported by default. Use
`diagnostics.otel.captureContent` only when your collector and retention policy
are approved for prompt, response, tool, or system prompt text.
- Set `headers` when your collector requires auth.
- Environment variables supported: `OTEL_EXPORTER_OTLP_ENDPOINT`,
`OTEL_SERVICE_NAME`, `OTEL_EXPORTER_OTLP_PROTOCOL`.
- Set `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental` to emit the
latest experimental GenAI provider span attribute (`gen_ai.provider.name`)
instead of the legacy span attribute (`gen_ai.system`). GenAI metrics always
use bounded, low-cardinality semantic attributes.
- Set `OPENCLAW_OTEL_PRELOADED=1` when another preload or host process already
registered the global OpenTelemetry SDK. In that mode the plugin does not start
or shut down its own SDK, but it still wires OpenClaw diagnostic listeners and
honors `diagnostics.otel.traces`, `metrics`, and `logs`.
### Exported metrics (names + types)
Model usage:
- `openclaw.tokens` (counter, attrs: `openclaw.token`, `openclaw.channel`,
`openclaw.provider`, `openclaw.model`)
- `openclaw.cost.usd` (counter, attrs: `openclaw.channel`, `openclaw.provider`,
`openclaw.model`)
- `openclaw.run.duration_ms` (histogram, attrs: `openclaw.channel`,
`openclaw.provider`, `openclaw.model`)
- `openclaw.context.tokens` (histogram, attrs: `openclaw.context`,
`openclaw.channel`, `openclaw.provider`, `openclaw.model`)
- `gen_ai.client.token.usage` (histogram, GenAI semantic-conventions metric,
attrs: `gen_ai.token.type` = `input`/`output`, `gen_ai.provider.name`,
`gen_ai.operation.name`, `gen_ai.request.model`)
- `gen_ai.client.operation.duration` (histogram, seconds, GenAI
semantic-conventions metric, attrs: `gen_ai.provider.name`,
`gen_ai.operation.name`, `gen_ai.request.model`, optional `error.type`)
Message flow:
- `openclaw.webhook.received` (counter, attrs: `openclaw.channel`,
`openclaw.webhook`)
- `openclaw.webhook.error` (counter, attrs: `openclaw.channel`,
`openclaw.webhook`)
- `openclaw.webhook.duration_ms` (histogram, attrs: `openclaw.channel`,
`openclaw.webhook`)
- `openclaw.message.queued` (counter, attrs: `openclaw.channel`,
`openclaw.source`)
- `openclaw.message.processed` (counter, attrs: `openclaw.channel`,
`openclaw.outcome`)
- `openclaw.message.duration_ms` (histogram, attrs: `openclaw.channel`,
`openclaw.outcome`)
- `openclaw.message.delivery.started` (counter, attrs: `openclaw.channel`,
`openclaw.delivery.kind`)
- `openclaw.message.delivery.duration_ms` (histogram, attrs:
`openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`,
`openclaw.errorCategory`)
Queues + sessions:
- `openclaw.queue.lane.enqueue` (counter, attrs: `openclaw.lane`)
- `openclaw.queue.lane.dequeue` (counter, attrs: `openclaw.lane`)
- `openclaw.queue.depth` (histogram, attrs: `openclaw.lane` or
`openclaw.channel=heartbeat`)
- `openclaw.queue.wait_ms` (histogram, attrs: `openclaw.lane`)
- `openclaw.session.state` (counter, attrs: `openclaw.state`, `openclaw.reason`)
- `openclaw.session.stuck` (counter, attrs: `openclaw.state`)
- `openclaw.session.stuck_age_ms` (histogram, attrs: `openclaw.state`)
- `openclaw.run.attempt` (counter, attrs: `openclaw.attempt`)
Exec:
- `openclaw.exec.duration_ms` (histogram, attrs: `openclaw.exec.target`,
`openclaw.exec.mode`, `openclaw.outcome`, `openclaw.failureKind`)
Diagnostics internals (memory + tool loop):
- `openclaw.memory.heap_used_bytes` (histogram, attrs: `openclaw.memory.kind`)
- `openclaw.memory.rss_bytes` (histogram)
- `openclaw.memory.pressure` (counter, attrs: `openclaw.memory.level`)
- `openclaw.tool.loop.iterations` (counter, attrs: `openclaw.toolName`,
`openclaw.outcome`)
- `openclaw.tool.loop.duration_ms` (histogram, attrs: `openclaw.toolName`,
`openclaw.outcome`)
### Exported spans (names + key attributes)
- `openclaw.model.usage`
- `openclaw.channel`, `openclaw.provider`, `openclaw.model`
- `openclaw.tokens.*` (input/output/cache_read/cache_write/total)
- `gen_ai.system` by default, or `gen_ai.provider.name` when latest GenAI
semantic conventions are opted in
- `gen_ai.request.model`, `gen_ai.operation.name`, `gen_ai.usage.*`
- `openclaw.run`
- `openclaw.outcome`, `openclaw.channel`, `openclaw.provider`,
`openclaw.model`, `openclaw.errorCategory`
- `openclaw.model.call`
- `gen_ai.system` by default, or `gen_ai.provider.name` when latest GenAI
semantic conventions are opted in
- `gen_ai.request.model`, `gen_ai.operation.name`,
`openclaw.provider`, `openclaw.model`, `openclaw.api`,
`openclaw.transport`, `openclaw.provider.request_id_hash` (bounded
SHA-based hash of the upstream provider request id; raw ids are not
exported)
- `openclaw.tool.execution`
- `gen_ai.tool.name`, `openclaw.toolName`, `openclaw.errorCategory`,
`openclaw.tool.params.*`
- `openclaw.exec`
- `openclaw.exec.target`, `openclaw.exec.mode`, `openclaw.outcome`,
`openclaw.failureKind`, `openclaw.exec.command_length`,
`openclaw.exec.exit_code`, `openclaw.exec.timed_out`
- `openclaw.webhook.processed`
- `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`
- `openclaw.webhook.error`
- `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`,
`openclaw.error`
- `openclaw.message.processed`
- `openclaw.channel`, `openclaw.outcome`, `openclaw.chatId`,
`openclaw.messageId`, `openclaw.reason`
- `openclaw.message.delivery`
- `openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`,
`openclaw.errorCategory`, `openclaw.delivery.result_count`
- `openclaw.session.stuck`
- `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`
- `openclaw.context.assembled`
- `openclaw.prompt.size`, `openclaw.history.size`,
`openclaw.context.tokens`, `openclaw.errorCategory` (no prompt,
history, response, or session-key content)
- `openclaw.tool.loop`
- `openclaw.toolName`, `openclaw.outcome`, `openclaw.iterations`,
`openclaw.errorCategory` (no loop messages, params, or tool output)
- `openclaw.memory.pressure`
- `openclaw.memory.level`, `openclaw.memory.heap_used_bytes`,
`openclaw.memory.rss_bytes`
When content capture is explicitly enabled, model/tool spans can also include
bounded, redacted `openclaw.content.*` attributes for the specific content
classes you opted into.
### Sampling + flushing
- Trace sampling: `diagnostics.otel.sampleRate` (0.01.0, root spans only).
- Metric export interval: `diagnostics.otel.flushIntervalMs` (min 1000ms).
### Protocol notes
- OTLP/HTTP endpoints can be set via `diagnostics.otel.endpoint` or
`OTEL_EXPORTER_OTLP_ENDPOINT`.
- If the endpoint already contains `/v1/traces` or `/v1/metrics`, it is used as-is.
- If the endpoint already contains `/v1/logs`, it is used as-is for logs.
- `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental` controls only the
GenAI span provider attribute shape. Existing dashboards that read
`gen_ai.system` can keep the default until they migrate.
- `OPENCLAW_OTEL_PRELOADED=1` reuses an externally registered OpenTelemetry SDK
for traces/metrics instead of starting a plugin-owned NodeSDK.
- `diagnostics.otel.logs` enables OTLP log export for the main logger output.
### Log export behavior
- OTLP logs use the same structured records written to `logging.file`.
- Respect `logging.level` (file log level). Console redaction does **not** apply
to OTLP logs.
- High-volume installs should prefer OTLP collector sampling/filtering.
For OTLP export to a collector, see [OpenTelemetry export](/gateway/opentelemetry).
## Troubleshooting tips
@@ -483,5 +208,7 @@ classes you opted into.
## Related
- [Gateway Logging Internals](/gateway/logging) — WS log styles, subsystem prefixes, and console capture
- [Diagnostics](/gateway/configuration-reference#diagnostics) — OpenTelemetry export and cache trace config
- [OpenTelemetry export](/gateway/opentelemetry) — OTLP/HTTP export, metric/span catalog, privacy model
- [Diagnostics flags](/diagnostics/flags) — targeted debug-log flags
- [Gateway logging internals](/gateway/logging) — WS log styles, subsystem prefixes, and console capture
- [Configuration reference](/gateway/configuration-reference#diagnostics) — full `diagnostics.*` field reference