openclaw/docs/logging.md at vincentkoc-code/otel-diagnostics-surface-docs

vultr/openclaw

Fork 0

mirror of https://github.com/openclaw/openclaw.git synced 2026-03-12 23:40:45 +00:00

Files

Vincent Koc 16fda2d1ce docs(logging): define diagnostics as telemetry surface

2026-03-09 13:28:09 -07:00

14 KiB

Raw Permalink Blame History

summary, read_when, title

summary

read_when

title

Logging overview: file logs, console output, CLI tailing, and the Control UI

You need a beginner-friendly overview of logging

You want to configure log levels or formats

You are troubleshooting and need to find logs quickly

Logging

OpenClaw logs in two places:

File logs (JSON lines) written by the Gateway.
Console output shown in terminals and the Control UI.

This page explains where logs live, how to read them, and how to configure log levels and formats.

Where logs live

By default, the Gateway writes a rolling log file under:

/tmp/openclaw/openclaw-YYYY-MM-DD.log

The date uses the gateway host's local timezone.

You can override this in ~/.openclaw/openclaw.json:

{
  "logging": {
    "file": "/path/to/openclaw.log"
  }
}

How to read logs

CLI: live tail (recommended)

Use the CLI to tail the gateway log file via RPC:

openclaw logs --follow

Output modes:

TTY sessions: pretty, colorized, structured log lines.
Non-TTY sessions: plain text.
--json: line-delimited JSON (one log event per line).
--plain: force plain text in TTY sessions.
--no-color: disable ANSI colors.

In JSON mode, the CLI emits type-tagged objects:

meta: stream metadata (file, cursor, size)
log: parsed log entry
notice: truncation / rotation hints
raw: unparsed log line

If the Gateway is unreachable, the CLI prints a short hint to run:

openclaw doctor

Control UI (web)

The Control UI’s Logs tab tails the same file using logs.tail. See /web/control-ui for how to open it.

Channel-only logs

To filter channel activity (WhatsApp/Telegram/etc), use:

openclaw channels logs --channel whatsapp

Log formats

File logs (JSONL)

Each line in the log file is a JSON object. The CLI and Control UI parse these entries to render structured output (time, level, subsystem, message).

Console output

Console logs are TTY-aware and formatted for readability:

Subsystem prefixes (e.g. gateway/channels/whatsapp)
Level coloring (info/warn/error)
Optional compact or JSON mode

Console formatting is controlled by logging.consoleStyle.

Configuring logging

All logging configuration lives under logging in ~/.openclaw/openclaw.json.

{
  "logging": {
    "level": "info",
    "file": "/tmp/openclaw/openclaw-YYYY-MM-DD.log",
    "consoleLevel": "info",
    "consoleStyle": "pretty",
    "redactSensitive": "tools",
    "redactPatterns": ["sk-.*"]
  }
}

Log levels

logging.level: file logs (JSONL) level.
logging.consoleLevel: console verbosity level.

You can override both via the OPENCLAW_LOG_LEVEL environment variable (e.g. OPENCLAW_LOG_LEVEL=debug). The env var takes precedence over the config file, so you can raise verbosity for a single run without editing openclaw.json. You can also pass the global CLI option --log-level <level> (for example, openclaw --log-level debug gateway run), which overrides the environment variable for that command.

--verbose only affects console output; it does not change file log levels.

Console styles

logging.consoleStyle:

pretty: human-friendly, colored, with timestamps.
compact: tighter output (best for long sessions).
json: JSON per line (for log processors).

Redaction

Tool summaries can redact sensitive tokens before they hit the console:

logging.redactSensitive: off | tools (default: tools)
logging.redactPatterns: list of regex strings to override the default set

Redaction affects console output only and does not alter file logs.

Diagnostics + OpenTelemetry

Diagnostics are structured, machine-readable events for model runs and message-flow telemetry (webhooks, queueing, session state). They do not replace logs; they exist to feed metrics, traces, and other exporters.

Diagnostics events are emitted in-process, but exporters only attach when diagnostics + the exporter plugin are enabled.

Telemetry surface ownership

OpenClaw has separate surfaces for automation, runtime control, and telemetry:

If you want to...	Use...	Why
Export metrics, traces, or machine-readable health signals	Diagnostic events	Observability should be append-only telemetry, not a behavior hook
Rewrite prompts, block tools, cancel outbound messages, or add policy/middleware	Typed plugin hooks via `api.on(...)`	Runtime hooks can mutate or block behavior
Trigger coarse operator automation such as file writes, notifications, or side effects	HOOK.md hooks / `api.registerHook(...)`	Internal hooks are for operator automation, not telemetry schemas

Future OTEL work should extend src/infra/diagnostic-events.ts, then map those events in the diagnostics-otel plugin. Do not add telemetry-only proposals by growing the hook APIs.

What diagnostic events are for

Diagnostic events are the observability contract between the gateway runtime and telemetry consumers such as the diagnostics-otel plugin.

Diagnostic events should be:

append-only signals for exporters, dashboards, alerts, and troubleshooting
safe to ignore without affecting runtime behavior
stable enough that exporters can map them into metrics, traces, or logs

Diagnostic events should not be used for:

blocking, vetoing, or rewriting runtime behavior
policy enforcement or middleware ordering
side-effect automation that must run for the system to behave correctly

OpenTelemetry vs OTLP

OpenTelemetry (OTel): the data model + SDKs for traces, metrics, and logs.
OTLP: the wire protocol used to export OTel data to a collector/backend.
OpenClaw exports via OTLP/HTTP (protobuf) today.

Signals exported

Metrics: counters + histograms (token usage, message flow, queueing).
Traces: spans for model usage + webhook/message processing.
Logs: exported over OTLP when diagnostics.otel.logs is enabled. Log volume can be high; keep logging.level and exporter filters in mind.

Diagnostic event catalog

Model usage:

model.usage: tokens, cost, duration, context, provider/model/channel, session ids.

Message flow:

webhook.received: webhook ingress per channel.
webhook.processed: webhook handled + duration.
webhook.error: webhook handler errors.
message.queued: message enqueued for processing.
message.processed: outcome + duration + optional error.

Queue + session:

queue.lane.enqueue: command queue lane enqueue + depth.
queue.lane.dequeue: command queue lane dequeue + wait time.
session.state: session state transition + reason.
session.stuck: session stuck warning + age.
run.attempt: run retry/attempt metadata.
diagnostic.heartbeat: aggregate counters (webhooks/queue/session).

Tool safety:

tool.loop: repeated-tool-loop warning/block telemetry emitted by the runtime.

What is still missing

The current event catalog is useful, but still coarse in a few places. New observability work should generally extend src/infra/diagnostic-events.ts instead of asking hooks to carry telemetry-only meaning.

Priority gaps for future telemetry work:

Run lifecycle: explicit run start, run end, and run error boundaries.
Model lifecycle: request/response/error boundaries in addition to aggregate model.usage.
Tool lifecycle: tool call start/end/error boundaries, plus first-class exporter mapping for existing tool.loop events.
Outbound delivery lifecycle: delivery attempted/sent/failed boundaries across channels, separate from message processing.
Attribute hygiene: clearer redaction and cardinality guidance for exporter-safe fields.

Enable diagnostics (no exporter)

Use this if you want diagnostics events available to plugins or custom sinks:

{
  "diagnostics": {
    "enabled": true
  }
}

Diagnostics flags (targeted logs)

Use flags to turn on extra, targeted debug logs without raising logging.level. Flags are case-insensitive and support wildcards (e.g. telegram.* or *).

{
  "diagnostics": {
    "flags": ["telegram.http"]
  }
}

Env override (one-off):

OPENCLAW_DIAGNOSTICS=telegram.http,telegram.payload

Notes:

Flag logs go to the standard log file (same as logging.file).
Output is still redacted according to logging.redactSensitive.
Full guide: /diagnostics/flags.

Export to OpenTelemetry

Diagnostics can be exported via the diagnostics-otel plugin (OTLP/HTTP). This works with any OpenTelemetry collector/backend that accepts OTLP/HTTP.

{
  "plugins": {
    "allow": ["diagnostics-otel"],
    "entries": {
      "diagnostics-otel": {
        "enabled": true
      }
    }
  },
  "diagnostics": {
    "enabled": true,
    "otel": {
      "enabled": true,
      "endpoint": "http://otel-collector:4318",
      "protocol": "http/protobuf",
      "serviceName": "openclaw-gateway",
      "traces": true,
      "metrics": true,
      "logs": true,
      "sampleRate": 0.2,
      "flushIntervalMs": 60000
    }
  }
}