mirror of
https://github.com/openclaw/openclaw.git
synced 2026-03-12 07:20:45 +00:00
docs(logging): define diagnostics as telemetry surface
This commit is contained in:
@@ -148,6 +148,37 @@ replace logs; they exist to feed metrics, traces, and other exporters.
|
||||
Diagnostics events are emitted in-process, but exporters only attach when
|
||||
diagnostics + the exporter plugin are enabled.
|
||||
|
||||
### Telemetry surface ownership
|
||||
|
||||
OpenClaw has separate surfaces for automation, runtime control, and telemetry:
|
||||
|
||||
| If you want to... | Use... | Why |
|
||||
| -------------------------------------------------------------------------------------- | --------------------------------------- | ------------------------------------------------------------------ |
|
||||
| Export metrics, traces, or machine-readable health signals | Diagnostic events | Observability should be append-only telemetry, not a behavior hook |
|
||||
| Rewrite prompts, block tools, cancel outbound messages, or add policy/middleware | Typed plugin hooks via `api.on(...)` | Runtime hooks can mutate or block behavior |
|
||||
| Trigger coarse operator automation such as file writes, notifications, or side effects | HOOK.md hooks / `api.registerHook(...)` | Internal hooks are for operator automation, not telemetry schemas |
|
||||
|
||||
Future OTEL work should extend `src/infra/diagnostic-events.ts`, then map those
|
||||
events in the `diagnostics-otel` plugin. Do not add telemetry-only proposals by
|
||||
growing the hook APIs.
|
||||
|
||||
### What diagnostic events are for
|
||||
|
||||
Diagnostic events are the observability contract between the gateway runtime and
|
||||
telemetry consumers such as the `diagnostics-otel` plugin.
|
||||
|
||||
Diagnostic events should be:
|
||||
|
||||
- append-only signals for exporters, dashboards, alerts, and troubleshooting
|
||||
- safe to ignore without affecting runtime behavior
|
||||
- stable enough that exporters can map them into metrics, traces, or logs
|
||||
|
||||
Diagnostic events should not be used for:
|
||||
|
||||
- blocking, vetoing, or rewriting runtime behavior
|
||||
- policy enforcement or middleware ordering
|
||||
- side-effect automation that must run for the system to behave correctly
|
||||
|
||||
### OpenTelemetry vs OTLP
|
||||
|
||||
- **OpenTelemetry (OTel)**: the data model + SDKs for traces, metrics, and logs.
|
||||
@@ -184,6 +215,28 @@ Queue + session:
|
||||
- `run.attempt`: run retry/attempt metadata.
|
||||
- `diagnostic.heartbeat`: aggregate counters (webhooks/queue/session).
|
||||
|
||||
Tool safety:
|
||||
|
||||
- `tool.loop`: repeated-tool-loop warning/block telemetry emitted by the runtime.
|
||||
|
||||
### What is still missing
|
||||
|
||||
The current event catalog is useful, but still coarse in a few places. New
|
||||
observability work should generally extend `src/infra/diagnostic-events.ts`
|
||||
instead of asking hooks to carry telemetry-only meaning.
|
||||
|
||||
Priority gaps for future telemetry work:
|
||||
|
||||
- Run lifecycle: explicit run start, run end, and run error boundaries.
|
||||
- Model lifecycle: request/response/error boundaries in addition to aggregate
|
||||
`model.usage`.
|
||||
- Tool lifecycle: tool call start/end/error boundaries, plus first-class exporter
|
||||
mapping for existing `tool.loop` events.
|
||||
- Outbound delivery lifecycle: delivery attempted/sent/failed boundaries across
|
||||
channels, separate from message processing.
|
||||
- Attribute hygiene: clearer redaction and cardinality guidance for exporter-safe
|
||||
fields.
|
||||
|
||||
### Enable diagnostics (no exporter)
|
||||
|
||||
Use this if you want diagnostics events available to plugins or custom sinks:
|
||||
|
||||
Reference in New Issue
Block a user