Improve gateway diagnostics export for support reports (#70324)

Merged via squash.

Prepared head SHA: 3d6ee85993
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
This commit is contained in:
Gustavo Madeira Santana
2026-04-22 20:47:14 -04:00
committed by GitHub
parent 6b41ef311f
commit 28818f9140
54 changed files with 5385 additions and 56 deletions

View File

@@ -25,6 +25,7 @@ Docs: https://docs.openclaw.ai
- TUI: add local embedded mode for running terminal chats without a Gateway while keeping plugin approval gates enforced. (#66767) Thanks @fuller-stack-dev.
- CLI/Claude: default `claude-cli` runs to warm stdio sessions, including custom configs that omit transport fields, and resume from the stored Claude session after Gateway restarts or idle exits. (#69679) Thanks @obviyus.
- Control UI/settings+chat: add a browser-local personal identity for the operator (name plus local-safe avatar), route user identity rendering through the shared chat/avatar path used by assistant and agent surfaces, and tighten Quick Settings, agent fallback chips, and narrow-screen chat layouts so personalization no longer wastes space or clips controls. (#70362) Thanks @BunsDev.
- Gateway/diagnostics: enable payload-free stability recording by default and add a support-ready diagnostics export with sanitized logs, status, health, config, and stability snapshots for bug reports. (#70324) Thanks @gumadeiras.
### Fixes

View File

@@ -1,4 +1,4 @@
81a8a7de5d4bf02cf3e697a641fe89844f98ed58d47890f12800181fde5a97b1 config-baseline.json
dab963eda8866b8bffd5c9032f92f0f6b08ed54dda837f1f5c513fca5d2c78e9 config-baseline.core.json
b05357fa162ba1f1d4ed192671b758d3905602678ff61148568840c6544d6222 config-baseline.json
a4e167f169db58d71c385a31fa2b980772f9fee963e70dd9553f63536cae5aed config-baseline.core.json
35d132fe176bd2bf9f0e46b29de91baba63ec4db3317cc5b294a982b46d16ba9 config-baseline.channel.json
71b5ff17041bc48a62300ad9f44fa8bb14d9dcd7f4c3549c0576d3059ce6ff36 config-baseline.plugin.json
3703c5345288adb9eee8cda3b592147cf4fed25a7782bed21ca83c88c3ca1cc0 config-baseline.plugin.json

View File

@@ -111,6 +111,59 @@ Options:
- `--days <days>`: number of days to include (default `30`).
### `gateway stability`
Fetch the recent diagnostic stability recorder from a running Gateway.
```bash
openclaw gateway stability
openclaw gateway stability --type payload.large
openclaw gateway stability --bundle latest
openclaw gateway stability --bundle latest --export
openclaw gateway stability --json
```
Options:
- `--limit <limit>`: maximum number of recent events to include (default `25`, max `1000`).
- `--type <type>`: filter by diagnostic event type, such as `payload.large` or `diagnostic.memory.pressure`.
- `--since-seq <seq>`: include only events after a diagnostic sequence number.
- `--bundle [path]`: read a persisted stability bundle instead of calling the running Gateway. Use `--bundle latest` (or just `--bundle`) for the newest bundle under the state directory, or pass a bundle JSON path directly.
- `--export`: write a shareable support diagnostics zip instead of printing stability details.
- `--output <path>`: output path for `--export`.
Notes:
- The recorder is active by default. Set `diagnostics.enabled: false` only when you need to disable Gateway diagnostic heartbeat collection.
- Records keep operational metadata: event names, counts, byte sizes, memory readings, queue/session state, channel/plugin names, and redacted session summaries. They do not keep chat text, webhook bodies, tool outputs, raw request or response bodies, tokens, cookies, secret values, hostnames, or raw session ids.
- On fatal Gateway exits, shutdown timeouts, and restart startup failures, OpenClaw writes the same diagnostic snapshot to `~/.openclaw/logs/stability/openclaw-stability-*.json` when the recorder has events. Inspect the newest bundle with `openclaw gateway stability --bundle latest`; `--limit`, `--type`, and `--since-seq` also apply to bundle output.
### `gateway diagnostics export`
Write a local diagnostics zip that is designed to attach to bug reports.
```bash
openclaw gateway diagnostics export
openclaw gateway diagnostics export --output openclaw-diagnostics.zip
openclaw gateway diagnostics export --json
```
Options:
- `--output <path>`: output zip path. Defaults to a support export under the state directory.
- `--log-lines <count>`: maximum sanitized log lines to include (default `5000`).
- `--log-bytes <bytes>`: maximum log bytes to inspect (default `1000000`).
- `--url <url>`: Gateway WebSocket URL for the health snapshot.
- `--token <token>`: Gateway token for the health snapshot.
- `--password <password>`: Gateway password for the health snapshot.
- `--timeout <ms>`: status/health snapshot timeout (default `3000`).
- `--no-stability-bundle`: skip persisted stability bundle lookup.
- `--json`: print the written path, size, and manifest as JSON.
The export contains a manifest, a Markdown summary, config shape, sanitized config details, sanitized log summaries, sanitized Gateway status/health snapshots, and the newest stability bundle when one exists.
It is meant to be shared. It keeps operational details that help debugging, such as safe OpenClaw log fields, subsystem names, status codes, durations, configured modes, ports, plugin ids, provider ids, non-secret feature settings, and redacted operational log messages. It omits or redacts chat text, webhook bodies, tool outputs, credentials, cookies, account/message identifiers, prompt/instruction text, hostnames, and secret values. When a LogTape-style message looks like user/chat/tool payload text, the export keeps only that a message was omitted plus its byte count.
### `gateway status`
`gateway status` shows the Gateway service (launchd/systemd/schtasks) plus an optional probe of connectivity/auth capability.

View File

@@ -26,6 +26,8 @@ Short guide to verify channel connectivity without guessing.
- Creds on disk: `ls -l ~/.openclaw/credentials/whatsapp/<accountId>/creds.json` (mtime should be recent).
- Session store: `ls -l ~/.openclaw/agents/<agentId>/sessions/sessions.json` (path can be overridden in config). Count and recent recipients are surfaced via `status`.
- Relink flow: `openclaw channels logout && openclaw channels login --verbose` when status codes 409515 or `loggedOut` appear in logs. (Note: the QR login flow auto-restarts once for status 515 after pairing.)
- Diagnostics are enabled by default. The gateway records operational facts unless `diagnostics.enabled: false` is set. Memory events record RSS/heap byte counts, threshold pressure, and growth pressure. Oversized-payload events record what was rejected, truncated, or chunked, plus sizes and limits when available. They do not record the message text, attachment contents, webhook body, raw request or response body, tokens, cookies, or secret values. The same heartbeat starts the bounded stability recorder, which is available through `openclaw gateway stability` or the `diagnostics.stability` Gateway RPC. Fatal Gateway exits, shutdown timeouts, and restart startup failures persist the latest recorder snapshot under `~/.openclaw/logs/stability/` when events exist; inspect the newest saved bundle with `openclaw gateway stability --bundle latest`.
- For bug reports, run `openclaw gateway diagnostics export` and attach the generated zip. The export combines a Markdown summary, the newest stability bundle, sanitized log metadata, sanitized Gateway status/health snapshots, and config shape. It is meant to be shared: chat text, webhook bodies, tool outputs, credentials, cookies, account/message identifiers, and secret values are omitted or redacted.
## Health monitor config

View File

@@ -18,6 +18,13 @@ handshake time.
- WebSocket, text frames with JSON payloads.
- First frame **must** be a `connect` request.
- Pre-connect frames are capped at 64 KiB. After a successful handshake, clients
should follow the `hello-ok.policy.maxPayload` and
`hello-ok.policy.maxBufferedBytes` limits. With diagnostics enabled,
oversized inbound frames and slow outbound buffers emit `payload.large` events
before the gateway closes or drops the affected frame. These events keep
sizes, limits, surfaces, and safe reason codes. They do not keep the message
body, attachment contents, raw frame body, tokens, cookies, or secret values.
## Handshake (connect)
@@ -265,6 +272,12 @@ implemented in `src/gateway/server-methods/*.ts`.
### System and identity
- `health` returns the cached or freshly probed gateway health snapshot.
- `diagnostics.stability` returns the recent bounded diagnostic stability
recorder. It keeps operational metadata such as event names, counts, byte
sizes, memory readings, queue/session state, channel/plugin names, and session
ids. It does not keep chat text, webhook bodies, tool outputs, raw request or
response bodies, tokens, cookies, or secret values. Operator read scope is
required.
- `status` returns the `/status`-style gateway summary; sensitive fields are
included only for admin-scoped operator clients.
- `gateway.identity.get` returns the gateway device identity used by relay and

View File

@@ -329,6 +329,20 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost):
- `pnpm test:perf:profile:main` writes a main-thread CPU profile for Vitest/Vite startup and transform overhead.
- `pnpm test:perf:profile:runner` writes runner CPU+heap profiles for the unit suite with file parallelism disabled.
### Stability (gateway)
- Command: `pnpm test:stability:gateway`
- Config: `vitest.gateway.config.ts`, forced to one worker
- Scope:
- Starts a real loopback Gateway with diagnostics enabled by default
- Drives synthetic gateway message, memory, and large-payload churn through the diagnostic event path
- Queries `diagnostics.stability` over the Gateway WS RPC
- Covers diagnostic stability bundle persistence helpers
- Asserts the recorder remains bounded, synthetic RSS samples stay under the pressure budget, and per-session queue depths drain back to zero
- Expectations:
- CI-safe and keyless
- Narrow lane for stability-regression follow-up, not a substitute for the full Gateway suite
### E2E (gateway smoke)
- Command: `pnpm test:e2e`

View File

@@ -1487,6 +1487,7 @@
"test:perf:profile:runner": "node scripts/run-vitest-profile.mjs runner",
"test:sectriage": "node scripts/run-vitest.mjs run --config test/vitest/vitest.gateway.config.ts && node scripts/run-vitest.mjs run --config test/vitest/vitest.unit.config.ts --exclude src/daemon/launchd.integration.test.ts --exclude src/process/exec.test.ts",
"test:serial": "OPENCLAW_TEST_PROJECTS_SERIAL=1 OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/test-projects.mjs",
"test:stability:gateway": "OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.gateway.config.ts src/gateway/gateway-stability.test.ts && OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.logging.config.ts src/logging/diagnostic-stability-bundle.test.ts && OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.infra.config.ts src/infra/fatal-error-hooks.test.ts",
"test:startup:bench": "node --import tsx scripts/bench-cli-startup.ts",
"test:startup:bench:check": "node scripts/test-cli-startup-bench-budget.mjs",
"test:startup:bench:save": "node --import tsx scripts/bench-cli-startup.ts --preset all --runs 5 --warmup 1 --output .artifacts/cli-startup-bench-all.json",

View File

@@ -1,3 +1,6 @@
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import { Command } from "commander";
import { beforeEach, describe, expect, it, vi } from "vitest";
import { withEnvOverride } from "../config/test-helpers.js";
@@ -169,6 +172,135 @@ describe("gateway-cli coverage", () => {
expect(gatewayStatusCommand).toHaveBeenCalledTimes(1);
});
it("registers gateway stability and routes to diagnostics RPC", async () => {
callGateway.mockClear();
await runGatewayCommand([
"gateway",
"stability",
"--limit",
"5",
"--type",
"payload.large",
"--json",
]);
expect(callGateway).toHaveBeenCalledWith(
expect.objectContaining({
method: "diagnostics.stability",
params: {
limit: 5,
type: "payload.large",
},
}),
);
});
it("prints the latest stability bundle without calling Gateway", async () => {
callGateway.mockClear();
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-gateway-cli-bundle-"));
try {
const bundleDir = path.join(tempDir, "logs", "stability");
const bundlePath = path.join(
bundleDir,
"openclaw-stability-2026-04-22T12-00-00-000Z-123-test.json",
);
const bundle = {
version: 1,
generatedAt: "2026-04-22T12:00:00.000Z",
reason: "gateway.restart_startup_failed",
process: {
pid: 123,
platform: process.platform,
arch: process.arch,
node: process.versions.node,
uptimeMs: 2000,
},
host: { hostname: "test-host" },
snapshot: {
generatedAt: "2026-04-22T12:00:00.000Z",
capacity: 1000,
count: 1,
dropped: 0,
firstSeq: 1,
lastSeq: 1,
events: [
{
seq: 1,
ts: Date.parse("2026-04-22T12:00:00.000Z"),
type: "payload.large",
surface: "gateway.http.json",
action: "rejected",
bytes: 2048,
limitBytes: 1024,
},
],
summary: {
byType: { "payload.large": 1 },
payloadLarge: {
count: 1,
rejected: 1,
truncated: 0,
chunked: 0,
bySurface: { "gateway.http.json": 1 },
},
},
},
};
fs.mkdirSync(bundleDir, { recursive: true });
fs.writeFileSync(bundlePath, `${JSON.stringify(bundle, null, 2)}\n`, "utf8");
await withEnvOverride({ OPENCLAW_STATE_DIR: tempDir }, async () => {
await runGatewayCommand(["gateway", "stability", "--bundle", "latest"]);
});
const output = runtimeLogs.join("\n");
expect(callGateway).not.toHaveBeenCalled();
expect(output).toContain("Stability bundle");
expect(output).toContain("gateway.restart_startup_failed");
expect(output).toContain("payload.large");
expect(output).toContain("gateway.http.json");
} finally {
fs.rmSync(tempDir, { recursive: true, force: true });
}
});
it("writes gateway diagnostics export with a best-effort health snapshot", async () => {
callGateway.mockClear();
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-gateway-cli-support-"));
try {
const outputPath = path.join(tempDir, "diagnostics.zip");
await withEnvOverride(
{ OPENCLAW_STATE_DIR: tempDir, OPENCLAW_TEST_FILE_LOG: undefined },
async () => {
await runGatewayCommand([
"gateway",
"diagnostics",
"export",
"--output",
outputPath,
"--json",
]);
},
);
expect(callGateway).toHaveBeenCalledTimes(1);
expect(callGateway).toHaveBeenCalledWith(
expect.objectContaining({
method: "health",
timeoutMs: 3000,
}),
);
expect(fs.existsSync(outputPath)).toBe(true);
const output = runtimeLogs.join("\n");
expect(output).toContain('"path"');
expect(output).toContain("diagnostics.zip");
expect(output).toContain('"payloadFree": true');
} finally {
fs.rmSync(tempDir, { recursive: true, force: true });
}
});
it("registers gateway discover and prints json output", async () => {
discoverGatewayBeacons.mockClear();
discoverGatewayBeacons.mockResolvedValueOnce([

View File

@@ -1,6 +1,17 @@
import type { Command } from "commander";
import type { HealthSummary } from "../../commands/health.js";
import type { CostUsageSummary } from "../../infra/session-cost-usage.js";
import type {
DiagnosticStabilityBundle,
ReadDiagnosticStabilityBundleResult,
} from "../../logging/diagnostic-stability-bundle.js";
import {
normalizeDiagnosticStabilityQuery,
selectDiagnosticStabilitySnapshot,
type DiagnosticStabilityEventRecord,
type DiagnosticStabilitySnapshot,
} from "../../logging/diagnostic-stability.js";
import type { WriteDiagnosticSupportExportResult } from "../../logging/diagnostic-support-export.js";
import { defaultRuntime } from "../../runtime.js";
import { formatDocsLink } from "../../terminal/links.js";
import { colorize, isRich, theme } from "../../terminal/theme.js";
@@ -9,7 +20,7 @@ import { inheritOptionFromParent } from "../command-options.js";
import { addGatewayServiceCommands } from "../daemon-cli/register-service-commands.js";
import { formatHelpExamples } from "../help-format.js";
import { withProgress } from "../progress.js";
import { callGatewayCli, gatewayCallOpts } from "./call.js";
import { callGatewayCli, gatewayCallOpts, type GatewayRpcOpts } from "./call.js";
import type { GatewayDiscoverOpts } from "./discover.js";
import {
dedupeBeacons,
@@ -33,6 +44,15 @@ let bonjourDiscoveryModulePromise:
let wideAreaDnsModulePromise: Promise<typeof import("../../infra/widearea-dns.js")> | undefined;
let healthStyleModulePromise: Promise<typeof import("../../terminal/health-style.js")> | undefined;
let usageFormatModulePromise: Promise<typeof import("../../utils/usage-format.js")> | undefined;
let stabilityBundleModulePromise:
| Promise<typeof import("../../logging/diagnostic-stability-bundle.js")>
| undefined;
let supportExportModulePromise:
| Promise<typeof import("../../logging/diagnostic-support-export.js")>
| undefined;
let daemonStatusGatherModulePromise:
| Promise<typeof import("../daemon-cli/status.gather.js")>
| undefined;
function loadConfigModule() {
configModulePromise ??= import("../../config/read-best-effort-config.runtime.js");
@@ -69,6 +89,21 @@ function loadUsageFormatModule() {
return usageFormatModulePromise;
}
function loadStabilityBundleModule() {
stabilityBundleModulePromise ??= import("../../logging/diagnostic-stability-bundle.js");
return stabilityBundleModulePromise;
}
function loadSupportExportModule() {
supportExportModulePromise ??= import("../../logging/diagnostic-support-export.js");
return supportExportModulePromise;
}
function loadDaemonStatusGatherModule() {
daemonStatusGatherModulePromise ??= import("../daemon-cli/status.gather.js");
return daemonStatusGatherModulePromise;
}
function runGatewayCommand(action: () => Promise<void>, label?: string) {
return runCommandWithRuntime(defaultRuntime, action, (err) => {
const message = String(err);
@@ -134,6 +169,225 @@ async function renderCostUsageSummaryAsync(
return lines;
}
function formatBytes(value: number | undefined): string {
if (value === undefined) {
return "n/a";
}
const units = ["B", "KiB", "MiB", "GiB"];
let amount = value;
let unitIndex = 0;
while (amount >= 1024 && unitIndex < units.length - 1) {
amount /= 1024;
unitIndex += 1;
}
const digits = unitIndex === 0 || amount >= 100 ? 0 : 1;
return `${amount.toFixed(digits)} ${units[unitIndex]}`;
}
function formatStabilityEvent(record: DiagnosticStabilityEventRecord): string {
const parts = [
new Date(record.ts).toISOString(),
`#${record.seq}`,
record.type,
record.level ? `level=${record.level}` : "",
record.action ? `action=${record.action}` : "",
record.outcome ? `outcome=${record.outcome}` : "",
record.surface ? `surface=${record.surface}` : "",
record.channel ? `channel=${record.channel}` : "",
record.pluginId ? `plugin=${record.pluginId}` : "",
record.reason ? `reason=${record.reason}` : "",
record.bytes !== undefined ? `bytes=${formatBytes(record.bytes)}` : "",
record.limitBytes !== undefined ? `limit=${formatBytes(record.limitBytes)}` : "",
record.queueDepth !== undefined ? `queueDepth=${record.queueDepth}` : "",
record.queued !== undefined ? `queued=${record.queued}` : "",
record.memory ? `rss=${formatBytes(record.memory.rssBytes)}` : "",
record.memory ? `heap=${formatBytes(record.memory.heapUsedBytes)}` : "",
].filter(Boolean);
return parts.join(" ");
}
function renderStabilitySummary(snapshot: DiagnosticStabilitySnapshot, rich: boolean): string[] {
const lines = [
colorize(rich, theme.heading, "Gateway Stability"),
`${colorize(rich, theme.muted, "Events:")} ${snapshot.count}/${snapshot.capacity}${
snapshot.dropped > 0 ? ` · dropped=${snapshot.dropped}` : ""
}`,
];
const topTypes = Object.entries(snapshot.summary.byType)
.toSorted((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]))
.slice(0, 8)
.map(([type, count]) => `${type}=${count}`)
.join(", ");
if (topTypes) {
lines.push(`${colorize(rich, theme.muted, "Types:")} ${topTypes}`);
}
const memory = snapshot.summary.memory;
if (memory) {
lines.push(
`${colorize(rich, theme.muted, "Memory:")} rss=${formatBytes(
memory.latest?.rssBytes,
)} heap=${formatBytes(memory.latest?.heapUsedBytes)} maxRss=${formatBytes(
memory.maxRssBytes,
)} pressure=${memory.pressureCount}`,
);
}
const payloadLarge = snapshot.summary.payloadLarge;
if (payloadLarge) {
const surfaces = Object.entries(payloadLarge.bySurface)
.toSorted((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]))
.map(([surface, count]) => `${surface}=${count}`)
.join(", ");
lines.push(
`${colorize(rich, theme.muted, "Large payloads:")} total=${payloadLarge.count} rejected=${
payloadLarge.rejected
} truncated=${payloadLarge.truncated} chunked=${payloadLarge.chunked}${
surfaces ? ` · ${surfaces}` : ""
}`,
);
}
if (snapshot.events.length > 0) {
lines.push(colorize(rich, theme.muted, "Recent:"));
for (const event of snapshot.events) {
lines.push(` ${formatStabilityEvent(event)}`);
}
}
return lines;
}
function normalizeStabilityBundleTarget(raw: unknown): string | null {
if (raw === undefined || raw === false) {
return null;
}
if (raw === true) {
return "latest";
}
if (typeof raw !== "string") {
return "latest";
}
const value = raw.trim();
return value === "" ? "latest" : value;
}
function formatBundleError(result: ReadDiagnosticStabilityBundleResult): string {
if (result.status === "missing") {
return `No stability bundles found in ${result.dir}`;
}
if (result.status === "failed") {
return result.error instanceof Error ? result.error.message : String(result.error);
}
return "Unexpected stability bundle read result";
}
async function readStabilityBundleTarget(
bundleTarget: string,
): Promise<ReadDiagnosticStabilityBundleResult> {
const { readDiagnosticStabilityBundleFileSync, readLatestDiagnosticStabilityBundleSync } =
await loadStabilityBundleModule();
return bundleTarget === "latest"
? readLatestDiagnosticStabilityBundleSync()
: readDiagnosticStabilityBundleFileSync(bundleTarget);
}
function renderStabilityBundleSummary(params: {
bundle: DiagnosticStabilityBundle;
path: string;
snapshot: DiagnosticStabilitySnapshot;
rich: boolean;
}): string[] {
const { bundle, path, rich, snapshot } = params;
const processDetails = [
`pid=${bundle.process.pid}`,
`node=${bundle.process.node}`,
`${bundle.process.platform}/${bundle.process.arch}`,
`uptime=${Math.round(bundle.process.uptimeMs / 1000)}s`,
].join(" ");
const lines = [
colorize(rich, theme.heading, "Stability bundle"),
`${colorize(rich, theme.muted, "Path:")} ${path}`,
`${colorize(rich, theme.muted, "Generated:")} ${bundle.generatedAt}`,
`${colorize(rich, theme.muted, "Reason:")} ${bundle.reason}`,
`${colorize(rich, theme.muted, "Process:")} ${processDetails}`,
`${colorize(rich, theme.muted, "Host:")} ${bundle.host.hostname}`,
];
if (bundle.error) {
const errorParts = [
bundle.error.name ? `name=${bundle.error.name}` : "",
bundle.error.code ? `code=${bundle.error.code}` : "",
].filter(Boolean);
if (errorParts.length > 0) {
lines.push(`${colorize(rich, theme.muted, "Error:")} ${errorParts.join(" ")}`);
}
}
lines.push("", ...renderStabilitySummary(snapshot, rich));
return lines;
}
function renderSupportExportResult(
result: WriteDiagnosticSupportExportResult,
rich: boolean,
): string[] {
return [
colorize(rich, theme.heading, "Diagnostics export"),
`${colorize(rich, theme.muted, "Path:")} ${result.path}`,
`${colorize(rich, theme.muted, "Size:")} ${formatBytes(result.bytes)}`,
`${colorize(rich, theme.muted, "Files:")} ${result.manifest.contents.length}`,
`${colorize(rich, theme.muted, "Privacy:")} payload-free stability, sanitized logs/status/health/config`,
];
}
function resolveSupportExportRpcOptions(
rpc?: Pick<GatewayRpcOpts, "url" | "token" | "password" | "timeout">,
): GatewayRpcOpts {
return {
url: rpc?.url,
token: rpc?.token,
password: rpc?.password,
timeout: rpc?.timeout ?? "3000",
json: true,
};
}
async function writeSupportExportFromCli(opts: {
json?: boolean;
output?: string;
logLines?: string;
logBytes?: string;
stabilityBundle?: string | false;
rpc?: Pick<GatewayRpcOpts, "url" | "token" | "password" | "timeout">;
}): Promise<void> {
const { writeDiagnosticSupportExport } = await loadSupportExportModule();
const rpc = resolveSupportExportRpcOptions(opts.rpc);
const result = await writeDiagnosticSupportExport({
outputPath: opts.output,
logLimit: opts.logLines ? Number(opts.logLines) : undefined,
logMaxBytes: opts.logBytes ? Number(opts.logBytes) : undefined,
stabilityBundle: opts.stabilityBundle,
readStatusSnapshot: async () => {
const { gatherDaemonStatus } = await loadDaemonStatusGatherModule();
return await gatherDaemonStatus({
rpc,
probe: true,
requireRpc: false,
deep: false,
});
},
readHealthSnapshot: async () => await callGatewayCli("health", rpc),
});
if (opts.json) {
defaultRuntime.writeJson(result);
return;
}
const rich = isRich();
for (const line of renderSupportExportResult(result, rich)) {
defaultRuntime.log(line);
}
}
export function registerGatewayCli(program: Command) {
const gateway = addGatewayRunCommand(
program
@@ -146,6 +400,7 @@ export function registerGatewayCli(program: Command) {
["openclaw gateway run", "Run the gateway in the foreground."],
["openclaw gateway status", "Show service status plus connectivity/capability."],
["openclaw gateway discover", "Find local and wide-area gateway beacons."],
["openclaw gateway stability", "Show recent stability diagnostics."],
["openclaw gateway call health", "Call a gateway RPC method directly."],
])}\n\n${theme.muted("Docs:")} ${formatDocsLink("/cli/gateway", "docs.openclaw.ai/cli/gateway")}\n`,
),
@@ -238,6 +493,115 @@ export function registerGatewayCli(program: Command) {
}),
);
gatewayCallOpts(
gateway
.command("stability")
.description("Fetch payload-free Gateway stability diagnostics")
.option("--limit <limit>", "Maximum number of recent events", "25")
.option("--type <type>", "Filter by diagnostic event type")
.option("--since-seq <seq>", "Only include events after this sequence")
.option(
"--bundle [path]",
'Read a persisted stability bundle instead of calling Gateway; pass "latest" for newest',
)
.option("--export", "Write a shareable support diagnostics export", false)
.option("--output <path>", "Diagnostics export output .zip path")
.action(async (opts, command) => {
await runGatewayCommand(async () => {
const rpcOpts = resolveGatewayRpcOptions(opts, command);
const query = normalizeDiagnosticStabilityQuery(
{
limit: opts.limit,
sinceSeq: opts.sinceSeq,
type: opts.type,
},
{ defaultLimit: 25 },
);
const bundleTarget = normalizeStabilityBundleTarget(opts.bundle);
if (opts.export) {
await writeSupportExportFromCli({
json: rpcOpts.json,
output: opts.output,
stabilityBundle: bundleTarget ?? "latest",
rpc: rpcOpts,
});
return;
}
if (bundleTarget) {
const result = await readStabilityBundleTarget(bundleTarget);
if (result.status !== "found") {
throw new Error(formatBundleError(result));
}
const snapshot = selectDiagnosticStabilitySnapshot(result.bundle.snapshot, query);
if (rpcOpts.json) {
defaultRuntime.writeJson({
path: result.path,
mtimeMs: result.mtimeMs,
bundle: {
...result.bundle,
snapshot,
},
});
return;
}
const rich = isRich();
for (const line of renderStabilityBundleSummary({
bundle: result.bundle,
path: result.path,
rich,
snapshot,
})) {
defaultRuntime.log(line);
}
return;
}
const result = await callGatewayCli("diagnostics.stability", rpcOpts, {
limit: query.limit,
...(query.type ? { type: query.type } : {}),
...(query.sinceSeq !== undefined ? { sinceSeq: query.sinceSeq } : {}),
});
if (rpcOpts.json) {
defaultRuntime.writeJson(result);
return;
}
const rich = isRich();
for (const line of renderStabilitySummary(result as DiagnosticStabilitySnapshot, rich)) {
defaultRuntime.log(line);
}
}, "Gateway stability failed");
}),
);
const diagnostics = gateway
.command("diagnostics")
.description("Export local support diagnostics");
diagnostics
.command("export")
.description("Write a shareable, payload-free diagnostics .zip")
.option("--output <path>", "Output .zip path")
.option("--log-lines <count>", "Maximum sanitized log lines to include", "5000")
.option("--log-bytes <bytes>", "Maximum log bytes to inspect", "1000000")
.option("--url <url>", "Gateway WebSocket URL for health snapshot")
.option("--token <token>", "Gateway token for health snapshot")
.option("--password <password>", "Gateway password for health snapshot")
.option("--timeout <ms>", "Status/health snapshot timeout in ms", "3000")
.option("--no-stability-bundle", "Skip persisted stability bundle lookup")
.option("--json", "Output JSON", false)
.action(async (opts, command) => {
await runGatewayCommand(async () => {
const rpcOpts = resolveGatewayRpcOptions(opts, command);
await writeSupportExportFromCli({
json: opts.json,
output: opts.output,
logLines: opts.logLines,
logBytes: opts.logBytes,
stabilityBundle: opts.stabilityBundle === false ? false : "latest",
rpc: rpcOpts,
});
}, "Gateway diagnostics export failed");
});
gateway
.command("probe")
.description(

View File

@@ -15,6 +15,7 @@ import {
scheduleGatewaySigusr1Restart,
} from "../../infra/restart.js";
import { detectRespawnSupervisor } from "../../infra/supervisor-markers.js";
import { writeDiagnosticStabilityBundleForFailureSync } from "../../logging/diagnostic-stability-bundle.js";
import { createSubsystemLogger } from "../../logging/subsystem.js";
import {
getActiveTaskCount,
@@ -53,6 +54,12 @@ export async function runGatewayLoop(params: {
cleanupSignals();
params.runtime.exit(code);
};
const writeStabilityBundle = (reason: string, error?: unknown) => {
const result = writeDiagnosticStabilityBundleForFailureSync(reason, error);
if ("message" in result) {
gatewayLog.warn(result.message);
}
};
const releaseLockIfHeld = async (): Promise<boolean> => {
if (!lock) {
return false;
@@ -96,6 +103,7 @@ export async function runGatewayLoop(params: {
return;
}
if (respawn.mode === "failed") {
writeStabilityBundle("gateway.restart_respawn_failed");
gatewayLog.warn(
`full process restart failed (${respawn.detail ?? "unknown error"}); falling back to in-process restart`,
);
@@ -144,6 +152,9 @@ export async function runGatewayLoop(params: {
: SHUTDOWN_TIMEOUT_MS;
const forceExitTimer = setTimeout(() => {
gatewayLog.error("shutdown timed out; exiting without full cleanup");
writeStabilityBundle(
isRestart ? "gateway.restart_shutdown_timeout" : "gateway.stop_shutdown_timeout",
);
// Keep the in-process watchdog below the supervisor stop budget so this
// path wins before launchd/systemd escalates to a hard kill. Exit
// non-zero on any timeout so supervised installs restart cleanly.
@@ -278,6 +289,7 @@ export async function runGatewayLoop(params: {
await releaseLockIfHeld();
const errMsg = formatErrorMessage(err);
const errStack = err instanceof Error && err.stack ? `\n${err.stack}` : "";
writeStabilityBundle("gateway.restart_startup_failed", err);
gatewayLog.error(
`gateway startup failed: ${errMsg}. ` +
`Process will stay alive; fix the issue and restart.${errStack}`,

View File

@@ -34,6 +34,11 @@ const recoverConfigFromJsonRootSuffix = vi.fn<(snapshot?: unknown) => Promise<bo
const writeRestartSentinel = vi.fn<(payload?: unknown) => Promise<string>>(
async (_payload?: unknown) => "/tmp/restart-sentinel.json",
);
const writeDiagnosticStabilityBundleForFailureSync = vi.fn((_reason: string, _error: unknown) => ({
status: "written" as const,
message: "wrote stability bundle: /tmp/openclaw-stability.json",
path: "/tmp/openclaw-stability.json",
}));
const controlUiState = vi.hoisted(() => ({
root: "/tmp/openclaw-control-ui" as string | null,
}));
@@ -110,6 +115,11 @@ vi.mock("../../logging/console.js", () => ({
setConsoleTimestampPrefix: () => undefined,
}));
vi.mock("../../logging/diagnostic-stability-bundle.js", () => ({
writeDiagnosticStabilityBundleForFailureSync: (reason: string, error: unknown) =>
writeDiagnosticStabilityBundleForFailureSync(reason, error),
}));
vi.mock("../../logging/subsystem.js", () => ({
createSubsystemLogger: () => ({
info: (message: string) => {
@@ -167,6 +177,7 @@ describe("gateway run option collisions", () => {
recoverConfigFromJsonRootSuffix.mockResolvedValue(false);
writeRestartSentinel.mockReset();
writeRestartSentinel.mockResolvedValue("/tmp/restart-sentinel.json");
writeDiagnosticStabilityBundleForFailureSync.mockClear();
startGatewayServer.mockClear();
setGatewayWsLogStyle.mockClear();
setVerbose.mockClear();
@@ -253,6 +264,19 @@ describe("gateway run option collisions", () => {
);
});
it("does not write startup failure bundles for expected gateway lock conflicts", async () => {
const err = Object.assign(new Error("gateway already running on port 18789"), {
name: "GatewayLockError",
});
startGatewayServer.mockRejectedValueOnce(err);
await expect(runGatewayCli(["gateway", "run", "--allow-unconfigured"])).rejects.toThrow(
"__exit__:0",
);
expect(writeDiagnosticStabilityBundleForFailureSync).not.toHaveBeenCalled();
});
it("blocks startup when the observed snapshot loses gateway.mode even if loadConfig still says local", async () => {
configState.cfg = {
gateway: {

View File

@@ -33,6 +33,7 @@ import { writeRestartSentinel } from "../../infra/restart-sentinel.js";
import { cleanStaleGatewayProcessesSync } from "../../infra/restart-stale-pids.js";
import { detectRespawnSupervisor } from "../../infra/supervisor-markers.js";
import { setConsoleSubsystemFilter, setConsoleTimestampPrefix } from "../../logging/console.js";
import { writeDiagnosticStabilityBundleForFailureSync } from "../../logging/diagnostic-stability-bundle.js";
import { createSubsystemLogger } from "../../logging/subsystem.js";
import { defaultRuntime } from "../../runtime.js";
import {
@@ -347,6 +348,13 @@ function isHealthyGatewayLockError(err: unknown): boolean {
);
}
function maybeWriteGatewayStartupFailureBundle(err: unknown): void {
const result = writeDiagnosticStabilityBundleForFailureSync("gateway.startup_failed", err);
if ("message" in result) {
gatewayLog.warn(result.message);
}
}
async function runGatewayCommand(opts: GatewayRunOpts) {
const isDevProfile = normalizeOptionalLowercaseString(process.env.OPENCLAW_PROFILE) === "dev";
const devMode = Boolean(opts.dev) || isDevProfile;
@@ -697,6 +705,7 @@ async function runGatewayCommand(opts: GatewayRunOpts) {
defaultRuntime.exit(isHealthyGatewayLockError(err) ? 0 : 1);
return;
}
maybeWriteGatewayStartupFailureBundle(err);
defaultRuntime.error(`Gateway failed to start: ${String(err)}`);
defaultRuntime.exit(1);
}

View File

@@ -210,12 +210,17 @@ export async function runCli(argv: string[] = process.argv) {
// Capture all console output into structured logs while keeping stdout/stderr behavior.
enableConsoleCapture();
const [{ buildProgram }, { installUnhandledRejectionHandler }, { restoreTerminalState }] =
await Promise.all([
import("./program.js"),
import("../infra/unhandled-rejections.js"),
import("../terminal/restore.js"),
]);
const [
{ buildProgram },
{ runFatalErrorHooks },
{ installUnhandledRejectionHandler },
{ restoreTerminalState },
] = await Promise.all([
import("./program.js"),
import("../infra/fatal-error-hooks.js"),
import("../infra/unhandled-rejections.js"),
import("../terminal/restore.js"),
]);
const program = buildProgram();
// Global error handlers to prevent silent crashes from unhandled rejections/exceptions.
@@ -224,6 +229,9 @@ export async function runCli(argv: string[] = process.argv) {
process.on("uncaughtException", (error) => {
console.error("[openclaw] Uncaught exception:", formatUncaughtError(error));
for (const message of runFatalErrorHooks({ reason: "uncaught_exception", error })) {
console.error("[openclaw]", message);
}
restoreTerminalState("uncaught exception", { resumeStdinIfPaused: false });
process.exit(1);
});

View File

@@ -132,7 +132,7 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
type: "boolean",
title: "Diagnostics Enabled",
description:
"Master toggle for diagnostics instrumentation output in logs and telemetry wiring paths. Keep enabled for normal observability, and disable only in tightly constrained environments.",
"Master toggle for diagnostics instrumentation output in logs and telemetry wiring paths. Defaults to enabled; set false only in tightly constrained environments.",
},
flags: {
type: "array",
@@ -23285,7 +23285,7 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
},
"diagnostics.enabled": {
label: "Diagnostics Enabled",
help: "Master toggle for diagnostics instrumentation output in logs and telemetry wiring paths. Keep enabled for normal observability, and disable only in tightly constrained environments.",
help: "Master toggle for diagnostics instrumentation output in logs and telemetry wiring paths. Defaults to enabled; set false only in tightly constrained environments.",
tags: ["observability"],
},
"diagnostics.flags": {

View File

@@ -525,7 +525,7 @@ export const FIELD_HELP: Record<string, string> = {
"diagnostics.flags":
'Enable targeted diagnostics logs by flag (e.g. ["telegram.http"]). Supports wildcards like "telegram.*" or "*".',
"diagnostics.enabled":
"Master toggle for diagnostics instrumentation output in logs and telemetry wiring paths. Keep enabled for normal observability, and disable only in tightly constrained environments.",
"Master toggle for diagnostics instrumentation output in logs and telemetry wiring paths. Defaults to enabled; set false only in tightly constrained environments.",
"diagnostics.stuckSessionWarnMs":
"Age threshold in milliseconds for emitting stuck-session warnings while a session remains in processing state. Increase for long multi-tool turns to reduce false positives; decrease for faster hang detection.",
"diagnostics.otel.enabled":

View File

@@ -3,6 +3,11 @@ import type { IncomingMessage, ServerResponse } from "node:http";
import * as os from "node:os";
import * as path from "node:path";
import { beforeAll, beforeEach, describe, expect, it, test, vi } from "vitest";
import {
onDiagnosticEvent,
resetDiagnosticEventsForTest,
type DiagnosticEventPayload,
} from "../infra/diagnostic-events.js";
import { defaultVoiceWakeTriggers } from "../infra/voicewake.js";
import { handleControlUiHttpRequest } from "./control-ui.js";
import {
@@ -12,6 +17,7 @@ import {
import type { RequestFrame } from "./protocol/index.js";
import { createGatewayBroadcaster } from "./server-broadcast.js";
import { createChatRunRegistry } from "./server-chat.js";
import { MAX_BUFFERED_BYTES } from "./server-constants.js";
import { handleNodeInvokeResult } from "./server-methods/nodes.handlers.invoke-result.js";
import type { GatewayClient as GatewayMethodClient } from "./server-methods/types.js";
import type { GatewayRequestContext, RespondFn } from "./server-methods/types.js";
@@ -471,6 +477,42 @@ describe("gateway broadcaster", () => {
["heartbeat", 2],
]);
});
it("records a payload diagnostic when the outbound websocket buffer exceeds the limit", () => {
resetDiagnosticEventsForTest();
const events: DiagnosticEventPayload[] = [];
const stop = onDiagnosticEvent((event) => events.push(event));
try {
const slowReadSocket = makeRecordingSocket();
slowReadSocket.bufferedAmount = MAX_BUFFERED_BYTES + 1;
const clients = new Set<GatewayWsClient>([
makeGatewayWsClient("c-slow-read", slowReadSocket, {
role: "operator",
scopes: ["operator.read"],
} as GatewayWsClient["connect"]),
]);
const { broadcast } = createGatewayBroadcaster({ clients });
broadcast("chat", { sessionKey: "agent:main:main", message: "secret" }, { dropIfSlow: true });
broadcast("heartbeat", { ts: 1 });
expect(events).toContainEqual(
expect.objectContaining({
type: "payload.large",
surface: "gateway.ws.outbound_buffer",
action: "rejected",
bytes: MAX_BUFFERED_BYTES + 1,
limitBytes: MAX_BUFFERED_BYTES,
reason: "ws_send_buffer_drop",
}),
);
expect(events.filter((event) => event.type === "payload.large")).toHaveLength(1);
} finally {
stop();
resetDiagnosticEventsForTest();
}
});
});
describe("chat run registry", () => {

View File

@@ -0,0 +1,154 @@
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import {
emitDiagnosticEvent,
resetDiagnosticEventsForTest,
type DiagnosticMemoryUsage,
} from "../infra/diagnostic-events.js";
import {
getDiagnosticStabilitySnapshot,
resetDiagnosticStabilityRecorderForTest,
startDiagnosticStabilityRecorder,
stopDiagnosticStabilityRecorder,
} from "../logging/diagnostic-stability.js";
const MB = 1024 * 1024;
const SYNTHETIC_BATCH_COUNT = 200;
const SYNTHETIC_SESSION_COUNT = 8;
const STABILITY_REASON = "stability_probe";
function memoryUsageForBatch(index: number): DiagnosticMemoryUsage {
const rssBytes = 180 * MB + index * 64 * 1024;
const heapUsedBytes = 70 * MB + (index % 12) * 256 * 1024;
return {
rssBytes,
heapTotalBytes: 96 * MB,
heapUsedBytes,
externalBytes: 8 * MB,
arrayBuffersBytes: 2 * MB,
};
}
function emitSyntheticGatewayStabilityLoad(): number {
const startedAt = 1_800_000_000_000;
let maxRssBytes = 0;
for (let index = 0; index < SYNTHETIC_BATCH_COUNT; index += 1) {
const sessionIndex = index % SYNTHETIC_SESSION_COUNT;
const sessionKey = `agent:main:stability-${sessionIndex}`;
const sessionId = `session-${sessionIndex}`;
emitDiagnosticEvent({
type: "message.queued",
sessionKey,
sessionId,
channel: "gateway",
source: "stability-probe",
queueDepth: 1,
});
emitDiagnosticEvent({
type: "session.state",
sessionKey,
sessionId,
state: "processing",
reason: STABILITY_REASON,
queueDepth: 1,
});
const memoryUsage = memoryUsageForBatch(index);
maxRssBytes = Math.max(maxRssBytes, memoryUsage.rssBytes);
emitDiagnosticEvent({
type: "diagnostic.memory.sample",
memory: memoryUsage,
uptimeMs: startedAt + index * 1_000,
});
if (index % 5 === 0) {
emitDiagnosticEvent({
type: "payload.large",
surface: "gateway.stability.probe",
action: "chunked",
bytes: 3 * MB + index,
limitBytes: 2 * MB,
count: 2,
reason: STABILITY_REASON,
channel: "gateway",
});
}
emitDiagnosticEvent({
type: "session.state",
sessionKey,
sessionId,
state: "idle",
reason: STABILITY_REASON,
queueDepth: 0,
});
emitDiagnosticEvent({
type: "message.processed",
channel: "gateway",
sessionKey,
sessionId,
outcome: "completed",
durationMs: 5,
reason: STABILITY_REASON,
});
}
return maxRssBytes;
}
describe("gateway stability lane", () => {
beforeEach(() => {
resetDiagnosticEventsForTest();
resetDiagnosticStabilityRecorderForTest();
startDiagnosticStabilityRecorder();
});
afterEach(() => {
stopDiagnosticStabilityRecorder();
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticEventsForTest();
});
it("keeps diagnostics bounded and queues drained under synthetic gateway churn", () => {
const initial = getDiagnosticStabilitySnapshot({ limit: 1 });
expect(initial.capacity).toBe(1000);
const maxSyntheticRssBytes = emitSyntheticGatewayStabilityLoad();
const snapshot = getDiagnosticStabilitySnapshot({ limit: 1000 });
expect(snapshot.capacity).toBe(1000);
expect(snapshot.count).toBe(1000);
expect(snapshot.events).toHaveLength(1000);
expect(snapshot.dropped).toBeGreaterThan(0);
const firstSeq = snapshot.firstSeq ?? 0;
const lastSeq = snapshot.lastSeq ?? 0;
expect(firstSeq).toBeGreaterThan(1);
expect(lastSeq).toBeGreaterThan(firstSeq);
expect(snapshot.summary.byType["diagnostic.memory.sample"]).toBeGreaterThan(0);
expect(snapshot.summary.byType["message.queued"]).toBeGreaterThan(0);
expect(snapshot.summary.memory).toMatchObject({
maxRssBytes: maxSyntheticRssBytes,
pressureCount: 0,
});
expect(snapshot.summary.memory?.maxHeapUsedBytes).toBeLessThan(96 * MB);
expect(snapshot.summary.payloadLarge?.chunked).toBeGreaterThan(0);
expect(snapshot.summary.payloadLarge?.bySurface["gateway.stability.probe"]).toBeGreaterThan(0);
const sessionEvents = snapshot.events.filter((event) => event.type === "session.state");
expect(sessionEvents.length).toBeGreaterThan(0);
for (const event of sessionEvents) {
expect(event).not.toHaveProperty("sessionId");
expect(event).not.toHaveProperty("sessionKey");
}
expect(sessionEvents.some((event) => event.outcome === "idle" && event.queueDepth === 0)).toBe(
true,
);
expect(sessionEvents.every((event) => event.reason === STABILITY_REASON)).toBe(true);
stopDiagnosticStabilityRecorder();
emitDiagnosticEvent({
type: "payload.large",
surface: "gateway.stability.after-close",
action: "rejected",
});
expect(getDiagnosticStabilitySnapshot({ limit: 1 }).lastSeq).toBe(lastSeq);
});
});

View File

@@ -1,6 +1,11 @@
import { EventEmitter } from "node:events";
import type { IncomingMessage, ServerResponse } from "node:http";
import { beforeEach, describe, expect, it, vi } from "vitest";
import {
onDiagnosticEvent,
resetDiagnosticEventsForTest,
type DiagnosticEventPayload,
} from "../infra/diagnostic-events.js";
import type { GatewayAuthResult } from "./auth.js";
import {
readJsonBodyOrError,
@@ -26,6 +31,7 @@ vi.mock("./hooks.js", () => ({
beforeEach(() => {
readJsonBodyMock.mockReset();
resetDiagnosticEventsForTest();
});
describe("setDefaultSecurityHeaders", () => {
@@ -210,8 +216,12 @@ describe("readJsonBodyOrError", () => {
it("responds with 413 when the body is too large", async () => {
readJsonBodyMock.mockResolvedValueOnce({ ok: false, error: "payload too large" });
const events: DiagnosticEventPayload[] = [];
const stop = onDiagnosticEvent((event) => events.push(event));
const { res, end } = makeMockHttpResponse();
const result = await readJsonBodyOrError(makeRequest(), res, 1024);
const req = { headers: { "content-length": "2048" } } as IncomingMessage;
const result = await readJsonBodyOrError(req, res, 1024);
stop();
expect(result).toBeUndefined();
expect(res.statusCode).toBe(413);
expect(end).toHaveBeenCalledWith(
@@ -219,6 +229,16 @@ describe("readJsonBodyOrError", () => {
error: { message: "Payload too large", type: "invalid_request_error" },
}),
);
expect(events).toContainEqual(
expect.objectContaining({
type: "payload.large",
surface: "gateway.http.json",
action: "rejected",
bytes: 2048,
limitBytes: 1024,
reason: "json_body_limit",
}),
);
});
it("responds with 408 when the request body times out", async () => {

View File

@@ -1,4 +1,8 @@
import type { IncomingMessage, ServerResponse } from "node:http";
import {
logRejectedLargePayload,
parseContentLengthHeader,
} from "../logging/diagnostic-payload.js";
import type { GatewayAuthResult } from "./auth.js";
import { readJsonBody } from "./hooks.js";
@@ -78,6 +82,13 @@ export async function readJsonBodyOrError(
const body = await readJsonBody(req, maxBytes);
if (!body.ok) {
if (body.error === "payload too large") {
const contentLength = parseContentLengthHeader(req.headers?.["content-length"]);
logRejectedLargePayload({
surface: "gateway.http.json",
limitBytes: maxBytes,
reason: "json_body_limit",
...(contentLength !== undefined ? { bytes: contentLength } : {}),
});
sendJson(res, 413, {
error: { message: "Payload too large", type: "invalid_request_error" },
});

View File

@@ -32,6 +32,7 @@ describe("method scope resolution", () => {
["sessions.abort", ["operator.write"]],
["sessions.messages.subscribe", ["operator.read"]],
["sessions.messages.unsubscribe", ["operator.read"]],
["diagnostics.stability", ["operator.read"]],
["node.pair.approve", ["operator.pairing"]],
["poll", ["operator.write"]],
["config.patch", ["operator.admin"]],

View File

@@ -68,6 +68,7 @@ const METHOD_SCOPE_GROUPS: Record<OperatorScope, readonly string[]> = {
[READ_SCOPE]: [
"assistant.media.get",
"health",
"diagnostics.stability",
"doctor.memory.status",
"doctor.memory.dreamDiary",
"logs.tail",

View File

@@ -1,3 +1,4 @@
import { logRejectedLargePayload } from "../logging/diagnostic-payload.js";
import {
ADMIN_SCOPE,
APPROVALS_SCOPE,
@@ -91,6 +92,7 @@ function hasEventScope(client: GatewayWsClient, event: string): boolean {
export function createGatewayBroadcaster(params: { clients: Set<GatewayWsClient> }) {
const clientSeq = new WeakMap<GatewayWsClient, number>();
const reportedSlowPayloadClients = new WeakSet<GatewayWsClient>();
const broadcastInternal = (
event: string,
@@ -126,6 +128,17 @@ export function createGatewayBroadcaster(params: { clients: Set<GatewayWsClient>
}
const nextSeq = (clientSeq.get(c) ?? 0) + 1;
const slow = c.socket.bufferedAmount > MAX_BUFFERED_BYTES;
if (!slow) {
reportedSlowPayloadClients.delete(c);
} else if (!reportedSlowPayloadClients.has(c)) {
reportedSlowPayloadClients.add(c);
logRejectedLargePayload({
surface: "gateway.ws.outbound_buffer",
bytes: c.socket.bufferedAmount,
limitBytes: MAX_BUFFERED_BYTES,
reason: opts?.dropIfSlow ? "ws_send_buffer_drop" : "ws_send_buffer_close",
});
}
if (slow && opts?.dropIfSlow) {
if (!isTargeted) {
clientSeq.set(c, nextSeq);

View File

@@ -3,6 +3,7 @@ import { GATEWAY_EVENT_UPDATE_AVAILABLE } from "./events.js";
const BASE_METHODS = [
"health",
"diagnostics.stability",
"doctor.memory.status",
"doctor.memory.dreamDiary",
"doctor.memory.backfillDreamDiary",

View File

@@ -13,6 +13,7 @@ import { configHandlers } from "./server-methods/config.js";
import { connectHandlers } from "./server-methods/connect.js";
import { cronHandlers } from "./server-methods/cron.js";
import { deviceHandlers } from "./server-methods/devices.js";
import { diagnosticsHandlers } from "./server-methods/diagnostics.js";
import { doctorHandlers } from "./server-methods/doctor.js";
import { execApprovalsHandlers } from "./server-methods/exec-approvals.js";
import { healthHandlers } from "./server-methods/health.js";
@@ -77,6 +78,7 @@ export const coreGatewayHandlers: GatewayRequestHandlers = {
...commandsHandlers,
...cronHandlers,
...deviceHandlers,
...diagnosticsHandlers,
...doctorHandlers,
...execApprovalsHandlers,
...webHandlers,

View File

@@ -13,6 +13,7 @@ import type { MsgContext } from "../../auto-reply/templating.js";
import { extractCanvasFromText } from "../../chat/canvas-render.js";
import { resolveSessionFilePath } from "../../config/sessions.js";
import { jsonUtf8Bytes } from "../../infra/json-utf8-bytes.js";
import { logLargePayload } from "../../logging/diagnostic-payload.js";
import { getAgentScopedMediaLocalRoots } from "../../media/local-roots.js";
import { isAudioFileName } from "../../media/mime.js";
import type { PromptImageOrderEntry } from "../../media/prompt-image-order.js";
@@ -1669,6 +1670,14 @@ export const chatHandlers: GatewayRequestHandlers = {
const placeholderCount = replaced.replacedCount + bounded.placeholderCount;
if (placeholderCount > 0) {
chatHistoryPlaceholderEmitCount += placeholderCount;
logLargePayload({
surface: "gateway.chat.history",
action: "truncated",
bytes: jsonUtf8Bytes(normalized),
limitBytes: maxHistoryBytes,
count: placeholderCount,
reason: "chat_history_budget",
});
context.logGateway.debug(
`chat.history omitted oversized payloads placeholders=${placeholderCount} total=${chatHistoryPlaceholderEmitCount}`,
);

View File

@@ -0,0 +1,82 @@
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import {
emitDiagnosticEvent,
resetDiagnosticEventsForTest,
} from "../../infra/diagnostic-events.js";
import {
resetDiagnosticStabilityRecorderForTest,
startDiagnosticStabilityRecorder,
stopDiagnosticStabilityRecorder,
} from "../../logging/diagnostic-stability.js";
import { diagnosticsHandlers } from "./diagnostics.js";
describe("diagnostics gateway methods", () => {
beforeEach(() => {
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticEventsForTest();
startDiagnosticStabilityRecorder();
});
afterEach(() => {
stopDiagnosticStabilityRecorder();
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticEventsForTest();
});
it("returns a filtered stability snapshot", async () => {
emitDiagnosticEvent({ type: "webhook.received", channel: "telegram" });
emitDiagnosticEvent({
type: "payload.large",
surface: "gateway.http.json",
action: "rejected",
bytes: 1024,
limitBytes: 512,
});
const respond = vi.fn();
await diagnosticsHandlers["diagnostics.stability"]({
req: { type: "req", id: "1", method: "diagnostics.stability", params: {} },
params: { type: "payload.large", limit: 10 },
client: null,
isWebchatConnect: () => false,
context: {} as never,
respond,
});
expect(respond).toHaveBeenCalledWith(
true,
expect.objectContaining({
count: 1,
events: [
expect.objectContaining({
type: "payload.large",
surface: "gateway.http.json",
action: "rejected",
}),
],
}),
undefined,
);
});
it("rejects invalid stability params", async () => {
const respond = vi.fn();
await diagnosticsHandlers["diagnostics.stability"]({
req: { type: "req", id: "1", method: "diagnostics.stability", params: {} },
params: { limit: 0 },
client: null,
isWebchatConnect: () => false,
context: {} as never,
respond,
});
expect(respond).toHaveBeenCalledWith(
false,
undefined,
expect.objectContaining({
code: "INVALID_REQUEST",
message: "limit must be between 1 and 1000",
}),
);
});
});

View File

@@ -0,0 +1,24 @@
import {
getDiagnosticStabilitySnapshot,
normalizeDiagnosticStabilityQuery,
} from "../../logging/diagnostic-stability.js";
import { ErrorCodes, errorShape } from "../protocol/index.js";
import type { GatewayRequestHandlers } from "./types.js";
export const diagnosticsHandlers: GatewayRequestHandlers = {
"diagnostics.stability": async ({ params, respond }) => {
try {
const query = normalizeDiagnosticStabilityQuery(params);
respond(true, getDiagnosticStabilitySnapshot(query), undefined);
} catch (err) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
err instanceof Error ? err.message : "invalid diagnostics.stability params",
),
);
}
},
};

View File

@@ -19,7 +19,10 @@ import {
import { applyPluginAutoEnable } from "../config/plugin-auto-enable.js";
import { resolveMainSessionKey } from "../config/sessions.js";
import { clearAgentRunContext } from "../infra/agent-events.js";
import { isDiagnosticsEnabled } from "../infra/diagnostic-events.js";
import {
isDiagnosticsEnabled,
setDiagnosticsEnabledForProcess,
} from "../infra/diagnostic-events.js";
import { isTruthyEnvValue, isVitestRuntimeEnv, logAcceptedEnvOption } from "../infra/env.js";
import { ensureOpenClawCliOnPath } from "../infra/path-env.js";
import { setGatewaySigusr1RestartPolicy, setPreRestartDeferralCheck } from "../infra/restart.js";
@@ -303,6 +306,7 @@ export async function startGatewayServer(
}
}
const diagnosticsEnabled = isDiagnosticsEnabled(cfgAtStart);
setDiagnosticsEnabledForProcess(diagnosticsEnabled);
if (diagnosticsEnabled) {
startDiagnosticHeartbeat(undefined, { getConfig: getRuntimeConfig });
}

View File

@@ -1,6 +1,11 @@
import http from "node:http";
import { afterEach, describe, expect, it } from "vitest";
import { WebSocketServer } from "ws";
import {
onDiagnosticEvent,
resetDiagnosticEventsForTest,
type DiagnosticEventPayload,
} from "../infra/diagnostic-events.js";
import type { ResolvedGatewayAuth } from "./auth.js";
import { MAX_PREAUTH_PAYLOAD_BYTES } from "./server-constants.js";
import { attachGatewayUpgradeHandler, createGatewayHttpServer } from "./server-http.js";
@@ -147,6 +152,9 @@ describe("gateway pre-auth hardening", () => {
});
it("rejects oversized pre-auth connect frames before application-level auth responses", async () => {
resetDiagnosticEventsForTest();
const events: DiagnosticEventPayload[] = [];
const stopDiagnostics = onDiagnosticEvent((event) => events.push(event));
const harness = await createGatewaySuiteHarness();
try {
const ws = await harness.openWs();
@@ -176,7 +184,18 @@ describe("gateway pre-auth hardening", () => {
const result = await closed;
expect(result.code).toBe(1009);
expect(events).toContainEqual(
expect.objectContaining({
type: "payload.large",
surface: "gateway.ws.preauth",
action: "rejected",
limitBytes: MAX_PREAUTH_PAYLOAD_BYTES,
reason: "preauth_frame_limit",
}),
);
} finally {
stopDiagnostics();
resetDiagnosticEventsForTest();
await harness.close();
}
});

View File

@@ -4,6 +4,7 @@ import type { WebSocket, WebSocketServer } from "ws";
import { resolveCanvasHostUrl } from "../../infra/canvas-host-url.js";
import { removeRemoteNodeInfo } from "../../infra/skills-remote.js";
import { upsertPresence } from "../../infra/system-presence.js";
import { logRejectedLargePayload } from "../../logging/diagnostic-payload.js";
import type { createSubsystemLogger } from "../../logging/subsystem.js";
import { normalizeLowercaseStringOrEmpty } from "../../shared/string-coerce.js";
import { truncateUtf16Safe } from "../../utils.js";
@@ -12,6 +13,7 @@ import type { AuthRateLimiter } from "../auth-rate-limit.js";
import type { ResolvedGatewayAuth } from "../auth.js";
import { getPreauthHandshakeTimeoutMsFromEnv } from "../handshake-timeouts.js";
import { isLoopbackAddress } from "../net.js";
import { MAX_PAYLOAD_BYTES, MAX_PREAUTH_PAYLOAD_BYTES } from "../server-constants.js";
import { clearNodeWakeState } from "../server-methods/nodes.js";
import type { GatewayRequestContext, GatewayRequestHandlers } from "../server-methods/types.js";
import { formatError } from "../server-utils.js";
@@ -102,6 +104,18 @@ function resolveSocketAddress(socket: WebSocket): {
};
}
function isWsPayloadLimitError(err: unknown): boolean {
if (!err || typeof err !== "object") {
return false;
}
const code = (err as { code?: unknown }).code;
if (code === "WS_ERR_UNSUPPORTED_MESSAGE_LENGTH") {
return true;
}
const message = (err as { message?: unknown }).message;
return typeof message === "string" && /max payload size exceeded/i.test(message);
}
export type GatewayWsSharedHandlerParams = {
wss: WebSocketServer;
clients: Set<GatewayWsClient>;
@@ -266,6 +280,13 @@ export function attachGatewayWsConnectionHandler(params: AttachGatewayWsConnecti
};
socket.once("error", (err) => {
if (isWsPayloadLimitError(err)) {
logRejectedLargePayload({
surface: client ? "gateway.ws.frame" : "gateway.ws.preauth",
limitBytes: client ? MAX_PAYLOAD_BYTES : MAX_PREAUTH_PAYLOAD_BYTES,
reason: client ? "ws_frame_limit" : "preauth_frame_limit",
});
}
logWsControl.warn(`error conn=${connId} remote=${remoteAddr ?? "?"}: ${formatError(err)}`);
close();
});

View File

@@ -35,6 +35,7 @@ import { recordRemoteNodeInfo, refreshRemoteNodeBins } from "../../../infra/skil
import { upsertPresence } from "../../../infra/system-presence.js";
import { loadVoiceWakeConfig } from "../../../infra/voicewake.js";
import { rawDataToString } from "../../../infra/ws.js";
import { logRejectedLargePayload } from "../../../logging/diagnostic-payload.js";
import type { createSubsystemLogger } from "../../../logging/subsystem.js";
import {
resolveBootstrapProfileScopesForRole,
@@ -316,6 +317,12 @@ export function attachGatewayWsMessageHandler(params: {
const preauthPayloadBytes = !getClient() ? getRawDataByteLength(data) : undefined;
if (preauthPayloadBytes !== undefined && preauthPayloadBytes > MAX_PREAUTH_PAYLOAD_BYTES) {
logRejectedLargePayload({
surface: "gateway.ws.preauth",
bytes: preauthPayloadBytes,
limitBytes: MAX_PREAUTH_PAYLOAD_BYTES,
reason: "preauth_frame_limit",
});
setHandshakeState("failed");
setCloseCause("preauth-payload-too-large", {
payloadBytes: preauthPayloadBytes,

View File

@@ -2,6 +2,7 @@
import process from "node:process";
import { fileURLToPath } from "node:url";
import { formatUncaughtError } from "./infra/errors.js";
import { runFatalErrorHooks } from "./infra/fatal-error-hooks.js";
import { isMainModule } from "./infra/is-main.js";
import { installUnhandledRejectionHandler } from "./infra/unhandled-rejections.js";
@@ -86,12 +87,18 @@ if (isMain) {
process.on("uncaughtException", (error) => {
console.error("[openclaw] Uncaught exception:", formatUncaughtError(error));
for (const message of runFatalErrorHooks({ reason: "uncaught_exception", error })) {
console.error("[openclaw]", message);
}
restoreTerminalState("uncaught exception", { resumeStdinIfPaused: false });
process.exit(1);
});
void runLegacyCliEntry(process.argv).catch((err) => {
console.error("[openclaw] CLI failed:", formatUncaughtError(err));
for (const message of runFatalErrorHooks({ reason: "legacy_cli_failure", error: err })) {
console.error("[openclaw]", message);
}
restoreTerminalState("legacy cli failure", { resumeStdinIfPaused: false });
process.exit(1);
});

View File

@@ -4,6 +4,7 @@ import {
isDiagnosticsEnabled,
onDiagnosticEvent,
resetDiagnosticEventsForTest,
setDiagnosticsEnabledForProcess,
} from "./diagnostic-events.js";
describe("diagnostic-events", () => {
@@ -87,6 +88,23 @@ describe("diagnostic-events", () => {
expect(seen).toEqual(["webhook.received"]);
});
it("skips event enrichment and subscribers when diagnostics are disabled", () => {
const nowSpy = vi.spyOn(Date, "now");
const seen: string[] = [];
onDiagnosticEvent((event) => {
seen.push(event.type);
});
setDiagnosticsEnabledForProcess(false);
emitDiagnosticEvent({
type: "webhook.received",
channel: "telegram",
});
expect(seen).toEqual([]);
expect(nowSpy).not.toHaveBeenCalled();
});
it("drops recursive emissions after the guard threshold", () => {
const errorSpy = vi.spyOn(console, "error").mockImplementation(() => {});
let calls = 0;
@@ -113,8 +131,10 @@ describe("diagnostic-events", () => {
);
});
it("requires an explicit true diagnostics flag", () => {
expect(isDiagnosticsEnabled()).toBe(false);
it("enables diagnostics unless explicitly disabled", () => {
expect(isDiagnosticsEnabled()).toBe(true);
expect(isDiagnosticsEnabled({} as never)).toBe(true);
expect(isDiagnosticsEnabled({ diagnostics: {} } as never)).toBe(true);
expect(isDiagnosticsEnabled({ diagnostics: { enabled: false } } as never)).toBe(false);
expect(isDiagnosticsEnabled({ diagnostics: { enabled: true } } as never)).toBe(true);
});

View File

@@ -152,6 +152,42 @@ export type DiagnosticToolLoopEvent = DiagnosticBaseEvent & {
pairedToolName?: string;
};
export type DiagnosticMemoryUsage = {
rssBytes: number;
heapTotalBytes: number;
heapUsedBytes: number;
externalBytes: number;
arrayBuffersBytes: number;
};
export type DiagnosticMemorySampleEvent = DiagnosticBaseEvent & {
type: "diagnostic.memory.sample";
memory: DiagnosticMemoryUsage;
uptimeMs?: number;
};
export type DiagnosticMemoryPressureEvent = DiagnosticBaseEvent & {
type: "diagnostic.memory.pressure";
level: "warning" | "critical";
reason: "rss_threshold" | "heap_threshold" | "rss_growth";
memory: DiagnosticMemoryUsage;
thresholdBytes?: number;
rssGrowthBytes?: number;
windowMs?: number;
};
export type DiagnosticPayloadLargeEvent = DiagnosticBaseEvent & {
type: "payload.large";
surface: string;
action: "rejected" | "truncated" | "chunked";
bytes?: number;
limitBytes?: number;
count?: number;
channel?: string;
pluginId?: string;
reason?: string;
};
export type DiagnosticEventPayload =
| DiagnosticUsageEvent
| DiagnosticWebhookReceivedEvent
@@ -165,7 +201,10 @@ export type DiagnosticEventPayload =
| DiagnosticLaneDequeueEvent
| DiagnosticRunAttemptEvent
| DiagnosticHeartbeatEvent
| DiagnosticToolLoopEvent;
| DiagnosticToolLoopEvent
| DiagnosticMemorySampleEvent
| DiagnosticMemoryPressureEvent
| DiagnosticPayloadLargeEvent;
export type DiagnosticEventInput = DiagnosticEventPayload extends infer Event
? Event extends DiagnosticEventPayload
@@ -174,6 +213,7 @@ export type DiagnosticEventInput = DiagnosticEventPayload extends infer Event
: never;
type DiagnosticEventsGlobalState = {
enabled: boolean;
seq: number;
listeners: Set<(evt: DiagnosticEventPayload) => void>;
dispatchDepth: number;
@@ -185,6 +225,7 @@ function getDiagnosticEventsState(): DiagnosticEventsGlobalState {
};
if (!globalStore.__openclawDiagnosticEventsState) {
globalStore.__openclawDiagnosticEventsState = {
enabled: true,
seq: 0,
listeners: new Set<(evt: DiagnosticEventPayload) => void>(),
dispatchDepth: 0,
@@ -194,11 +235,22 @@ function getDiagnosticEventsState(): DiagnosticEventsGlobalState {
}
export function isDiagnosticsEnabled(config?: OpenClawConfig): boolean {
return config?.diagnostics?.enabled === true;
return config?.diagnostics?.enabled !== false;
}
export function setDiagnosticsEnabledForProcess(enabled: boolean): void {
getDiagnosticEventsState().enabled = enabled;
}
export function areDiagnosticsEnabledForProcess(): boolean {
return getDiagnosticEventsState().enabled;
}
export function emitDiagnosticEvent(event: DiagnosticEventInput) {
const state = getDiagnosticEventsState();
if (!state.enabled) {
return;
}
if (state.dispatchDepth > 100) {
console.error(
`[diagnostic-events] recursion guard tripped at depth=${state.dispatchDepth}, dropping type=${event.type}`,
@@ -241,6 +293,7 @@ export function onDiagnosticEvent(listener: (evt: DiagnosticEventPayload) => voi
export function resetDiagnosticEventsForTest(): void {
const state = getDiagnosticEventsState();
state.enabled = true;
state.seq = 0;
state.listeners.clear();
state.dispatchDepth = 0;

View File

@@ -0,0 +1,33 @@
import { beforeEach, describe, expect, it } from "vitest";
import {
registerFatalErrorHook,
resetFatalErrorHooksForTest,
runFatalErrorHooks,
} from "./fatal-error-hooks.js";
describe("fatal error hooks", () => {
beforeEach(() => {
resetFatalErrorHooksForTest();
});
it("collects non-empty hook messages", () => {
registerFatalErrorHook(() => "first");
registerFatalErrorHook(() => " ");
registerFatalErrorHook(() => "second");
expect(runFatalErrorHooks({ reason: "uncaught_exception" })).toEqual(["first", "second"]);
});
it("does not expose hook failure message or stack text", () => {
registerFatalErrorHook(() => {
throw new Error("raw secret from hook");
});
const messages = runFatalErrorHooks({ reason: "uncaught_exception" });
const output = messages.join("\n");
expect(messages).toEqual(["fatal-error hook failed: Error"]);
expect(output).not.toContain("raw secret");
expect(output).not.toContain("at ");
});
});

View File

@@ -0,0 +1,39 @@
export type FatalErrorHookContext = {
reason: string;
error?: unknown;
};
export type FatalErrorHook = (context: FatalErrorHookContext) => string | undefined | void;
const hooks = new Set<FatalErrorHook>();
function formatHookFailure(error: unknown): string {
const name = error instanceof Error && error.name ? error.name : "unknown";
return `fatal-error hook failed: ${name}`;
}
export function registerFatalErrorHook(hook: FatalErrorHook): () => void {
hooks.add(hook);
return () => {
hooks.delete(hook);
};
}
export function runFatalErrorHooks(context: FatalErrorHookContext): string[] {
const messages: string[] = [];
for (const hook of hooks) {
try {
const message = hook(context);
if (typeof message === "string" && message.trim()) {
messages.push(message);
}
} catch (err) {
messages.push(formatHookFailure(err));
}
}
return messages;
}
export function resetFatalErrorHooksForTest(): void {
hooks.clear();
}

View File

@@ -7,6 +7,7 @@ vi.mock("../terminal/restore.js", () => ({
restoreTerminalState: restoreTerminalStateMock,
}));
import { resetFatalErrorHooksForTest } from "./fatal-error-hooks.js";
import { installUnhandledRejectionHandler } from "./unhandled-rejections.js";
describe("installUnhandledRejectionHandler - fatal detection", () => {
@@ -22,6 +23,7 @@ describe("installUnhandledRejectionHandler - fatal detection", () => {
beforeEach(() => {
exitCalls = [];
resetFatalErrorHooksForTest();
vi.spyOn(process, "exit").mockImplementation((code?: string | number | null): never => {
if (code !== undefined && code !== null) {

View File

@@ -7,6 +7,7 @@ import {
formatUncaughtError,
readErrorName,
} from "./errors.js";
import { runFatalErrorHooks } from "./fatal-error-hooks.js";
type UnhandledRejectionHandler = (reason: unknown) => boolean;
@@ -337,7 +338,10 @@ export function isUnhandledRejectionHandled(reason: unknown): boolean {
}
export function installUnhandledRejectionHandler(): void {
const exitWithTerminalRestore = (reason: string) => {
const exitWithTerminalRestore = (reason: string, error?: unknown, hookReason = reason) => {
for (const message of runFatalErrorHooks({ reason: hookReason, error })) {
console.error("[openclaw]", message);
}
restoreTerminalState(reason, { resumeStdinIfPaused: false });
process.exit(1);
};
@@ -356,13 +360,13 @@ export function installUnhandledRejectionHandler(): void {
if (isFatalError(reason)) {
console.error("[openclaw] FATAL unhandled rejection:", formatUncaughtError(reason));
exitWithTerminalRestore("fatal unhandled rejection");
exitWithTerminalRestore("fatal unhandled rejection", reason, "fatal_unhandled_rejection");
return;
}
if (isConfigError(reason)) {
console.error("[openclaw] CONFIGURATION ERROR - requires fix:", formatUncaughtError(reason));
exitWithTerminalRestore("configuration error");
exitWithTerminalRestore("configuration error", reason, "configuration_error");
return;
}
@@ -375,6 +379,6 @@ export function installUnhandledRejectionHandler(): void {
}
console.error("[openclaw] Unhandled promise rejection:", formatUncaughtError(reason));
exitWithTerminalRestore("unhandled rejection");
exitWithTerminalRestore("unhandled rejection", reason, "unhandled_rejection");
});
}

View File

@@ -0,0 +1,154 @@
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import {
onDiagnosticEvent,
resetDiagnosticEventsForTest,
type DiagnosticEventPayload,
} from "../infra/diagnostic-events.js";
import { emitDiagnosticMemorySample, resetDiagnosticMemoryForTest } from "./diagnostic-memory.js";
function memoryUsage(overrides: Partial<NodeJS.MemoryUsage>): NodeJS.MemoryUsage {
return {
rss: 100,
heapTotal: 80,
heapUsed: 40,
external: 10,
arrayBuffers: 5,
...overrides,
};
}
describe("diagnostic memory", () => {
beforeEach(() => {
resetDiagnosticEventsForTest();
resetDiagnosticMemoryForTest();
});
afterEach(() => {
resetDiagnosticEventsForTest();
resetDiagnosticMemoryForTest();
});
it("emits memory samples with byte counts", () => {
const events: DiagnosticEventPayload[] = [];
const stop = onDiagnosticEvent((event) => events.push(event));
emitDiagnosticMemorySample({
now: 1000,
uptimeMs: 123,
memoryUsage: memoryUsage({ rss: 4096, heapUsed: 1024 }),
});
stop();
expect(events).toMatchObject([
{
type: "diagnostic.memory.sample",
uptimeMs: 123,
memory: {
rssBytes: 4096,
heapUsedBytes: 1024,
},
},
]);
});
it("emits pressure when RSS crosses a threshold", () => {
const events: DiagnosticEventPayload[] = [];
const stop = onDiagnosticEvent((event) => events.push(event));
emitDiagnosticMemorySample({
now: 1000,
memoryUsage: memoryUsage({ rss: 2000 }),
thresholds: {
rssWarningBytes: 1000,
rssCriticalBytes: 3000,
pressureRepeatMs: 60_000,
},
});
stop();
expect(events).toContainEqual(
expect.objectContaining({
type: "diagnostic.memory.pressure",
level: "warning",
reason: "rss_threshold",
thresholdBytes: 1000,
}),
);
});
it("can check pressure without recording an idle memory sample", () => {
const events: DiagnosticEventPayload[] = [];
const stop = onDiagnosticEvent((event) => events.push(event));
emitDiagnosticMemorySample({
now: 1000,
emitSample: false,
memoryUsage: memoryUsage({ rss: 2000 }),
thresholds: {
rssWarningBytes: 1000,
rssCriticalBytes: 3000,
pressureRepeatMs: 60_000,
},
});
stop();
expect(events.map((event) => event.type)).toEqual(["diagnostic.memory.pressure"]);
});
it("emits pressure when RSS grows quickly", () => {
const events: DiagnosticEventPayload[] = [];
const stop = onDiagnosticEvent((event) => events.push(event));
emitDiagnosticMemorySample({
now: 1000,
memoryUsage: memoryUsage({ rss: 1000 }),
thresholds: {
rssWarningBytes: 10_000,
heapUsedWarningBytes: 10_000,
rssGrowthWarningBytes: 500,
growthWindowMs: 10_000,
},
});
emitDiagnosticMemorySample({
now: 2000,
memoryUsage: memoryUsage({ rss: 1700 }),
thresholds: {
rssWarningBytes: 10_000,
heapUsedWarningBytes: 10_000,
rssGrowthWarningBytes: 500,
growthWindowMs: 10_000,
},
});
stop();
expect(events).toContainEqual(
expect.objectContaining({
type: "diagnostic.memory.pressure",
level: "warning",
reason: "rss_growth",
rssGrowthBytes: 700,
windowMs: 1000,
}),
);
});
it("throttles repeated pressure events by reason and level", () => {
const events: DiagnosticEventPayload[] = [];
const stop = onDiagnosticEvent((event) => events.push(event));
for (const now of [1000, 2000]) {
emitDiagnosticMemorySample({
now,
memoryUsage: memoryUsage({ rss: 2000 }),
thresholds: {
rssWarningBytes: 1000,
rssCriticalBytes: 3000,
pressureRepeatMs: 60_000,
},
});
}
stop();
expect(events.filter((event) => event.type === "diagnostic.memory.pressure")).toHaveLength(1);
});
});

View File

@@ -0,0 +1,196 @@
import {
emitDiagnosticEvent,
type DiagnosticMemoryPressureEvent,
type DiagnosticMemoryUsage,
} from "../infra/diagnostic-events.js";
const MB = 1024 * 1024;
const DEFAULT_RSS_WARNING_BYTES = 1536 * MB;
const DEFAULT_RSS_CRITICAL_BYTES = 3072 * MB;
const DEFAULT_HEAP_WARNING_BYTES = 1024 * MB;
const DEFAULT_HEAP_CRITICAL_BYTES = 2048 * MB;
const DEFAULT_RSS_GROWTH_WARNING_BYTES = 512 * MB;
const DEFAULT_RSS_GROWTH_CRITICAL_BYTES = 1024 * MB;
const DEFAULT_GROWTH_WINDOW_MS = 10 * 60 * 1000;
const DEFAULT_PRESSURE_REPEAT_MS = 5 * 60 * 1000;
export type DiagnosticMemoryThresholds = {
rssWarningBytes?: number;
rssCriticalBytes?: number;
heapUsedWarningBytes?: number;
heapUsedCriticalBytes?: number;
rssGrowthWarningBytes?: number;
rssGrowthCriticalBytes?: number;
growthWindowMs?: number;
pressureRepeatMs?: number;
};
type DiagnosticMemorySample = {
ts: number;
memory: DiagnosticMemoryUsage;
};
type DiagnosticMemoryState = {
lastSample: DiagnosticMemorySample | null;
lastPressureAtByKey: Map<string, number>;
};
const state: DiagnosticMemoryState = {
lastSample: null,
lastPressureAtByKey: new Map(),
};
function normalizeMemoryUsage(memory: NodeJS.MemoryUsage): DiagnosticMemoryUsage {
return {
rssBytes: memory.rss,
heapTotalBytes: memory.heapTotal,
heapUsedBytes: memory.heapUsed,
externalBytes: memory.external,
arrayBuffersBytes: memory.arrayBuffers,
};
}
function resolveThresholds(
thresholds?: DiagnosticMemoryThresholds,
): Required<DiagnosticMemoryThresholds> {
return {
rssWarningBytes: thresholds?.rssWarningBytes ?? DEFAULT_RSS_WARNING_BYTES,
rssCriticalBytes: thresholds?.rssCriticalBytes ?? DEFAULT_RSS_CRITICAL_BYTES,
heapUsedWarningBytes: thresholds?.heapUsedWarningBytes ?? DEFAULT_HEAP_WARNING_BYTES,
heapUsedCriticalBytes: thresholds?.heapUsedCriticalBytes ?? DEFAULT_HEAP_CRITICAL_BYTES,
rssGrowthWarningBytes: thresholds?.rssGrowthWarningBytes ?? DEFAULT_RSS_GROWTH_WARNING_BYTES,
rssGrowthCriticalBytes: thresholds?.rssGrowthCriticalBytes ?? DEFAULT_RSS_GROWTH_CRITICAL_BYTES,
growthWindowMs: thresholds?.growthWindowMs ?? DEFAULT_GROWTH_WINDOW_MS,
pressureRepeatMs: thresholds?.pressureRepeatMs ?? DEFAULT_PRESSURE_REPEAT_MS,
};
}
function pickThresholdPressure(params: {
memory: DiagnosticMemoryUsage;
thresholds: Required<DiagnosticMemoryThresholds>;
}): Omit<DiagnosticMemoryPressureEvent, "seq" | "ts" | "type"> | null {
const { memory, thresholds } = params;
if (memory.rssBytes >= thresholds.rssCriticalBytes) {
return {
level: "critical",
reason: "rss_threshold",
memory,
thresholdBytes: thresholds.rssCriticalBytes,
};
}
if (memory.heapUsedBytes >= thresholds.heapUsedCriticalBytes) {
return {
level: "critical",
reason: "heap_threshold",
memory,
thresholdBytes: thresholds.heapUsedCriticalBytes,
};
}
if (memory.rssBytes >= thresholds.rssWarningBytes) {
return {
level: "warning",
reason: "rss_threshold",
memory,
thresholdBytes: thresholds.rssWarningBytes,
};
}
if (memory.heapUsedBytes >= thresholds.heapUsedWarningBytes) {
return {
level: "warning",
reason: "heap_threshold",
memory,
thresholdBytes: thresholds.heapUsedWarningBytes,
};
}
return null;
}
function pickGrowthPressure(params: {
previous: DiagnosticMemorySample | null;
current: DiagnosticMemorySample;
thresholds: Required<DiagnosticMemoryThresholds>;
}): Omit<DiagnosticMemoryPressureEvent, "seq" | "ts" | "type"> | null {
const { previous, current, thresholds } = params;
if (!previous) {
return null;
}
const windowMs = current.ts - previous.ts;
if (windowMs <= 0 || windowMs > thresholds.growthWindowMs) {
return null;
}
const rssGrowthBytes = current.memory.rssBytes - previous.memory.rssBytes;
if (rssGrowthBytes >= thresholds.rssGrowthCriticalBytes) {
return {
level: "critical",
reason: "rss_growth",
memory: current.memory,
thresholdBytes: thresholds.rssGrowthCriticalBytes,
rssGrowthBytes,
windowMs,
};
}
if (rssGrowthBytes >= thresholds.rssGrowthWarningBytes) {
return {
level: "warning",
reason: "rss_growth",
memory: current.memory,
thresholdBytes: thresholds.rssGrowthWarningBytes,
rssGrowthBytes,
windowMs,
};
}
return null;
}
function shouldEmitPressure(
pressure: Omit<DiagnosticMemoryPressureEvent, "seq" | "ts" | "type">,
now: number,
repeatMs: number,
): boolean {
const key = `${pressure.level}:${pressure.reason}`;
const lastAt = state.lastPressureAtByKey.get(key);
if (lastAt !== undefined && now - lastAt < repeatMs) {
return false;
}
state.lastPressureAtByKey.set(key, now);
return true;
}
export function emitDiagnosticMemorySample(options?: {
now?: number;
memoryUsage?: NodeJS.MemoryUsage;
uptimeMs?: number;
thresholds?: DiagnosticMemoryThresholds;
emitSample?: boolean;
}): DiagnosticMemoryUsage {
const now = options?.now ?? Date.now();
const memory = normalizeMemoryUsage(options?.memoryUsage ?? process.memoryUsage());
const current = { ts: now, memory };
const thresholds = resolveThresholds(options?.thresholds);
const shouldEmitSample = options?.emitSample !== false;
if (shouldEmitSample) {
emitDiagnosticEvent({
type: "diagnostic.memory.sample",
memory,
uptimeMs: options?.uptimeMs ?? Math.round(process.uptime() * 1000),
});
}
const pressure =
pickThresholdPressure({ memory, thresholds }) ??
pickGrowthPressure({ previous: state.lastSample, current, thresholds });
state.lastSample = current;
if (pressure && shouldEmitPressure(pressure, now, thresholds.pressureRepeatMs)) {
emitDiagnosticEvent({
type: "diagnostic.memory.pressure",
...pressure,
});
}
return memory;
}
export function resetDiagnosticMemoryForTest(): void {
state.lastSample = null;
state.lastPressureAtByKey.clear();
}

View File

@@ -0,0 +1,42 @@
import { emitDiagnosticEvent } from "../infra/diagnostic-events.js";
type LargePayloadBase = {
surface: string;
bytes?: number;
limitBytes?: number;
count?: number;
channel?: string;
pluginId?: string;
reason?: string;
};
export function logLargePayload(
params: LargePayloadBase & {
action: "rejected" | "truncated" | "chunked";
},
): void {
emitDiagnosticEvent({
type: "payload.large",
...params,
});
}
export function logRejectedLargePayload(params: LargePayloadBase): void {
logLargePayload({
action: "rejected",
...params,
});
}
export function parseContentLengthHeader(raw: string | string[] | undefined): number | undefined {
const value = Array.isArray(raw) ? raw[0] : raw;
if (typeof value !== "string") {
return undefined;
}
const trimmed = value.trim();
if (trimmed.length === 0 || !/^\d+$/.test(trimmed)) {
return undefined;
}
const parsed = Number.parseInt(trimmed, 10);
return Number.isSafeInteger(parsed) && parsed >= 0 ? parsed : undefined;
}

View File

@@ -1,4 +1,7 @@
import { emitDiagnosticEvent } from "../infra/diagnostic-events.js";
import {
areDiagnosticsEnabledForProcess,
emitDiagnosticEvent,
} from "../infra/diagnostic-events.js";
import { createSubsystemLogger } from "./subsystem.js";
const diag = createSubsystemLogger("diagnostic");
@@ -19,6 +22,9 @@ export function resetDiagnosticActivityForTest(): void {
}
export function logLaneEnqueue(lane: string, queueSize: number): void {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
diag.debug(`lane enqueue: lane=${lane} queueSize=${queueSize}`);
emitDiagnosticEvent({
type: "queue.lane.enqueue",
@@ -29,6 +35,9 @@ export function logLaneEnqueue(lane: string, queueSize: number): void {
}
export function logLaneDequeue(lane: string, waitMs: number, queueSize: number): void {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
diag.debug(`lane dequeue: lane=${lane} waitMs=${waitMs} queueSize=${queueSize}`);
emitDiagnosticEvent({
type: "queue.lane.dequeue",

View File

@@ -0,0 +1,305 @@
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import { emitDiagnosticEvent, resetDiagnosticEventsForTest } from "../infra/diagnostic-events.js";
import { resetFatalErrorHooksForTest, runFatalErrorHooks } from "../infra/fatal-error-hooks.js";
import {
installDiagnosticStabilityFatalHook,
MAX_DIAGNOSTIC_STABILITY_BUNDLE_BYTES,
readDiagnosticStabilityBundleFileSync,
readLatestDiagnosticStabilityBundleSync,
resetDiagnosticStabilityBundleForTest,
writeDiagnosticStabilityBundleForFailureSync,
writeDiagnosticStabilityBundleSync,
type DiagnosticStabilityBundle,
} from "./diagnostic-stability-bundle.js";
import {
resetDiagnosticStabilityRecorderForTest,
startDiagnosticStabilityRecorder,
stopDiagnosticStabilityRecorder,
} from "./diagnostic-stability.js";
describe("diagnostic stability bundles", () => {
let tempDir: string;
function resetStabilityBundleTestState(): void {
resetDiagnosticEventsForTest();
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticStabilityBundleForTest();
resetFatalErrorHooksForTest();
}
beforeEach(() => {
tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-stability-bundle-"));
resetStabilityBundleTestState();
});
afterEach(() => {
stopDiagnosticStabilityRecorder();
resetStabilityBundleTestState();
fs.rmSync(tempDir, { recursive: true, force: true });
});
function readBundle(file: string): DiagnosticStabilityBundle {
return JSON.parse(fs.readFileSync(file, "utf8")) as DiagnosticStabilityBundle;
}
it("writes a payload-free bundle with safe failure metadata", () => {
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({
type: "webhook.error",
channel: "telegram",
chatId: "chat-secret",
error: "raw diagnostic error with message body",
});
emitDiagnosticEvent({
type: "payload.large",
surface: "gateway.http.json",
action: "rejected",
bytes: 2048,
limitBytes: 1024,
reason: "json_body_limit",
});
const error = Object.assign(new Error("contains secret message"), { code: "ERR_TEST" });
const result = writeDiagnosticStabilityBundleSync({
reason: "gateway.restart_startup_failed",
error,
stateDir: tempDir,
now: new Date("2026-04-22T12:00:00.000Z"),
});
expect(result.status).toBe("written");
const file = result.status === "written" ? result.path : "";
const bundle = readBundle(file);
const raw = fs.readFileSync(file, "utf8");
expect(bundle).toMatchObject({
version: 1,
generatedAt: "2026-04-22T12:00:00.000Z",
reason: "gateway.restart_startup_failed",
error: {
name: "Error",
code: "ERR_TEST",
},
host: {
hostname: "<redacted-hostname>",
},
snapshot: {
count: 2,
},
});
expect(bundle.snapshot.events[0]).toMatchObject({
type: "webhook.error",
channel: "telegram",
});
expect(bundle.snapshot.events[0]).not.toHaveProperty("chatId");
expect(bundle.snapshot.events[0]).not.toHaveProperty("error");
expect(raw).not.toContain("chat-secret");
expect(raw).not.toContain("message body");
expect(raw).not.toContain("contains secret message");
expect(raw).not.toContain(os.hostname());
});
it("skips empty recorder snapshots by default", () => {
const result = writeDiagnosticStabilityBundleSync({
reason: "uncaught_exception",
stateDir: tempDir,
});
expect(result).toEqual({ status: "skipped", reason: "empty" });
expect(fs.existsSync(path.join(tempDir, "logs", "stability"))).toBe(false);
});
it("writes failure bundles even when the recorder snapshot is empty", () => {
const result = writeDiagnosticStabilityBundleForFailureSync(
"gateway.restart_startup_failed",
Object.assign(new Error("raw startup config payload"), { code: "ERR_CONFIG_PARSE" }),
{
stateDir: tempDir,
now: new Date("2026-04-22T12:00:00.000Z"),
},
);
if (result.status !== "written") {
throw new Error(`expected written bundle, got ${result.status}`);
}
const bundle = readBundle(result.path);
const raw = fs.readFileSync(result.path, "utf8");
expect(bundle).toMatchObject({
reason: "gateway.restart_startup_failed",
error: {
name: "Error",
code: "ERR_CONFIG_PARSE",
},
snapshot: {
count: 0,
events: [],
},
});
expect(raw).not.toContain("raw startup config payload");
});
it("registers a fatal hook only while installed", () => {
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({ type: "webhook.received", channel: "telegram" });
installDiagnosticStabilityFatalHook({ stateDir: tempDir });
const messages = runFatalErrorHooks({
reason: "fatal_unhandled_rejection",
error: Object.assign(new Error("raw text"), { code: "ERR_OUT_OF_MEMORY" }),
});
expect(messages).toHaveLength(1);
expect(messages[0]).toContain("wrote stability bundle:");
expect(messages[0]).toContain(tempDir);
resetDiagnosticStabilityBundleForTest();
expect(runFatalErrorHooks({ reason: "uncaught_exception" })).toEqual([]);
});
it("retains only the newest bundle files", () => {
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({ type: "webhook.received", channel: "telegram" });
for (let index = 0; index < 4; index += 1) {
const result = writeDiagnosticStabilityBundleSync({
reason: "gateway.restart_respawn_failed",
stateDir: tempDir,
now: new Date(`2026-04-22T12:00:0${index}.000Z`),
retention: 2,
});
expect(result.status).toBe("written");
}
const bundleDir = path.join(tempDir, "logs", "stability");
const files = fs.readdirSync(bundleDir).toSorted();
expect(files).toHaveLength(2);
expect(files[0]).toContain("12-00-02");
expect(files[1]).toContain("12-00-03");
});
it("reads the newest retained bundle", () => {
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({ type: "webhook.received", channel: "telegram" });
const older = writeDiagnosticStabilityBundleSync({
reason: "gateway.restart_startup_failed",
stateDir: tempDir,
now: new Date("2026-04-22T12:00:00.000Z"),
});
const newer = writeDiagnosticStabilityBundleSync({
reason: "gateway.restart_respawn_failed",
stateDir: tempDir,
now: new Date("2026-04-22T12:00:01.000Z"),
});
expect(older.status).toBe("written");
expect(newer.status).toBe("written");
const latest = readLatestDiagnosticStabilityBundleSync({ stateDir: tempDir });
expect(latest.status).toBe("found");
expect(latest.status === "found" ? latest.path : "").toContain("12-00-01");
expect(latest.status === "found" ? latest.bundle.reason : "").toBe(
"gateway.restart_respawn_failed",
);
});
it("rejects malformed bundle files", () => {
const file = path.join(tempDir, "invalid.json");
fs.writeFileSync(file, "{}\n", "utf8");
const result = readDiagnosticStabilityBundleFileSync(file);
expect(result.status).toBe("failed");
expect(result.status === "failed" ? String(result.error) : "").toContain(
"Unsupported stability bundle version",
);
});
it("rejects oversized bundle files before reading them", () => {
const file = path.join(tempDir, "oversized.json");
fs.closeSync(fs.openSync(file, "w"));
fs.truncateSync(file, MAX_DIAGNOSTIC_STABILITY_BUNDLE_BYTES + 1);
const result = readDiagnosticStabilityBundleFileSync(file);
expect(result.status).toBe("failed");
expect(result.status === "failed" ? String(result.error) : "").toContain(
"Stability bundle is too large",
);
});
it("rejects malformed bundle snapshots before returning them", () => {
const baseBundle = {
version: 1,
generatedAt: "2026-04-22T12:00:00.000Z",
reason: "gateway.restart_startup_failed",
process: {
pid: 123,
platform: "darwin",
arch: "arm64",
node: "24.14.1",
uptimeMs: 1000,
},
host: {
hostname: "<redacted-hostname>",
},
snapshot: {
generatedAt: "2026-04-22T12:00:00.000Z",
capacity: 1000,
count: 1,
dropped: 0,
events: [{ seq: 1, ts: 1, type: "webhook.received" }],
summary: { byType: { "webhook.received": 1 } },
},
};
const cases = [
{
name: "malformed-event",
bundle: {
...baseBundle,
snapshot: {
...baseBundle.snapshot,
events: [{ type: "webhook.received", ts: 1 }],
},
},
error: "snapshot.events[0].seq",
},
{
name: "out-of-range-event-timestamp",
bundle: {
...baseBundle,
snapshot: {
...baseBundle.snapshot,
events: [{ seq: 1, ts: 9e15, type: "webhook.received" }],
},
},
error: "snapshot.events[0].ts",
},
{
name: "null-summary",
bundle: {
...baseBundle,
snapshot: {
...baseBundle.snapshot,
summary: null,
},
},
error: "snapshot.summary",
},
];
for (const testCase of cases) {
const file = path.join(tempDir, `${testCase.name}.json`);
fs.writeFileSync(file, `${JSON.stringify(testCase.bundle, null, 2)}\n`, "utf8");
const result = readDiagnosticStabilityBundleFileSync(file);
expect(result.status).toBe("failed");
expect(result.status === "failed" ? String(result.error) : "").toContain(testCase.error);
}
});
});

View File

@@ -0,0 +1,421 @@
import fs from "node:fs";
import path from "node:path";
import process from "node:process";
import { resolveStateDir } from "../config/paths.js";
import { registerFatalErrorHook } from "../infra/fatal-error-hooks.js";
import {
getDiagnosticStabilitySnapshot,
MAX_DIAGNOSTIC_STABILITY_LIMIT,
type DiagnosticStabilitySnapshot,
} from "./diagnostic-stability.js";
export const DIAGNOSTIC_STABILITY_BUNDLE_VERSION = 1;
export const DEFAULT_DIAGNOSTIC_STABILITY_BUNDLE_LIMIT = MAX_DIAGNOSTIC_STABILITY_LIMIT;
export const DEFAULT_DIAGNOSTIC_STABILITY_BUNDLE_RETENTION = 20;
export const MAX_DIAGNOSTIC_STABILITY_BUNDLE_BYTES = 5 * 1024 * 1024;
const SAFE_REASON_CODE = /^[A-Za-z0-9_.:-]{1,120}$/u;
const BUNDLE_PREFIX = "openclaw-stability-";
const BUNDLE_SUFFIX = ".json";
const REDACTED_HOSTNAME = "<redacted-hostname>";
export type DiagnosticStabilityBundle = {
version: typeof DIAGNOSTIC_STABILITY_BUNDLE_VERSION;
generatedAt: string;
reason: string;
process: {
pid: number;
platform: NodeJS.Platform;
arch: string;
node: string;
uptimeMs: number;
};
host: {
hostname: string;
};
error?: {
name?: string;
code?: string;
};
snapshot: DiagnosticStabilitySnapshot;
};
export type WriteDiagnosticStabilityBundleResult =
| { status: "written"; path: string; bundle: DiagnosticStabilityBundle }
| { status: "skipped"; reason: "empty" }
| { status: "failed"; error: unknown };
export type WriteDiagnosticStabilityBundleOptions = {
reason: string;
error?: unknown;
includeEmpty?: boolean;
limit?: number;
now?: Date;
env?: NodeJS.ProcessEnv;
stateDir?: string;
retention?: number;
};
export type DiagnosticStabilityBundleLocationOptions = {
env?: NodeJS.ProcessEnv;
stateDir?: string;
};
export type DiagnosticStabilityBundleFile = {
path: string;
mtimeMs: number;
};
export type ReadDiagnosticStabilityBundleResult =
| { status: "found"; path: string; mtimeMs: number; bundle: DiagnosticStabilityBundle }
| { status: "missing"; dir: string }
| { status: "failed"; path?: string; error: unknown };
export type DiagnosticStabilityBundleFailureWriteOutcome =
| { status: "written"; message: string; path: string }
| { status: "failed"; message: string; error: unknown }
| { status: "skipped"; reason: "empty" };
export type WriteDiagnosticStabilityBundleForFailureOptions = Omit<
WriteDiagnosticStabilityBundleOptions,
"error" | "includeEmpty" | "reason"
>;
let fatalHookUnsubscribe: (() => void) | null = null;
function normalizeReason(reason: string): string {
return SAFE_REASON_CODE.test(reason) ? reason : "unknown";
}
function formatBundleTimestamp(now: Date): string {
return now.toISOString().replace(/[:.]/g, "-");
}
function readErrorCode(error: unknown): string | undefined {
if (!error || typeof error !== "object" || !("code" in error)) {
return undefined;
}
const code = (error as { code?: unknown }).code;
if (typeof code === "string" && SAFE_REASON_CODE.test(code)) {
return code;
}
if (typeof code === "number" && Number.isFinite(code)) {
return String(code);
}
return undefined;
}
function readErrorName(error: unknown): string | undefined {
if (!error || typeof error !== "object" || !("name" in error)) {
return undefined;
}
const name = (error as { name?: unknown }).name;
return typeof name === "string" && SAFE_REASON_CODE.test(name) ? name : undefined;
}
function readSafeErrorMetadata(error: unknown): DiagnosticStabilityBundle["error"] | undefined {
const name = readErrorName(error);
const code = readErrorCode(error);
if (!name && !code) {
return undefined;
}
return {
...(name ? { name } : {}),
...(code ? { code } : {}),
};
}
export function resolveDiagnosticStabilityBundleDir(
options: DiagnosticStabilityBundleLocationOptions = {},
): string {
return path.join(
options.stateDir ?? resolveStateDir(options.env ?? process.env),
"logs",
"stability",
);
}
function buildBundlePath(dir: string, now: Date, reason: string): string {
return path.join(
dir,
`${BUNDLE_PREFIX}${formatBundleTimestamp(now)}-${process.pid}-${normalizeReason(reason)}${BUNDLE_SUFFIX}`,
);
}
function isBundleFile(name: string): boolean {
return name.startsWith(BUNDLE_PREFIX) && name.endsWith(BUNDLE_SUFFIX);
}
function isMissingFileError(error: unknown): boolean {
return (
typeof error === "object" &&
error !== null &&
"code" in error &&
(error as { code?: unknown }).code === "ENOENT"
);
}
function readObject(value: unknown, label: string): Record<string, unknown> {
if (!value || typeof value !== "object" || Array.isArray(value)) {
throw new Error(`Invalid stability bundle: ${label} must be an object`);
}
return value as Record<string, unknown>;
}
function readNumber(value: unknown, label: string): number {
if (typeof value !== "number" || !Number.isFinite(value)) {
throw new Error(`Invalid stability bundle: ${label} must be a finite number`);
}
return value;
}
function readTimestampMs(value: unknown, label: string): number {
const timestamp = readNumber(value, label);
if (Number.isNaN(new Date(timestamp).getTime())) {
throw new Error(`Invalid stability bundle: ${label} must be a valid timestamp`);
}
return timestamp;
}
function readOptionalNumber(value: unknown, label: string): number | undefined {
if (value === undefined) {
return undefined;
}
return readNumber(value, label);
}
function readString(value: unknown, label: string): string {
if (typeof value !== "string") {
throw new Error(`Invalid stability bundle: ${label} must be a string`);
}
return value;
}
function readStabilitySnapshot(value: unknown): DiagnosticStabilitySnapshot {
const snapshot = readObject(value, "snapshot");
readString(snapshot.generatedAt, "snapshot.generatedAt");
readNumber(snapshot.capacity, "snapshot.capacity");
readNumber(snapshot.count, "snapshot.count");
readNumber(snapshot.dropped, "snapshot.dropped");
readOptionalNumber(snapshot.firstSeq, "snapshot.firstSeq");
readOptionalNumber(snapshot.lastSeq, "snapshot.lastSeq");
if (!Array.isArray(snapshot.events)) {
throw new Error("Invalid stability bundle: snapshot.events must be an array");
}
for (const [index, event] of snapshot.events.entries()) {
const record = readObject(event, `snapshot.events[${index}]`);
readNumber(record.seq, `snapshot.events[${index}].seq`);
readTimestampMs(record.ts, `snapshot.events[${index}].ts`);
readString(record.type, `snapshot.events[${index}].type`);
}
const summary = readObject(snapshot.summary, "snapshot.summary");
readObject(summary.byType, "snapshot.summary.byType");
return snapshot as DiagnosticStabilitySnapshot;
}
function parseDiagnosticStabilityBundle(value: unknown): DiagnosticStabilityBundle {
const bundle = readObject(value, "bundle");
if (bundle.version !== DIAGNOSTIC_STABILITY_BUNDLE_VERSION) {
throw new Error(`Unsupported stability bundle version: ${String(bundle.version)}`);
}
if (typeof bundle.generatedAt !== "string" || typeof bundle.reason !== "string") {
throw new Error("Invalid stability bundle: missing generatedAt or reason");
}
readObject(bundle.process, "process");
readObject(bundle.host, "host");
readStabilitySnapshot(bundle.snapshot);
return bundle as DiagnosticStabilityBundle;
}
export function listDiagnosticStabilityBundleFilesSync(
options: DiagnosticStabilityBundleLocationOptions = {},
): DiagnosticStabilityBundleFile[] {
const dir = resolveDiagnosticStabilityBundleDir(options);
try {
return fs
.readdirSync(dir, { withFileTypes: true })
.filter((entry) => entry.isFile() && isBundleFile(entry.name))
.map((entry) => {
const file = path.join(dir, entry.name);
return {
path: file,
mtimeMs: fs.statSync(file).mtimeMs,
};
})
.toSorted((a, b) => b.mtimeMs - a.mtimeMs || b.path.localeCompare(a.path));
} catch (error) {
if (isMissingFileError(error)) {
return [];
}
throw error;
}
}
export function readDiagnosticStabilityBundleFileSync(
file: string,
): ReadDiagnosticStabilityBundleResult {
try {
const stat = fs.statSync(file);
if (stat.size > MAX_DIAGNOSTIC_STABILITY_BUNDLE_BYTES) {
throw new Error(
`Stability bundle is too large: ${stat.size} bytes exceeds ${MAX_DIAGNOSTIC_STABILITY_BUNDLE_BYTES}`,
);
}
const raw = fs.readFileSync(file, "utf8");
const bundle = parseDiagnosticStabilityBundle(JSON.parse(raw));
return {
status: "found",
path: file,
mtimeMs: stat.mtimeMs,
bundle,
};
} catch (error) {
return { status: "failed", path: file, error };
}
}
export function readLatestDiagnosticStabilityBundleSync(
options: DiagnosticStabilityBundleLocationOptions = {},
): ReadDiagnosticStabilityBundleResult {
try {
const latest = listDiagnosticStabilityBundleFilesSync(options)[0];
if (!latest) {
return {
status: "missing",
dir: resolveDiagnosticStabilityBundleDir(options),
};
}
return readDiagnosticStabilityBundleFileSync(latest.path);
} catch (error) {
return { status: "failed", error };
}
}
function pruneOldBundles(dir: string, retention: number): void {
if (!Number.isFinite(retention) || retention < 1) {
return;
}
try {
const entries = fs
.readdirSync(dir, { withFileTypes: true })
.filter((entry) => entry.isFile() && isBundleFile(entry.name))
.map((entry) => {
const file = path.join(dir, entry.name);
let mtimeMs = 0;
try {
mtimeMs = fs.statSync(file).mtimeMs;
} catch {
// Missing files are ignored below.
}
return { file, mtimeMs };
})
.toSorted((a, b) => b.mtimeMs - a.mtimeMs || b.file.localeCompare(a.file));
for (const entry of entries.slice(retention)) {
try {
fs.unlinkSync(entry.file);
} catch {
// Retention cleanup must not block failure handling.
}
}
} catch {
// Retention cleanup must not block failure handling.
}
}
export function writeDiagnosticStabilityBundleSync(
options: WriteDiagnosticStabilityBundleOptions,
): WriteDiagnosticStabilityBundleResult {
try {
const now = options.now ?? new Date();
const snapshot = getDiagnosticStabilitySnapshot({
limit: options.limit ?? DEFAULT_DIAGNOSTIC_STABILITY_BUNDLE_LIMIT,
});
if (!options.includeEmpty && snapshot.count === 0) {
return { status: "skipped", reason: "empty" };
}
const reason = normalizeReason(options.reason);
const error = options.error ? readSafeErrorMetadata(options.error) : undefined;
const bundle: DiagnosticStabilityBundle = {
version: DIAGNOSTIC_STABILITY_BUNDLE_VERSION,
generatedAt: now.toISOString(),
reason,
process: {
pid: process.pid,
platform: process.platform,
arch: process.arch,
node: process.versions.node,
uptimeMs: Math.round(process.uptime() * 1000),
},
host: {
hostname: REDACTED_HOSTNAME,
},
...(error ? { error } : {}),
snapshot,
};
const dir = resolveDiagnosticStabilityBundleDir(options);
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
const file = buildBundlePath(dir, now, reason);
const tmpFile = `${file}.${process.pid}.tmp`;
fs.writeFileSync(tmpFile, `${JSON.stringify(bundle, null, 2)}\n`, {
encoding: "utf8",
mode: 0o600,
});
fs.renameSync(tmpFile, file);
pruneOldBundles(dir, options.retention ?? DEFAULT_DIAGNOSTIC_STABILITY_BUNDLE_RETENTION);
return { status: "written", path: file, bundle };
} catch (error) {
return { status: "failed", error };
}
}
export function writeDiagnosticStabilityBundleForFailureSync(
reason: string,
error?: unknown,
options: WriteDiagnosticStabilityBundleForFailureOptions = {},
): DiagnosticStabilityBundleFailureWriteOutcome {
const result = writeDiagnosticStabilityBundleSync({
...options,
reason,
error,
includeEmpty: true,
});
if (result.status === "written") {
return {
status: "written",
path: result.path,
message: `wrote stability bundle: ${result.path}`,
};
}
if (result.status === "failed") {
return {
status: "failed",
error: result.error,
message: `failed to write stability bundle: ${String(result.error)}`,
};
}
return result;
}
export function installDiagnosticStabilityFatalHook(
options: WriteDiagnosticStabilityBundleForFailureOptions = {},
): void {
if (fatalHookUnsubscribe) {
return;
}
fatalHookUnsubscribe = registerFatalErrorHook(({ reason, error }) => {
const result = writeDiagnosticStabilityBundleForFailureSync(reason, error, options);
return "message" in result ? result.message : undefined;
});
}
export function uninstallDiagnosticStabilityFatalHook(): void {
fatalHookUnsubscribe?.();
fatalHookUnsubscribe = null;
}
export function resetDiagnosticStabilityBundleForTest(): void {
uninstallDiagnosticStabilityFatalHook();
}

View File

@@ -0,0 +1,273 @@
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import { emitDiagnosticEvent, resetDiagnosticEventsForTest } from "../infra/diagnostic-events.js";
import {
getDiagnosticStabilitySnapshot,
normalizeDiagnosticStabilityQuery,
resetDiagnosticStabilityRecorderForTest,
selectDiagnosticStabilitySnapshot,
startDiagnosticStabilityRecorder,
stopDiagnosticStabilityRecorder,
type DiagnosticStabilitySnapshot,
} from "./diagnostic-stability.js";
describe("diagnostic stability recorder", () => {
beforeEach(() => {
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticEventsForTest();
});
afterEach(() => {
stopDiagnosticStabilityRecorder();
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticEventsForTest();
});
it("records a bounded payload-free projection of diagnostic events", () => {
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({
type: "webhook.error",
channel: "telegram",
chatId: "chat-secret",
error: "raw upstream error with content",
});
emitDiagnosticEvent({
type: "tool.loop",
sessionId: "session-1",
toolName: "poll",
level: "warning",
action: "warn",
detector: "known_poll_no_progress",
count: 3,
message: "message that should not be stored",
});
const snapshot = getDiagnosticStabilitySnapshot({ limit: 10 });
expect(snapshot.count).toBe(2);
expect(snapshot.summary.byType).toMatchObject({
"webhook.error": 1,
"tool.loop": 1,
});
expect(snapshot.events[0]).toMatchObject({
type: "webhook.error",
channel: "telegram",
});
expect(snapshot.events[0]).not.toHaveProperty("error");
expect(snapshot.events[0]).not.toHaveProperty("chatId");
expect(snapshot.events[1]).toMatchObject({
type: "tool.loop",
toolName: "poll",
level: "warning",
action: "warn",
detector: "known_poll_no_progress",
count: 3,
});
expect(snapshot.events[1]).not.toHaveProperty("message");
expect(snapshot.events[1]).not.toHaveProperty("sessionId");
expect(snapshot.events[1]).not.toHaveProperty("sessionKey");
});
it("keeps stable reason codes but drops free-form reason text", () => {
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({
type: "payload.large",
surface: "gateway.http.json",
action: "rejected",
reason: "json_body_limit",
});
emitDiagnosticEvent({
type: "message.processed",
channel: "telegram",
outcome: "error",
reason: "raw error with user content",
});
const snapshot = getDiagnosticStabilitySnapshot({ limit: 10 });
expect(snapshot.events[0]).toMatchObject({
type: "payload.large",
reason: "json_body_limit",
});
expect(snapshot.events[1]).toMatchObject({
type: "message.processed",
outcome: "error",
});
expect(snapshot.events[1]).not.toHaveProperty("reason");
});
it("summarizes memory and large payload events", () => {
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({
type: "diagnostic.memory.sample",
memory: {
rssBytes: 100,
heapTotalBytes: 80,
heapUsedBytes: 40,
externalBytes: 10,
arrayBuffersBytes: 5,
},
});
emitDiagnosticEvent({
type: "diagnostic.memory.pressure",
level: "warning",
reason: "rss_threshold",
thresholdBytes: 90,
memory: {
rssBytes: 120,
heapTotalBytes: 90,
heapUsedBytes: 50,
externalBytes: 10,
arrayBuffersBytes: 5,
},
});
emitDiagnosticEvent({
type: "payload.large",
surface: "gateway.http.json",
action: "rejected",
bytes: 1024,
limitBytes: 512,
reason: "content-length",
});
const snapshot = getDiagnosticStabilitySnapshot();
expect(snapshot.summary.memory).toMatchObject({
latest: {
rssBytes: 120,
heapUsedBytes: 50,
},
maxRssBytes: 120,
maxHeapUsedBytes: 50,
pressureCount: 1,
});
expect(snapshot.summary.payloadLarge).toEqual({
count: 1,
rejected: 1,
truncated: 0,
chunked: 0,
bySurface: {
"gateway.http.json": 1,
},
});
});
it("keeps the newest events when capacity is exceeded", () => {
startDiagnosticStabilityRecorder();
for (let index = 0; index < 1005; index += 1) {
emitDiagnosticEvent({
type: "message.queued",
source: "test",
queueDepth: index,
});
}
const snapshot = getDiagnosticStabilitySnapshot({ limit: 1000 });
expect(snapshot.capacity).toBe(1000);
expect(snapshot.count).toBe(1000);
expect(snapshot.dropped).toBe(5);
expect(snapshot.firstSeq).toBe(6);
expect(snapshot.lastSeq).toBe(1005);
expect(snapshot.events[0]).toMatchObject({ seq: 6, queueDepth: 5 });
});
it("filters snapshots by type, sequence, and limit", () => {
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({ type: "webhook.received", channel: "telegram" });
emitDiagnosticEvent({ type: "payload.large", surface: "chat.history", action: "truncated" });
emitDiagnosticEvent({ type: "payload.large", surface: "chat.history", action: "chunked" });
const snapshot = getDiagnosticStabilitySnapshot({
type: "payload.large",
sinceSeq: 2,
limit: 1,
});
expect(snapshot.count).toBe(1);
expect(snapshot.events).toMatchObject([
{
seq: 3,
type: "payload.large",
action: "chunked",
},
]);
});
it("applies query filters to persisted snapshots without mutating the source", () => {
const snapshot: DiagnosticStabilitySnapshot = {
generatedAt: "2026-04-22T12:00:00.000Z",
capacity: 1000,
count: 3,
dropped: 0,
firstSeq: 1,
lastSeq: 3,
events: [
{ seq: 1, ts: 1, type: "webhook.received" },
{ seq: 2, ts: 2, type: "payload.large", surface: "chat.history", action: "rejected" },
{ seq: 3, ts: 3, type: "payload.large", surface: "chat.history", action: "chunked" },
],
summary: {
byType: {
"webhook.received": 1,
"payload.large": 2,
},
},
};
const selected = selectDiagnosticStabilitySnapshot(snapshot, {
type: "payload.large",
limit: 1,
});
expect(selected).toMatchObject({
count: 2,
firstSeq: 2,
lastSeq: 3,
events: [{ seq: 3, type: "payload.large", action: "chunked" }],
summary: {
byType: {
"payload.large": 2,
},
payloadLarge: {
count: 2,
rejected: 1,
chunked: 1,
},
},
});
expect(snapshot.events).toHaveLength(3);
});
it("normalizes external stability query params consistently", () => {
expect(
normalizeDiagnosticStabilityQuery(
{
limit: "25",
type: " payload.large ",
sinceSeq: "2",
},
{ defaultLimit: 10 },
),
).toEqual({
limit: 25,
type: "payload.large",
sinceSeq: 2,
});
expect(normalizeDiagnosticStabilityQuery({}, { defaultLimit: 10 })).toEqual({
limit: 10,
type: undefined,
sinceSeq: undefined,
});
expect(() => normalizeDiagnosticStabilityQuery({ limit: 0 })).toThrow(
"limit must be between 1 and 1000",
);
expect(() => normalizeDiagnosticStabilityQuery({ sinceSeq: -1 })).toThrow(
"sinceSeq must be a non-negative integer",
);
});
});

View File

@@ -0,0 +1,484 @@
import {
onDiagnosticEvent,
type DiagnosticEventPayload,
type DiagnosticMemoryUsage,
} from "../infra/diagnostic-events.js";
export const DEFAULT_DIAGNOSTIC_STABILITY_CAPACITY = 1000;
export const DEFAULT_DIAGNOSTIC_STABILITY_LIMIT = 50;
export const MAX_DIAGNOSTIC_STABILITY_LIMIT = DEFAULT_DIAGNOSTIC_STABILITY_CAPACITY;
const SAFE_REASON_CODE = /^[A-Za-z0-9_.:-]{1,120}$/u;
export type DiagnosticStabilityEventRecord = {
seq: number;
ts: number;
type: DiagnosticEventPayload["type"];
channel?: string;
pluginId?: string;
source?: string;
surface?: string;
action?: string;
reason?: string;
outcome?: string;
level?: string;
detector?: string;
toolName?: string;
pairedToolName?: string;
provider?: string;
model?: string;
durationMs?: number;
costUsd?: number;
count?: number;
bytes?: number;
limitBytes?: number;
thresholdBytes?: number;
rssGrowthBytes?: number;
windowMs?: number;
ageMs?: number;
queueDepth?: number;
queueSize?: number;
waitMs?: number;
active?: number;
waiting?: number;
queued?: number;
webhooks?: {
received: number;
processed: number;
errors: number;
};
memory?: DiagnosticMemoryUsage;
usage?: {
input?: number;
output?: number;
cacheRead?: number;
cacheWrite?: number;
promptTokens?: number;
total?: number;
};
context?: {
limit?: number;
used?: number;
};
};
export type DiagnosticStabilitySnapshot = {
generatedAt: string;
capacity: number;
count: number;
dropped: number;
firstSeq?: number;
lastSeq?: number;
events: DiagnosticStabilityEventRecord[];
summary: {
byType: Record<string, number>;
memory?: {
latest?: DiagnosticMemoryUsage;
maxRssBytes?: number;
maxHeapUsedBytes?: number;
pressureCount: number;
};
payloadLarge?: {
count: number;
rejected: number;
truncated: number;
chunked: number;
bySurface: Record<string, number>;
};
};
};
export type DiagnosticStabilityQuery = {
limit?: number;
type?: string;
sinceSeq?: number;
};
export type DiagnosticStabilityQueryInput = {
limit?: unknown;
type?: unknown;
sinceSeq?: unknown;
};
export type NormalizedDiagnosticStabilityQuery = {
limit: number;
type: string | undefined;
sinceSeq: number | undefined;
};
type DiagnosticStabilityState = {
records: Array<DiagnosticStabilityEventRecord | undefined>;
capacity: number;
nextIndex: number;
count: number;
dropped: number;
unsubscribe: (() => void) | null;
};
function createState(capacity = DEFAULT_DIAGNOSTIC_STABILITY_CAPACITY): DiagnosticStabilityState {
return {
records: Array.from<DiagnosticStabilityEventRecord | undefined>({ length: capacity }),
capacity,
nextIndex: 0,
count: 0,
dropped: 0,
unsubscribe: null,
};
}
function getDiagnosticStabilityState(): DiagnosticStabilityState {
const globalStore = globalThis as typeof globalThis & {
__openclawDiagnosticStabilityState?: DiagnosticStabilityState;
};
globalStore.__openclawDiagnosticStabilityState ??= createState();
return globalStore.__openclawDiagnosticStabilityState;
}
function copyMemory(memory: DiagnosticMemoryUsage): DiagnosticMemoryUsage {
return { ...memory };
}
function copyReasonCode(reason: string | undefined): string | undefined {
if (!reason || !SAFE_REASON_CODE.test(reason)) {
return undefined;
}
return reason;
}
function assignReasonCode(
record: DiagnosticStabilityEventRecord,
reason: string | undefined,
): void {
const reasonCode = copyReasonCode(reason);
if (reasonCode) {
record.reason = reasonCode;
}
}
function isRecord(
record: DiagnosticStabilityEventRecord | undefined,
): record is DiagnosticStabilityEventRecord {
return record !== undefined;
}
function sanitizeDiagnosticEvent(event: DiagnosticEventPayload): DiagnosticStabilityEventRecord {
const record: DiagnosticStabilityEventRecord = {
seq: event.seq,
ts: event.ts,
type: event.type,
};
switch (event.type) {
case "model.usage":
record.channel = event.channel;
record.provider = event.provider;
record.model = event.model;
record.usage = { ...event.usage };
record.context = event.context ? { ...event.context } : undefined;
record.costUsd = event.costUsd;
record.durationMs = event.durationMs;
break;
case "webhook.received":
record.channel = event.channel;
break;
case "webhook.processed":
record.channel = event.channel;
record.durationMs = event.durationMs;
break;
case "webhook.error":
record.channel = event.channel;
break;
case "message.queued":
record.channel = event.channel;
record.source = event.source;
record.queueDepth = event.queueDepth;
break;
case "message.processed":
record.channel = event.channel;
record.durationMs = event.durationMs;
record.outcome = event.outcome;
assignReasonCode(record, event.reason);
break;
case "session.state":
record.outcome = event.state;
assignReasonCode(record, event.reason);
record.queueDepth = event.queueDepth;
break;
case "session.stuck":
record.outcome = event.state;
record.ageMs = event.ageMs;
record.queueDepth = event.queueDepth;
break;
case "queue.lane.enqueue":
record.source = event.lane;
record.queueSize = event.queueSize;
break;
case "queue.lane.dequeue":
record.source = event.lane;
record.queueSize = event.queueSize;
record.waitMs = event.waitMs;
break;
case "run.attempt":
record.count = event.attempt;
break;
case "diagnostic.heartbeat":
record.webhooks = { ...event.webhooks };
record.active = event.active;
record.waiting = event.waiting;
record.queued = event.queued;
break;
case "tool.loop":
record.toolName = event.toolName;
record.level = event.level;
record.action = event.action;
record.detector = event.detector;
record.count = event.count;
record.pairedToolName = event.pairedToolName;
break;
case "diagnostic.memory.sample":
record.memory = copyMemory(event.memory);
break;
case "diagnostic.memory.pressure":
record.level = event.level;
assignReasonCode(record, event.reason);
record.memory = copyMemory(event.memory);
record.thresholdBytes = event.thresholdBytes;
record.rssGrowthBytes = event.rssGrowthBytes;
record.windowMs = event.windowMs;
break;
case "payload.large":
record.surface = event.surface;
record.action = event.action;
record.bytes = event.bytes;
record.limitBytes = event.limitBytes;
record.count = event.count;
record.channel = event.channel;
record.pluginId = event.pluginId;
assignReasonCode(record, event.reason);
break;
}
return record;
}
function appendRecord(record: DiagnosticStabilityEventRecord): void {
const state = getDiagnosticStabilityState();
state.records[state.nextIndex] = record;
state.nextIndex = (state.nextIndex + 1) % state.capacity;
if (state.count < state.capacity) {
state.count += 1;
return;
}
state.dropped += 1;
}
function listRecords(): DiagnosticStabilityEventRecord[] {
const state = getDiagnosticStabilityState();
if (state.count === 0) {
return [];
}
if (state.count < state.capacity) {
return state.records.slice(0, state.count).filter(isRecord);
}
return [
...state.records.slice(state.nextIndex),
...state.records.slice(0, state.nextIndex),
].filter(isRecord);
}
function summarizeRecords(
records: DiagnosticStabilityEventRecord[],
): DiagnosticStabilitySnapshot["summary"] {
const byType: Record<string, number> = {};
let latestMemory: DiagnosticMemoryUsage | undefined;
let maxRssBytes: number | undefined;
let maxHeapUsedBytes: number | undefined;
let pressureCount = 0;
const payloadLarge = {
count: 0,
rejected: 0,
truncated: 0,
chunked: 0,
bySurface: {} as Record<string, number>,
};
for (const record of records) {
byType[record.type] = (byType[record.type] ?? 0) + 1;
if (record.memory) {
latestMemory = record.memory;
maxRssBytes =
maxRssBytes === undefined
? record.memory.rssBytes
: Math.max(maxRssBytes, record.memory.rssBytes);
maxHeapUsedBytes =
maxHeapUsedBytes === undefined
? record.memory.heapUsedBytes
: Math.max(maxHeapUsedBytes, record.memory.heapUsedBytes);
}
if (record.type === "diagnostic.memory.pressure") {
pressureCount += 1;
}
if (record.type === "payload.large") {
payloadLarge.count += 1;
if (record.action === "rejected") {
payloadLarge.rejected += 1;
} else if (record.action === "truncated") {
payloadLarge.truncated += 1;
} else if (record.action === "chunked") {
payloadLarge.chunked += 1;
}
const surface = record.surface ?? "unknown";
payloadLarge.bySurface[surface] = (payloadLarge.bySurface[surface] ?? 0) + 1;
}
}
return {
byType,
...(latestMemory || pressureCount > 0
? {
memory: {
latest: latestMemory,
maxRssBytes,
maxHeapUsedBytes,
pressureCount,
},
}
: {}),
...(payloadLarge.count > 0 ? { payloadLarge } : {}),
};
}
function selectRecords(
records: DiagnosticStabilityEventRecord[],
options?: {
limit?: number;
type?: string;
sinceSeq?: number;
},
): {
filtered: DiagnosticStabilityEventRecord[];
events: DiagnosticStabilityEventRecord[];
} {
const { limit, type, sinceSeq } = normalizeDiagnosticStabilityQuery(options);
const filtered = records.filter((record) => {
if (type && record.type !== type) {
return false;
}
if (sinceSeq !== undefined && record.seq <= sinceSeq) {
return false;
}
return true;
});
return {
filtered,
events: filtered.slice(Math.max(0, filtered.length - limit)),
};
}
function parseOptionalNonNegativeInteger(value: unknown, field: string): number | undefined {
if (value === undefined || value === null || value === "") {
return undefined;
}
const parsed =
typeof value === "number" ? value : typeof value === "string" ? Number(value) : NaN;
if (!Number.isInteger(parsed) || parsed < 0) {
throw new Error(`${field} must be a non-negative integer`);
}
return parsed;
}
function parseOptionalType(value: unknown): string | undefined {
if (value === undefined || value === null || value === "") {
return undefined;
}
if (typeof value !== "string" || value.trim() === "") {
throw new Error("type must be a non-empty string");
}
return value.trim();
}
function normalizeLimit(limit: unknown, defaultLimit = DEFAULT_DIAGNOSTIC_STABILITY_LIMIT): number {
const parsed = parseOptionalNonNegativeInteger(limit, "limit");
if (parsed === undefined) {
return defaultLimit;
}
if (parsed < 1 || parsed > MAX_DIAGNOSTIC_STABILITY_LIMIT) {
throw new Error(`limit must be between 1 and ${MAX_DIAGNOSTIC_STABILITY_LIMIT}`);
}
return parsed;
}
export function normalizeDiagnosticStabilityQuery(
input: DiagnosticStabilityQueryInput = {},
options?: { defaultLimit?: number },
): NormalizedDiagnosticStabilityQuery {
return {
limit: normalizeLimit(input.limit, options?.defaultLimit),
type: parseOptionalType(input.type),
sinceSeq: parseOptionalNonNegativeInteger(input.sinceSeq, "sinceSeq"),
};
}
export function startDiagnosticStabilityRecorder(): void {
const state = getDiagnosticStabilityState();
if (state.unsubscribe) {
return;
}
state.unsubscribe = onDiagnosticEvent((event) => {
appendRecord(sanitizeDiagnosticEvent(event));
});
}
export function stopDiagnosticStabilityRecorder(): void {
const state = getDiagnosticStabilityState();
state.unsubscribe?.();
state.unsubscribe = null;
}
export function getDiagnosticStabilitySnapshot(options?: {
limit?: number;
type?: string;
sinceSeq?: number;
}): DiagnosticStabilitySnapshot {
const state = getDiagnosticStabilityState();
const { filtered, events } = selectRecords(listRecords(), options);
return {
generatedAt: new Date().toISOString(),
capacity: state.capacity,
count: filtered.length,
dropped: state.dropped,
firstSeq: filtered[0]?.seq,
lastSeq: filtered.at(-1)?.seq,
events,
summary: summarizeRecords(filtered),
};
}
export function selectDiagnosticStabilitySnapshot(
snapshot: DiagnosticStabilitySnapshot,
options?: {
limit?: number;
type?: string;
sinceSeq?: number;
},
): DiagnosticStabilitySnapshot {
const { filtered, events } = selectRecords(snapshot.events, options);
return {
...snapshot,
count: filtered.length,
firstSeq: filtered[0]?.seq,
lastSeq: filtered.at(-1)?.seq,
events,
summary: summarizeRecords(filtered),
};
}
export function resetDiagnosticStabilityRecorderForTest(): void {
const state = getDiagnosticStabilityState();
state.unsubscribe?.();
const next = createState(state.capacity);
const globalStore = globalThis as typeof globalThis & {
__openclawDiagnosticStabilityState?: DiagnosticStabilityState;
};
globalStore.__openclawDiagnosticStabilityState = next;
}

View File

@@ -0,0 +1,658 @@
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import JSZip from "jszip";
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import { emitDiagnosticEvent, resetDiagnosticEventsForTest } from "../infra/diagnostic-events.js";
import {
resetDiagnosticStabilityBundleForTest,
writeDiagnosticStabilityBundleSync,
} from "./diagnostic-stability-bundle.js";
import {
resetDiagnosticStabilityRecorderForTest,
startDiagnosticStabilityRecorder,
stopDiagnosticStabilityRecorder,
} from "./diagnostic-stability.js";
import { writeDiagnosticSupportExport } from "./diagnostic-support-export.js";
import {
redactSupportString,
redactTextForSupport,
sanitizeSupportConfigValue,
sanitizeSupportSnapshotValue,
} from "./diagnostic-support-redaction.js";
import type { LogTailPayload } from "./log-tail.js";
async function readZipTextEntries(file: string): Promise<Record<string, string>> {
const zip = await JSZip.loadAsync(fs.readFileSync(file));
const entries: Record<string, string> = {};
for (const [name, entry] of Object.entries(zip.files)) {
if (!entry.dir) {
entries[name] = await entry.async("string");
}
}
return entries;
}
describe("diagnostic support export", () => {
let tempDir: string;
beforeEach(() => {
tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-support-export-"));
resetDiagnosticEventsForTest();
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticStabilityBundleForTest();
});
afterEach(() => {
stopDiagnosticStabilityRecorder();
resetDiagnosticEventsForTest();
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticStabilityBundleForTest();
fs.rmSync(tempDir, { recursive: true, force: true });
});
it("writes a shareable zip without raw chats, webhook bodies, or secrets", async () => {
const fakeToken = "sk-test-support-export-secret-token-1234567890";
const fakeAwsKey = ["AKIA", "IOSFODNN7EXAMPLE"].join("");
const fakeJwt = [
"eyJhbGciOiJIUzI1NiIs",
"eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4i",
"SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c",
].join(".");
const privateChat = "private user said diagnose my bank transfer";
const webhookBody = "raw webhook body with message contents";
const credentialUrl =
"wss://support-user:support-password@gateway.example/ws?token=short-token&ok=1";
const configPath = path.join(tempDir, "openclaw.json");
fs.writeFileSync(
configPath,
JSON.stringify(
{
gateway: {
mode: "local",
bind: "loopback",
port: 18789,
auth: {
mode: "token",
token: fakeToken,
},
},
logging: {
redactSensitive: "off",
},
channels: {
telegram: {
accounts: {
"15555551212": {
botToken: fakeToken,
allowFrom: [privateChat],
ownerId: 8675309001,
},
},
},
},
agents: [{ name: "personal-agent", instructions: privateChat }],
},
null,
2,
),
"utf8",
);
startDiagnosticStabilityRecorder();
emitDiagnosticEvent({
type: "webhook.error",
channel: "telegram",
chatId: "15555551212",
error: webhookBody,
});
emitDiagnosticEvent({
type: "payload.large",
surface: "gateway.http.json",
action: "rejected",
bytes: 2048,
limitBytes: 1024,
reason: "json_body_limit",
});
const bundle = writeDiagnosticStabilityBundleSync({
reason: "gateway.restart_startup_failed",
stateDir: tempDir,
now: new Date("2026-04-22T12:00:00.000Z"),
});
expect(bundle.status).toBe("written");
const logTail: LogTailPayload = {
file: path.join(tempDir, "logs", "openclaw.log"),
cursor: 200,
size: 200,
truncated: false,
reset: false,
lines: [
JSON.stringify({
time: "2026-04-22T12:00:00.000Z",
level: "info",
subsystem: "gateway",
component: "gateway/server",
channel: "telegram",
sessionId: "gateway-session-15555551212",
sessionKey: "matrix:!supportRoom:matrix.example.com:$supportEventSecret",
msg: `gateway websocket listening at ${credentialUrl} Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ== ${fakeAwsKey} ${fakeJwt} Cookie: sid=secret`,
hostname: "support-host",
message: privateChat,
body: webhookBody,
authorization: `Bearer ${fakeToken}`,
statusCode: 200,
}),
JSON.stringify({
"0": JSON.stringify({ module: "matrix-auto-reply" }),
"1": "matrix logged in as @support-user:matrix.example.com",
_meta: {
logLevelName: "info",
name: JSON.stringify({
module: "matrix-auto-reply",
storePath: path.join(tempDir, "cron", "jobs.json"),
}),
hostname: "support-host",
},
time: "2026-04-22T12:00:00.100Z",
}),
JSON.stringify({
time: "2026-04-22T12:00:00.200Z",
level: "info",
component: "gateway/server",
msg: "user said structured secret payload",
}),
JSON.stringify({
"0": JSON.stringify({ subsystem: "gateway/channels/matrix" }),
"1": privateChat,
_meta: {
logLevelName: "warn",
name: "gateway-runtime",
hostname: "support-host",
},
time: "2026-04-22T12:00:00.300Z",
}),
`plain fallback ${privateChat} ${fakeToken}`,
],
};
let requestedLogTail: { limit?: number; maxBytes?: number } | undefined;
const outputPath = path.join(tempDir, "support.zip");
const result = await writeDiagnosticSupportExport({
env: {
...process.env,
HOME: tempDir,
OPENCLAW_STATE_DIR: tempDir,
},
stateDir: tempDir,
outputPath,
now: new Date("2026-04-22T12:00:01.000Z"),
readLogTail: async (params) => {
requestedLogTail = params;
return logTail;
},
readStatusSnapshot: async () => ({
service: {
loaded: true,
command: {
programArguments: ["openclaw", "gateway", "run", "--token", fakeToken],
environment: {
HOME: tempDir,
OPENCLAW_GATEWAY_TOKEN: fakeToken,
},
},
},
gateway: {
probeUrl: credentialUrl,
},
warning: {
chatId: 4444555566,
message: privateChat,
},
}),
readHealthSnapshot: async () => ({
ok: true,
channels: {
telegram: {
accounts: {
"15555551212": {
accountId: 15555551212,
configured: true,
phone: 4444555566,
probe: {
ok: false,
error: webhookBody,
},
},
},
},
},
}),
});
expect(result.path).toBe(outputPath);
expect(result.bytes).toBeGreaterThan(0);
expect(requestedLogTail).toMatchObject({
limit: 5000,
maxBytes: 1_000_000,
});
const entries = await readZipTextEntries(outputPath);
expect(Object.keys(entries).toSorted()).toEqual([
"config/sanitized.json",
"config/shape.json",
"diagnostics.json",
"health/gateway-health.json",
"logs/openclaw-sanitized.jsonl",
"manifest.json",
"stability/latest.json",
"status/gateway-status.json",
"summary.md",
]);
const combined = Object.values(entries).join("\n");
expect(combined).not.toContain(fakeToken);
expect(combined).not.toContain(privateChat);
expect(combined).not.toContain(webhookBody);
expect(combined).not.toContain("15555551212");
expect(combined).not.toContain("4444555566");
expect(combined).not.toContain("8675309001");
expect(combined).not.toContain("support-password");
expect(combined).not.toContain("short-token");
expect(combined).not.toContain(tempDir);
expect(combined).not.toContain("cron/jobs.json");
expect(combined).not.toContain(os.hostname());
expect(combined).not.toContain("QWxhZGRpbjpvcGVuIHNlc2FtZQ==");
expect(combined).not.toContain("sid=secret");
expect(combined).not.toContain("structured secret payload");
expect(combined).not.toContain("gateway-session-15555551212");
expect(combined).not.toContain("supportEventSecret");
expect(combined).not.toContain(fakeAwsKey);
expect(combined).not.toContain(fakeJwt);
expect(combined).toContain("payload.large");
expect(combined).toContain("gateway.http.json");
expect(combined).toContain("$OPENCLAW_STATE_DIR");
expect(combined).toContain("<redacted-hostname>");
expect(combined).toContain("gateway-status.json");
expect(combined).toContain("gateway-health.json");
expect(combined).toContain("Attach this zip to the bug report");
const sanitizedLogs = entries["logs/openclaw-sanitized.jsonl"];
expect(sanitizedLogs).toContain('"subsystem":"gateway"');
expect(sanitizedLogs).toContain('"component":"gateway/server"');
expect(sanitizedLogs).toContain('"channel":"telegram"');
expect(sanitizedLogs).not.toContain("sessionId");
expect(sanitizedLogs).not.toContain("sessionKey");
expect(sanitizedLogs).toContain("gateway websocket listening");
expect(sanitizedLogs).toContain(
"wss://<redacted>:<redacted>@gateway.example/ws?token=<redacted>",
);
expect(sanitizedLogs).toContain("Basic <redacted>");
expect(sanitizedLogs).toContain("Cookie: <redacted>");
expect(sanitizedLogs).toContain("<redacted-aws-key>");
expect(sanitizedLogs).toContain("<redacted-jwt>");
expect(sanitizedLogs).toContain('"module":"matrix-auto-reply"');
expect(sanitizedLogs).toContain('"subsystem":"gateway/channels/matrix"');
expect(sanitizedLogs).toContain('"logger":"gateway-runtime"');
expect(sanitizedLogs).toContain('"level":"warn"');
expect(sanitizedLogs).toContain("matrix logged in as <redacted-matrix-user>");
expect(sanitizedLogs).toContain('"omitted":"log-message"');
expect(sanitizedLogs).toContain('"omittedLogMessageBytes"');
expect(sanitizedLogs).toContain('"omittedLogMessageCount"');
expect(sanitizedLogs).not.toContain("private user said");
expect(sanitizedLogs).not.toContain("@support-user:matrix.example.com");
expect(sanitizedLogs).not.toContain("support-host");
expect(sanitizedLogs).toContain('"omitted":"unparsed"');
const status = JSON.parse(entries["status/gateway-status.json"] ?? "{}") as {
data?: {
service?: {
command?: {
programArguments?: string[];
environment?: Record<string, string>;
};
};
};
};
expect(status.data?.service?.command?.programArguments).toEqual([
"openclaw",
"gateway",
"run",
"--token",
"<redacted>",
]);
expect(status.data?.service?.command?.environment?.OPENCLAW_GATEWAY_TOKEN).toBe("<redacted>");
expect(JSON.stringify(status)).toContain(
"wss://<redacted>:<redacted>@gateway.example/ws?token=<redacted>",
);
const health = JSON.parse(entries["health/gateway-health.json"] ?? "{}") as {
data?: {
channels?: {
telegram?: {
accounts?: { count?: number };
};
};
};
};
expect(health.data?.channels?.telegram?.accounts).toEqual({ count: 1 });
const configShape = JSON.parse(entries["config/shape.json"] ?? "{}") as {
gateway?: { mode?: string; authMode?: string };
channels?: { ids?: string[] };
};
expect(configShape.gateway).toMatchObject({
mode: "local",
authMode: "token",
});
expect(configShape.channels?.ids).toEqual(["telegram"]);
const sanitizedConfig = JSON.parse(entries["config/sanitized.json"] ?? "{}") as {
gateway?: {
mode?: string;
port?: number;
auth?: {
mode?: string;
token?: string;
};
};
channels?: {
telegram?: {
accounts?: Record<
string,
{ botToken?: string; allowFrom?: { redacted?: boolean }; ownerId?: string }
>;
};
};
logging?: {
redactSensitive?: string;
};
agents?: Array<{ name?: string; instructions?: string }>;
};
expect(sanitizedConfig.gateway).toMatchObject({
mode: "local",
port: 18789,
auth: {
mode: "token",
token: "<redacted>",
},
});
expect(sanitizedConfig.logging).toMatchObject({
redactSensitive: "off",
});
expect(Object.keys(sanitizedConfig.channels?.telegram?.accounts ?? {})).toEqual([
"<redacted-account-1>",
]);
const sanitizedTelegramAccount =
sanitizedConfig.channels?.telegram?.accounts?.["<redacted-account-1>"];
expect(sanitizedTelegramAccount?.botToken).toBe("<redacted>");
expect(sanitizedTelegramAccount?.allowFrom).toEqual({ redacted: true, count: 1 });
expect(sanitizedTelegramAccount?.ownerId).toBe("<redacted>");
expect(sanitizedConfig.agents?.[0]?.name).toBe("personal-agent");
expect(sanitizedConfig.agents?.[0]?.instructions).toBe("<redacted>");
});
it("redacts numeric private fields in support snapshots and config", () => {
const redaction = {
env: {
HOME: tempDir,
OPENCLAW_STATE_DIR: tempDir,
},
stateDir: tempDir,
};
expect(sanitizeSupportSnapshotValue(15555551212, redaction, "chatId")).toBe("<redacted>");
expect(sanitizeSupportSnapshotValue(15555551212, redaction, "messageId")).toBe("<redacted>");
expect(sanitizeSupportSnapshotValue(200, redaction, "statusCode")).toBe(200);
expect(sanitizeSupportConfigValue(15555551212, redaction, "ownerId")).toBe("<redacted>");
expect(sanitizeSupportConfigValue(18789, redaction, "port")).toBe(18789);
});
it("blocks prototype keys and caps support sanitizer width", () => {
const redaction = {
env: {
HOME: tempDir,
OPENCLAW_STATE_DIR: tempDir,
},
stateDir: tempDir,
};
const wideSnapshot: Record<string, unknown> = {
["__proto__"]: "polluted",
constructor: "polluted",
prototype: "polluted",
};
for (let index = 0; index < 1005; index += 1) {
wideSnapshot[`field${String(index).padStart(4, "0")}`] = index;
}
const snapshot = sanitizeSupportSnapshotValue(wideSnapshot, redaction) as Record<
string,
unknown
>;
expect(Object.getPrototypeOf(snapshot)).toBe(null);
expect(snapshot.__proto__).toBeUndefined();
expect(snapshot.constructor).toBeUndefined();
expect(snapshot.prototype).toBeUndefined();
expect(snapshot.field0000).toBe(0);
expect(snapshot.field0999).toBe(999);
expect(snapshot.field1000).toBeUndefined();
expect(snapshot["<truncated>"]).toEqual({
truncated: true,
count: 1008,
limit: 1000,
});
const array = sanitizeSupportConfigValue(
Array.from({ length: 1005 }, (_entry, index) => ({ name: `item-${index}` })),
redaction,
) as Record<string, unknown>;
expect(Array.isArray(array)).toBe(false);
expect((array.items as unknown[]).length).toBe(1000);
expect(array.truncated).toBe(true);
expect(array.count).toBe(1005);
expect(array.limit).toBe(1000);
});
it("redacts support text identifiers without hiding useful URL hosts", () => {
const fakeAwsKey = ["ASIA", "IOSFODNN7EXAMPLE"].join("");
const fakeJwt = [
"eyJhbGciOiJIUzI1NiIs",
"eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4i",
"SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c",
].join(".");
const cases = [
[
"connect wss://support-user:support-password@gateway.example/ws?token=short-token&ok=1",
"connect wss://<redacted>:<redacted>@gateway.example/ws?token=<redacted>",
],
[
"connect https://gateway.example/ws?access-token=short-token",
"connect https://gateway.example/ws?access-token=<redacted>",
],
[
"connect https://gateway.example/ws?hook-token=hook-secret",
"connect https://gateway.example/ws?hook-token=<redacted>",
],
["connect https://token@gateway.example/ws", "connect https://<redacted>@gateway.example/ws"],
["auth Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==", "auth Basic <redacted>"],
["Cookie: sid=secret; theme=light", "Cookie: <redacted>"],
[`aws ${fakeAwsKey}`, "aws <redacted-aws-key>"],
[`jwt ${fakeJwt}`, "jwt <redacted-jwt>"],
["email alice@example.com", "email <redacted-email>"],
["matrix @support-user:matrix.example.com", "matrix <redacted-matrix-user>"],
["room !support-room:matrix.example.com", "room <redacted-matrix-room>"],
["event $F0Zlxky8bavuqH6MK75Av_c7UWFLp550WTQ1EA-F0KM", "event <redacted-matrix-event>"],
["notify @support_bot now", "notify <redacted-handle> now"],
["phone 15555551212", "phone <redacted-id>"],
] as const;
for (const [input, expected] of cases) {
expect(redactTextForSupport(input)).toBe(expected);
}
});
it("redacts Windows USERPROFILE paths when HOME is unset", () => {
const userProfile = "C:\\Users\\support-user";
const stateDir = `${userProfile}\\AppData\\Roaming\\openclaw`;
const redaction = {
env: {
USERPROFILE: userProfile,
OPENCLAW_STATE_DIR: stateDir,
},
stateDir,
};
expect(redactSupportString(`${stateDir}\\logs\\gateway.log`, redaction)).toBe(
"$OPENCLAW_STATE_DIR\\logs\\gateway.log",
);
expect(
redactSupportString(`failed at ${userProfile}\\Documents\\snapshot-error.txt`, redaction),
).toBe("failed at ~\\Documents\\snapshot-error.txt");
expect(
redactSupportString(
"failed at c:\\users\\support-user\\Documents\\snapshot-error.txt",
redaction,
),
).toBe("failed at ~\\Documents\\snapshot-error.txt");
const status = sanitizeSupportSnapshotValue(
{
service: {
command: {
programArguments: [
"node",
`${userProfile}\\openclaw\\dist\\index.js`,
"--config",
`${stateDir}\\openclaw.json`,
],
sourcePath: "c:\\users\\support-user\\AppData\\Local\\openclaw\\gateway-service.json",
},
},
},
redaction,
);
const serialized = JSON.stringify(status);
expect(serialized).not.toContain("support-user");
expect(serialized).toContain("~\\\\openclaw\\\\dist\\\\index.js");
expect(serialized).toContain("$OPENCLAW_STATE_DIR\\\\openclaw.json");
expect(serialized).toContain("~\\\\AppData\\\\Local\\\\openclaw\\\\gateway-service.json");
});
it("keeps writing when status and health snapshots fail", async () => {
const fakeToken = "sk-test-support-export-secret-token-1234567890";
const outputPath = path.join(tempDir, "support-failed-snapshots.zip");
await writeDiagnosticSupportExport({
env: {
...process.env,
HOME: tempDir,
OPENCLAW_STATE_DIR: tempDir,
},
stateDir: tempDir,
outputPath,
now: new Date("2026-04-22T12:00:01.000Z"),
readLogTail: async () => ({
file: path.join(tempDir, "logs", "openclaw.log"),
cursor: 0,
size: 0,
truncated: false,
reset: false,
lines: [],
}),
readStatusSnapshot: async () => {
throw new Error(`status failed with token ${fakeToken}`);
},
readHealthSnapshot: async () => {
throw new Error("health failed with PASSWORD=hunter2");
},
});
const entries = await readZipTextEntries(outputPath);
expect(Object.keys(entries).toSorted()).toContain("status/gateway-status.json");
expect(Object.keys(entries).toSorted()).toContain("health/gateway-health.json");
const combined = Object.values(entries).join("\n");
expect(combined).not.toContain(fakeToken);
expect(combined).not.toContain("hunter2");
expect(combined).toContain('"status": "failed"');
expect(combined).toContain("status snapshot failed");
expect(combined).toContain("health snapshot failed");
});
it("keeps writing when log tail collection fails", async () => {
const fakeToken = "sk-test-log-tail-secret-token-1234567890";
const outputPath = path.join(tempDir, "support-failed-log-tail.zip");
await writeDiagnosticSupportExport({
env: {
...process.env,
HOME: tempDir,
OPENCLAW_STATE_DIR: tempDir,
},
stateDir: tempDir,
outputPath,
now: new Date("2026-04-22T12:00:02.000Z"),
readLogTail: async () => {
throw new Error(`log tail failed at ${tempDir}/openclaw.log with token ${fakeToken}`);
},
});
const entries = await readZipTextEntries(outputPath);
expect(Object.keys(entries).toSorted()).toContain("logs/openclaw-sanitized.jsonl");
const combined = Object.values(entries).join("\n");
expect(combined).not.toContain(fakeToken);
expect(combined).not.toContain(tempDir);
expect(combined).toContain("log-tail-read-failed");
expect(combined).toContain("sanitized log tail unavailable");
});
it("keeps writing when config stat fails", async () => {
const fakeToken = "sk-test-config-stat-secret-token-1234567890";
const configPath = path.join(tempDir, "openclaw.json");
const outputPath = path.join(tempDir, "support-failed-config-stat.zip");
fs.writeFileSync(configPath, "{}\n", "utf8");
const originalStatSync = fs.statSync.bind(fs);
const statSpy = vi.spyOn(fs, "statSync").mockImplementation((target, options) => {
if (target === configPath) {
throw new Error(`config stat failed with token ${fakeToken}`);
}
return originalStatSync(target, options as never);
});
try {
await writeDiagnosticSupportExport({
env: {
...process.env,
HOME: tempDir,
OPENCLAW_CONFIG_PATH: configPath,
OPENCLAW_STATE_DIR: tempDir,
},
stateDir: tempDir,
outputPath,
now: new Date("2026-04-22T12:00:03.000Z"),
readLogTail: async () => ({
file: path.join(tempDir, "logs", "openclaw.log"),
cursor: 0,
size: 0,
truncated: false,
reset: false,
lines: [],
}),
});
} finally {
statSpy.mockRestore();
}
const entries = await readZipTextEntries(outputPath);
const combined = Object.values(entries).join("\n");
expect(Object.keys(entries).toSorted()).toContain("config/shape.json");
expect(combined).not.toContain(fakeToken);
expect(combined).toContain('"parseOk": false');
expect(combined).toContain("config stat failed with token");
expect(combined).toContain("Attach this zip to the bug report");
});
});

View File

@@ -0,0 +1,723 @@
import fs from "node:fs";
import path from "node:path";
import process from "node:process";
import JSZip from "jszip";
import { parseConfigJson5 } from "../config/io.js";
import { resolveConfigPath, resolveStateDir } from "../config/paths.js";
import { redactConfigObject } from "../config/redact-snapshot.js";
import { resolveHomeRelativePath } from "../infra/home-dir.js";
import { VERSION } from "../version.js";
import {
readDiagnosticStabilityBundleFileSync,
readLatestDiagnosticStabilityBundleSync,
type ReadDiagnosticStabilityBundleResult,
} from "./diagnostic-stability-bundle.js";
import { sanitizeSupportLogRecord } from "./diagnostic-support-log-redaction.js";
import {
redactPathForSupport,
redactSupportString,
redactTextForSupport,
sanitizeSupportConfigValue,
sanitizeSupportSnapshotValue,
type SupportRedactionContext,
} from "./diagnostic-support-redaction.js";
import { readConfiguredLogTail, type LogTailPayload } from "./log-tail.js";
export const DIAGNOSTIC_SUPPORT_EXPORT_VERSION = 1;
const DEFAULT_LOG_LIMIT = 5000;
const DEFAULT_LOG_MAX_BYTES = 1_000_000;
const SUPPORT_EXPORT_PREFIX = "openclaw-diagnostics-";
const SUPPORT_EXPORT_SUFFIX = ".zip";
type Awaitable<T> = T | Promise<T>;
type SupportSnapshotReader = () => Awaitable<unknown>;
export type DiagnosticSupportExportOptions = {
outputPath?: string;
cwd?: string;
env?: NodeJS.ProcessEnv;
stateDir?: string;
now?: Date;
logLimit?: number;
logMaxBytes?: number;
stabilityBundle?: string | false;
readLogTail?: typeof readConfiguredLogTail;
readStatusSnapshot?: SupportSnapshotReader;
readHealthSnapshot?: SupportSnapshotReader;
};
export type DiagnosticSupportExportManifest = {
version: typeof DIAGNOSTIC_SUPPORT_EXPORT_VERSION;
generatedAt: string;
openclawVersion: string;
platform: NodeJS.Platform;
arch: string;
node: string;
stateDir: string;
contents: Array<{
path: string;
mediaType: string;
bytes: number;
}>;
privacy: {
payloadFree: true;
rawLogsIncluded: false;
notes: string[];
};
};
export type DiagnosticSupportExportFile = {
path: string;
mediaType: string;
content: string;
};
export type DiagnosticSupportExportArtifact = {
manifest: DiagnosticSupportExportManifest;
files: DiagnosticSupportExportFile[];
};
export type WriteDiagnosticSupportExportResult = {
path: string;
bytes: number;
manifest: DiagnosticSupportExportManifest;
};
type ConfigShape = {
path: string;
exists: boolean;
parseOk: boolean;
bytes?: number;
mtime?: string;
error?: string;
topLevelKeys: string[];
gateway?: {
mode?: unknown;
bind?: unknown;
port?: unknown;
authMode?: unknown;
tailscale?: unknown;
};
channels?: {
count: number;
ids: string[];
};
plugins?: {
count: number;
ids: string[];
};
agents?: {
count: number;
};
};
type ConfigExport = {
shape: ConfigShape;
sanitized?: unknown;
};
type IncludedSanitizedLogTail = {
status: "included";
file: string;
cursor: number;
size: number;
lineCount: number;
truncated: boolean;
reset: boolean;
lines: Array<Record<string, unknown>>;
};
type FailedSanitizedLogTail = Omit<IncludedSanitizedLogTail, "status"> & {
status: "failed";
error: string;
};
type SanitizedLogTail = IncludedSanitizedLogTail | FailedSanitizedLogTail;
type SupportSnapshotStatus =
| {
status: "included";
path: string;
}
| {
status: "failed";
path: string;
error: string;
}
| {
status: "skipped";
};
type CollectedSupportSnapshot = {
summary: SupportSnapshotStatus;
file?: DiagnosticSupportExportFile;
};
function formatExportTimestamp(now: Date): string {
return now.toISOString().replace(/[:.]/g, "-");
}
function byteLength(content: string): number {
return Buffer.byteLength(content, "utf8");
}
function jsonFile(pathName: string, value: unknown): DiagnosticSupportExportFile {
return {
path: pathName,
mediaType: "application/json",
content: `${JSON.stringify(value, null, 2)}\n`,
};
}
function textFile(pathName: string, content: string): DiagnosticSupportExportFile {
return {
path: pathName,
mediaType: "text/plain; charset=utf-8",
content: content.endsWith("\n") ? content : `${content}\n`,
};
}
function normalizePositiveInteger(value: unknown, fallback: number): number {
const parsed = typeof value === "number" ? value : Number(value);
if (!Number.isFinite(parsed) || parsed < 1) {
return fallback;
}
return Math.floor(parsed);
}
function asRecord(value: unknown): Record<string, unknown> | undefined {
if (!value || typeof value !== "object" || Array.isArray(value)) {
return undefined;
}
return value as Record<string, unknown>;
}
function safeScalar(value: unknown): unknown {
if (typeof value === "boolean") {
return value;
}
if (typeof value === "number" && Number.isFinite(value)) {
return value;
}
if (typeof value === "string") {
const redacted = redactTextForSupport(value);
return redacted === value && /^[A-Za-z0-9_.:-]{1,120}$/u.test(value) ? value : "<redacted>";
}
return undefined;
}
function sortedObjectKeys(value: unknown): string[] {
return Object.keys(asRecord(value) ?? {}).toSorted((a, b) => a.localeCompare(b));
}
function sanitizeConfigShape(parsed: unknown, configPath: string, stat: fs.Stats): ConfigShape {
const root = asRecord(parsed) ?? {};
const gateway = asRecord(root.gateway);
const auth = asRecord(gateway?.auth);
const channels = asRecord(root.channels);
const plugins = asRecord(root.plugins);
const agents = Array.isArray(root.agents) ? root.agents : undefined;
const shape: ConfigShape = {
path: configPath,
exists: true,
parseOk: true,
bytes: stat.size,
mtime: stat.mtime.toISOString(),
topLevelKeys: sortedObjectKeys(root),
};
if (gateway) {
shape.gateway = {
mode: safeScalar(gateway.mode),
bind: safeScalar(gateway.bind),
port: safeScalar(gateway.port),
authMode: safeScalar(auth?.mode),
tailscale: safeScalar(gateway.tailscale),
};
}
if (channels) {
shape.channels = {
count: Object.keys(channels).length,
ids: sortedObjectKeys(channels),
};
}
if (plugins) {
shape.plugins = {
count: Object.keys(plugins).length,
ids: sortedObjectKeys(plugins),
};
}
if (agents) {
shape.agents = { count: agents.length };
}
return shape;
}
function sanitizeConfigDetails(parsed: unknown, redaction: SupportRedactionContext): unknown {
return sanitizeSupportConfigValue(redactConfigObject(parsed), redaction);
}
function configShapeReadFailure(params: {
configPath: string;
redaction: SupportRedactionContext;
stat?: fs.Stats;
error?: string;
}): ConfigShape {
const shape: ConfigShape = {
path: params.configPath,
exists: Boolean(params.stat),
parseOk: false,
topLevelKeys: [],
};
if (params.stat) {
shape.bytes = params.stat.size;
shape.mtime = params.stat.mtime.toISOString();
}
if (params.error) {
shape.error = redactSupportString(params.error, params.redaction);
}
return shape;
}
function isMissingPathError(error: unknown): boolean {
if (!error || typeof error !== "object" || !("code" in error)) {
return false;
}
return error.code === "ENOENT" || error.code === "ENOTDIR";
}
function configReadErrorMessage(error: unknown, stat?: fs.Stats): string | undefined {
if (!stat && isMissingPathError(error)) {
return undefined;
}
return error instanceof Error ? error.message : String(error);
}
function readConfigExport(options: {
configPath: string;
env: NodeJS.ProcessEnv;
stateDir: string;
}): ConfigExport {
const redactedConfigPath = redactPathForSupport(options.configPath, options);
let stat: fs.Stats | undefined;
try {
stat = fs.statSync(options.configPath);
const parsed = parseConfigJson5(fs.readFileSync(options.configPath, "utf8"));
if (!parsed.ok) {
return {
shape: configShapeReadFailure({
configPath: redactedConfigPath,
redaction: options,
stat,
error: parsed.error,
}),
};
}
return {
shape: sanitizeConfigShape(parsed.parsed, redactedConfigPath, stat),
sanitized: sanitizeConfigDetails(parsed.parsed, options),
};
} catch (error) {
return {
shape: configShapeReadFailure({
configPath: redactedConfigPath,
redaction: options,
stat,
error: configReadErrorMessage(error, stat),
}),
};
}
}
function redactErrorForSupport(error: unknown, redaction: SupportRedactionContext): string {
return redactSupportString(error instanceof Error ? error.message : String(error), redaction);
}
async function collectSupportSnapshot(params: {
path: string;
reader?: SupportSnapshotReader;
generatedAt: string;
redaction: SupportRedactionContext;
}): Promise<CollectedSupportSnapshot> {
if (!params.reader) {
return { summary: { status: "skipped" } };
}
try {
const data = await params.reader();
return {
summary: {
status: "included",
path: params.path,
},
file: jsonFile(params.path, {
status: "ok",
capturedAt: params.generatedAt,
data: sanitizeSupportSnapshotValue(data, params.redaction),
}),
};
} catch (error) {
const redactedError = redactErrorForSupport(error, params.redaction);
return {
summary: {
status: "failed",
path: params.path,
error: redactedError,
},
file: jsonFile(params.path, {
status: "failed",
capturedAt: params.generatedAt,
error: redactedError,
}),
};
}
}
function readStabilityBundle(
target: DiagnosticSupportExportOptions["stabilityBundle"],
stateDir: string,
): ReadDiagnosticStabilityBundleResult {
if (target === false) {
return { status: "missing", dir: "$OPENCLAW_STATE_DIR/logs/stability" };
}
if (target === undefined || target === "latest") {
return readLatestDiagnosticStabilityBundleSync({ stateDir });
}
return readDiagnosticStabilityBundleFileSync(target);
}
function sanitizeLogTail(tail: LogTailPayload, options: SupportRedactionContext): SanitizedLogTail {
return {
status: "included",
file: redactPathForSupport(tail.file, options),
cursor: tail.cursor,
size: tail.size,
lineCount: tail.lines.length,
truncated: tail.truncated,
reset: tail.reset,
lines: tail.lines.map((line) => sanitizeSupportLogRecord(line, options)),
};
}
function failedLogTail(error: unknown, redaction: SupportRedactionContext): SanitizedLogTail {
const redactedError = redactErrorForSupport(error, redaction);
return {
status: "failed",
file: "unavailable",
cursor: 0,
size: 0,
lineCount: 0,
truncated: false,
reset: false,
error: redactedError,
lines: [
{
omitted: "log-tail-read-failed",
error: redactedError,
},
],
};
}
async function collectSupportLogTail(params: {
readLogTail: typeof readConfiguredLogTail;
limit: number;
maxBytes: number;
redaction: SupportRedactionContext;
}): Promise<SanitizedLogTail> {
try {
const tail = await params.readLogTail({
limit: params.limit,
maxBytes: params.maxBytes,
});
return sanitizeLogTail(tail, params.redaction);
} catch (error) {
return failedLogTail(error, params.redaction);
}
}
function describeStabilityForDiagnostics(
stability: ReadDiagnosticStabilityBundleResult,
redaction: SupportRedactionContext,
) {
if (stability.status === "found") {
return {
status: "found" as const,
path: redactPathForSupport(stability.path, redaction),
mtimeMs: stability.mtimeMs,
eventCount: stability.bundle.snapshot.count,
reason: stability.bundle.reason,
generatedAt: stability.bundle.generatedAt,
};
}
if (stability.status === "missing") {
return {
status: "missing" as const,
dir: redactPathForSupport(stability.dir, redaction),
};
}
return {
status: "failed" as const,
path: stability.path ? redactPathForSupport(stability.path, redaction) : undefined,
error: redactErrorForSupport(stability.error, redaction),
};
}
function renderSummary(params: {
generatedAt: string;
stability: ReadDiagnosticStabilityBundleResult;
logTail: SanitizedLogTail;
config: ConfigShape;
status: SupportSnapshotStatus;
health: SupportSnapshotStatus;
}): string {
const stabilityLine =
params.stability.status === "found"
? `included latest stability bundle (${params.stability.bundle.snapshot.count} event(s))`
: `no stability bundle included (${params.stability.status})`;
const configLine = params.config.exists
? `config shape included (${params.config.parseOk ? "parsed" : "parse failed"})`
: "config file not found";
const logTailLine =
params.logTail.status === "failed"
? `sanitized log tail unavailable (${params.logTail.error})`
: `sanitized log tail (${params.logTail.lineCount} line(s), inspected ${params.logTail.size} byte(s), raw messages omitted)`;
const supportSnapshotLine = (label: string, snapshot: SupportSnapshotStatus) => {
if (snapshot.status === "included") {
return `${label} snapshot included (${snapshot.path})`;
}
if (snapshot.status === "failed") {
return `${label} snapshot failed (${snapshot.error})`;
}
return `${label} snapshot skipped`;
};
return [
"# OpenClaw Diagnostics Export",
"",
"Attach this zip to the bug report. It is designed for maintainers to inspect without asking for raw logs first.",
"",
"## Generated",
"",
`Generated: ${params.generatedAt}`,
`OpenClaw: ${VERSION}`,
"",
"## Contents",
"",
`- ${stabilityLine}`,
`- ${logTailLine}`,
`- ${configLine}`,
`- ${supportSnapshotLine("gateway status", params.status)}`,
`- ${supportSnapshotLine("gateway health", params.health)}`,
"",
"## Maintainer Quick Read",
"",
"- `manifest.json`: file inventory and privacy notes",
"- `diagnostics.json`: top-level summary of config, logs, stability, status, and health",
"- `config/sanitized.json`: config values with credentials, private identifiers, and prompt text redacted",
"- `status/gateway-status.json`: sanitized service/connectivity snapshot",
"- `health/gateway-health.json`: sanitized Gateway health snapshot",
"- `logs/openclaw-sanitized.jsonl`: sanitized log summaries and metadata",
"- `stability/latest.json`: newest payload-free stability bundle, when available",
"",
"## Privacy",
"",
"- raw chat text, webhook bodies, tool outputs, tokens, cookies, and secrets are not included intentionally",
"- log records keep operational summaries and safe metadata fields",
"- status and health snapshots redact secret fields, payload-like fields, and account/message identifiers",
"- config output keeps useful settings but redacts secrets, private identifiers, and prompt text",
].join("\n");
}
function defaultOutputPath(options: { now: Date; stateDir: string }): string {
return path.join(
options.stateDir,
"logs",
"support",
`${SUPPORT_EXPORT_PREFIX}${formatExportTimestamp(options.now)}-${process.pid}${SUPPORT_EXPORT_SUFFIX}`,
);
}
function resolveOutputPath(options: {
outputPath?: string;
cwd: string;
env: NodeJS.ProcessEnv;
stateDir: string;
now: Date;
}): string {
const raw = options.outputPath?.trim();
if (!raw) {
return defaultOutputPath(options);
}
const resolved =
path.isAbsolute(raw) || raw.startsWith("~")
? resolveHomeRelativePath(raw, { env: options.env })
: path.resolve(options.cwd, raw);
try {
if (fs.statSync(resolved).isDirectory()) {
return path.join(
resolved,
`${SUPPORT_EXPORT_PREFIX}${formatExportTimestamp(options.now)}-${process.pid}${SUPPORT_EXPORT_SUFFIX}`,
);
}
} catch {
// Non-existing output paths are treated as files.
}
return resolved;
}
export async function buildDiagnosticSupportExport(
options: DiagnosticSupportExportOptions = {},
): Promise<DiagnosticSupportExportArtifact> {
const env = options.env ?? process.env;
const stateDir = options.stateDir ?? resolveStateDir(env);
const now = options.now ?? new Date();
const generatedAt = now.toISOString();
const configPath = resolveConfigPath(env, stateDir);
const stability = readStabilityBundle(options.stabilityBundle, stateDir);
const redaction = { env, stateDir };
const logTail = await collectSupportLogTail({
readLogTail: options.readLogTail ?? readConfiguredLogTail,
limit: normalizePositiveInteger(options.logLimit, DEFAULT_LOG_LIMIT),
maxBytes: normalizePositiveInteger(options.logMaxBytes, DEFAULT_LOG_MAX_BYTES),
redaction,
});
const config = readConfigExport({ configPath, env, stateDir });
const [statusSnapshot, healthSnapshot] = await Promise.all([
collectSupportSnapshot({
path: "status/gateway-status.json",
reader: options.readStatusSnapshot,
generatedAt,
redaction,
}),
collectSupportSnapshot({
path: "health/gateway-health.json",
reader: options.readHealthSnapshot,
generatedAt,
redaction,
}),
]);
const diagnostics = {
generatedAt,
openclawVersion: VERSION,
process: {
platform: process.platform,
arch: process.arch,
node: process.versions.node,
pid: process.pid,
},
stateDir: redactPathForSupport(stateDir, redaction),
config: config.shape,
logs: {
file: logTail.file,
cursor: logTail.cursor,
size: logTail.size,
lineCount: logTail.lineCount,
truncated: logTail.truncated,
reset: logTail.reset,
},
stability: describeStabilityForDiagnostics(stability, redaction),
status: statusSnapshot.summary,
health: healthSnapshot.summary,
};
const files: DiagnosticSupportExportFile[] = [
jsonFile("diagnostics.json", diagnostics),
jsonFile("config/shape.json", config.shape),
jsonFile("config/sanitized.json", config.sanitized ?? null),
{
path: "logs/openclaw-sanitized.jsonl",
mediaType: "application/x-ndjson",
content: logTail.lines.map((line) => JSON.stringify(line)).join("\n") + "\n",
},
];
for (const snapshot of [statusSnapshot, healthSnapshot]) {
if (snapshot.file) {
files.push(snapshot.file);
}
}
if (stability.status === "found") {
files.push(jsonFile("stability/latest.json", stability.bundle));
}
files.push(
textFile(
"summary.md",
renderSummary({
generatedAt,
stability,
logTail,
config: config.shape,
status: statusSnapshot.summary,
health: healthSnapshot.summary,
}),
),
);
const manifest: DiagnosticSupportExportManifest = {
version: DIAGNOSTIC_SUPPORT_EXPORT_VERSION,
generatedAt,
openclawVersion: VERSION,
platform: process.platform,
arch: process.arch,
node: process.versions.node,
stateDir: redactPathForSupport(stateDir, redaction),
contents: files.map((file) => ({
path: file.path,
mediaType: file.mediaType,
bytes: byteLength(file.content),
})),
privacy: {
payloadFree: true,
rawLogsIncluded: false,
notes: [
"Stability bundles are payload-free diagnostic snapshots.",
"Logs keep operational summaries and safe metadata fields; payload-like fields are omitted.",
"Status and health snapshots redact secrets, payload-like fields, and account/message identifiers.",
"Config output includes useful settings with credentials, private identifiers, and prompt text redacted.",
],
},
};
return {
manifest,
files: [jsonFile("manifest.json", manifest), ...files],
};
}
export async function writeDiagnosticSupportExport(
options: DiagnosticSupportExportOptions = {},
): Promise<WriteDiagnosticSupportExportResult> {
const env = options.env ?? process.env;
const stateDir = options.stateDir ?? resolveStateDir(env);
const now = options.now ?? new Date();
const outputPath = resolveOutputPath({
outputPath: options.outputPath,
cwd: options.cwd ?? process.cwd(),
env,
stateDir,
now,
});
const artifact = await buildDiagnosticSupportExport({ ...options, env, stateDir, now });
const zip = new JSZip();
for (const file of artifact.files) {
zip.file(file.path, file.content);
}
const buffer = await zip.generateAsync({
type: "nodebuffer",
compression: "DEFLATE",
compressionOptions: { level: 6 },
});
fs.mkdirSync(path.dirname(outputPath), { recursive: true, mode: 0o700 });
fs.writeFileSync(outputPath, buffer, { mode: 0o600 });
return {
path: outputPath,
bytes: buffer.length,
manifest: artifact.manifest,
};
}

View File

@@ -0,0 +1,221 @@
import { isBlockedObjectKey } from "../infra/prototype-keys.js";
import {
redactSupportString,
type SupportRedactionContext,
} from "./diagnostic-support-redaction.js";
const LOG_STRING_FIELD_RE =
/^(?:action|channel|code|component|endpoint|event|handshake|kind|level|localAddr|logger|method|model|module|msg|name|outcome|phase|pluginId|provider|reason|remoteAddr|requestId|runId|service|source|status|subsystem|surface|target|time|traceId|type)$/iu;
const LOG_SCALAR_FIELD_RE =
/^(?:active|attempt|bytes|count|durationMs|enabled|exitCode|intervalMs|jobs|limitBytes|localPort|nextWakeAtMs|pid|port|queueDepth|queued|remotePort|statusCode|waitMs|waiting)$/iu;
const OMITTED_LOG_FIELD_RE =
/(?:authorization|body|chat|content|cookie|credential|detail|error|header|instruction|message|password|payload|prompt|result|secret|session[-_]?id|session[-_]?key|text|token|tool|transcript|url)/iu;
const UNSAFE_LOG_MESSAGE_RE =
/(?:\b(?:ai response|assistant said|chat text|message contents|prompt|raw webhook body|tool output|tool result|transcript|user said|webhook body)\b|auto-responding\b.*:\s*["']|partial for\b.*:)/iu;
const MAX_LOG_STRING_LENGTH = 240;
const LOGTAPE_META_FIELD = "_meta";
const LOGTAPE_ARG_FIELD_RE = /^\d+$/u;
const LOGTAPE_META_STRING_FIELDS = new Map([
["logLevelName", "level"],
["name", "logger"],
]);
function byteLength(content: string): number {
return Buffer.byteLength(content, "utf8");
}
function asRecord(value: unknown): Record<string, unknown> | undefined {
if (!value || typeof value !== "object" || Array.isArray(value)) {
return undefined;
}
return value as Record<string, unknown>;
}
function createLogRecord(): Record<string, unknown> {
return Object.create(null) as Record<string, unknown>;
}
export function sanitizeSupportLogRecord(
line: string,
redaction: SupportRedactionContext,
): Record<string, unknown> {
let parsed: unknown;
try {
parsed = JSON.parse(line);
} catch {
return {
omitted: "unparsed",
bytes: byteLength(line),
};
}
const source = asRecord(parsed);
if (!source) {
return {
omitted: "non-object",
bytes: byteLength(line),
};
}
const sanitized = createLogRecord();
addNamedLogFields(sanitized, source, redaction);
addLogTapeMetaFields(sanitized, source, redaction);
addLogTapeArgFields(sanitized, source, redaction);
return Object.keys(sanitized).length > 0
? sanitized
: {
omitted: "no-safe-fields",
bytes: byteLength(line),
};
}
function addNamedLogFields(
sanitized: Record<string, unknown>,
source: Record<string, unknown>,
redaction: SupportRedactionContext,
): void {
for (const [key, value] of Object.entries(source)) {
if (key === LOGTAPE_META_FIELD || LOGTAPE_ARG_FIELD_RE.test(key)) {
continue;
}
addSafeLogField(sanitized, key, value, redaction);
}
}
function addLogTapeMetaFields(
sanitized: Record<string, unknown>,
source: Record<string, unknown>,
redaction: SupportRedactionContext,
): void {
const meta = asRecord(source[LOGTAPE_META_FIELD]);
if (!meta) {
return;
}
for (const [sourceKey, outputKey] of LOGTAPE_META_STRING_FIELDS) {
if (sanitized[outputKey] !== undefined) {
continue;
}
const value = meta[sourceKey];
if (typeof value === "string") {
if (sourceKey === "name") {
const record = parseJsonRecord(value);
if (record) {
addLogObjectFields(sanitized, record, redaction);
continue;
}
}
sanitized[outputKey] = sanitizeLogString(value, redaction);
}
}
}
function addLogTapeArgFields(
sanitized: Record<string, unknown>,
source: Record<string, unknown>,
redaction: SupportRedactionContext,
): void {
const args = Object.entries(source)
.filter(([key]) => LOGTAPE_ARG_FIELD_RE.test(key))
.toSorted(([left], [right]) => Number(left) - Number(right));
for (const [, value] of args) {
const record = typeof value === "string" ? parseJsonRecord(value) : asRecord(value);
if (record) {
addLogObjectFields(sanitized, record, redaction);
continue;
}
if (typeof value === "string") {
addLogTapeMessageField(sanitized, value, redaction);
}
}
}
function addLogTapeMessageField(
sanitized: Record<string, unknown>,
value: string,
redaction: SupportRedactionContext,
): void {
const message = sanitizeLogString(value, redaction);
if (sanitized.msg === undefined && message && !UNSAFE_LOG_MESSAGE_RE.test(message)) {
sanitized.msg = message;
return;
}
addOmittedLogMessageMetadata(sanitized, value);
}
function addOmittedLogMessageMetadata(sanitized: Record<string, unknown>, value: string): void {
sanitized.omitted = "log-message";
sanitized.omittedLogMessageBytes =
numericLogMetadata(sanitized.omittedLogMessageBytes) + byteLength(value);
sanitized.omittedLogMessageCount = numericLogMetadata(sanitized.omittedLogMessageCount) + 1;
}
function numericLogMetadata(value: unknown): number {
return typeof value === "number" && Number.isFinite(value) ? value : 0;
}
function parseJsonRecord(value: string): Record<string, unknown> | undefined {
const trimmed = value.trim();
if (!trimmed.startsWith("{") || !trimmed.endsWith("}")) {
return undefined;
}
try {
return asRecord(JSON.parse(trimmed));
} catch {
return undefined;
}
}
function addLogObjectFields(
sanitized: Record<string, unknown>,
source: Record<string, unknown>,
redaction: SupportRedactionContext,
): void {
for (const [key, value] of Object.entries(source)) {
addSafeLogField(sanitized, key, value, redaction);
}
}
function addSafeLogField(
sanitized: Record<string, unknown>,
key: string,
value: unknown,
redaction: SupportRedactionContext,
): void {
if (OMITTED_LOG_FIELD_RE.test(key)) {
return;
}
if (isBlockedObjectKey(key)) {
return;
}
if (!isSafeLogField(key, value)) {
return;
}
if (typeof value === "string") {
const message = sanitizeLogString(value, redaction);
if (key === "msg" && (!message || UNSAFE_LOG_MESSAGE_RE.test(message))) {
addOmittedLogMessageMetadata(sanitized, value);
return;
}
sanitized[key] = message;
} else if (typeof value === "number" || typeof value === "boolean" || value === null) {
sanitized[key] = value;
}
}
function sanitizeLogString(value: string, redaction: SupportRedactionContext): string {
return redactSupportString(value, redaction, {
maxLength: MAX_LOG_STRING_LENGTH,
truncationSuffix: "",
});
}
function isSafeLogField(key: string, value: unknown): boolean {
if (typeof value === "string") {
return LOG_STRING_FIELD_RE.test(key);
}
return LOG_STRING_FIELD_RE.test(key) || LOG_SCALAR_FIELD_RE.test(key);
}

View File

@@ -0,0 +1,467 @@
import path from "node:path";
import { isSecretRefShape } from "../config/redact-snapshot.secret-ref.js";
import { isBlockedObjectKey } from "../infra/prototype-keys.js";
import { isSensitiveUrlQueryParamName } from "../shared/net/redact-sensitive-url.js";
import { redactSensitiveText } from "./redact.js";
const SECRET_SUPPORT_FIELD_RE =
/(?:authorization|cookie|credential|key|password|passwd|secret|token)/iu;
const PAYLOAD_SUPPORT_FIELD_RE =
/(?:body|chat|content|detail|error|header|instruction|message|payload|prompt|result|text|tool|transcript)/iu;
const IDENTIFIER_SUPPORT_FIELD_RE =
/(?:account[-_]?id|chat[-_]?id|conversation[-_]?id|email|message[-_]?id|phone|thread[-_]?id|user[-_]?id|username)/iu;
const PRIVATE_MAP_SUPPORT_FIELD_RE = /^(?:accounts|chats|conversations|messages|threads|users)$/iu;
const CONFIG_PRIVATE_FIELD_RE =
/(?:allow[-_]?from|allow[-_]?to|deny[-_]?from|deny[-_]?to|blocked[-_]?from|blocked[-_]?users|owner[-_]?id|sender[-_]?id|recipient[-_]?id)/iu;
const SENSITIVE_COMMAND_ARG_RE =
/^--(?:api[-_]?key|hook[-_]?token|password|password-file|passwd|secret|token)(?:=.*)?$/iu;
const BASIC_AUTH_RE = /\bBasic\s+[A-Za-z0-9+/]+={0,2}/giu;
const COOKIE_HEADER_RE = /\b(Cookie|Set-Cookie)\s*:\s*[^\r\n]+/giu;
const AWS_ACCESS_KEY_ID_RE = /\b(?:AKIA|ASIA)[A-Z0-9]{16}\b/gu;
const JWT_RE = /\beyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\b/gu;
const URL_USERINFO_RE = /\b([a-z][a-z0-9+.-]*:\/\/)([^/@\s:?#]+)(?::([^/@\s?#]+))?@/giu;
const URL_PARAM_RE = /([?&])([^=&\s]+)=([^&#\s]+)/giu;
const EMAIL_RE = /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/giu;
const MATRIX_USER_ID_RE = /@[A-Za-z0-9._=-]+:[A-Za-z0-9.-]+/gu;
const MATRIX_ROOM_ID_RE = /![A-Za-z0-9._=-]+:[A-Za-z0-9.-]+/gu;
const MATRIX_EVENT_ID_RE = /\$[A-Za-z0-9_-]{16,}/gu;
const HANDLE_RE = /(^|[^\w:/])@[A-Za-z0-9_]{5,}\b(?!\.)/gu;
const LONG_DECIMAL_ID_RE = /\b\d{9,}\b/gu;
const MAX_SUPPORT_STRING_LENGTH = 2000;
const MAX_SUPPORT_SNAPSHOT_DEPTH = 10;
const MAX_SUPPORT_ARRAY_ITEMS = 1000;
const MAX_SUPPORT_OBJECT_ENTRIES = 1000;
const DEFAULT_TRUNCATION_SUFFIX = "...<truncated>";
const TRUNCATED_SUPPORT_FIELD = "<truncated>";
export type SupportRedactionContext = {
env: NodeJS.ProcessEnv;
stateDir: string;
};
type RedactSupportStringOptions = {
maxLength?: number;
truncationSuffix?: string;
};
type PathRedactionPrefix = {
prefix: string;
label: string;
caseInsensitive: boolean;
};
type SupportObjectEntry = {
key: string;
value: unknown;
};
type LimitedSupportArray = {
count: number;
items: unknown[];
};
function asRecord(value: unknown): Record<string, unknown> | undefined {
if (!value || typeof value !== "object" || Array.isArray(value)) {
return undefined;
}
return value as Record<string, unknown>;
}
function isPrivateSupportField(key: string): boolean {
return (
SECRET_SUPPORT_FIELD_RE.test(key) ||
PAYLOAD_SUPPORT_FIELD_RE.test(key) ||
IDENTIFIER_SUPPORT_FIELD_RE.test(key)
);
}
function isPrivateConfigField(key: string): boolean {
return isPrivateSupportField(key) || CONFIG_PRIVATE_FIELD_RE.test(key);
}
function sanitizeSecretRefForSupport(value: Record<string, unknown>): Record<string, unknown> {
const sanitized = createSupportRecord();
if (typeof value.source === "string") {
sanitized.source = value.source;
}
if (typeof value.provider === "string") {
sanitized.provider = value.provider;
}
sanitized.id = "<redacted>";
return sanitized;
}
function privateMapEntryLabel(key: string): string {
const normalized = key.toLowerCase();
return normalized.endsWith("s") ? normalized.slice(0, -1) : normalized;
}
function createSupportRecord(): Record<string, unknown> {
return Object.create(null) as Record<string, unknown>;
}
function hasOwnRecordKey(record: Record<string, unknown>, key: string): boolean {
return Object.prototype.hasOwnProperty.call(record, key);
}
function countOwnObjectEntries(record: Record<string, unknown>): number {
let count = 0;
for (const key in record) {
if (hasOwnRecordKey(record, key)) {
count += 1;
}
}
return count;
}
function limitedSupportObjectEntries(record: Record<string, unknown>): {
count: number;
entries: SupportObjectEntry[];
} {
let count = 0;
const entries: SupportObjectEntry[] = [];
for (const key in record) {
if (!hasOwnRecordKey(record, key)) {
continue;
}
count += 1;
if (isBlockedObjectKey(key) || entries.length >= MAX_SUPPORT_OBJECT_ENTRIES) {
continue;
}
entries.push({ key, value: record[key] });
}
entries.sort((a, b) => a.key.localeCompare(b.key));
return { count, entries };
}
function limitedSupportArray(value: unknown[]): LimitedSupportArray {
return {
count: value.length,
items: value.slice(0, MAX_SUPPORT_ARRAY_ITEMS),
};
}
function addTruncationMetadata(sanitized: Record<string, unknown>, count: number): void {
if (count > MAX_SUPPORT_OBJECT_ENTRIES) {
sanitized[TRUNCATED_SUPPORT_FIELD] = {
truncated: true,
count,
limit: MAX_SUPPORT_OBJECT_ENTRIES,
};
}
}
function supportArrayResult(items: unknown[], count: number): unknown[] | Record<string, unknown> {
if (count <= MAX_SUPPORT_ARRAY_ITEMS) {
return items;
}
return {
items,
truncated: true,
count,
limit: MAX_SUPPORT_ARRAY_ITEMS,
};
}
function isWindowsAbsolutePath(value: string): boolean {
return /^(?:[A-Za-z]:[\\/]|\\\\)/u.test(value);
}
function normalizePathPrefix(value: string): string {
return isWindowsAbsolutePath(value) ? path.win32.resolve(value) : path.resolve(value);
}
function addPathPrefix(
prefixes: Map<string, PathRedactionPrefix>,
prefix: string,
label: string,
caseInsensitive: boolean,
): void {
if (!prefixes.has(prefix)) {
prefixes.set(prefix, { prefix, label, caseInsensitive });
}
}
function addPathPrefixVariants(
prefixes: Map<string, PathRedactionPrefix>,
value: string | undefined,
label: string,
): void {
if (!value) {
return;
}
const normalized = normalizePathPrefix(value);
const caseInsensitive = isWindowsAbsolutePath(normalized);
addPathPrefix(prefixes, normalized, label, caseInsensitive);
if (isWindowsAbsolutePath(normalized)) {
addPathPrefix(prefixes, normalized.replaceAll("\\", "/"), label, caseInsensitive);
}
}
function pathRedactionPrefixes(options: SupportRedactionContext): PathRedactionPrefix[] {
const prefixes = new Map<string, PathRedactionPrefix>();
addPathPrefixVariants(prefixes, options.stateDir, "$OPENCLAW_STATE_DIR");
addPathPrefixVariants(prefixes, options.env.HOME, "~");
addPathPrefixVariants(prefixes, options.env.USERPROFILE, "~");
return [...prefixes.values()].toSorted((a, b) => b.prefix.length - a.prefix.length);
}
function pathCandidates(file: string): string[] {
if (!isWindowsAbsolutePath(file)) {
return [path.resolve(file)];
}
const resolved = path.win32.resolve(file);
return [resolved, resolved.replaceAll("\\", "/")];
}
function hasPathPrefix(value: string, prefix: PathRedactionPrefix): boolean {
return prefix.caseInsensitive
? value.toLowerCase().startsWith(prefix.prefix.toLowerCase())
: value.startsWith(prefix.prefix);
}
function matchPathPrefix(file: string, prefix: PathRedactionPrefix): string | undefined {
if (file.length === prefix.prefix.length && hasPathPrefix(file, prefix)) {
return "";
}
if (!hasPathPrefix(file, prefix)) {
return undefined;
}
const next = file[prefix.prefix.length];
return next === "/" || next === "\\" ? file.slice(prefix.prefix.length) : undefined;
}
function isSupportAbsolutePath(value: string): boolean {
return path.isAbsolute(value) || isWindowsAbsolutePath(value);
}
export function redactPathForSupport(file: string, options: SupportRedactionContext): string {
if (file.startsWith("$")) {
return file;
}
const candidates = pathCandidates(file);
for (const next of candidates) {
for (const prefix of pathRedactionPrefixes(options)) {
const suffix = matchPathPrefix(next, prefix);
if (suffix !== undefined) {
return `${prefix.label}${suffix}`;
}
}
}
return redactSensitiveTextForSupport(candidates[0] ?? file);
}
function replaceKnownPathPrefix(value: string, prefix: PathRedactionPrefix): string {
const search = prefix.caseInsensitive ? prefix.prefix.toLowerCase() : prefix.prefix;
const haystack = prefix.caseInsensitive ? value.toLowerCase() : value;
let offset = 0;
let next = "";
while (offset < value.length) {
const index = haystack.indexOf(search, offset);
if (index === -1) {
next += value.slice(offset);
break;
}
next += value.slice(offset, index);
next += prefix.label;
offset = index + prefix.prefix.length;
}
return next;
}
function redactKnownPathPrefixesForSupport(
value: string,
redaction: SupportRedactionContext,
): string {
let next = value;
for (const prefix of pathRedactionPrefixes(redaction)) {
next = replaceKnownPathPrefix(next, prefix);
}
return next;
}
export function redactTextForSupport(value: string): string {
let redacted = redactSensitiveTextForSupport(value);
redacted = redactCommonCredentialTextForSupport(redacted);
redacted = redactUrlSecretsForSupport(redacted);
redacted = redactServiceIdentifiersForSupport(redacted);
redacted = redactContactIdentifiersForSupport(redacted);
return redactLongIdentifiersForSupport(redacted);
}
function redactSensitiveTextForSupport(value: string): string {
return redactSensitiveText(value, { mode: "tools" });
}
function redactCommonCredentialTextForSupport(value: string): string {
return value
.replace(BASIC_AUTH_RE, "Basic <redacted>")
.replace(COOKIE_HEADER_RE, "$1: <redacted>")
.replace(AWS_ACCESS_KEY_ID_RE, "<redacted-aws-key>")
.replace(JWT_RE, "<redacted-jwt>");
}
function redactUrlSecretsForSupport(value: string): string {
return value
.replace(URL_USERINFO_RE, (_match, scheme: string, _username: string, password?: string) =>
password ? `${scheme}<redacted>:<redacted>@` : `${scheme}<redacted>@`,
)
.replace(URL_PARAM_RE, (match, prefix: string, key: string) =>
isSensitiveUrlQueryParamName(key) ? `${prefix}${key}=<redacted>` : match,
);
}
function redactContactIdentifiersForSupport(value: string): string {
return value.replace(EMAIL_RE, "<redacted-email>").replace(HANDLE_RE, "$1<redacted-handle>");
}
function redactServiceIdentifiersForSupport(value: string): string {
return value
.replace(MATRIX_USER_ID_RE, "<redacted-matrix-user>")
.replace(MATRIX_ROOM_ID_RE, "<redacted-matrix-room>")
.replace(MATRIX_EVENT_ID_RE, "<redacted-matrix-event>");
}
function redactLongIdentifiersForSupport(value: string): string {
return value.replace(LONG_DECIMAL_ID_RE, "<redacted-id>");
}
export function redactSupportString(
value: string,
redaction: SupportRedactionContext,
options: RedactSupportStringOptions = {},
): string {
const maxLength = options.maxLength ?? MAX_SUPPORT_STRING_LENGTH;
const truncationSuffix = options.truncationSuffix ?? DEFAULT_TRUNCATION_SUFFIX;
const redacted = redactTextForSupport(value);
const pathRedacted = isSupportAbsolutePath(redacted)
? redactPathForSupport(redacted, redaction)
: redactKnownPathPrefixesForSupport(redacted, redaction);
if (pathRedacted.length <= maxLength) {
return pathRedacted;
}
return `${pathRedacted.slice(0, maxLength)}${truncationSuffix}`;
}
function sanitizeCommandArguments(args: unknown[], redaction: SupportRedactionContext): unknown[] {
let redactNext = false;
return args.map((arg) => {
if (typeof arg !== "string") {
return sanitizeSupportSnapshotValue(arg, redaction);
}
if (redactNext) {
redactNext = false;
return "<redacted>";
}
if (SENSITIVE_COMMAND_ARG_RE.test(arg)) {
const hasInlineValue = arg.includes("=");
if (!hasInlineValue) {
redactNext = true;
}
return hasInlineValue ? arg.replace(/=.*/u, "=<redacted>") : arg;
}
return redactSupportString(arg, redaction);
});
}
export function sanitizeSupportSnapshotValue(
value: unknown,
redaction: SupportRedactionContext,
key = "",
depth = 0,
): unknown {
if (value == null || typeof value === "boolean") {
return value;
}
if (typeof value === "number") {
return isPrivateSupportField(key) ? "<redacted>" : value;
}
if (typeof value === "string") {
return isPrivateSupportField(key) ? "<redacted>" : redactSupportString(value, redaction);
}
if (depth >= MAX_SUPPORT_SNAPSHOT_DEPTH) {
return "<truncated>";
}
if (Array.isArray(value)) {
const { count, items } = limitedSupportArray(value);
if (key === "programArguments") {
return supportArrayResult(sanitizeCommandArguments(items, redaction), count);
}
return supportArrayResult(
items.map((entry) => sanitizeSupportSnapshotValue(entry, redaction, key, depth + 1)),
count,
);
}
const record = asRecord(value);
if (!record) {
return "<unsupported>";
}
if (PRIVATE_MAP_SUPPORT_FIELD_RE.test(key)) {
return { count: countOwnObjectEntries(record) };
}
const sanitized = createSupportRecord();
const { count, entries } = limitedSupportObjectEntries(record);
for (const { key: entryKey, value: entryValue } of entries) {
sanitized[entryKey] = isPrivateSupportField(entryKey)
? "<redacted>"
: sanitizeSupportSnapshotValue(entryValue, redaction, entryKey, depth + 1);
}
addTruncationMetadata(sanitized, count);
return sanitized;
}
export function sanitizeSupportConfigValue(
value: unknown,
redaction: SupportRedactionContext,
key = "",
depth = 0,
): unknown {
if (value == null || typeof value === "boolean") {
return value;
}
if (typeof value === "number") {
return isPrivateConfigField(key) ? "<redacted>" : value;
}
if (typeof value === "string") {
return isPrivateConfigField(key) ? "<redacted>" : redactSupportString(value, redaction);
}
if (depth >= MAX_SUPPORT_SNAPSHOT_DEPTH) {
return "<truncated>";
}
if (Array.isArray(value)) {
if (isPrivateConfigField(key)) {
return {
redacted: true,
count: value.length,
};
}
const { count, items } = limitedSupportArray(value);
return supportArrayResult(
items.map((entry) => sanitizeSupportConfigValue(entry, redaction, key, depth + 1)),
count,
);
}
const record = asRecord(value);
if (!record) {
return "<unsupported>";
}
if (isPrivateConfigField(key)) {
return isSecretRefShape(record) ? sanitizeSecretRefForSupport(record) : "<redacted>";
}
const sanitized = createSupportRecord();
let privateEntryIndex = 0;
const redactEntryKeys = PRIVATE_MAP_SUPPORT_FIELD_RE.test(key);
const privateEntryLabel = redactEntryKeys ? privateMapEntryLabel(key) : "";
const { count, entries } = limitedSupportObjectEntries(record);
for (const { key: entryKey, value: entryValue } of entries) {
let outputKey = entryKey;
if (redactEntryKeys) {
privateEntryIndex += 1;
outputKey = `<redacted-${privateEntryLabel}-${privateEntryIndex}>`;
}
sanitized[outputKey] = sanitizeSupportConfigValue(entryValue, redaction, entryKey, depth + 1);
}
addTruncationMetadata(sanitized, count);
return sanitized;
}

View File

@@ -1,7 +1,12 @@
import fs from "node:fs";
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import { importFreshModule } from "../../test/helpers/import-fresh.js";
import { onDiagnosticEvent, resetDiagnosticEventsForTest } from "../infra/diagnostic-events.js";
import {
emitDiagnosticEvent,
onDiagnosticEvent,
resetDiagnosticEventsForTest,
setDiagnosticsEnabledForProcess,
} from "../infra/diagnostic-events.js";
import {
diagnosticSessionStates,
getDiagnosticSessionStateCountForTest,
@@ -9,6 +14,7 @@ import {
pruneDiagnosticSessionStates,
resetDiagnosticSessionStateForTest,
} from "./diagnostic-session-state.js";
import { getDiagnosticStabilitySnapshot } from "./diagnostic-stability.js";
import {
logSessionStateChange,
resetDiagnosticStateForTest,
@@ -16,6 +22,16 @@ import {
startDiagnosticHeartbeat,
} from "./diagnostic.js";
function createEmitMemorySampleMock() {
return vi.fn(() => ({
rssBytes: 100,
heapTotalBytes: 80,
heapUsedBytes: 40,
externalBytes: 10,
arrayBuffersBytes: 5,
}));
}
describe("diagnostic session state pruning", () => {
beforeEach(() => {
vi.useFakeTimers();
@@ -119,6 +135,81 @@ describe("stuck session diagnostics threshold", () => {
expect(events.filter((event) => event.type === "session.stuck")).toHaveLength(1);
});
it("starts and stops the stability recorder with the heartbeat lifecycle", () => {
startDiagnosticHeartbeat({
diagnostics: {
enabled: true,
},
});
logSessionStateChange({ sessionId: "s1", sessionKey: "main", state: "processing" });
expect(getDiagnosticStabilitySnapshot({ limit: 10 }).events).toContainEqual(
expect.objectContaining({
type: "session.state",
outcome: "processing",
}),
);
const [event] = getDiagnosticStabilitySnapshot({ limit: 10 }).events;
expect(event).not.toHaveProperty("sessionId");
expect(event).not.toHaveProperty("sessionKey");
resetDiagnosticStateForTest();
emitDiagnosticEvent({ type: "webhook.received", channel: "telegram" });
expect(getDiagnosticStabilitySnapshot({ limit: 10 }).events).toEqual([]);
});
it("does not track session state when diagnostics are disabled", () => {
const events: string[] = [];
const unsubscribe = onDiagnosticEvent((event) => events.push(event.type));
try {
setDiagnosticsEnabledForProcess(false);
logSessionStateChange({ sessionId: "s1", sessionKey: "main", state: "processing" });
} finally {
unsubscribe();
}
expect(events).toEqual([]);
expect(getDiagnosticSessionStateCountForTest()).toBe(0);
});
it("checks memory pressure every tick without recording idle samples", () => {
const emitMemorySample = createEmitMemorySampleMock();
startDiagnosticHeartbeat(
{
diagnostics: {
enabled: true,
},
},
{ emitMemorySample },
);
vi.advanceTimersByTime(30_000);
expect(emitMemorySample).toHaveBeenLastCalledWith({ emitSample: false });
logSessionStateChange({ sessionId: "s1", sessionKey: "main", state: "processing" });
vi.advanceTimersByTime(30_000);
expect(emitMemorySample).toHaveBeenLastCalledWith({ emitSample: true });
});
it("does not start the heartbeat when diagnostics are disabled by config", () => {
const emitMemorySample = createEmitMemorySampleMock();
startDiagnosticHeartbeat(
{
diagnostics: {
enabled: false,
},
},
{ emitMemorySample },
);
vi.advanceTimersByTime(30_000);
expect(emitMemorySample).not.toHaveBeenCalled();
});
it("falls back to default threshold when config is absent", () => {
const events: Array<{ type: string }> = [];
const unsubscribe = onDiagnosticEvent((event) => {

View File

@@ -1,6 +1,11 @@
import { getRuntimeConfig } from "../config/config.js";
import type { OpenClawConfig } from "../config/types.openclaw.js";
import { emitDiagnosticEvent } from "../infra/diagnostic-events.js";
import {
areDiagnosticsEnabledForProcess,
emitDiagnosticEvent,
isDiagnosticsEnabled,
} from "../infra/diagnostic-events.js";
import { emitDiagnosticMemorySample, resetDiagnosticMemoryForTest } from "./diagnostic-memory.js";
import {
diagnosticLogger as diag,
getLastDiagnosticActivityAt,
@@ -16,6 +21,16 @@ import {
type SessionRef,
type SessionStateValue,
} from "./diagnostic-session-state.js";
import {
installDiagnosticStabilityFatalHook,
resetDiagnosticStabilityBundleForTest,
uninstallDiagnosticStabilityFatalHook,
} from "./diagnostic-stability-bundle.js";
import {
resetDiagnosticStabilityRecorderForTest,
startDiagnosticStabilityRecorder,
stopDiagnosticStabilityRecorder,
} from "./diagnostic-stability.js";
export { diagnosticLogger, logLaneDequeue, logLaneEnqueue } from "./diagnostic-runtime.js";
const webhookStats = {
@@ -28,15 +43,50 @@ const webhookStats = {
const DEFAULT_STUCK_SESSION_WARN_MS = 120_000;
const MIN_STUCK_SESSION_WARN_MS = 1_000;
const MAX_STUCK_SESSION_WARN_MS = 24 * 60 * 60 * 1000;
const RECENT_DIAGNOSTIC_ACTIVITY_MS = 120_000;
let commandPollBackoffRuntimePromise: Promise<
typeof import("../agents/command-poll-backoff.runtime.js")
> | null = null;
type EmitDiagnosticMemorySample = typeof emitDiagnosticMemorySample;
type DiagnosticWorkSnapshot = {
activeCount: number;
waitingCount: number;
queuedCount: number;
};
function loadCommandPollBackoffRuntime() {
commandPollBackoffRuntimePromise ??= import("../agents/command-poll-backoff.runtime.js");
return commandPollBackoffRuntimePromise;
}
function getDiagnosticWorkSnapshot(): DiagnosticWorkSnapshot {
let activeCount = 0;
let waitingCount = 0;
let queuedCount = 0;
for (const state of diagnosticSessionStates.values()) {
if (state.state === "processing") {
activeCount += 1;
} else if (state.state === "waiting") {
waitingCount += 1;
}
queuedCount += state.queueDepth;
}
return { activeCount, waitingCount, queuedCount };
}
function hasOpenDiagnosticWork(snapshot: DiagnosticWorkSnapshot): boolean {
return snapshot.activeCount > 0 || snapshot.waitingCount > 0 || snapshot.queuedCount > 0;
}
function hasRecentDiagnosticActivity(now: number): boolean {
const lastActivityAt = getLastDiagnosticActivityAt();
return lastActivityAt > 0 && now - lastActivityAt <= RECENT_DIAGNOSTIC_ACTIVITY_MS;
}
export function resolveStuckSessionWarnMs(config?: OpenClawConfig): number {
const raw = config?.diagnostics?.stuckSessionWarnMs;
if (typeof raw !== "number" || !Number.isFinite(raw)) {
@@ -54,6 +104,9 @@ export function logWebhookReceived(params: {
updateType?: string;
chatId?: number | string;
}) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
webhookStats.received += 1;
webhookStats.lastReceived = Date.now();
if (diag.isEnabled("debug")) {
@@ -78,6 +131,9 @@ export function logWebhookProcessed(params: {
chatId?: number | string;
durationMs?: number;
}) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
webhookStats.processed += 1;
if (diag.isEnabled("debug")) {
diag.debug(
@@ -104,6 +160,9 @@ export function logWebhookError(params: {
chatId?: number | string;
error: string;
}) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
webhookStats.errors += 1;
diag.error(
`webhook error: channel=${params.channel} type=${params.updateType ?? "unknown"} chatId=${
@@ -126,6 +185,9 @@ export function logMessageQueued(params: {
channel?: string;
source: string;
}) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
const state = getDiagnosticSessionState(params);
state.queueDepth += 1;
state.lastActivity = Date.now();
@@ -158,6 +220,9 @@ export function logMessageProcessed(params: {
reason?: string;
error?: string;
}) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
const wantsLog = params.outcome === "error" ? diag.isEnabled("error") : diag.isEnabled("debug");
if (wantsLog) {
const payload = `message processed: channel=${params.channel} chatId=${
@@ -196,6 +261,9 @@ export function logSessionStateChange(
reason?: string;
},
) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
const state = getDiagnosticSessionState(params);
const isProbeSession = state.sessionId?.startsWith("probe-") ?? false;
const prevState = state.state;
@@ -226,6 +294,9 @@ export function logSessionStateChange(
}
export function logSessionStuck(params: SessionRef & { state: SessionStateValue; ageMs: number }) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
const state = getDiagnosticSessionState(params);
diag.warn(
`stuck session: sessionId=${state.sessionId ?? "unknown"} sessionKey=${
@@ -244,6 +315,9 @@ export function logSessionStuck(params: SessionRef & { state: SessionStateValue;
}
export function logRunAttempt(params: SessionRef & { runId: string; attempt: number }) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
diag.debug(
`run attempt: sessionId=${params.sessionId ?? "unknown"} sessionKey=${
params.sessionKey ?? "unknown"
@@ -275,6 +349,9 @@ export function logToolLoopAction(
pairedToolName?: string;
},
) {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
const payload = `tool loop: sessionId=${params.sessionId ?? "unknown"} sessionKey=${
params.sessionKey ?? "unknown"
} tool=${params.toolName} level=${params.level} action=${params.action} detector=${
@@ -301,12 +378,13 @@ export function logToolLoopAction(
}
export function logActiveRuns() {
if (!areDiagnosticsEnabledForProcess()) {
return;
}
const now = Date.now();
const activeSessions = Array.from(diagnosticSessionStates.entries())
.filter(([, s]) => s.state === "processing")
.map(
([id, s]) =>
`${id}(q=${s.queueDepth},age=${Math.round((Date.now() - s.lastActivity) / 1000)}s)`,
);
.map(([id, s]) => `${id}(q=${s.queueDepth},age=${Math.round((now - s.lastActivity) / 1000)}s)`);
diag.debug(`active runs: count=${activeSessions.length} sessions=[${activeSessions.join(", ")}]`);
markActivity();
}
@@ -315,8 +393,13 @@ let heartbeatInterval: NodeJS.Timeout | null = null;
export function startDiagnosticHeartbeat(
config?: OpenClawConfig,
opts?: { getConfig?: () => OpenClawConfig },
opts?: { getConfig?: () => OpenClawConfig; emitMemorySample?: EmitDiagnosticMemorySample },
) {
if (!areDiagnosticsEnabledForProcess() || !isDiagnosticsEnabled(config)) {
return;
}
startDiagnosticStabilityRecorder();
installDiagnosticStabilityFatalHook();
if (heartbeatInterval) {
return;
}
@@ -332,31 +415,19 @@ export function startDiagnosticHeartbeat(
const stuckSessionWarnMs = resolveStuckSessionWarnMs(heartbeatConfig);
const now = Date.now();
pruneDiagnosticSessionStates(now, true);
const activeCount = Array.from(diagnosticSessionStates.values()).filter(
(s) => s.state === "processing",
).length;
const waitingCount = Array.from(diagnosticSessionStates.values()).filter(
(s) => s.state === "waiting",
).length;
const totalQueued = Array.from(diagnosticSessionStates.values()).reduce(
(sum, s) => sum + s.queueDepth,
0,
);
const hasActivity =
getLastDiagnosticActivityAt() > 0 ||
webhookStats.received > 0 ||
activeCount > 0 ||
waitingCount > 0 ||
totalQueued > 0;
if (!hasActivity) {
return;
}
if (now - getLastDiagnosticActivityAt() > 120_000 && activeCount === 0 && waitingCount === 0) {
const work = getDiagnosticWorkSnapshot();
const shouldRecordMemorySample =
hasRecentDiagnosticActivity(now) || hasOpenDiagnosticWork(work);
(opts?.emitMemorySample ?? emitDiagnosticMemorySample)({
emitSample: shouldRecordMemorySample,
});
if (!shouldRecordMemorySample) {
return;
}
diag.debug(
`heartbeat: webhooks=${webhookStats.received}/${webhookStats.processed}/${webhookStats.errors} active=${activeCount} waiting=${waitingCount} queued=${totalQueued}`,
`heartbeat: webhooks=${webhookStats.received}/${webhookStats.processed}/${webhookStats.errors} active=${work.activeCount} waiting=${work.waitingCount} queued=${work.queuedCount}`,
);
emitDiagnosticEvent({
type: "diagnostic.heartbeat",
@@ -365,9 +436,9 @@ export function startDiagnosticHeartbeat(
processed: webhookStats.processed,
errors: webhookStats.errors,
},
active: activeCount,
waiting: waitingCount,
queued: totalQueued,
active: work.activeCount,
waiting: work.waitingCount,
queued: work.queuedCount,
});
void loadCommandPollBackoffRuntime()
@@ -400,6 +471,8 @@ export function stopDiagnosticHeartbeat() {
clearInterval(heartbeatInterval);
heartbeatInterval = null;
}
stopDiagnosticStabilityRecorder();
uninstallDiagnosticStabilityFatalHook();
}
export function getDiagnosticSessionStateCountForTest(): number {
@@ -414,4 +487,7 @@ export function resetDiagnosticStateForTest(): void {
webhookStats.errors = 0;
webhookStats.lastReceived = 0;
stopDiagnosticHeartbeat();
resetDiagnosticMemoryForTest();
resetDiagnosticStabilityRecorderForTest();
resetDiagnosticStabilityBundleForTest();
}

View File

@@ -48,6 +48,9 @@ describe("isSensitiveUrlQueryParamName", () => {
it("matches the auth-oriented query params used by MCP SSE config redaction", () => {
expect(isSensitiveUrlQueryParamName("token")).toBe(true);
expect(isSensitiveUrlQueryParamName("refresh_token")).toBe(true);
expect(isSensitiveUrlQueryParamName("access-token")).toBe(true);
expect(isSensitiveUrlQueryParamName("hook-token")).toBe(true);
expect(isSensitiveUrlQueryParamName("passwd")).toBe(true);
expect(isSensitiveUrlQueryParamName("signature")).toBe(true);
expect(isSensitiveUrlQueryParamName("safe")).toBe(false);
});

View File

@@ -10,16 +10,20 @@ const SENSITIVE_URL_QUERY_PARAM_NAMES = new Set([
"apikey",
"secret",
"access_token",
"auth_token",
"password",
"pass",
"passwd",
"auth",
"client_secret",
"hook_token",
"refresh_token",
"signature",
]);
export function isSensitiveUrlQueryParamName(name: string): boolean {
return SENSITIVE_URL_QUERY_PARAM_NAMES.has(normalizeLowercaseStringOrEmpty(name));
const normalized = normalizeLowercaseStringOrEmpty(name).replaceAll("-", "_");
return SENSITIVE_URL_QUERY_PARAM_NAMES.has(normalized);
}
export function isSensitiveUrlConfigPath(path: string): boolean {