mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-21 14:11:26 +00:00
Skills: harden heap snapshot diffing
This commit is contained in:
@@ -1,11 +1,11 @@
|
||||
---
|
||||
name: openclaw-test-heap-leaks
|
||||
description: Investigate `pnpm test` memory growth, Vitest worker OOMs, and suspicious RSS increases in OpenClaw using the `scripts/test-parallel.mjs` heap snapshot tooling. Use when Codex needs to reproduce test-lane memory growth, collect repeated `.heapsnapshot` files, compare snapshots from the same worker PID, distinguish transformed-module retention from real data leaks, and fix or reduce the impact by patching cleanup logic or isolating hotspot tests.
|
||||
description: Investigate `pnpm test` memory growth, Vitest worker OOMs, and suspicious RSS increases in OpenClaw using the `scripts/test-parallel.mjs` heap snapshot tooling. Use when Codex needs to reproduce test-lane memory growth, collect repeated `.heapsnapshot` files, compare snapshots from the same worker PID, triage likely transformed-module retention versus likely runtime leaks, and fix or reduce the impact by patching cleanup logic or isolating hotspot tests.
|
||||
---
|
||||
|
||||
# OpenClaw Test Heap Leaks
|
||||
|
||||
Use this skill for test-memory investigations. Do not guess from RSS alone when heap snapshots are available.
|
||||
Use this skill for test-memory investigations. Do not guess from RSS alone when heap snapshots are available. Treat snapshot-name deltas as triage evidence, not proof, until retainers or dominators support the call.
|
||||
|
||||
## Workflow
|
||||
|
||||
@@ -14,19 +14,23 @@ Use this skill for test-memory investigations. Do not guess from RSS alone when
|
||||
- `pnpm canvas:a2ui:bundle && OPENCLAW_TEST_MEMORY_TRACE=1 OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap OPENCLAW_TEST_WORKERS=2 OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 pnpm test`
|
||||
- Keep `OPENCLAW_TEST_MEMORY_TRACE=1` enabled so the wrapper prints per-file RSS summaries alongside the snapshots.
|
||||
- If the report is about a specific shard or worker budget, preserve that shape.
|
||||
- Before you analyze snapshots, identify the real lane names from `[test-parallel] start ...` lines or `pnpm test --plan`. Do not assume a single `unit-fast` lane; local plans often split into `unit-fast-batch-*`.
|
||||
|
||||
2. Wait for repeated snapshots before concluding anything.
|
||||
- Take at least two intervals from the same lane.
|
||||
- Compare snapshots from the same PID inside one lane directory such as `.tmp/heapsnap/unit-fast/`.
|
||||
- Use `scripts/heapsnapshot-delta.mjs` to compare either two files directly or the earliest/latest pair per PID in one lane directory.
|
||||
- Compare snapshots from the same PID inside the real lane directory such as `.tmp/heapsnap/unit-fast-batch-2/`.
|
||||
- Use `.agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs` to compare either two files directly or the earliest/latest pair per PID in one lane directory.
|
||||
- If the helper suggests transformed-module retention, confirm the top entries in DevTools retainers/dominators before calling it solved.
|
||||
|
||||
3. Classify the growth before choosing a fix.
|
||||
- If growth is dominated by Vite/Vitest transformed source strings, `Module`, `system / Context`, bytecode, descriptor arrays, or property maps, treat it as retained module graph growth in long-lived workers.
|
||||
- If growth is dominated by Vite/Vitest transformed source strings, `Module`, `system / Context`, bytecode, descriptor arrays, or property maps, treat it as likely retained module graph growth in long-lived workers.
|
||||
- If growth is dominated by app objects, caches, buffers, server handles, timers, mock state, sqlite state, or similar runtime objects, treat it as a likely cleanup or lifecycle leak.
|
||||
- If the names are ambiguous, stop short of a confident label and inspect retainers/dominators in DevTools for the top deltas.
|
||||
|
||||
4. Fix the right layer.
|
||||
- For retained transformed-module growth in shared workers:
|
||||
- Move hotspot files out of `unit-fast` by updating `test/fixtures/test-parallel.behavior.json`.
|
||||
- For likely retained transformed-module growth in shared workers:
|
||||
- Prefer timing and hotspot-driven scheduling fixes first. Check whether the file is already represented in `test/fixtures/test-timings.unit.json` and whether `scripts/test-update-memory-hotspots.mjs` should refresh the measured hotspot manifest before hand-editing behavior overrides.
|
||||
- Move hotspot files out of the real shared lane by updating `test/fixtures/test-parallel.behavior.json` only when timing-driven peeling is insufficient.
|
||||
- Prefer `singletonIsolated` for files that are safe alone but inflate shared worker heaps.
|
||||
- If the file should already have been peeled out by timings but is absent from `test/fixtures/test-timings.unit.json`, call that out explicitly. Missing timings are a scheduling blind spot.
|
||||
- For real leaks:
|
||||
@@ -40,24 +44,24 @@ Use this skill for test-memory investigations. Do not guess from RSS alone when
|
||||
|
||||
## Heuristics
|
||||
|
||||
- Do not call everything a leak. In this repo, large `unit-fast` growth can be a worker-lifetime problem rather than an application object leak.
|
||||
- Do not call everything a leak. In this repo, large `unit-fast` or `unit-fast-batch-*` growth can be a worker-lifetime problem rather than an application object leak.
|
||||
- `scripts/test-parallel.mjs` and `scripts/test-parallel-memory.mjs` are the primary control points for wrapper diagnostics.
|
||||
- The lane names printed by `[test-parallel] start ...` and `[test-parallel][mem] summary ...` tell you where to focus.
|
||||
- When one or two files account for most of the delta and they are missing from timings, reducing impact by isolating them is usually the first pragmatic fix.
|
||||
- When the same retained object families grow across multiple intervals in the same worker PID, trust the snapshots over intuition.
|
||||
- When the same retained object families grow across multiple intervals in the same worker PID, trust the snapshots over intuition, then confirm ambiguous calls with retainer evidence.
|
||||
|
||||
## Snapshot Comparison
|
||||
|
||||
- Direct comparison:
|
||||
- `node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs before.heapsnapshot after.heapsnapshot`
|
||||
- Auto-select earliest/latest snapshots per PID within one lane:
|
||||
- `node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs --lane-dir .tmp/heapsnap/unit-fast`
|
||||
- `node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs --lane-dir .tmp/heapsnap/unit-fast-batch-2`
|
||||
- Useful flags:
|
||||
- `--top 40`
|
||||
- `--min-kb 32`
|
||||
- `--pid 16133`
|
||||
|
||||
Read the top positive deltas first. Large positive growth in module-transform artifacts suggests lane isolation; large positive growth in runtime objects suggests a real leak.
|
||||
Read the top positive deltas first. Large positive growth in module-transform artifacts suggests lane isolation; large positive growth in runtime objects suggests a real leak. If the names alone do not settle it, open the same snapshot pair in DevTools and inspect retainers/dominators for the top rows before declaring root cause.
|
||||
|
||||
## Output Expectations
|
||||
|
||||
@@ -66,6 +70,6 @@ When using this skill, report:
|
||||
- The exact reproduce command.
|
||||
- Which lane and PID were compared.
|
||||
- The dominant retained object families from the snapshot delta.
|
||||
- Whether the issue is a real leak or shared-worker retained module growth.
|
||||
- Whether the issue is a likely real leak or likely shared-worker retained module growth, plus whether retainers/dominators confirmed it.
|
||||
- The concrete fix or impact-reduction patch.
|
||||
- What you verified, and what snapshot overhead prevented you from verifying.
|
||||
|
||||
@@ -64,6 +64,243 @@ function parseArgs(argv) {
|
||||
return options;
|
||||
}
|
||||
|
||||
class JsonStreamScanner {
|
||||
constructor(filePath) {
|
||||
this.stream = fs.createReadStream(filePath, {
|
||||
encoding: "utf8",
|
||||
highWaterMark: 1024 * 1024,
|
||||
});
|
||||
this.iterator = this.stream[Symbol.asyncIterator]();
|
||||
this.buffer = "";
|
||||
this.offset = 0;
|
||||
this.done = false;
|
||||
}
|
||||
|
||||
compactBuffer() {
|
||||
if (this.offset > 65536) {
|
||||
this.buffer = this.buffer.slice(this.offset);
|
||||
this.offset = 0;
|
||||
}
|
||||
}
|
||||
|
||||
async ensureAvailable(count = 1) {
|
||||
while (!this.done && this.buffer.length - this.offset < count) {
|
||||
const next = await this.iterator.next();
|
||||
if (next.done) {
|
||||
this.done = true;
|
||||
break;
|
||||
}
|
||||
this.buffer += next.value;
|
||||
}
|
||||
}
|
||||
|
||||
async peek() {
|
||||
await this.ensureAvailable(1);
|
||||
return this.buffer[this.offset] ?? null;
|
||||
}
|
||||
|
||||
async next() {
|
||||
await this.ensureAvailable(1);
|
||||
if (this.offset >= this.buffer.length) {
|
||||
return null;
|
||||
}
|
||||
const char = this.buffer[this.offset];
|
||||
this.offset += 1;
|
||||
this.compactBuffer();
|
||||
return char;
|
||||
}
|
||||
|
||||
async skipWhitespace() {
|
||||
while (true) {
|
||||
const char = await this.peek();
|
||||
if (char === null || !/\s/u.test(char)) {
|
||||
return;
|
||||
}
|
||||
await this.next();
|
||||
}
|
||||
}
|
||||
|
||||
async expectChar(expected) {
|
||||
const char = await this.next();
|
||||
if (char !== expected) {
|
||||
fail(`Expected ${expected} but found ${char ?? "<eof>"}`);
|
||||
}
|
||||
}
|
||||
|
||||
async find(sequence) {
|
||||
let matched = 0;
|
||||
while (true) {
|
||||
const char = await this.next();
|
||||
if (char === null) {
|
||||
fail(`Could not find ${sequence}`);
|
||||
}
|
||||
if (char === sequence[matched]) {
|
||||
matched += 1;
|
||||
if (matched === sequence.length) {
|
||||
return;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
matched = char === sequence[0] ? 1 : 0;
|
||||
if (matched === sequence.length) {
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async readBalancedObject() {
|
||||
const start = await this.next();
|
||||
if (start !== "{") {
|
||||
fail(`Expected { but found ${start ?? "<eof>"}`);
|
||||
}
|
||||
let text = "{";
|
||||
let depth = 1;
|
||||
let inString = false;
|
||||
let escaped = false;
|
||||
while (depth > 0) {
|
||||
const char = await this.next();
|
||||
if (char === null) {
|
||||
fail("Unexpected EOF while reading JSON object");
|
||||
}
|
||||
text += char;
|
||||
if (inString) {
|
||||
if (escaped) {
|
||||
escaped = false;
|
||||
} else if (char === "\\") {
|
||||
escaped = true;
|
||||
} else if (char === '"') {
|
||||
inString = false;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
if (char === '"') {
|
||||
inString = true;
|
||||
} else if (char === "{") {
|
||||
depth += 1;
|
||||
} else if (char === "}") {
|
||||
depth -= 1;
|
||||
}
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
async parseNumberArray(onValue) {
|
||||
await this.skipWhitespace();
|
||||
await this.expectChar("[");
|
||||
await this.skipWhitespace();
|
||||
if ((await this.peek()) === "]") {
|
||||
await this.next();
|
||||
return;
|
||||
}
|
||||
|
||||
let token = "";
|
||||
let index = 0;
|
||||
const flush = () => {
|
||||
if (token.length === 0) {
|
||||
fail("Unexpected empty number token");
|
||||
}
|
||||
const value = Number.parseInt(token, 10);
|
||||
if (!Number.isFinite(value)) {
|
||||
fail(`Invalid numeric token: ${token}`);
|
||||
}
|
||||
onValue(value, index);
|
||||
index += 1;
|
||||
token = "";
|
||||
};
|
||||
|
||||
while (true) {
|
||||
const char = await this.next();
|
||||
if (char === null) {
|
||||
fail("Unexpected EOF while reading number array");
|
||||
}
|
||||
if (char === "]") {
|
||||
flush();
|
||||
return;
|
||||
}
|
||||
if (char === ",") {
|
||||
flush();
|
||||
continue;
|
||||
}
|
||||
if (/\s/u.test(char)) {
|
||||
continue;
|
||||
}
|
||||
token += char;
|
||||
}
|
||||
}
|
||||
|
||||
async readJsonString() {
|
||||
await this.expectChar('"');
|
||||
let value = "";
|
||||
while (true) {
|
||||
const char = await this.next();
|
||||
if (char === null) {
|
||||
fail("Unexpected EOF while reading JSON string");
|
||||
}
|
||||
if (char === '"') {
|
||||
return value;
|
||||
}
|
||||
if (char !== "\\") {
|
||||
value += char;
|
||||
continue;
|
||||
}
|
||||
const escaped = await this.next();
|
||||
if (escaped === null) {
|
||||
fail("Unexpected EOF while reading JSON string escape");
|
||||
}
|
||||
if (escaped === "u") {
|
||||
let hex = "";
|
||||
for (let index = 0; index < 4; index += 1) {
|
||||
const hexChar = await this.next();
|
||||
if (hexChar === null) {
|
||||
fail("Unexpected EOF while reading JSON unicode escape");
|
||||
}
|
||||
hex += hexChar;
|
||||
}
|
||||
value += String.fromCharCode(Number.parseInt(hex, 16));
|
||||
continue;
|
||||
}
|
||||
value +=
|
||||
escaped === "b"
|
||||
? "\b"
|
||||
: escaped === "f"
|
||||
? "\f"
|
||||
: escaped === "n"
|
||||
? "\n"
|
||||
: escaped === "r"
|
||||
? "\r"
|
||||
: escaped === "t"
|
||||
? "\t"
|
||||
: escaped;
|
||||
}
|
||||
}
|
||||
|
||||
async parseStringArray(onValue) {
|
||||
await this.skipWhitespace();
|
||||
await this.expectChar("[");
|
||||
await this.skipWhitespace();
|
||||
if ((await this.peek()) === "]") {
|
||||
await this.next();
|
||||
return;
|
||||
}
|
||||
|
||||
let index = 0;
|
||||
while (true) {
|
||||
const value = await this.readJsonString();
|
||||
onValue(value, index);
|
||||
index += 1;
|
||||
await this.skipWhitespace();
|
||||
const separator = await this.next();
|
||||
if (separator === "]") {
|
||||
return;
|
||||
}
|
||||
if (separator !== ",") {
|
||||
fail(`Expected , or ] but found ${separator ?? "<eof>"}`);
|
||||
}
|
||||
await this.skipWhitespace();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function parseHeapFilename(filePath) {
|
||||
const base = path.basename(filePath);
|
||||
const match = base.match(
|
||||
@@ -151,38 +388,89 @@ function resolvePair(options) {
|
||||
};
|
||||
}
|
||||
|
||||
function loadSummary(filePath) {
|
||||
const data = JSON.parse(fs.readFileSync(filePath, "utf8"));
|
||||
const meta = data.snapshot?.meta;
|
||||
async function parseSnapshotMeta(scanner) {
|
||||
await scanner.find('"snapshot":');
|
||||
await scanner.skipWhitespace();
|
||||
const metaObjectText = await scanner.readBalancedObject();
|
||||
const parsed = JSON.parse(metaObjectText);
|
||||
return parsed?.meta ?? null;
|
||||
}
|
||||
|
||||
async function buildSummary(filePath) {
|
||||
const scanner = new JsonStreamScanner(filePath);
|
||||
const meta = await parseSnapshotMeta(scanner);
|
||||
if (!meta) {
|
||||
fail(`Invalid heap snapshot: ${filePath}`);
|
||||
}
|
||||
|
||||
const nodeFieldCount = meta.node_fields.length;
|
||||
const typeNames = meta.node_types[0];
|
||||
const strings = data.strings;
|
||||
const typeIndex = meta.node_fields.indexOf("type");
|
||||
const nameIndex = meta.node_fields.indexOf("name");
|
||||
const selfSizeIndex = meta.node_fields.indexOf("self_size");
|
||||
if (typeIndex === -1 || nameIndex === -1 || selfSizeIndex === -1) {
|
||||
fail(`Unsupported heap snapshot schema: ${filePath}`);
|
||||
}
|
||||
|
||||
const summary = new Map();
|
||||
for (let offset = 0; offset < data.nodes.length; offset += nodeFieldCount) {
|
||||
const type = typeNames[data.nodes[offset + typeIndex]];
|
||||
const name = strings[data.nodes[offset + nameIndex]];
|
||||
const selfSize = data.nodes[offset + selfSizeIndex];
|
||||
const key = `${type}\t${name}`;
|
||||
const current = summary.get(key) ?? {
|
||||
type,
|
||||
name,
|
||||
const summaryByIndex = new Map();
|
||||
let nodeCount = 0;
|
||||
let currentTypeId = 0;
|
||||
let currentNameId = 0;
|
||||
let currentSelfSize = 0;
|
||||
await scanner.find('"nodes":');
|
||||
await scanner.parseNumberArray((value, index) => {
|
||||
const fieldIndex = index % nodeFieldCount;
|
||||
if (fieldIndex === typeIndex) {
|
||||
currentTypeId = value;
|
||||
return;
|
||||
}
|
||||
if (fieldIndex === nameIndex) {
|
||||
currentNameId = value;
|
||||
return;
|
||||
}
|
||||
if (fieldIndex === selfSizeIndex) {
|
||||
currentSelfSize = value;
|
||||
}
|
||||
if (fieldIndex !== nodeFieldCount - 1) {
|
||||
return;
|
||||
}
|
||||
const key = `${currentTypeId}\t${currentNameId}`;
|
||||
const current = summaryByIndex.get(key) ?? {
|
||||
typeId: currentTypeId,
|
||||
nameId: currentNameId,
|
||||
selfSize: 0,
|
||||
count: 0,
|
||||
};
|
||||
current.selfSize += selfSize;
|
||||
current.selfSize += currentSelfSize;
|
||||
current.count += 1;
|
||||
summary.set(key, current);
|
||||
summaryByIndex.set(key, current);
|
||||
nodeCount += 1;
|
||||
});
|
||||
|
||||
const requiredNameIds = new Set(
|
||||
Array.from(summaryByIndex.values(), (entry) => entry.nameId).filter((value) => value >= 0),
|
||||
);
|
||||
const nameStrings = new Map();
|
||||
await scanner.find('"strings":');
|
||||
await scanner.parseStringArray((value, index) => {
|
||||
if (requiredNameIds.has(index)) {
|
||||
nameStrings.set(index, value);
|
||||
}
|
||||
});
|
||||
|
||||
const summary = new Map();
|
||||
for (const entry of summaryByIndex.values()) {
|
||||
const key = `${typeNames[entry.typeId] ?? "unknown"}\t${nameStrings.get(entry.nameId) ?? ""}`;
|
||||
summary.set(key, {
|
||||
type: typeNames[entry.typeId] ?? "unknown",
|
||||
name: nameStrings.get(entry.nameId) ?? "",
|
||||
selfSize: entry.selfSize,
|
||||
count: entry.count,
|
||||
});
|
||||
}
|
||||
|
||||
return {
|
||||
nodeCount: data.snapshot.node_count,
|
||||
nodeCount,
|
||||
summary,
|
||||
};
|
||||
}
|
||||
@@ -205,11 +493,11 @@ function truncate(text, maxLength) {
|
||||
return text.length <= maxLength ? text : `${text.slice(0, maxLength - 1)}…`;
|
||||
}
|
||||
|
||||
function main() {
|
||||
async function main() {
|
||||
const options = parseArgs(process.argv.slice(2));
|
||||
const pair = resolvePair(options);
|
||||
const before = loadSummary(pair.before);
|
||||
const after = loadSummary(pair.after);
|
||||
const before = await buildSummary(pair.before);
|
||||
const after = await buildSummary(pair.after);
|
||||
const minBytes = options.minKb * 1024;
|
||||
|
||||
const rows = [];
|
||||
@@ -262,4 +550,4 @@ function main() {
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
await main();
|
||||
|
||||
Reference in New Issue
Block a user