docs: document installer recovery cleanup

This commit is contained in:
Peter Steinberger
2026-04-27 00:25:56 +01:00
parent eccb79db99
commit 9be8d43c31
3 changed files with 14 additions and 673 deletions

View File

@@ -67,6 +67,20 @@ Add `--no-onboard` to skip onboarding. To force a specific install type through
the installer, pass `--install-method git --no-onboard` or
`--install-method npm --no-onboard`.
If `openclaw update` fails after the npm package install phase, re-run the
installer. The installer does not call the old updater; it runs the global
package install directly and can recover a partially updated npm install.
```bash
curl -fsSL https://openclaw.ai/install.sh | bash -s -- --install-method npm
```
To pin the recovery to a specific version or dist-tag, add `--version`:
```bash
curl -fsSL https://openclaw.ai/install.sh | bash -s -- --install-method npm --version <version-or-dist-tag>
```
## Alternative: manual npm, pnpm, or bun
```bash

View File

@@ -1,133 +0,0 @@
---
summary: "Investigation notes for duplicate async exec completion injection"
read_when:
- Debugging repeated node exec completion events
- Working on heartbeat/system-event dedupe
title: "Async exec duplicate completion investigation"
---
## Scope
- Session: `agent:main:telegram:group:-1003774691294:topic:1`
- Symptom: the same async exec completion for session/run `keen-nexus` was recorded twice in LCM as user turns.
- Goal: identify whether this is most likely duplicate session injection or plain outbound delivery retry.
## Conclusion
Most likely this is **duplicate session injection**, not a pure outbound delivery retry.
The strongest gateway-side gap is in the **node exec completion path**:
1. A node-side exec finish emits `exec.finished` with the full `runId`.
2. Gateway `server-node-events` converts that into a system event and requests a heartbeat.
3. The heartbeat run injects the drained system event block into the agent prompt.
4. The embedded runner persists that prompt as a new user turn in the session transcript.
If the same `exec.finished` reaches the gateway twice for the same `runId` for any reason (replay, reconnect duplicate, upstream resend, duplicated producer), OpenClaw currently has **no idempotency check keyed by `runId`/`contextKey`** on this path. The second copy will become a second user message with the same content.
## Exact Code Path
### 1. Producer: node exec completion event
- `src/node-host/invoke.ts:340-360`
- `sendExecFinishedEvent(...)` emits `node.event` with event `exec.finished`.
- Payload includes `sessionKey` and full `runId`.
### 2. Gateway event ingestion
- `src/gateway/server-node-events.ts:574-640`
- Handles `exec.finished`.
- Builds text:
- `Exec finished (node=..., id=<runId>, code ...)`
- Enqueues it via:
- `enqueueSystemEvent(text, { sessionKey, contextKey: runId ? \`exec:${runId}\` : "exec", trusted: false })`
- Immediately requests a wake:
- `requestHeartbeatNow(scopedHeartbeatWakeOptions(sessionKey, { reason: "exec-event" }))`
### 3. System event dedupe weakness
- `src/infra/system-events.ts:90-115`
- `enqueueSystemEvent(...)` only suppresses **consecutive duplicate text**:
- `if (entry.lastText === cleaned) return false`
- It stores `contextKey`, but does **not** use `contextKey` for idempotency.
- After drain, duplicate suppression resets.
This means a replayed `exec.finished` with the same `runId` can be accepted again later, even though the code already had a stable idempotency candidate (`exec:<runId>`).
### 4. Wake handling is not the primary duplicator
- `src/infra/heartbeat-wake.ts:79-117`
- Wakes are coalesced by `(agentId, sessionKey)`.
- Duplicate wake requests for the same target collapse to one pending wake entry.
This makes **duplicate wake handling alone** a weaker explanation than duplicate event ingestion.
### 5. Heartbeat consumes the event and turns it into prompt input
- `src/infra/heartbeat-runner.ts:535-574`
- Preflight peeks pending system events and classifies exec-event runs.
- `src/auto-reply/reply/session-system-events.ts:86-90`
- `drainFormattedSystemEvents(...)` drains the queue for the session.
- `src/auto-reply/reply/get-reply-run.ts:400-427`
- The drained system event block is prepended into the agent prompt body.
### 6. Transcript injection point
- `src/agents/pi-embedded-runner/run/attempt.ts:2000-2017`
- `activeSession.prompt(effectivePrompt)` submits the full prompt to the embedded PI session.
- That is the point where the completion-derived prompt becomes a persisted user turn.
So once the same system event is rebuilt into the prompt twice, duplicate LCM user messages are expected.
## Why plain outbound delivery retry is less likely
There is a real outbound failure path in the heartbeat runner:
- `src/infra/heartbeat-runner.ts:1194-1242`
- The reply is generated first.
- Outbound delivery happens later via `deliverOutboundPayloads(...)`.
- Failure there returns `{ status: "failed" }`.
However, for the same system event queue entry, this alone is **not sufficient** to explain the duplicate user turns:
- `src/auto-reply/reply/session-system-events.ts:86-90`
- The system event queue is already drained before outbound delivery.
So a channel send retry by itself would not recreate the exact same queued event. It could explain missing/failed external delivery, but not by itself a second identical session user message.
## Secondary, lower-confidence possibility
There is a full-run retry loop in the agent runner:
- `src/auto-reply/reply/agent-runner-execution.ts:741-1473`
- Certain transient failures can retry the whole run and resubmit the same `commandBody`.
That can duplicate a persisted user prompt **within the same reply execution** if the prompt was already appended before the retry condition triggered.
I rank this lower than duplicate `exec.finished` ingestion because:
- the observed gap was around 51 seconds, which looks more like a second wake/turn than an in-process retry;
- the report already mentions repeated message send failures, which points more toward a separate later turn than an immediate model/runtime retry.
## Root Cause Hypothesis
Highest-confidence hypothesis:
- The `keen-nexus` completion came through the **node exec event path**.
- The same `exec.finished` was delivered to `server-node-events` twice.
- Gateway accepted both because `enqueueSystemEvent(...)` does not dedupe by `contextKey` / `runId`.
- Each accepted event triggered a heartbeat and was injected as a user turn into the PI transcript.
## Proposed Tiny Surgical Fix
If a fix is wanted, the smallest high-value change is:
- make exec/system-event idempotency honor `contextKey` for a short horizon, at least for exact `(sessionKey, contextKey, text)` repeats;
- or add a dedicated dedupe in `server-node-events` for `exec.finished` keyed by `(sessionKey, runId, event kind)`.
That would directly block replayed `exec.finished` duplicates before they become session turns.
## Related
- [Exec tool](/tools/exec)
- [Session management](/concepts/session)

View File

@@ -1,540 +0,0 @@
---
summary: "QA refactor plan for scenario catalog and harness consolidation"
read_when:
- Refactoring QA scenario definitions or qa-lab harness code
- Moving QA behavior between markdown scenarios and TypeScript harness logic
title: "QA refactor"
---
Status: foundational migration landed.
## Goal
Move OpenClaw QA from a split-definition model to a single source of truth:
- scenario metadata
- prompts sent to the model
- setup and teardown
- harness logic
- assertions and success criteria
- artifacts and report hints
The desired end state is a generic QA harness that loads powerful scenario definition files instead of hardcoding most behavior in TypeScript.
## Current State
Primary source of truth now lives in `qa/scenarios/index.md` plus one file per
scenario under `qa/scenarios/<theme>/*.md`.
Implemented:
- `qa/scenarios/index.md`
- canonical QA pack metadata
- operator identity
- kickoff mission
- `qa/scenarios/<theme>/*.md`
- one markdown file per scenario
- scenario metadata
- handler bindings
- scenario-specific execution config
- `extensions/qa-lab/src/scenario-catalog.ts`
- markdown pack parser + zod validation
- `extensions/qa-lab/src/qa-agent-bootstrap.ts`
- plan rendering from the markdown pack
- `extensions/qa-lab/src/qa-agent-workspace.ts`
- seeds generated compatibility files plus `QA_SCENARIOS.md`
- `extensions/qa-lab/src/suite.ts`
- selects executable scenarios through markdown-defined handler bindings
- QA bus protocol + UI
- generic inline attachments for image/video/audio/file rendering
Remaining split surfaces:
- `extensions/qa-lab/src/suite.ts`
- still owns most executable custom handler logic
- `extensions/qa-lab/src/report.ts`
- still derives report structure from runtime outputs
So the source-of-truth split is fixed, but execution is still mostly handler-backed rather than fully declarative.
## What The Real Scenario Surface Looks Like
Reading the current suite shows a few distinct scenario classes.
### Simple interaction
- channel baseline
- DM baseline
- threaded follow-up
- model switch
- approval followthrough
- reaction/edit/delete
### Config and runtime mutation
- config patch skill disable
- config apply restart wake-up
- config restart capability flip
- runtime inventory drift check
### Filesystem and repo assertions
- source/docs discovery report
- build Lobster Invaders
- generated image artifact lookup
### Memory orchestration
- memory recall
- memory tools in channel context
- memory failure fallback
- session memory ranking
- thread memory isolation
- memory dreaming sweep
### Tool and plugin integration
- MCP plugin-tools call
- skill visibility
- skill hot install
- native image generation
- image roundtrip
- image understanding from attachment
### Multi-turn and multi-actor
- subagent handoff
- subagent fanout synthesis
- restart recovery style flows
These categories matter because they drive DSL requirements. A flat list of prompt + expected text is not enough.
## Direction
### Single source of truth
Use `qa/scenarios/index.md` plus `qa/scenarios/<theme>/*.md` as the authored
source of truth.
The pack should stay:
- human-readable in review
- machine-parseable
- rich enough to drive:
- suite execution
- QA workspace bootstrap
- QA Lab UI metadata
- docs/discovery prompts
- report generation
### Preferred authoring format
Use markdown as the top-level format, with structured YAML inside it.
Recommended shape:
- YAML frontmatter
- id
- title
- surface
- tags
- docs refs
- code refs
- model/provider overrides
- prerequisites
- prose sections
- objective
- notes
- debugging hints
- fenced YAML blocks
- setup
- steps
- assertions
- cleanup
This gives:
- better PR readability than giant JSON
- richer context than pure YAML
- strict parsing and zod validation
Raw JSON is acceptable only as an intermediate generated form.
## Proposed Scenario File Shape
Example:
````md
---
id: image-generation-roundtrip
title: Image generation roundtrip
surface: image
tags: [media, image, roundtrip]
models:
primary: openai/gpt-5.4
requires:
tools: [image_generate]
plugins: [openai, qa-channel]
docsRefs:
- docs/help/testing.md
- docs/concepts/model-providers.md
codeRefs:
- extensions/qa-lab/src/suite.ts
- src/gateway/chat-attachments.ts
---
# Objective
Verify generated media is reattached on the follow-up turn.
# Setup
```yaml scenario.setup
- action: config.patch
patch:
agents:
defaults:
imageGenerationModel:
primary: openai/gpt-image-1
- action: session.create
key: agent:qa:image-roundtrip
```
# Steps
```yaml scenario.steps
- action: agent.send
session: agent:qa:image-roundtrip
message: |
Image generation check: generate a QA lighthouse image and summarize it in one short sentence.
- action: artifact.capture
kind: generated-image
promptSnippet: Image generation check
saveAs: lighthouseImage
- action: agent.send
session: agent:qa:image-roundtrip
message: |
Roundtrip image inspection check: describe the generated lighthouse attachment in one short sentence.
attachments:
- fromArtifact: lighthouseImage
```
# Expect
```yaml scenario.expect
- assert: outbound.textIncludes
value: lighthouse
- assert: requestLog.matches
where:
promptIncludes: Roundtrip image inspection check
imageInputCountGte: 1
- assert: artifact.exists
ref: lighthouseImage
```
````
## Runner Capabilities The DSL Must Cover
Based on the current suite, the generic runner needs more than prompt execution.
### Environment and setup actions
- `bus.reset`
- `gateway.waitHealthy`
- `channel.waitReady`
- `session.create`
- `thread.create`
- `workspace.writeSkill`
### Agent turn actions
- `agent.send`
- `agent.wait`
- `bus.injectInbound`
- `bus.injectOutbound`
### Config and runtime actions
- `config.get`
- `config.patch`
- `config.apply`
- `gateway.restart`
- `tools.effective`
- `skills.status`
### File and artifact actions
- `file.write`
- `file.read`
- `file.delete`
- `file.touchTime`
- `artifact.captureGeneratedImage`
- `artifact.capturePath`
### Memory and cron actions
- `memory.indexForce`
- `memory.searchCli`
- `doctor.memory.status`
- `cron.list`
- `cron.run`
- `cron.waitCompletion`
- `sessionTranscript.write`
### MCP actions
- `mcp.callTool`
### Assertions
- `outbound.textIncludes`
- `outbound.inThread`
- `outbound.notInRoot`
- `tool.called`
- `tool.notPresent`
- `skill.visible`
- `skill.disabled`
- `file.contains`
- `memory.contains`
- `requestLog.matches`
- `sessionStore.matches`
- `cron.managedPresent`
- `artifact.exists`
## Variables and Artifact References
The DSL must support saved outputs and later references.
Examples from the current suite:
- create a thread, then reuse `threadId`
- create a session, then reuse `sessionKey`
- generate an image, then attach the file on the next turn
- generate a wake marker string, then assert that it appears later
Needed capabilities:
- `saveAs`
- `${vars.name}`
- `${artifacts.name}`
- typed references for paths, session keys, thread ids, markers, tool outputs
Without variable support, the harness will keep leaking scenario logic back into TypeScript.
## What Should Stay As Escape Hatches
A fully pure declarative runner is not realistic in phase 1.
Some scenarios are inherently orchestration-heavy:
- memory dreaming sweep
- config apply restart wake-up
- config restart capability flip
- generated image artifact resolution by timestamp/path
- discovery-report evaluation
These should use explicit custom handlers for now.
Recommended rule:
- 85-90% declarative
- explicit `customHandler` steps for the hard remainder
- named and documented custom handlers only
- no anonymous inline code in the scenario file
That keeps the generic engine clean while still allowing progress.
## Architecture Change
### Current
Scenario markdown already is the source of truth for:
- suite execution
- workspace bootstrap files
- QA Lab UI scenario catalog
- report metadata
- discovery prompts
Generated compatibility:
- seeded workspace still includes `QA_KICKOFF_TASK.md`
- seeded workspace still includes `QA_SCENARIO_PLAN.md`
- seeded workspace now also includes `QA_SCENARIOS.md`
## Refactor Plan
### Phase 1: loader and schema
Done.
- added `qa/scenarios/index.md`
- split scenarios into `qa/scenarios/<theme>/*.md`
- added parser for named markdown YAML pack content
- validated with zod
- switched consumers to the parsed pack
- removed repo-level `qa/seed-scenarios.json` and `qa/QA_KICKOFF_TASK.md`
### Phase 2: generic engine
- split `extensions/qa-lab/src/suite.ts` into:
- loader
- engine
- action registry
- assertion registry
- custom handlers
- keep existing helper functions as engine operations
Deliverable:
- engine executes simple declarative scenarios
Start with scenarios that are mostly prompt + wait + assert:
- threaded follow-up
- image understanding from attachment
- skill visibility and invocation
- channel baseline
Deliverable:
- first real markdown-defined scenarios shipping through the generic engine
### Phase 4: migrate medium scenarios
- image generation roundtrip
- memory tools in channel context
- session memory ranking
- subagent handoff
- subagent fanout synthesis
Deliverable:
- variables, artifacts, tool assertions, request-log assertions proven out
### Phase 5: keep hard scenarios on custom handlers
- memory dreaming sweep
- config apply restart wake-up
- config restart capability flip
- runtime inventory drift
Deliverable:
- same authoring format, but with explicit custom-step blocks where needed
### Phase 6: delete hardcoded scenario map
Once the pack coverage is good enough:
- remove most scenario-specific TypeScript branching from `extensions/qa-lab/src/suite.ts`
## Fake Slack / Rich Media Support
The current QA bus is text-first.
Relevant files:
- `extensions/qa-channel/src/protocol.ts`
- `extensions/qa-lab/src/bus-state.ts`
- `extensions/qa-lab/src/bus-queries.ts`
- `extensions/qa-lab/src/bus-server.ts`
- `extensions/qa-lab/web/src/ui-render.ts`
Today the QA bus supports:
- text
- reactions
- threads
It does not yet model inline media attachments.
### Needed transport contract
Add a generic QA bus attachment model:
```ts
type QaBusAttachment = {
id: string;
kind: "image" | "video" | "audio" | "file";
mimeType: string;
fileName?: string;
inline?: boolean;
url?: string;
contentBase64?: string;
width?: number;
height?: number;
durationMs?: number;
altText?: string;
transcript?: string;
};
```
Then add `attachments?: QaBusAttachment[]` to:
- `QaBusMessage`
- `QaBusInboundMessageInput`
- `QaBusOutboundMessageInput`
### Why generic first
Do not build a Slack-only media model.
Instead:
- one generic QA transport model
- multiple renderers on top of it
- current QA Lab chat
- future fake Slack web
- any other fake transport views
This prevents duplicate logic and lets media scenarios stay transport-agnostic.
### UI work needed
Update the QA UI to render:
- inline image preview
- inline audio player
- inline video player
- file attachment chip
The current UI can already render threads and reactions, so attachment rendering should layer onto the same message card model.
### Scenario work enabled by media transport
Once attachments flow through QA bus, we can add richer fake-chat scenarios:
- inline image reply in fake Slack
- audio attachment understanding
- video attachment understanding
- mixed attachment ordering
- thread reply with media retained
## Recommendation
The next implementation chunk should be:
1. add markdown scenario loader + zod schema
2. generate the current catalog from markdown
3. migrate a few simple scenarios first
4. add generic QA bus attachment support
5. render inline image in the QA UI
6. then expand to audio and video
This is the smallest path that proves both goals:
- generic markdown-defined QA
- richer fake messaging surfaces
## Open Questions
- whether scenario files should allow embedded markdown prompt templates with variable interpolation
- whether setup/cleanup should be named sections or just ordered action lists
- whether artifact references should be strongly typed in schema or string-based
- whether custom handlers should live in one registry or per-surface registries
- whether the generated JSON compatibility file should remain checked in during migration
## Related
- [QA E2E automation](/concepts/qa-e2e-automation)