From 9be8d43c3182c2b773bbb25a79a08895320addab Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Mon, 27 Apr 2026 00:25:56 +0100 Subject: [PATCH] docs: document installer recovery cleanup --- docs/install/updating.md | 14 + ...exec-duplicate-completion-investigation.md | 133 ----- docs/refactor/qa.md | 540 ------------------ 3 files changed, 14 insertions(+), 673 deletions(-) delete mode 100644 docs/refactor/async-exec-duplicate-completion-investigation.md delete mode 100644 docs/refactor/qa.md diff --git a/docs/install/updating.md b/docs/install/updating.md index 56af3187ebd..e5384bf450b 100644 --- a/docs/install/updating.md +++ b/docs/install/updating.md @@ -67,6 +67,20 @@ Add `--no-onboard` to skip onboarding. To force a specific install type through the installer, pass `--install-method git --no-onboard` or `--install-method npm --no-onboard`. +If `openclaw update` fails after the npm package install phase, re-run the +installer. The installer does not call the old updater; it runs the global +package install directly and can recover a partially updated npm install. + +```bash +curl -fsSL https://openclaw.ai/install.sh | bash -s -- --install-method npm +``` + +To pin the recovery to a specific version or dist-tag, add `--version`: + +```bash +curl -fsSL https://openclaw.ai/install.sh | bash -s -- --install-method npm --version +``` + ## Alternative: manual npm, pnpm, or bun ```bash diff --git a/docs/refactor/async-exec-duplicate-completion-investigation.md b/docs/refactor/async-exec-duplicate-completion-investigation.md deleted file mode 100644 index 8f92ae3ed0c..00000000000 --- a/docs/refactor/async-exec-duplicate-completion-investigation.md +++ /dev/null @@ -1,133 +0,0 @@ ---- -summary: "Investigation notes for duplicate async exec completion injection" -read_when: - - Debugging repeated node exec completion events - - Working on heartbeat/system-event dedupe -title: "Async exec duplicate completion investigation" ---- - -## Scope - -- Session: `agent:main:telegram:group:-1003774691294:topic:1` -- Symptom: the same async exec completion for session/run `keen-nexus` was recorded twice in LCM as user turns. -- Goal: identify whether this is most likely duplicate session injection or plain outbound delivery retry. - -## Conclusion - -Most likely this is **duplicate session injection**, not a pure outbound delivery retry. - -The strongest gateway-side gap is in the **node exec completion path**: - -1. A node-side exec finish emits `exec.finished` with the full `runId`. -2. Gateway `server-node-events` converts that into a system event and requests a heartbeat. -3. The heartbeat run injects the drained system event block into the agent prompt. -4. The embedded runner persists that prompt as a new user turn in the session transcript. - -If the same `exec.finished` reaches the gateway twice for the same `runId` for any reason (replay, reconnect duplicate, upstream resend, duplicated producer), OpenClaw currently has **no idempotency check keyed by `runId`/`contextKey`** on this path. The second copy will become a second user message with the same content. - -## Exact Code Path - -### 1. Producer: node exec completion event - -- `src/node-host/invoke.ts:340-360` - - `sendExecFinishedEvent(...)` emits `node.event` with event `exec.finished`. - - Payload includes `sessionKey` and full `runId`. - -### 2. Gateway event ingestion - -- `src/gateway/server-node-events.ts:574-640` - - Handles `exec.finished`. - - Builds text: - - `Exec finished (node=..., id=, code ...)` - - Enqueues it via: - - `enqueueSystemEvent(text, { sessionKey, contextKey: runId ? \`exec:${runId}\` : "exec", trusted: false })` - - Immediately requests a wake: - - `requestHeartbeatNow(scopedHeartbeatWakeOptions(sessionKey, { reason: "exec-event" }))` - -### 3. System event dedupe weakness - -- `src/infra/system-events.ts:90-115` - - `enqueueSystemEvent(...)` only suppresses **consecutive duplicate text**: - - `if (entry.lastText === cleaned) return false` - - It stores `contextKey`, but does **not** use `contextKey` for idempotency. - - After drain, duplicate suppression resets. - -This means a replayed `exec.finished` with the same `runId` can be accepted again later, even though the code already had a stable idempotency candidate (`exec:`). - -### 4. Wake handling is not the primary duplicator - -- `src/infra/heartbeat-wake.ts:79-117` - - Wakes are coalesced by `(agentId, sessionKey)`. - - Duplicate wake requests for the same target collapse to one pending wake entry. - -This makes **duplicate wake handling alone** a weaker explanation than duplicate event ingestion. - -### 5. Heartbeat consumes the event and turns it into prompt input - -- `src/infra/heartbeat-runner.ts:535-574` - - Preflight peeks pending system events and classifies exec-event runs. -- `src/auto-reply/reply/session-system-events.ts:86-90` - - `drainFormattedSystemEvents(...)` drains the queue for the session. -- `src/auto-reply/reply/get-reply-run.ts:400-427` - - The drained system event block is prepended into the agent prompt body. - -### 6. Transcript injection point - -- `src/agents/pi-embedded-runner/run/attempt.ts:2000-2017` - - `activeSession.prompt(effectivePrompt)` submits the full prompt to the embedded PI session. - - That is the point where the completion-derived prompt becomes a persisted user turn. - -So once the same system event is rebuilt into the prompt twice, duplicate LCM user messages are expected. - -## Why plain outbound delivery retry is less likely - -There is a real outbound failure path in the heartbeat runner: - -- `src/infra/heartbeat-runner.ts:1194-1242` - - The reply is generated first. - - Outbound delivery happens later via `deliverOutboundPayloads(...)`. - - Failure there returns `{ status: "failed" }`. - -However, for the same system event queue entry, this alone is **not sufficient** to explain the duplicate user turns: - -- `src/auto-reply/reply/session-system-events.ts:86-90` - - The system event queue is already drained before outbound delivery. - -So a channel send retry by itself would not recreate the exact same queued event. It could explain missing/failed external delivery, but not by itself a second identical session user message. - -## Secondary, lower-confidence possibility - -There is a full-run retry loop in the agent runner: - -- `src/auto-reply/reply/agent-runner-execution.ts:741-1473` - - Certain transient failures can retry the whole run and resubmit the same `commandBody`. - -That can duplicate a persisted user prompt **within the same reply execution** if the prompt was already appended before the retry condition triggered. - -I rank this lower than duplicate `exec.finished` ingestion because: - -- the observed gap was around 51 seconds, which looks more like a second wake/turn than an in-process retry; -- the report already mentions repeated message send failures, which points more toward a separate later turn than an immediate model/runtime retry. - -## Root Cause Hypothesis - -Highest-confidence hypothesis: - -- The `keen-nexus` completion came through the **node exec event path**. -- The same `exec.finished` was delivered to `server-node-events` twice. -- Gateway accepted both because `enqueueSystemEvent(...)` does not dedupe by `contextKey` / `runId`. -- Each accepted event triggered a heartbeat and was injected as a user turn into the PI transcript. - -## Proposed Tiny Surgical Fix - -If a fix is wanted, the smallest high-value change is: - -- make exec/system-event idempotency honor `contextKey` for a short horizon, at least for exact `(sessionKey, contextKey, text)` repeats; -- or add a dedicated dedupe in `server-node-events` for `exec.finished` keyed by `(sessionKey, runId, event kind)`. - -That would directly block replayed `exec.finished` duplicates before they become session turns. - -## Related - -- [Exec tool](/tools/exec) -- [Session management](/concepts/session) diff --git a/docs/refactor/qa.md b/docs/refactor/qa.md deleted file mode 100644 index 4770aeafe7a..00000000000 --- a/docs/refactor/qa.md +++ /dev/null @@ -1,540 +0,0 @@ ---- -summary: "QA refactor plan for scenario catalog and harness consolidation" -read_when: - - Refactoring QA scenario definitions or qa-lab harness code - - Moving QA behavior between markdown scenarios and TypeScript harness logic -title: "QA refactor" ---- - -Status: foundational migration landed. - -## Goal - -Move OpenClaw QA from a split-definition model to a single source of truth: - -- scenario metadata -- prompts sent to the model -- setup and teardown -- harness logic -- assertions and success criteria -- artifacts and report hints - -The desired end state is a generic QA harness that loads powerful scenario definition files instead of hardcoding most behavior in TypeScript. - -## Current State - -Primary source of truth now lives in `qa/scenarios/index.md` plus one file per -scenario under `qa/scenarios//*.md`. - -Implemented: - -- `qa/scenarios/index.md` - - canonical QA pack metadata - - operator identity - - kickoff mission -- `qa/scenarios//*.md` - - one markdown file per scenario - - scenario metadata - - handler bindings - - scenario-specific execution config -- `extensions/qa-lab/src/scenario-catalog.ts` - - markdown pack parser + zod validation -- `extensions/qa-lab/src/qa-agent-bootstrap.ts` - - plan rendering from the markdown pack -- `extensions/qa-lab/src/qa-agent-workspace.ts` - - seeds generated compatibility files plus `QA_SCENARIOS.md` -- `extensions/qa-lab/src/suite.ts` - - selects executable scenarios through markdown-defined handler bindings -- QA bus protocol + UI - - generic inline attachments for image/video/audio/file rendering - -Remaining split surfaces: - -- `extensions/qa-lab/src/suite.ts` - - still owns most executable custom handler logic -- `extensions/qa-lab/src/report.ts` - - still derives report structure from runtime outputs - -So the source-of-truth split is fixed, but execution is still mostly handler-backed rather than fully declarative. - -## What The Real Scenario Surface Looks Like - -Reading the current suite shows a few distinct scenario classes. - -### Simple interaction - -- channel baseline -- DM baseline -- threaded follow-up -- model switch -- approval followthrough -- reaction/edit/delete - -### Config and runtime mutation - -- config patch skill disable -- config apply restart wake-up -- config restart capability flip -- runtime inventory drift check - -### Filesystem and repo assertions - -- source/docs discovery report -- build Lobster Invaders -- generated image artifact lookup - -### Memory orchestration - -- memory recall -- memory tools in channel context -- memory failure fallback -- session memory ranking -- thread memory isolation -- memory dreaming sweep - -### Tool and plugin integration - -- MCP plugin-tools call -- skill visibility -- skill hot install -- native image generation -- image roundtrip -- image understanding from attachment - -### Multi-turn and multi-actor - -- subagent handoff -- subagent fanout synthesis -- restart recovery style flows - -These categories matter because they drive DSL requirements. A flat list of prompt + expected text is not enough. - -## Direction - -### Single source of truth - -Use `qa/scenarios/index.md` plus `qa/scenarios//*.md` as the authored -source of truth. - -The pack should stay: - -- human-readable in review -- machine-parseable -- rich enough to drive: - - suite execution - - QA workspace bootstrap - - QA Lab UI metadata - - docs/discovery prompts - - report generation - -### Preferred authoring format - -Use markdown as the top-level format, with structured YAML inside it. - -Recommended shape: - -- YAML frontmatter - - id - - title - - surface - - tags - - docs refs - - code refs - - model/provider overrides - - prerequisites -- prose sections - - objective - - notes - - debugging hints -- fenced YAML blocks - - setup - - steps - - assertions - - cleanup - -This gives: - -- better PR readability than giant JSON -- richer context than pure YAML -- strict parsing and zod validation - -Raw JSON is acceptable only as an intermediate generated form. - -## Proposed Scenario File Shape - -Example: - -````md ---- -id: image-generation-roundtrip -title: Image generation roundtrip -surface: image -tags: [media, image, roundtrip] -models: - primary: openai/gpt-5.4 -requires: - tools: [image_generate] - plugins: [openai, qa-channel] -docsRefs: - - docs/help/testing.md - - docs/concepts/model-providers.md -codeRefs: - - extensions/qa-lab/src/suite.ts - - src/gateway/chat-attachments.ts ---- - -# Objective - -Verify generated media is reattached on the follow-up turn. - -# Setup - -```yaml scenario.setup -- action: config.patch - patch: - agents: - defaults: - imageGenerationModel: - primary: openai/gpt-image-1 -- action: session.create - key: agent:qa:image-roundtrip -``` - -# Steps - -```yaml scenario.steps -- action: agent.send - session: agent:qa:image-roundtrip - message: | - Image generation check: generate a QA lighthouse image and summarize it in one short sentence. -- action: artifact.capture - kind: generated-image - promptSnippet: Image generation check - saveAs: lighthouseImage -- action: agent.send - session: agent:qa:image-roundtrip - message: | - Roundtrip image inspection check: describe the generated lighthouse attachment in one short sentence. - attachments: - - fromArtifact: lighthouseImage -``` - -# Expect - -```yaml scenario.expect -- assert: outbound.textIncludes - value: lighthouse -- assert: requestLog.matches - where: - promptIncludes: Roundtrip image inspection check - imageInputCountGte: 1 -- assert: artifact.exists - ref: lighthouseImage -``` -```` - -## Runner Capabilities The DSL Must Cover - -Based on the current suite, the generic runner needs more than prompt execution. - -### Environment and setup actions - -- `bus.reset` -- `gateway.waitHealthy` -- `channel.waitReady` -- `session.create` -- `thread.create` -- `workspace.writeSkill` - -### Agent turn actions - -- `agent.send` -- `agent.wait` -- `bus.injectInbound` -- `bus.injectOutbound` - -### Config and runtime actions - -- `config.get` -- `config.patch` -- `config.apply` -- `gateway.restart` -- `tools.effective` -- `skills.status` - -### File and artifact actions - -- `file.write` -- `file.read` -- `file.delete` -- `file.touchTime` -- `artifact.captureGeneratedImage` -- `artifact.capturePath` - -### Memory and cron actions - -- `memory.indexForce` -- `memory.searchCli` -- `doctor.memory.status` -- `cron.list` -- `cron.run` -- `cron.waitCompletion` -- `sessionTranscript.write` - -### MCP actions - -- `mcp.callTool` - -### Assertions - -- `outbound.textIncludes` -- `outbound.inThread` -- `outbound.notInRoot` -- `tool.called` -- `tool.notPresent` -- `skill.visible` -- `skill.disabled` -- `file.contains` -- `memory.contains` -- `requestLog.matches` -- `sessionStore.matches` -- `cron.managedPresent` -- `artifact.exists` - -## Variables and Artifact References - -The DSL must support saved outputs and later references. - -Examples from the current suite: - -- create a thread, then reuse `threadId` -- create a session, then reuse `sessionKey` -- generate an image, then attach the file on the next turn -- generate a wake marker string, then assert that it appears later - -Needed capabilities: - -- `saveAs` -- `${vars.name}` -- `${artifacts.name}` -- typed references for paths, session keys, thread ids, markers, tool outputs - -Without variable support, the harness will keep leaking scenario logic back into TypeScript. - -## What Should Stay As Escape Hatches - -A fully pure declarative runner is not realistic in phase 1. - -Some scenarios are inherently orchestration-heavy: - -- memory dreaming sweep -- config apply restart wake-up -- config restart capability flip -- generated image artifact resolution by timestamp/path -- discovery-report evaluation - -These should use explicit custom handlers for now. - -Recommended rule: - -- 85-90% declarative -- explicit `customHandler` steps for the hard remainder -- named and documented custom handlers only -- no anonymous inline code in the scenario file - -That keeps the generic engine clean while still allowing progress. - -## Architecture Change - -### Current - -Scenario markdown already is the source of truth for: - -- suite execution -- workspace bootstrap files -- QA Lab UI scenario catalog -- report metadata -- discovery prompts - -Generated compatibility: - -- seeded workspace still includes `QA_KICKOFF_TASK.md` -- seeded workspace still includes `QA_SCENARIO_PLAN.md` -- seeded workspace now also includes `QA_SCENARIOS.md` - -## Refactor Plan - -### Phase 1: loader and schema - -Done. - -- added `qa/scenarios/index.md` -- split scenarios into `qa/scenarios//*.md` -- added parser for named markdown YAML pack content -- validated with zod -- switched consumers to the parsed pack -- removed repo-level `qa/seed-scenarios.json` and `qa/QA_KICKOFF_TASK.md` - -### Phase 2: generic engine - -- split `extensions/qa-lab/src/suite.ts` into: - - loader - - engine - - action registry - - assertion registry - - custom handlers -- keep existing helper functions as engine operations - -Deliverable: - -- engine executes simple declarative scenarios - -Start with scenarios that are mostly prompt + wait + assert: - -- threaded follow-up -- image understanding from attachment -- skill visibility and invocation -- channel baseline - -Deliverable: - -- first real markdown-defined scenarios shipping through the generic engine - -### Phase 4: migrate medium scenarios - -- image generation roundtrip -- memory tools in channel context -- session memory ranking -- subagent handoff -- subagent fanout synthesis - -Deliverable: - -- variables, artifacts, tool assertions, request-log assertions proven out - -### Phase 5: keep hard scenarios on custom handlers - -- memory dreaming sweep -- config apply restart wake-up -- config restart capability flip -- runtime inventory drift - -Deliverable: - -- same authoring format, but with explicit custom-step blocks where needed - -### Phase 6: delete hardcoded scenario map - -Once the pack coverage is good enough: - -- remove most scenario-specific TypeScript branching from `extensions/qa-lab/src/suite.ts` - -## Fake Slack / Rich Media Support - -The current QA bus is text-first. - -Relevant files: - -- `extensions/qa-channel/src/protocol.ts` -- `extensions/qa-lab/src/bus-state.ts` -- `extensions/qa-lab/src/bus-queries.ts` -- `extensions/qa-lab/src/bus-server.ts` -- `extensions/qa-lab/web/src/ui-render.ts` - -Today the QA bus supports: - -- text -- reactions -- threads - -It does not yet model inline media attachments. - -### Needed transport contract - -Add a generic QA bus attachment model: - -```ts -type QaBusAttachment = { - id: string; - kind: "image" | "video" | "audio" | "file"; - mimeType: string; - fileName?: string; - inline?: boolean; - url?: string; - contentBase64?: string; - width?: number; - height?: number; - durationMs?: number; - altText?: string; - transcript?: string; -}; -``` - -Then add `attachments?: QaBusAttachment[]` to: - -- `QaBusMessage` -- `QaBusInboundMessageInput` -- `QaBusOutboundMessageInput` - -### Why generic first - -Do not build a Slack-only media model. - -Instead: - -- one generic QA transport model -- multiple renderers on top of it - - current QA Lab chat - - future fake Slack web - - any other fake transport views - -This prevents duplicate logic and lets media scenarios stay transport-agnostic. - -### UI work needed - -Update the QA UI to render: - -- inline image preview -- inline audio player -- inline video player -- file attachment chip - -The current UI can already render threads and reactions, so attachment rendering should layer onto the same message card model. - -### Scenario work enabled by media transport - -Once attachments flow through QA bus, we can add richer fake-chat scenarios: - -- inline image reply in fake Slack -- audio attachment understanding -- video attachment understanding -- mixed attachment ordering -- thread reply with media retained - -## Recommendation - -The next implementation chunk should be: - -1. add markdown scenario loader + zod schema -2. generate the current catalog from markdown -3. migrate a few simple scenarios first -4. add generic QA bus attachment support -5. render inline image in the QA UI -6. then expand to audio and video - -This is the smallest path that proves both goals: - -- generic markdown-defined QA -- richer fake messaging surfaces - -## Open Questions - -- whether scenario files should allow embedded markdown prompt templates with variable interpolation -- whether setup/cleanup should be named sections or just ordered action lists -- whether artifact references should be strongly typed in schema or string-based -- whether custom handlers should live in one registry or per-surface registries -- whether the generated JSON compatibility file should remain checked in during migration - -## Related - -- [QA E2E automation](/concepts/qa-e2e-automation)