openclaw

mirror of https://github.com/openclaw/openclaw.git synced 2026-05-08 02:40:43 +00:00

Author	SHA1	Message	Date
pashpashpash	b13844732e	qa: salvage GPT-5.4 parity proof slice (#65664 ) * test(qa): gate parity prose scenarios on real tool calls Closes criterion 2 of the GPT-5.4 parity completion gate in #64227 ('no fake progress / fake tool completion') for the two first/second-wave parity scenarios that can currently pass with a prose-only reply. Background: the scenario framework already exposes tool-call assertions via /debug/requests on the mock server (see approval-turn-tool-followthrough for the pattern). Most parity scenarios use this seam to require a specific plannedToolName, but source-docs-discovery-report and subagent-handoff only checked the assistant's prose text, which means a model could fabricate: - a Worked / Failed / Blocked / Follow-up report without ever calling the read tool on the docs / source files the prompt named - three labeled 'Delegated task', 'Result', 'Evidence' sections without ever calling sessions_spawn to delegate Both gaps are fake-progress loopholes for the parity gate. Changes: - source-docs-discovery-report: require at least one read tool call tied to the 'worked, failed, blocked' prompt in /debug/requests. Failure message dumps the observed plannedToolName list for debugging. - subagent-handoff: require at least one sessions_spawn tool call tied to the 'delegate' / 'subagent handoff' prompt in /debug/requests. Same debug-friendly failure message. Both assertions are gated behind !env.mock so they no-op in live-frontier mode where the real provider exposes plannedToolName through a different channel (or not at all). Not touched: memory-recall is also in the parity pack but its pass path is legitimately 'read the fact from prior-turn context'. That is a valid recall strategy, not fake progress, so it is out of scope for this PR. memory-recall's fake-progress story (no real memory_search call) would require bigger mock-server changes and belongs in a follow-up that extends the mock memory pipeline. Validation: - pnpm test extensions/qa-lab/src/scenario-catalog.test.ts Refs #64227 * test(qa): fix case-sensitive tool-call assertions and dedupe debug fetch Addresses loop-6 review feedback on PR #64681: 1. Copilot / Greptile / codex-connector all flagged that the discovery scenario's .includes('worked, failed, blocked') assertion is case-sensitive but the real prompt says 'Worked, Failed, Blocked...', so the mock-mode assertion never matches. Fix: lowercase-normalize allInputText before the contains check. 2. Greptile P2: the expr and message.expr each called fetchJson separately, incurring two round-trips to /debug/requests. Fix: hoist the fetch to a set step (discoveryDebugRequests / subagentDebugRequests) and reuse the snapshot. 3. Copilot: the subagent-handoff assertion scanned the entire request log and matched the first request with 'delegate' in its input text, which could false-pass on a stale prior scenario. Fix: reverse the array and take the most recent matching request instead. Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts (4/4 pass). Refs #64227 * test(qa): narrow subagent-handoff tool-call assertion to pre-tool requests Pass-2 codex-connector P1 finding on #64681: the reverse-find pattern I used on pass 1 usually lands on the FOLLOW-UP request after the mock runs sessions_spawn, not the pre-tool planning request that actually has plannedToolName === 'sessions_spawn'. The mock only plans that tool on requests with !toolOutput (mock-openai-server.ts:662), so the post-tool request has plannedToolName unset and the assertion fails even when the handoff succeeded. Fix: switch the assertion back to a forward .some() match but add a !request.toolOutput filter so the match is pinned to the pre-tool planning phase. The case-insensitive regex, the fetchJson dedupe, and the failure-message diagnostic from pass 1 are unchanged. Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts (4/4 pass). Refs #64227 * test(qa): pin subagent-handoff tool-call assertion to scenario prompt Addresses the pass-3 codex-connector P1 on #64681: the pass-2 fix filtered to pre-tool requests but still used a broad `/delegate\|subagent handoff/i` regex. The `subagent-fanout-synthesis` scenario runs BEFORE `subagent-handoff` in catalog order (scenarios are sorted by path), and the fanout prompt reads 'Subagent fanout synthesis check: delegate exactly two bounded subagents sequentially' — which contains 'delegate' and also plans sessions_spawn pre-tool. That produces a cross-scenario false pass where the fanout's earlier sessions_spawn request satisfies the handoff assertion even when the handoff run never delegates. Fix: tighten the input-text match from `/delegate\|subagent handoff/i` to `/delegate one bounded qa task/i`, which is the exact scenario- unique substring from the `subagent-handoff` config.prompt. That pins the assertion to this scenario's request window and closes the cross-scenario false positive. Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts (4/4 pass). Refs #64227 * test(qa): align parity assertion comments with actual filter logic Addresses two loop-7 Copilot findings on PR #64681: 1. source-docs-discovery-report.md: the explanatory comment said the debug request log was 'lowercased for case-insensitive matching', but the code actually lowercases each request's allInputText inline inside the .some() predicate, not the discoveryDebugRequests snapshot. Rewrite the comment to describe the inline-lowercase pattern so a future reader matches the code they see. 2. subagent-handoff.md: the comment said the assertion 'must be pinned to THIS scenario's request window' but the implementation actually relies on matching a scenario-unique prompt substring (/delegate one bounded qa task/i), not a request-window. Rewrite the comment to describe the substring pinning and keep the pre-tool filter rationale intact. No runtime change; comment-only fix to keep reviewer expectations aligned with the actual assertion shape. Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts (4/4 pass). Refs #64227 * test(qa): extend tool-call assertions to image-understanding, subagent-fanout, and capability-flip scenarios * Guard mock-only image parity assertions * Expand agentic parity second wave * test(qa): pad parity suspicious-pass isolation to second wave * qa-lab: parametrize parity report title and drop stale first-wave comment Addresses two loop-7 Copilot findings on PR #64662: 1. Hard-coded 'GPT-5.4 / Opus 4.6' markdown H1: the renderer now uses a template string that interpolates candidateLabel and baselineLabel, so any parity run (not only gpt-5.4 vs opus 4.6) renders an accurate title in saved reports. Default CLI flags still produce openai/gpt-5.4 vs anthropic/claude-opus-4-6 as the baseline pair. 2. Stale 'declared first-wave parity scenarios' comment in scopeSummaryToParityPack: the parity pack is now the ten-scenario first-wave+second-wave set (PR D + PR E). Comment updated to drop the first-wave qualifier and name the full QA_AGENTIC_PARITY_SCENARIOS constant the scope is filtering against. New regression: 'parametrizes the markdown header from the comparison labels' — asserts that non-default labels (openai/gpt-5.4-alt vs openai/gpt-5.4) render in the H1. Validation: pnpm test extensions/qa-lab/src/agentic-parity-report.test.ts (13/13 pass). Refs #64227 * qa-lab: fail parity gate on required scenario failures regardless of baseline parity * test(qa): update readable-report test to cover all 10 parity scenarios * qa-lab: strengthen parity-report fake-success detector and verify run.primaryProvider labels * Tighten parity label and scenario checks * fix: tighten parity label provenance checks * fix: scope parity tool-call metrics to tool lanes * Fix parity report label and fake-success checks * fix(qa): tighten parity report edge cases * qa-lab: add Anthropic /v1/messages mock route for parity baseline Closes the last local-runnability gap on criterion 5 of the GPT-5.4 parity completion gate in #64227 ('the parity gate shows GPT-5.4 matches or beats Opus 4.6 on the agreed metrics'). Background: the parity gate needs two comparable scenario runs - one against openai/gpt-5.4 and one against anthropic/claude-opus-4-6 - so the aggregate metrics and verdict in PR D (#64441) can be computed. Today the qa-lab mock server only implements /v1/responses, so the baseline run against Claude Opus 4.6 requires a real Anthropic API key. That makes the gate impossible to prove end-to-end from a local worktree and means the CI story is always 'two real providers + quota + keys'. This PR adds a /v1/messages Anthropic-compatible route to the existing mock OpenAI server. The route is a thin adapter that: - Parses Anthropic Messages API request shapes (system as string or [{type:text,text}], messages with string or block content, text and tool_result and tool_use and image blocks) - Translates them into the ResponsesInputItem[] shape the existing shared scenario dispatcher (buildResponsesPayload) already understands - Calls the shared dispatcher so both the OpenAI and Anthropic lanes run through the exact same scenario prompt-matching logic (same subagent fanout state machine, same extractRememberedFact helper, same '/debug/requests' telemetry) - Converts the resulting OpenAI-format events back into an Anthropic message response with text and tool_use content blocks and a correct stop_reason (tool_use vs end_turn) Non-streaming only: the QA suite runner falls back to non-streaming mock mode so real Anthropic SSE isn't necessary for the parity baseline. Also adds claude-opus-4-6 and claude-sonnet-4-6 to /v1/models so baseline model-list probes from the suite runner resolve without extra config. Tests added: - advertises Anthropic claude-opus-4-6 baseline model on /v1/models - dispatches an Anthropic /v1/messages read tool call for source discovery prompts (tool_use stop_reason, correct input path, /debug/requests records plannedToolName=read) - dispatches Anthropic /v1/messages tool_result follow-ups through the shared scenario logic (subagent-handoff two-stage flow: tool_use - tool_result - 'Delegated task / Evidence' prose summary) Local validation: - pnpm test extensions/qa-lab/src/mock-openai-server.test.ts (18/18 pass) - pnpm test extensions/qa-lab/src/mock-openai-server.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/scenario-catalog.test.ts (47/47 pass) Refs #64227 Unblocks #64441 (parity harness) and the forthcoming qa parity run wrapper by giving the baseline lane a local-only mock path. * qa-lab: fix Anthropic tool_result ordering in messages adapter Addresses the loop-6 Copilot / Greptile finding on PR #64685: in `convertAnthropicMessagesToResponsesInput`, `tool_result` blocks were pushed to `items` inside the per-block loop while the surrounding user/assistant message was only pushed after the loop finished. That reordered the function_call_output BEFORE its parent user message whenever a user turn mixed `tool_result` with fresh text/image blocks, which broke `extractToolOutput` (it scans AFTER the last user-role index; function_call_output placed BEFORE that index is invisible to it) and made the downstream scenario dispatcher behave as if no tool output had been returned on mixed-content turns. Fix: buffer `tool_result` and `tool_use` blocks in local arrays during the per-block loop, push the parent role message first (when it has any text/image pieces), then push the accumulated function_call / function_call_output items in original order. tool_result-only user turns still omit the parent message as before, so the non-mixed subagent-fanout-synthesis two-stage flow that already worked keeps working. Regression added: - `places tool_result after the parent user message even in mixed-content turns` — sends a user turn that mixes a `tool_result` block with a trailing fresh text block, then inspects `/debug/last-request` to assert that `toolOutput === 'SUBAGENT-OK'` (extractToolOutput found the function_call_output AFTER the last user index) and `prompt === 'Keep going with the fanout.'` (extractLastUserText picked up the trailing fresh text). Local validation: pnpm test extensions/qa-lab/src/mock-openai-server.test.ts (19/19 pass). Refs #64227 * qa-lab: reject Anthropic streaming and empty model in messages mock * qa-lab: tag mock request snapshots with a provider variant so parity runs can diff per provider * Handle invalid Anthropic mock JSON * fix: wire mock parity providers by model ref * fix(qa): support Anthropic message streaming in mock parity lane * qa-lab: record provider/model/mode in qa-suite-summary.json Closes the 'summary cannot be label-verified' half of criterion 5 on the GPT-5.4 parity completion gate in #64227. Background: the parity gate in #64441 compares two qa-suite-summary.json files and trusts whatever candidateLabel / baselineLabel the caller passes. Today the summary JSON only contains { scenarios, counts }, so nothing in the summary records which provider/model the run actually used. If a maintainer swaps candidate and baseline summary paths in a parity-report call, the verdict is silently mislabeled and nobody can retroactively verify which run produced which summary. Changes: - Add a 'run' block to qa-suite-summary.json with startedAt, finishedAt, providerMode, primaryModel (+ provider and model splits), alternateModel (+ provider and model splits), fastMode, concurrency, scenarioIds (when explicitly filtered). - Extract a pure 'buildQaSuiteSummaryJson(params)' helper so the summary JSON shape is unit-testable and the parity gate (and any future parity wrapper) can import the exact same type rather than reverse-engineering the JSON shape at runtime. - Thread 'scenarioIds' from 'runQaSuite' into writeQaSuiteArtifacts so --scenario-ids flags are recorded in the summary. Unit tests added (src/suite.summary-json.test.ts, 5 cases): - records provider/model/mode so parity gates can verify labels - includes scenarioIds in run metadata when provided - records an Anthropic baseline lane cleanly for parity runs - leaves split fields null when a model ref is malformed - keeps scenarios and counts alongside the run metadata This is additive: existing callers of qa-suite-summary.json continue to see the same { scenarios, counts } shape, just with an extra run field. No existing consumers of the JSON need to change. The follow-up 'qa parity run' CLI wrapper (run the parity pack twice against candidate + baseline, emit two labeled summaries in one command) stacks cleanly on top of this change and will land as a separate PR once #64441 and #64662 merge so the wrapper can call runQaParityReportCommand directly. Local validation: - pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (5/5 pass) - pnpm test extensions/qa-lab/src/suite.summary-json.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/scenario-catalog.test.ts (34/34 pass) Refs #64227 Unblocks the final parity run for #64441 / #64662 by making summaries self-describing. * qa-lab: strengthen qa-suite-summary builder types and empty-array semantics Addresses 4 loop-6 Copilot / codex-connector findings on PR #64689 (re-opened as #64789): 1. P2 codex + Copilot: empty `scenarioIds` array was serialized as `[]` because of a truthiness check. The CLI passes an empty array when --scenario is omitted, so full-suite runs would incorrectly record an explicit empty selection. Fix: switch to a `length > 0` check so '[] or undefined' both encode as `null` in the summary run metadata. 2. Copilot: `buildQaSuiteSummaryJson` was exported for parity-gate consumers but its return type was `Record<string, unknown>`, which defeated the point of exporting it. Fix: introduce a concrete `QaSuiteSummaryJson` type that matches the JSON shape 1-for-1 and make the builder return it. Downstream code (parity gate, parity run wrapper) can now import the type and keep consumers type-checked. 3. Copilot: `QaSuiteSummaryJsonParams.providerMode` re-declared the `'mock-openai' \| 'live-frontier'` string union even though `QaProviderMode` is already imported from model-selection.ts. Fix: reuse `QaProviderMode` so provider-mode additions flow through both types at once. 4. Copilot: test fixtures omitted `steps` from the fake scenario results, creating shape drift with the real suite scenario-result shape. Fix: pad the test fixtures with `steps: []` and tighten the scenarioIds assertion to read `json.run.scenarioIds` directly (the new concrete return type makes the type-cast unnecessary). New regression: `treats an empty scenarioIds array as unspecified (no filter)` — passes `scenarioIds: []` and asserts the summary records `scenarioIds: null`. Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (6/6 pass). Refs #64227 * qa-lab: record executed scenarioIds in summary run metadata Addresses the pass-3 codex-connector P2 on #64789 (repl of #64689): `run.scenarioIds` was copied from the raw `params.scenarioIds` caller input, but `runQaSuite` normalizes that input through `selectQaSuiteScenarios` which dedupes via `Set` and reorders the selection to catalog order. When callers repeat --scenario ids or pass them in non-catalog order, the summary metadata drifted from the scenarios actually executed, which can make parity/report tooling treat equivalent runs as different or trust inaccurate provenance. Fix: both writeQaSuiteArtifacts call sites in runQaSuite now pass `selectedCatalogScenarios.map(scenario => scenario.id)` instead of `params?.scenarioIds`, so the summary records the post-selection executed list. This also covers the full-suite case automatically (the executed list is the full lane-filtered catalog), giving parity consumers a stable record of exactly which scenarios landed in the run regardless of how the caller phrased the request. buildQaSuiteSummaryJson's `length > 0 ? [...] : null` pass-2 semantics are preserved so the public helper still treats an empty array as 'unspecified' for any future caller that legitimately passes one. Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (6/6 pass). Refs #64227 * qa-lab: preserve null scenarioIds for unfiltered suite runs Addresses the pass-4 codex-connector P2 on #64789: the pass-3 fix always passed `selectedCatalogScenarios.map(...)` to writeQaSuiteArtifacts, which made unfiltered full-suite runs indistinguishable from an explicit all-scenarios selection in the summary metadata. The 'unfiltered → null' semantic (documented in the buildQaSuiteSummaryJson JSDoc and exercised by the "treats an empty scenarioIds array as unspecified" regression) was lost. Fix: both writeQaSuiteArtifacts call sites now condition on the caller's original `params.scenarioIds`. When the caller passed an explicit non-empty filter, record the post-selection executed list (pass-3 behavior, preserving Set-dedupe + catalog-order normalization). When the caller passed undefined or an empty array, pass undefined to writeQaSuiteArtifacts so buildQaSuiteSummaryJson's length-check serializes null (pass-2 behavior, preserving unfiltered semantics). This keeps both codex-connector findings satisfied simultaneously: - explicit --scenario filter reorders/dedupes through the executed list, not the raw caller input - unfiltered full-suite run records null, not a full catalog dump that would shadow "explicit all-scenarios" selections Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (6/6 pass). Refs #64227 * qa-lab: reuse QaProviderMode in writeQaSuiteArtifacts param type * qa-lab: stage mock auth profiles so the parity gate runs without real credentials * fix(qa): clean up mock auth staging follow-ups * ci: add parity-gate workflow that runs the GPT-5.4 vs Opus 4.6 gate end-to-end against the qa-lab mock * ci: use supported parity gate runner label * ci: watch gateway changes in parity gate * docs: pin parity runbook alternate models * fix(ci): watch qa-channel parity inputs * qa: roll up parity proof closeout * qa: harden mock parity review fixes * qa-lab: fix review findings — comment wording, placeholder key, exported type, ordering assertion, remove false-positive positive-tone detection * qa: fix memory-recall scenario count, update criterion 2 comment, cache fetchJson in model-switch * qa-lab: clean up positive-tone comment + fix stale test expectations * qa: pin workflow Node version to 22.14.0 + fix stale label-match wording * qa-lab: refresh mock provider routing expectation * docs: drop stale parity rollup rewrite from proof slice * qa: run parity gate against mock lane * deps: sync qa-lab lockfile * build: refresh a2ui bundle hash * ci: widen parity gate triggers --------- Co-authored-by: Eva <eva@100yen.org>	2026-04-13 13:01:54 +09:00
Josh Avant	3d07dfbb65	feat(qa-lab): add Convex credential broker and admin CLI (#65596 ) * QA Lab: add Convex credential source for Telegram lane * QA Lab: scaffold Convex credential broker * QA Lab: add Convex credential admin CLI * QA Lab: harden Convex credential security paths * QA Broker: validate Telegram payloads on admin add * fix: note QA Convex credential broker in changelog (#65596) (thanks @joshavant)	2026-04-12 22:03:42 -05:00
Peter Steinberger	5da237c887	fix(ci): refresh qa-lab lockfile	2026-04-12 19:45:46 -07:00
Peter Steinberger	20266c14cb	feat(qa-lab): add control ui qa-channel roundtrip scenario	2026-04-12 19:41:06 -07:00
Peter Steinberger	f682413f57	feat(qa-channel): forward inbound media attachments	2026-04-12 19:41:06 -07:00
Peter Steinberger	1a47660518	feat(browser): add qa web runtime support	2026-04-12 19:41:06 -07:00
pashpashpash	c848ebc8ce	agents: split GPT-5 prompt and retry behavior (#65597 ) * agents: split GPT-5 prompt and retry behavior * agents: fix GPT-5 review follow-ups * agents: address GPT-5 review follow-ups * agents: avoid replaying side-effectful GPT retries * agents: mark subagent control as mutating * agents: fail closed on single-action retries * commands: stabilize channel legacy doctor migration test * agents: narrow single-action retry promise trigger	2026-04-12 18:52:22 -07:00
pashpashpash	de1b6abf94	test(memory-core): freeze dreaming session-ingest clocks (#65605 )	2026-04-12 17:24:34 -07:00
Peter Steinberger	9dbbee8a02	fix(test): align trace directive type stubs	2026-04-13 00:20:52 +01:00
Peter Steinberger	cfd5f9e4e3	test(e2e): repair OpenShell prerelease smoke	2026-04-13 00:20:51 +01:00
Marcus Castro	9af8288c05	fix(whatsapp): send group reactions with target participant (#65512 )	2026-04-12 20:00:19 -03:00
Marcus Castro	403783a3b1	fix(tts): correct tagged TTS syntax guidance (#65573 )	2026-04-12 19:41:13 -03:00
pashpashpash	f5447aab88	OpenAI: strengthen heartbeat overlay guidance (#65148 )	2026-04-13 06:47:40 +09:00
pashpashpash	383c854313	CI: fix mainline regression blockers (#65269 ) * MSTeams: align logger test expectations * Gateway: fix CI follow-up regressions * Config: refresh generated schema baseline * VoiceCall: type webhook test doubles * CI: retrigger blocker workflow * CI: retrigger retry workflow * Agents: fix current mainline agentic regressions * Agents: type auth controller test mock * CI: retrigger blocker validation * Agents: repair OpenAI replay pairing order	2026-04-13 06:18:37 +09:00
Peter Steinberger	1ea332a658	fix: repair CI type checks	2026-04-12 12:04:59 -07:00
Peter Steinberger	fcee268373	feat(qa-lab): support scenario-defined plugin runs	2026-04-12 11:59:50 -07:00
Vincent Koc	ea71a59127	fix(imessage): repair monitor retry type checks	2026-04-12 19:57:37 +01:00
Peter Steinberger	e4841d767d	test: stabilize loaded full-suite checks	2026-04-12 11:52:56 -07:00
Peter Steinberger	d35cc6ef86	fix(discord): declare gateway heartbeat timeout state	2026-04-12 11:52:56 -07:00
Peter Steinberger	cb5a25d8d8	fix(discord): normalize legacy streaming aliases	2026-04-12 11:52:56 -07:00
Peter Steinberger	fa87c6334a	fix(imessage): align monitor retry types	2026-04-12 11:52:33 -07:00
saram ali	acdf2b1c8a	fix(memory-core): match daily notes stored in memory/ subdirectories (#64682 ) * fix(memory-core): match daily notes in memory/ subdirectories in isShortTermMemoryPath * fix(memory-core): exclude dream reports from short-term recall * fix(memory-core): widen short-term recall path matching * docs(changelog): note short-term recall fix --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 19:40:59 +01:00
Vincent Koc	35a784c165	fix(imessage): retry watch.subscribe startup failures (#65482 ) * fix(imessage): retry watch.subscribe startup failures * fix(imessage): sanitize watch error logging	2026-04-12 19:40:19 +01:00
Peter Steinberger	c8347e70da	fix: align trace directive types	2026-04-12 11:30:44 -07:00
Peter Steinberger	e76c2812b7	style: apply oxfmt	2026-04-12 11:28:43 -07:00
Marcus Castro	aa023e4283	refactor(whatsapp): centralize account connection lifecycle (#65427 ) * refactor(whatsapp): centralize account connection lifecycle * fix(whatsapp): harden controller open failure cleanup * refactor(whatsapp): remove active listener fallback path * fix(whatsapp): isolate controller registry state * debug(whatsapp): trace typing presence updates * docs(changelog): add whatsapp lifecycle fix note * debug(whatsapp): log global presence mode * chore(whatsapp): remove debug presence logs --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 15:24:49 -03:00
Tak Hoffman	c37e49f275	Add /trace toggle and fix Active Memory diagnostics	2026-04-12 13:20:22 -05:00
Peter Steinberger	910a0e40d2	chore: update dependencies	2026-04-12 19:19:06 +01:00
Vincent Koc	d4fb7d893d	fix(ci): repair main tsgo regressions	2026-04-12 19:14:00 +01:00
Peter Steinberger	c4412c6b0c	fix: compact discord allowlist resolution logs	2026-04-12 19:08:59 +01:00
Marcus Castro	000fc7f233	refactor(qa): add shared QA channel contract and harden worker startup (#64562 ) * refactor(qa): add shared transport contract and suite migration * refactor(qa): harden worker gateway startup * fix(qa): scope waits and sanitize shutdown artifacts * fix(qa): confine artifacts and redact preserved logs * fix(qa): block symlink escapes in artifact paths * fix(gateway): clear shutdown race timers * fix(qa): harden shutdown cleanup paths * fix(qa): sanitize gateway logs in thrown errors * fix(qa): harden suite startup and artifact paths * fix(qa): stage bundled plugins from mutated config * fix(qa): broaden gateway log bearer redaction * fix(qa-channel): restore runtime export * fix(qa): stop failed gateway startups as a process tree * fix(qa-channel): load runtime hook from api surface	2026-04-12 15:02:57 -03:00
jasonxargs-boop	2204753b62	fix(memory-core): fix macOS chokidar glob issue by watching memory dir directly (#64711 ) * fix(memory-core): fix macOS chokidar glob issue by watching memory dir directly * fix(memory-core): ignore non-markdown memory watch churn * fix(memory-core): allow multimodal watch events * test(memory-core): type watcher ignore callback --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 18:53:20 +01:00
Peter Steinberger	15b86ac6d0	fix: narrow qmd defaults and clawblocker memory	2026-04-12 18:52:06 +01:00
saram ali	7995e408ce	fix(discord): clear stale heartbeat timers in SafeGatewayPlugin.connect() (#65087 ) * fix(discord): clear stale heartbeat timers in SafeGatewayPlugin.connect() The @buape/carbon@0.15.0 heartbeat setup has a race where stopHeartbeat() runs before heartbeatInterval is assigned, leaving a stale setInterval with a closed reconnectCallback. When the stale interval fires ~41s later it throws an uncaught exception that bypasses the EventEmitter error path and crashes the gateway process via process.on('uncaughtException'). Add a connect() override in SafeGatewayPlugin that unconditionally clears both heartbeatInterval and firstHeartbeatTimeout before calling super. The parent's connect() only calls stopHeartbeat() when isConnecting=false; when isConnecting=true it returns early without clearing — this override fills that gap. Fixes #65009. Related: #64011, #63387, #62038. * test(discord): assert super.connect() delegation in SafeGatewayPlugin tests * fix(ci): update raw-fetch allowlist line numbers for gateway-plugin.ts The connect() override added in the heartbeat fix shifted the two pre-existing fetch() callsites from lines 370/436 to 387/453. * docs(changelog): add discord heartbeat crash note * test(cli): align plugin registry load-context mock --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 18:40:04 +01:00
Peter Steinberger	a8e140e395	chore: bump version to 2026.4.12	2026-04-12 10:37:18 -07:00
Anonymous Amit	42590106ab	improve memory fallback lexical ranking (#65395 ) * improve memory fallback lexical ranking * use neutral lexical fallback fixtures * fix(memory-core): keep lexical boosts out of hybrid search --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 18:36:28 +01:00
Vincent Koc	8a4a63ca07	fix(memory-core): use all dreaming signals for light confidence	2026-04-12 18:30:35 +01:00
Vincent Koc	077cfca229	fix(memory-core): unblock dreaming-only promotion	2026-04-12 18:14:06 +01:00
zhouhe-xydt	879bb5dd91	fix(memory-wiki): support Unicode characters in slugifyWikiSegment (#64742 ) * fix(memory-wiki): support Unicode characters in slugifyWikiSegment Replace ASCII-only regex with Unicode-aware regex to preserve CJK, Cyrillic, Arabic, and other non-ASCII characters in wiki slugs. Fixes #64620 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(memory-wiki): cover Unicode slug regressions * fix(memory-wiki): preserve combining marks in slugs * fix(memory-wiki): cap composed source filenames --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 17:54:41 +01:00
MrBrain	346e38e275	fix(memory-core): isolate dreaming narrative sessions per workspace (#61674 ) * fix(memory-core): isolate dreaming narrative sessions per workspace * chore(changelog): add narrative isolation note --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 17:39:28 +01:00
Sergiusz	079eb18bf7	fix: harden dreaming narrative session cleanup (#65320 ) * fix: harden dreaming narrative session cleanup * fix(memory-core): harden narrative cleanup * fix(memory-core): preserve fallback narrative sessions --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 17:33:47 +01:00
Pengfei Ni	aff8a0c0e7	fix(config): resolve CLI command aliases against parent plugin in plugins.allow (#64748 ) (#64779 ) * fix(config): resolve CLI command aliases against parent plugin in plugins.allow (#64748) The CLI allow guard checked command names (e.g. 'wiki') directly against plugins.allow, missing the parent plugin ('memory-wiki'). Additionally, memory-wiki did not declare 'wiki' as a commandAlias, so doctor --fix would remove it as stale. - Add commandAliases entry for 'wiki' in memory-wiki plugin manifest - Check parent plugin ID in the CLI fallback allow guard - Add tests for both allow and deny cases * fix(cli): inject manifest registry for alias diagnostics * Update CHANGELOG.md --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 17:32:11 +01:00
Leonard Sellem	c545e4605e	fix(memory-wiki): pass app config into CLI metadata registrar (#65012 ) * fix(memory-wiki): pass config into cli metadata registrar * fix(memory-wiki): use cli context config for metadata registrar * docs(changelog): note memory-wiki cli metadata fix --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 17:30:54 +01:00
Vincent Koc	43cb94a39a	fix(doctor): preserve discord streaming downgrade compatibility	2026-04-12 17:09:08 +01:00
eric-fr4	ad826ea450	Fix WhatsApp media sends when mediaUrl is empty but mediaUrls is populated (#64394 ) * Fix WhatsApp media fallback Accept the first mediaUrls entry when mediaUrl is empty so outbound WhatsApp sends do not silently downgrade media messages to text. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(changelog): credit WhatsApp mediaUrls fallback * fix(changelog): restore 2026.4.10 release block --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 16:58:40 +01:00
Edder Talmor	5f92094d51	fix: gracefully handle missing QA scenario pack in npm distributions (closes #65082 ) (#65118 ) * fix: allow built-in chat commands to bypass plugins.allow check (closes #65083) The 'commands' CLI command is a built-in chat command registered in the chat commands registry, not a plugin-backed command. When plugins.allow is configured, the error message incorrectly suggests adding 'commands' to plugins.allow, which produces a second error because no 'commands' plugin exists. Check if the command has a plugin entry or manifest alias before suggesting plugins.allow. Built-in commands without plugin entries now proceed normally instead of showing misleading errors. * fix: gracefully handle missing QA scenario pack in npm distributions (closes #65082) The completion cache update fails with a fatal error when the qa/scenarios/index.md file is not present in the installed npm package, even though the directory is listed in package.json "files". Instead of throwing an error, return an empty QA scenario pack with default agent identity. This allows completion cache updates to succeed while QA scenarios remain unavailable in the npm distribution. The QA scenario pack is primarily used for internal testing and QA automation — it is not critical for end-user functionality. * revert: remove unintended run-main.ts changes from PR #65118 The scenario-catalog.ts fix is the correct change for this PR. The run-main.ts changes were accidentally included and cause a regression in plugins.allow error handling. * fix(qa): tolerate missing packaged scenario config --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 16:50:58 +01:00
Yanhu	3ef8f0edd8	fix(dreaming): include timezone label in diary timestamps (#65057 ) Dream diary entries in DREAMS.md and the Control UI show bare timestamps without any timezone indicator. When users have not configured a timezone, timestamps are rendered in UTC but appear to be local time, causing confusion. Add timeZoneName: "short" to the Intl.DateTimeFormat options in formatNarrativeDate so timestamps always include a timezone abbreviation (e.g. "9:46 PM UTC" or "2:46 PM PDT"). Fixes #65027	2026-04-12 16:48:40 +01:00
neo1027144	7d9e349129	[AI-assisted] fix(dreaming): use host local timezone for diary timestamps (#65034 ) * fix(dreaming): use host local timezone when timezone is not configured When `memory.dreaming.timezone` is unset, `formatNarrativeDate()` previously defaulted to UTC, causing diary timestamps in DREAMS.md and the Control UI to display UTC time as though it were the user's local time. For example, a PDT user seeing 9:46 PM instead of the correct 2:46 PM. Drop the UTC fallback so `Intl.DateTimeFormat` automatically uses the host's timezone when no explicit timezone is provided. Users who have set `agents.defaults.userTimezone` or `dreaming.timezone` are unaffected. Fixes #65027 * docs(changelog): add dreaming timezone entry * Update CHANGELOG.md --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 16:38:18 +01:00
Daniel Alkurdi	b8c95e5825	fix(memory-core): wake managed dreaming jobs immediately (#65053 ) * fix(memory-core): wake managed dreaming jobs immediately * docs(changelog): add dreaming wake entry --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>	2026-04-12 16:37:21 +01:00
Peter Steinberger	659bcc5e5b	fix: tighten codex app-server lifecycle	2026-04-12 16:15:01 +01:00

1 2 3 4 5 ...

5540 Commits