Commit Graph

5540 Commits

Author SHA1 Message Date
pashpashpash
b13844732e qa: salvage GPT-5.4 parity proof slice (#65664)
* test(qa): gate parity prose scenarios on real tool calls

Closes criterion 2 of the GPT-5.4 parity completion gate in #64227 ('no
fake progress / fake tool completion') for the two first/second-wave
parity scenarios that can currently pass with a prose-only reply.

Background: the scenario framework already exposes tool-call assertions
via /debug/requests on the mock server (see approval-turn-tool-followthrough
for the pattern). Most parity scenarios use this seam to require a specific
plannedToolName, but source-docs-discovery-report and subagent-handoff
only checked the assistant's prose text, which means a model could fabricate:

- a Worked / Failed / Blocked / Follow-up report without ever calling
  the read tool on the docs / source files the prompt named
- three labeled 'Delegated task', 'Result', 'Evidence' sections without
  ever calling sessions_spawn to delegate

Both gaps are fake-progress loopholes for the parity gate.

Changes:

- source-docs-discovery-report: require at least one read tool call tied
  to the 'worked, failed, blocked' prompt in /debug/requests. Failure
  message dumps the observed plannedToolName list for debugging.
- subagent-handoff: require at least one sessions_spawn tool call tied
  to the 'delegate' / 'subagent handoff' prompt in /debug/requests. Same
  debug-friendly failure message.

Both assertions are gated behind !env.mock so they no-op in live-frontier
mode where the real provider exposes plannedToolName through a different
channel (or not at all).

Not touched: memory-recall is also in the parity pack but its pass path
is legitimately 'read the fact from prior-turn context'. That is a valid
recall strategy, not fake progress, so it is out of scope for this PR.
memory-recall's fake-progress story (no real memory_search call) would
require bigger mock-server changes and belongs in a follow-up that
extends the mock memory pipeline.

Validation:

- pnpm test extensions/qa-lab/src/scenario-catalog.test.ts

Refs #64227

* test(qa): fix case-sensitive tool-call assertions and dedupe debug fetch

Addresses loop-6 review feedback on PR #64681:

1. Copilot / Greptile / codex-connector all flagged that the discovery
   scenario's .includes('worked, failed, blocked') assertion is
   case-sensitive but the real prompt says 'Worked, Failed, Blocked...',
   so the mock-mode assertion never matches. Fix: lowercase-normalize
   allInputText before the contains check.
2. Greptile P2: the expr and message.expr each called fetchJson
   separately, incurring two round-trips to /debug/requests. Fix: hoist
   the fetch to a set step (discoveryDebugRequests / subagentDebugRequests)
   and reuse the snapshot.
3. Copilot: the subagent-handoff assertion scanned the entire request
   log and matched the first request with 'delegate' in its input text,
   which could false-pass on a stale prior scenario. Fix: reverse the
   array and take the most recent matching request instead.

Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts
(4/4 pass).

Refs #64227

* test(qa): narrow subagent-handoff tool-call assertion to pre-tool requests

Pass-2 codex-connector P1 finding on #64681: the reverse-find pattern I
used on pass 1 usually lands on the FOLLOW-UP request after the mock
runs sessions_spawn, not the pre-tool planning request that actually
has plannedToolName === 'sessions_spawn'. The mock only plans that tool
on requests with !toolOutput (mock-openai-server.ts:662), so the
post-tool request has plannedToolName unset and the assertion fails
even when the handoff succeeded.

Fix: switch the assertion back to a forward .some() match but add a
!request.toolOutput filter so the match is pinned to the pre-tool
planning phase. The case-insensitive regex, the fetchJson dedupe, and
the failure-message diagnostic from pass 1 are unchanged.

Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts
(4/4 pass).

Refs #64227

* test(qa): pin subagent-handoff tool-call assertion to scenario prompt

Addresses the pass-3 codex-connector P1 on #64681: the pass-2 fix
filtered to pre-tool requests but still used a broad
`/delegate|subagent handoff/i` regex. The `subagent-fanout-synthesis`
scenario runs BEFORE `subagent-handoff` in catalog order (scenarios
are sorted by path), and the fanout prompt reads
'Subagent fanout synthesis check: delegate exactly two bounded
subagents sequentially' — which contains 'delegate' and also plans
sessions_spawn pre-tool. That produces a cross-scenario false pass
where the fanout's earlier sessions_spawn request satisfies the
handoff assertion even when the handoff run never delegates.

Fix: tighten the input-text match from `/delegate|subagent handoff/i`
to `/delegate one bounded qa task/i`, which is the exact scenario-
unique substring from the `subagent-handoff` config.prompt. That
pins the assertion to this scenario's request window and closes the
cross-scenario false positive.

Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts
(4/4 pass).

Refs #64227

* test(qa): align parity assertion comments with actual filter logic

Addresses two loop-7 Copilot findings on PR #64681:

1. source-docs-discovery-report.md: the explanatory comment said the
   debug request log was 'lowercased for case-insensitive matching',
   but the code actually lowercases each request's allInputText inline
   inside the .some() predicate, not the discoveryDebugRequests
   snapshot. Rewrite the comment to describe the inline-lowercase
   pattern so a future reader matches the code they see.

2. subagent-handoff.md: the comment said the assertion 'must be
   pinned to THIS scenario's request window' but the implementation
   actually relies on matching a scenario-unique prompt substring
   (/delegate one bounded qa task/i), not a request-window. Rewrite
   the comment to describe the substring pinning and keep the
   pre-tool filter rationale intact.

No runtime change; comment-only fix to keep reviewer expectations
aligned with the actual assertion shape.

Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts
(4/4 pass).

Refs #64227

* test(qa): extend tool-call assertions to image-understanding, subagent-fanout, and capability-flip scenarios

* Guard mock-only image parity assertions

* Expand agentic parity second wave

* test(qa): pad parity suspicious-pass isolation to second wave

* qa-lab: parametrize parity report title and drop stale first-wave comment

Addresses two loop-7 Copilot findings on PR #64662:

1. Hard-coded 'GPT-5.4 / Opus 4.6' markdown H1: the renderer now uses a
   template string that interpolates candidateLabel and baselineLabel, so
   any parity run (not only gpt-5.4 vs opus 4.6) renders an accurate
   title in saved reports. Default CLI flags still produce
   openai/gpt-5.4 vs anthropic/claude-opus-4-6 as the baseline pair.

2. Stale 'declared first-wave parity scenarios' comment in
   scopeSummaryToParityPack: the parity pack is now the ten-scenario
   first-wave+second-wave set (PR D + PR E). Comment updated to drop
   the first-wave qualifier and name the full QA_AGENTIC_PARITY_SCENARIOS
   constant the scope is filtering against.

New regression: 'parametrizes the markdown header from the comparison
labels' — asserts that non-default labels (openai/gpt-5.4-alt vs
openai/gpt-5.4) render in the H1.

Validation: pnpm test extensions/qa-lab/src/agentic-parity-report.test.ts
(13/13 pass).

Refs #64227

* qa-lab: fail parity gate on required scenario failures regardless of baseline parity

* test(qa): update readable-report test to cover all 10 parity scenarios

* qa-lab: strengthen parity-report fake-success detector and verify run.primaryProvider labels

* Tighten parity label and scenario checks

* fix: tighten parity label provenance checks

* fix: scope parity tool-call metrics to tool lanes

* Fix parity report label and fake-success checks

* fix(qa): tighten parity report edge cases

* qa-lab: add Anthropic /v1/messages mock route for parity baseline

Closes the last local-runnability gap on criterion 5 of the GPT-5.4 parity
completion gate in #64227 ('the parity gate shows GPT-5.4 matches or beats
Opus 4.6 on the agreed metrics').

Background: the parity gate needs two comparable scenario runs - one
against openai/gpt-5.4 and one against anthropic/claude-opus-4-6 - so the
aggregate metrics and verdict in PR D (#64441) can be computed. Today the
qa-lab mock server only implements /v1/responses, so the baseline run
against Claude Opus 4.6 requires a real Anthropic API key. That makes the
gate impossible to prove end-to-end from a local worktree and means the
CI story is always 'two real providers + quota + keys'.

This PR adds a /v1/messages Anthropic-compatible route to the existing
mock OpenAI server. The route is a thin adapter that:

- Parses Anthropic Messages API request shapes (system as string or
  [{type:text,text}], messages with string or block content, text and
  tool_result and tool_use and image blocks)
- Translates them into the ResponsesInputItem[] shape the existing shared
  scenario dispatcher (buildResponsesPayload) already understands
- Calls the shared dispatcher so both the OpenAI and Anthropic lanes run
  through the exact same scenario prompt-matching logic (same subagent
  fanout state machine, same extractRememberedFact helper, same
  '/debug/requests' telemetry)
- Converts the resulting OpenAI-format events back into an Anthropic
  message response with text and tool_use content blocks and a correct
  stop_reason (tool_use vs end_turn)

Non-streaming only: the QA suite runner falls back to non-streaming mock
mode so real Anthropic SSE isn't necessary for the parity baseline.

Also adds claude-opus-4-6 and claude-sonnet-4-6 to /v1/models so baseline
model-list probes from the suite runner resolve without extra config.

Tests added:

- advertises Anthropic claude-opus-4-6 baseline model on /v1/models
- dispatches an Anthropic /v1/messages read tool call for source discovery
  prompts (tool_use stop_reason, correct input path, /debug/requests
  records plannedToolName=read)
- dispatches Anthropic /v1/messages tool_result follow-ups through the
  shared scenario logic (subagent-handoff two-stage flow: tool_use -
  tool_result - 'Delegated task / Evidence' prose summary)

Local validation:

- pnpm test extensions/qa-lab/src/mock-openai-server.test.ts (18/18 pass)
- pnpm test extensions/qa-lab/src/mock-openai-server.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/scenario-catalog.test.ts (47/47 pass)

Refs #64227
Unblocks #64441 (parity harness) and the forthcoming qa parity run wrapper
by giving the baseline lane a local-only mock path.

* qa-lab: fix Anthropic tool_result ordering in messages adapter

Addresses the loop-6 Copilot / Greptile finding on PR #64685: in
`convertAnthropicMessagesToResponsesInput`, `tool_result` blocks were
pushed to `items` inside the per-block loop while the surrounding
user/assistant message was only pushed after the loop finished. That
reordered the function_call_output BEFORE its parent user message
whenever a user turn mixed `tool_result` with fresh text/image blocks,
which broke `extractToolOutput` (it scans AFTER the last user-role
index; function_call_output placed BEFORE that index is invisible to it)
and made the downstream scenario dispatcher behave as if no tool output
had been returned on mixed-content turns.

Fix: buffer `tool_result` and `tool_use` blocks in local arrays during
the per-block loop, push the parent role message first (when it has any
text/image pieces), then push the accumulated function_call /
function_call_output items in original order. tool_result-only user
turns still omit the parent message as before, so the non-mixed
subagent-fanout-synthesis two-stage flow that already worked keeps
working.

Regression added:

- `places tool_result after the parent user message even in mixed-content
  turns` — sends a user turn that mixes a `tool_result` block with a
  trailing fresh text block, then inspects `/debug/last-request` to
  assert that `toolOutput === 'SUBAGENT-OK'` (extractToolOutput found
  the function_call_output AFTER the last user index) and
  `prompt === 'Keep going with the fanout.'` (extractLastUserText picked
  up the trailing fresh text).

Local validation: pnpm test extensions/qa-lab/src/mock-openai-server.test.ts
(19/19 pass).

Refs #64227

* qa-lab: reject Anthropic streaming and empty model in messages mock

* qa-lab: tag mock request snapshots with a provider variant so parity runs can diff per provider

* Handle invalid Anthropic mock JSON

* fix: wire mock parity providers by model ref

* fix(qa): support Anthropic message streaming in mock parity lane

* qa-lab: record provider/model/mode in qa-suite-summary.json

Closes the 'summary cannot be label-verified' half of criterion 5 on the
GPT-5.4 parity completion gate in #64227.

Background: the parity gate in #64441 compares two qa-suite-summary.json
files and trusts whatever candidateLabel / baselineLabel the caller
passes. Today the summary JSON only contains { scenarios, counts }, so
nothing in the summary records which provider/model the run actually
used. If a maintainer swaps candidate and baseline summary paths in a
parity-report call, the verdict is silently mislabeled and nobody can
retroactively verify which run produced which summary.

Changes:

- Add a 'run' block to qa-suite-summary.json with startedAt, finishedAt,
  providerMode, primaryModel (+ provider and model splits),
  alternateModel (+ provider and model splits), fastMode, concurrency,
  scenarioIds (when explicitly filtered).
- Extract a pure 'buildQaSuiteSummaryJson(params)' helper so the summary
  JSON shape is unit-testable and the parity gate (and any future parity
  wrapper) can import the exact same type rather than reverse-engineering
  the JSON shape at runtime.
- Thread 'scenarioIds' from 'runQaSuite' into writeQaSuiteArtifacts so
  --scenario-ids flags are recorded in the summary.

Unit tests added (src/suite.summary-json.test.ts, 5 cases):

- records provider/model/mode so parity gates can verify labels
- includes scenarioIds in run metadata when provided
- records an Anthropic baseline lane cleanly for parity runs
- leaves split fields null when a model ref is malformed
- keeps scenarios and counts alongside the run metadata

This is additive: existing callers of qa-suite-summary.json continue to
see the same { scenarios, counts } shape, just with an extra run field.
No existing consumers of the JSON need to change.

The follow-up 'qa parity run' CLI wrapper (run the parity pack twice
against candidate + baseline, emit two labeled summaries in one command)
stacks cleanly on top of this change and will land as a separate PR
once #64441 and #64662 merge so the wrapper can call runQaParityReportCommand
directly.

Local validation:

- pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (5/5 pass)
- pnpm test extensions/qa-lab/src/suite.summary-json.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/scenario-catalog.test.ts (34/34 pass)

Refs #64227
Unblocks the final parity run for #64441 / #64662 by making summaries
self-describing.

* qa-lab: strengthen qa-suite-summary builder types and empty-array semantics

Addresses 4 loop-6 Copilot / codex-connector findings on PR #64689
(re-opened as #64789):

1. P2 codex + Copilot: empty `scenarioIds` array was serialized as
   `[]` because of a truthiness check. The CLI passes an empty array
   when --scenario is omitted, so full-suite runs would incorrectly
   record an explicit empty selection. Fix: switch to a
   `length > 0` check so '[] or undefined' both encode as `null`
   in the summary run metadata.

2. Copilot: `buildQaSuiteSummaryJson` was exported for parity-gate
   consumers but its return type was `Record<string, unknown>`, which
   defeated the point of exporting it. Fix: introduce a concrete
   `QaSuiteSummaryJson` type that matches the JSON shape 1-for-1 and
   make the builder return it. Downstream code (parity gate, parity
   run wrapper) can now import the type and keep consumers
   type-checked.

3. Copilot: `QaSuiteSummaryJsonParams.providerMode` re-declared the
   `'mock-openai' | 'live-frontier'` string union even though
   `QaProviderMode` is already imported from model-selection.ts. Fix:
   reuse `QaProviderMode` so provider-mode additions flow through
   both types at once.

4. Copilot: test fixtures omitted `steps` from the fake scenario
   results, creating shape drift with the real suite scenario-result
   shape. Fix: pad the test fixtures with `steps: []` and tighten the
   scenarioIds assertion to read `json.run.scenarioIds` directly (the
   new concrete return type makes the type-cast unnecessary).

New regression: `treats an empty scenarioIds array as unspecified
(no filter)` — passes `scenarioIds: []` and asserts the summary
records `scenarioIds: null`.

Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts
(6/6 pass).

Refs #64227

* qa-lab: record executed scenarioIds in summary run metadata

Addresses the pass-3 codex-connector P2 on #64789 (repl of #64689):
`run.scenarioIds` was copied from the raw `params.scenarioIds`
caller input, but `runQaSuite` normalizes that input through
`selectQaSuiteScenarios` which dedupes via `Set` and reorders the
selection to catalog order. When callers repeat --scenario ids or
pass them in non-catalog order, the summary metadata drifted from
the scenarios actually executed, which can make parity/report
tooling treat equivalent runs as different or trust inaccurate
provenance.

Fix: both writeQaSuiteArtifacts call sites in runQaSuite now pass
`selectedCatalogScenarios.map(scenario => scenario.id)` instead of
`params?.scenarioIds`, so the summary records the post-selection
executed list. This also covers the full-suite case automatically
(the executed list is the full lane-filtered catalog), giving parity
consumers a stable record of exactly which scenarios landed in the
run regardless of how the caller phrased the request.

buildQaSuiteSummaryJson's `length > 0 ? [...] : null` pass-2
semantics are preserved so the public helper still treats an empty
array as 'unspecified' for any future caller that legitimately passes
one.

Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts
(6/6 pass).

Refs #64227

* qa-lab: preserve null scenarioIds for unfiltered suite runs

Addresses the pass-4 codex-connector P2 on #64789: the pass-3 fix
always passed `selectedCatalogScenarios.map(...)` to
writeQaSuiteArtifacts, which made unfiltered full-suite runs
indistinguishable from an explicit all-scenarios selection in the
summary metadata. The 'unfiltered → null' semantic (documented in
the buildQaSuiteSummaryJson JSDoc and exercised by the
"treats an empty scenarioIds array as unspecified" regression) was
lost.

Fix: both writeQaSuiteArtifacts call sites now condition on the
caller's original `params.scenarioIds`. When the caller passed an
explicit non-empty filter, record the post-selection executed list
(pass-3 behavior, preserving Set-dedupe + catalog-order
normalization). When the caller passed undefined or an empty array,
pass undefined to writeQaSuiteArtifacts so buildQaSuiteSummaryJson's
length-check serializes null (pass-2 behavior, preserving unfiltered
semantics).

This keeps both codex-connector findings satisfied simultaneously:
- explicit --scenario filter reorders/dedupes through the executed
  list, not the raw caller input
- unfiltered full-suite run records null, not a full catalog dump
  that would shadow "explicit all-scenarios" selections

Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts
(6/6 pass).

Refs #64227

* qa-lab: reuse QaProviderMode in writeQaSuiteArtifacts param type

* qa-lab: stage mock auth profiles so the parity gate runs without real credentials

* fix(qa): clean up mock auth staging follow-ups

* ci: add parity-gate workflow that runs the GPT-5.4 vs Opus 4.6 gate end-to-end against the qa-lab mock

* ci: use supported parity gate runner label

* ci: watch gateway changes in parity gate

* docs: pin parity runbook alternate models

* fix(ci): watch qa-channel parity inputs

* qa: roll up parity proof closeout

* qa: harden mock parity review fixes

* qa-lab: fix review findings — comment wording, placeholder key, exported type, ordering assertion, remove false-positive positive-tone detection

* qa: fix memory-recall scenario count, update criterion 2 comment, cache fetchJson in model-switch

* qa-lab: clean up positive-tone comment + fix stale test expectations

* qa: pin workflow Node version to 22.14.0 + fix stale label-match wording

* qa-lab: refresh mock provider routing expectation

* docs: drop stale parity rollup rewrite from proof slice

* qa: run parity gate against mock lane

* deps: sync qa-lab lockfile

* build: refresh a2ui bundle hash

* ci: widen parity gate triggers

---------

Co-authored-by: Eva <eva@100yen.org>
2026-04-13 13:01:54 +09:00
Josh Avant
3d07dfbb65 feat(qa-lab): add Convex credential broker and admin CLI (#65596)
* QA Lab: add Convex credential source for Telegram lane

* QA Lab: scaffold Convex credential broker

* QA Lab: add Convex credential admin CLI

* QA Lab: harden Convex credential security paths

* QA Broker: validate Telegram payloads on admin add

* fix: note QA Convex credential broker in changelog (#65596) (thanks @joshavant)
2026-04-12 22:03:42 -05:00
Peter Steinberger
5da237c887 fix(ci): refresh qa-lab lockfile 2026-04-12 19:45:46 -07:00
Peter Steinberger
20266c14cb feat(qa-lab): add control ui qa-channel roundtrip scenario 2026-04-12 19:41:06 -07:00
Peter Steinberger
f682413f57 feat(qa-channel): forward inbound media attachments 2026-04-12 19:41:06 -07:00
Peter Steinberger
1a47660518 feat(browser): add qa web runtime support 2026-04-12 19:41:06 -07:00
pashpashpash
c848ebc8ce agents: split GPT-5 prompt and retry behavior (#65597)
* agents: split GPT-5 prompt and retry behavior

* agents: fix GPT-5 review follow-ups

* agents: address GPT-5 review follow-ups

* agents: avoid replaying side-effectful GPT retries

* agents: mark subagent control as mutating

* agents: fail closed on single-action retries

* commands: stabilize channel legacy doctor migration test

* agents: narrow single-action retry promise trigger
2026-04-12 18:52:22 -07:00
pashpashpash
de1b6abf94 test(memory-core): freeze dreaming session-ingest clocks (#65605) 2026-04-12 17:24:34 -07:00
Peter Steinberger
9dbbee8a02 fix(test): align trace directive type stubs 2026-04-13 00:20:52 +01:00
Peter Steinberger
cfd5f9e4e3 test(e2e): repair OpenShell prerelease smoke 2026-04-13 00:20:51 +01:00
Marcus Castro
9af8288c05 fix(whatsapp): send group reactions with target participant (#65512) 2026-04-12 20:00:19 -03:00
Marcus Castro
403783a3b1 fix(tts): correct tagged TTS syntax guidance (#65573) 2026-04-12 19:41:13 -03:00
pashpashpash
f5447aab88 OpenAI: strengthen heartbeat overlay guidance (#65148) 2026-04-13 06:47:40 +09:00
pashpashpash
383c854313 CI: fix mainline regression blockers (#65269)
* MSTeams: align logger test expectations

* Gateway: fix CI follow-up regressions

* Config: refresh generated schema baseline

* VoiceCall: type webhook test doubles

* CI: retrigger blocker workflow

* CI: retrigger retry workflow

* Agents: fix current mainline agentic regressions

* Agents: type auth controller test mock

* CI: retrigger blocker validation

* Agents: repair OpenAI replay pairing order
2026-04-13 06:18:37 +09:00
Peter Steinberger
1ea332a658 fix: repair CI type checks 2026-04-12 12:04:59 -07:00
Peter Steinberger
fcee268373 feat(qa-lab): support scenario-defined plugin runs 2026-04-12 11:59:50 -07:00
Vincent Koc
ea71a59127 fix(imessage): repair monitor retry type checks 2026-04-12 19:57:37 +01:00
Peter Steinberger
e4841d767d test: stabilize loaded full-suite checks 2026-04-12 11:52:56 -07:00
Peter Steinberger
d35cc6ef86 fix(discord): declare gateway heartbeat timeout state 2026-04-12 11:52:56 -07:00
Peter Steinberger
cb5a25d8d8 fix(discord): normalize legacy streaming aliases 2026-04-12 11:52:56 -07:00
Peter Steinberger
fa87c6334a fix(imessage): align monitor retry types 2026-04-12 11:52:33 -07:00
saram ali
acdf2b1c8a fix(memory-core): match daily notes stored in memory/ subdirectories (#64682)
* fix(memory-core): match daily notes in memory/ subdirectories in isShortTermMemoryPath

* fix(memory-core): exclude dream reports from short-term recall

* fix(memory-core): widen short-term recall path matching

* docs(changelog): note short-term recall fix

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 19:40:59 +01:00
Vincent Koc
35a784c165 fix(imessage): retry watch.subscribe startup failures (#65482)
* fix(imessage): retry watch.subscribe startup failures

* fix(imessage): sanitize watch error logging
2026-04-12 19:40:19 +01:00
Peter Steinberger
c8347e70da fix: align trace directive types 2026-04-12 11:30:44 -07:00
Peter Steinberger
e76c2812b7 style: apply oxfmt 2026-04-12 11:28:43 -07:00
Marcus Castro
aa023e4283 refactor(whatsapp): centralize account connection lifecycle (#65427)
* refactor(whatsapp): centralize account connection lifecycle

* fix(whatsapp): harden controller open failure cleanup

* refactor(whatsapp): remove active listener fallback path

* fix(whatsapp): isolate controller registry state

* debug(whatsapp): trace typing presence updates

* docs(changelog): add whatsapp lifecycle fix note

* debug(whatsapp): log global presence mode

* chore(whatsapp): remove debug presence logs

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 15:24:49 -03:00
Tak Hoffman
c37e49f275 Add /trace toggle and fix Active Memory diagnostics 2026-04-12 13:20:22 -05:00
Peter Steinberger
910a0e40d2 chore: update dependencies 2026-04-12 19:19:06 +01:00
Vincent Koc
d4fb7d893d fix(ci): repair main tsgo regressions 2026-04-12 19:14:00 +01:00
Peter Steinberger
c4412c6b0c fix: compact discord allowlist resolution logs 2026-04-12 19:08:59 +01:00
Marcus Castro
000fc7f233 refactor(qa): add shared QA channel contract and harden worker startup (#64562)
* refactor(qa): add shared transport contract and suite migration

* refactor(qa): harden worker gateway startup

* fix(qa): scope waits and sanitize shutdown artifacts

* fix(qa): confine artifacts and redact preserved logs

* fix(qa): block symlink escapes in artifact paths

* fix(gateway): clear shutdown race timers

* fix(qa): harden shutdown cleanup paths

* fix(qa): sanitize gateway logs in thrown errors

* fix(qa): harden suite startup and artifact paths

* fix(qa): stage bundled plugins from mutated config

* fix(qa): broaden gateway log bearer redaction

* fix(qa-channel): restore runtime export

* fix(qa): stop failed gateway startups as a process tree

* fix(qa-channel): load runtime hook from api surface
2026-04-12 15:02:57 -03:00
jasonxargs-boop
2204753b62 fix(memory-core): fix macOS chokidar glob issue by watching memory dir directly (#64711)
* fix(memory-core): fix macOS chokidar glob issue by watching memory dir directly

* fix(memory-core): ignore non-markdown memory watch churn

* fix(memory-core): allow multimodal watch events

* test(memory-core): type watcher ignore callback

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 18:53:20 +01:00
Peter Steinberger
15b86ac6d0 fix: narrow qmd defaults and clawblocker memory 2026-04-12 18:52:06 +01:00
saram ali
7995e408ce fix(discord): clear stale heartbeat timers in SafeGatewayPlugin.connect() (#65087)
* fix(discord): clear stale heartbeat timers in SafeGatewayPlugin.connect()

The @buape/carbon@0.15.0 heartbeat setup has a race where stopHeartbeat()
runs before heartbeatInterval is assigned, leaving a stale setInterval with
a closed reconnectCallback. When the stale interval fires ~41s later it
throws an uncaught exception that bypasses the EventEmitter error path and
crashes the gateway process via process.on('uncaughtException').

Add a connect() override in SafeGatewayPlugin that unconditionally clears
both heartbeatInterval and firstHeartbeatTimeout before calling super. The
parent's connect() only calls stopHeartbeat() when isConnecting=false; when
isConnecting=true it returns early without clearing — this override fills
that gap.

Fixes #65009. Related: #64011, #63387, #62038.

* test(discord): assert super.connect() delegation in SafeGatewayPlugin tests

* fix(ci): update raw-fetch allowlist line numbers for gateway-plugin.ts

The connect() override added in the heartbeat fix shifted the two
pre-existing fetch() callsites from lines 370/436 to 387/453.

* docs(changelog): add discord heartbeat crash note

* test(cli): align plugin registry load-context mock

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 18:40:04 +01:00
Peter Steinberger
a8e140e395 chore: bump version to 2026.4.12 2026-04-12 10:37:18 -07:00
Anonymous Amit
42590106ab improve memory fallback lexical ranking (#65395)
* improve memory fallback lexical ranking

* use neutral lexical fallback fixtures

* fix(memory-core): keep lexical boosts out of hybrid search

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 18:36:28 +01:00
Vincent Koc
8a4a63ca07 fix(memory-core): use all dreaming signals for light confidence 2026-04-12 18:30:35 +01:00
Vincent Koc
077cfca229 fix(memory-core): unblock dreaming-only promotion 2026-04-12 18:14:06 +01:00
zhouhe-xydt
879bb5dd91 fix(memory-wiki): support Unicode characters in slugifyWikiSegment (#64742)
* fix(memory-wiki): support Unicode characters in slugifyWikiSegment

Replace ASCII-only regex with Unicode-aware regex to preserve CJK,
Cyrillic, Arabic, and other non-ASCII characters in wiki slugs.

Fixes #64620

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(memory-wiki): cover Unicode slug regressions

* fix(memory-wiki): preserve combining marks in slugs

* fix(memory-wiki): cap composed source filenames

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:54:41 +01:00
MrBrain
346e38e275 fix(memory-core): isolate dreaming narrative sessions per workspace (#61674)
* fix(memory-core): isolate dreaming narrative sessions per workspace

* chore(changelog): add narrative isolation note

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:39:28 +01:00
Sergiusz
079eb18bf7 fix: harden dreaming narrative session cleanup (#65320)
* fix: harden dreaming narrative session cleanup

* fix(memory-core): harden narrative cleanup

* fix(memory-core): preserve fallback narrative sessions

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:33:47 +01:00
Pengfei Ni
aff8a0c0e7 fix(config): resolve CLI command aliases against parent plugin in plugins.allow (#64748) (#64779)
* fix(config): resolve CLI command aliases against parent plugin in plugins.allow (#64748)

The CLI allow guard checked command names (e.g. 'wiki') directly against
plugins.allow, missing the parent plugin ('memory-wiki'). Additionally,
memory-wiki did not declare 'wiki' as a commandAlias, so doctor --fix
would remove it as stale.

- Add commandAliases entry for 'wiki' in memory-wiki plugin manifest
- Check parent plugin ID in the CLI fallback allow guard
- Add tests for both allow and deny cases

* fix(cli): inject manifest registry for alias diagnostics

* Update CHANGELOG.md

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:32:11 +01:00
Leonard Sellem
c545e4605e fix(memory-wiki): pass app config into CLI metadata registrar (#65012)
* fix(memory-wiki): pass config into cli metadata registrar

* fix(memory-wiki): use cli context config for metadata registrar

* docs(changelog): note memory-wiki cli metadata fix

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:30:54 +01:00
Vincent Koc
43cb94a39a fix(doctor): preserve discord streaming downgrade compatibility 2026-04-12 17:09:08 +01:00
eric-fr4
ad826ea450 Fix WhatsApp media sends when mediaUrl is empty but mediaUrls is populated (#64394)
* Fix WhatsApp media fallback

Accept the first mediaUrls entry when mediaUrl is empty so outbound WhatsApp sends do not silently downgrade media messages to text.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(changelog): credit WhatsApp mediaUrls fallback

* fix(changelog): restore 2026.4.10 release block

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 16:58:40 +01:00
Edder Talmor
5f92094d51 fix: gracefully handle missing QA scenario pack in npm distributions (closes #65082) (#65118)
* fix: allow built-in chat commands to bypass plugins.allow check (closes #65083)

The 'commands' CLI command is a built-in chat command registered in the
chat commands registry, not a plugin-backed command. When plugins.allow
is configured, the error message incorrectly suggests adding 'commands'
to plugins.allow, which produces a second error because no 'commands'
plugin exists.

Check if the command has a plugin entry or manifest alias before
suggesting plugins.allow. Built-in commands without plugin entries
now proceed normally instead of showing misleading errors.

* fix: gracefully handle missing QA scenario pack in npm distributions (closes #65082)

The completion cache update fails with a fatal error when the
qa/scenarios/index.md file is not present in the installed npm package,
even though the directory is listed in package.json "files".

Instead of throwing an error, return an empty QA scenario pack with
default agent identity. This allows completion cache updates to succeed
while QA scenarios remain unavailable in the npm distribution.

The QA scenario pack is primarily used for internal testing and QA
automation — it is not critical for end-user functionality.

* revert: remove unintended run-main.ts changes from PR #65118

The scenario-catalog.ts fix is the correct change for this PR.
The run-main.ts changes were accidentally included and cause a
regression in plugins.allow error handling.

* fix(qa): tolerate missing packaged scenario config

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 16:50:58 +01:00
Yanhu
3ef8f0edd8 fix(dreaming): include timezone label in diary timestamps (#65057)
Dream diary entries in DREAMS.md and the Control UI show bare
timestamps without any timezone indicator. When users have not
configured a timezone, timestamps are rendered in UTC but appear to be
local time, causing confusion.

Add timeZoneName: "short" to the Intl.DateTimeFormat options in
formatNarrativeDate so timestamps always include a timezone
abbreviation (e.g. "9:46 PM UTC" or "2:46 PM PDT").

Fixes #65027
2026-04-12 16:48:40 +01:00
neo1027144
7d9e349129 [AI-assisted] fix(dreaming): use host local timezone for diary timestamps (#65034)
* fix(dreaming): use host local timezone when timezone is not configured

When `memory.dreaming.timezone` is unset, `formatNarrativeDate()`
previously defaulted to UTC, causing diary timestamps in DREAMS.md and
the Control UI to display UTC time as though it were the user's local
time. For example, a PDT user seeing 9:46 PM instead of the correct
2:46 PM.

Drop the UTC fallback so `Intl.DateTimeFormat` automatically uses the
host's timezone when no explicit timezone is provided. Users who have
set `agents.defaults.userTimezone` or `dreaming.timezone` are
unaffected.

Fixes #65027

* docs(changelog): add dreaming timezone entry

* Update CHANGELOG.md

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 16:38:18 +01:00
Daniel Alkurdi
b8c95e5825 fix(memory-core): wake managed dreaming jobs immediately (#65053)
* fix(memory-core): wake managed dreaming jobs immediately

* docs(changelog): add dreaming wake entry

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 16:37:21 +01:00
Peter Steinberger
659bcc5e5b fix: tighten codex app-server lifecycle 2026-04-12 16:15:01 +01:00