Commit Graph

5546 Commits

Author SHA1 Message Date
Vincent Koc
98c2a38bc3 fix(matrix): mirror runtime deps for docker builds
(cherry picked from commit 1c843552b7)
2026-04-13 10:24:18 +01:00
Sliverp
ddb7a8dd80 Feat/fix qq ssrf url list (#65788)
* fix: update qqbot media host allowlist

* fix: update qqbot media host allowlist

* fix: update qqbot media host allowlist

* fix: update qqbot media host allowlist
2026-04-13 15:49:32 +08:00
Rugved Somwanshi
0cfb83edfa feat: LM Studio Integration (#53248)
* Feat: LM Studio Integration

* Format

* Support usage in streaming true

Fix token count

* Add custom window check

* Drop max tokens fallback

* tweak docs

Update generated

* Avoid error if stale header does not resolve

* Fix test

* Fix test

* Fix rebase issues

Trim code

* Fix tests

Drop keyless

Fixes

* Fix linter issues in tests

* Update generated artifacts

* Do not have fatal header resoltuion for discovery

* Do the same for API key as well

* fix: honor lmstudio preload runtime auth

* fix: clear stale lmstudio header auth

* fix: lazy-load lmstudio runtime facade

* fix: preserve lmstudio shared synthetic auth

* fix: clear stale lmstudio header auth in discovery

* fix: prefer lmstudio header auth for discovery

* fix: honor lmstudio header auth in warmup paths

* fix: clear stale lmstudio profile auth

* fix: ignore lmstudio env auth on header migration

* fix: use local lmstudio setup seam

* fix: resolve lmstudio rebase fallout

---------

Co-authored-by: Frank Yang <frank.ekn@gmail.com>
2026-04-13 15:22:44 +08:00
pashpashpash
83f6a26d77 qa: keep OpenAI live defaults when auth exists 2026-04-12 23:34:54 -07:00
pashpashpash
ae4b997a00 qa: prefer codex auth for live defaults 2026-04-12 23:34:54 -07:00
pashpashpash
eede525970 qa: relax repo-contract artifact matcher 2026-04-12 22:43:22 -07:00
pashpashpash
b13844732e qa: salvage GPT-5.4 parity proof slice (#65664)
* test(qa): gate parity prose scenarios on real tool calls

Closes criterion 2 of the GPT-5.4 parity completion gate in #64227 ('no
fake progress / fake tool completion') for the two first/second-wave
parity scenarios that can currently pass with a prose-only reply.

Background: the scenario framework already exposes tool-call assertions
via /debug/requests on the mock server (see approval-turn-tool-followthrough
for the pattern). Most parity scenarios use this seam to require a specific
plannedToolName, but source-docs-discovery-report and subagent-handoff
only checked the assistant's prose text, which means a model could fabricate:

- a Worked / Failed / Blocked / Follow-up report without ever calling
  the read tool on the docs / source files the prompt named
- three labeled 'Delegated task', 'Result', 'Evidence' sections without
  ever calling sessions_spawn to delegate

Both gaps are fake-progress loopholes for the parity gate.

Changes:

- source-docs-discovery-report: require at least one read tool call tied
  to the 'worked, failed, blocked' prompt in /debug/requests. Failure
  message dumps the observed plannedToolName list for debugging.
- subagent-handoff: require at least one sessions_spawn tool call tied
  to the 'delegate' / 'subagent handoff' prompt in /debug/requests. Same
  debug-friendly failure message.

Both assertions are gated behind !env.mock so they no-op in live-frontier
mode where the real provider exposes plannedToolName through a different
channel (or not at all).

Not touched: memory-recall is also in the parity pack but its pass path
is legitimately 'read the fact from prior-turn context'. That is a valid
recall strategy, not fake progress, so it is out of scope for this PR.
memory-recall's fake-progress story (no real memory_search call) would
require bigger mock-server changes and belongs in a follow-up that
extends the mock memory pipeline.

Validation:

- pnpm test extensions/qa-lab/src/scenario-catalog.test.ts

Refs #64227

* test(qa): fix case-sensitive tool-call assertions and dedupe debug fetch

Addresses loop-6 review feedback on PR #64681:

1. Copilot / Greptile / codex-connector all flagged that the discovery
   scenario's .includes('worked, failed, blocked') assertion is
   case-sensitive but the real prompt says 'Worked, Failed, Blocked...',
   so the mock-mode assertion never matches. Fix: lowercase-normalize
   allInputText before the contains check.
2. Greptile P2: the expr and message.expr each called fetchJson
   separately, incurring two round-trips to /debug/requests. Fix: hoist
   the fetch to a set step (discoveryDebugRequests / subagentDebugRequests)
   and reuse the snapshot.
3. Copilot: the subagent-handoff assertion scanned the entire request
   log and matched the first request with 'delegate' in its input text,
   which could false-pass on a stale prior scenario. Fix: reverse the
   array and take the most recent matching request instead.

Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts
(4/4 pass).

Refs #64227

* test(qa): narrow subagent-handoff tool-call assertion to pre-tool requests

Pass-2 codex-connector P1 finding on #64681: the reverse-find pattern I
used on pass 1 usually lands on the FOLLOW-UP request after the mock
runs sessions_spawn, not the pre-tool planning request that actually
has plannedToolName === 'sessions_spawn'. The mock only plans that tool
on requests with !toolOutput (mock-openai-server.ts:662), so the
post-tool request has plannedToolName unset and the assertion fails
even when the handoff succeeded.

Fix: switch the assertion back to a forward .some() match but add a
!request.toolOutput filter so the match is pinned to the pre-tool
planning phase. The case-insensitive regex, the fetchJson dedupe, and
the failure-message diagnostic from pass 1 are unchanged.

Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts
(4/4 pass).

Refs #64227

* test(qa): pin subagent-handoff tool-call assertion to scenario prompt

Addresses the pass-3 codex-connector P1 on #64681: the pass-2 fix
filtered to pre-tool requests but still used a broad
`/delegate|subagent handoff/i` regex. The `subagent-fanout-synthesis`
scenario runs BEFORE `subagent-handoff` in catalog order (scenarios
are sorted by path), and the fanout prompt reads
'Subagent fanout synthesis check: delegate exactly two bounded
subagents sequentially' — which contains 'delegate' and also plans
sessions_spawn pre-tool. That produces a cross-scenario false pass
where the fanout's earlier sessions_spawn request satisfies the
handoff assertion even when the handoff run never delegates.

Fix: tighten the input-text match from `/delegate|subagent handoff/i`
to `/delegate one bounded qa task/i`, which is the exact scenario-
unique substring from the `subagent-handoff` config.prompt. That
pins the assertion to this scenario's request window and closes the
cross-scenario false positive.

Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts
(4/4 pass).

Refs #64227

* test(qa): align parity assertion comments with actual filter logic

Addresses two loop-7 Copilot findings on PR #64681:

1. source-docs-discovery-report.md: the explanatory comment said the
   debug request log was 'lowercased for case-insensitive matching',
   but the code actually lowercases each request's allInputText inline
   inside the .some() predicate, not the discoveryDebugRequests
   snapshot. Rewrite the comment to describe the inline-lowercase
   pattern so a future reader matches the code they see.

2. subagent-handoff.md: the comment said the assertion 'must be
   pinned to THIS scenario's request window' but the implementation
   actually relies on matching a scenario-unique prompt substring
   (/delegate one bounded qa task/i), not a request-window. Rewrite
   the comment to describe the substring pinning and keep the
   pre-tool filter rationale intact.

No runtime change; comment-only fix to keep reviewer expectations
aligned with the actual assertion shape.

Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts
(4/4 pass).

Refs #64227

* test(qa): extend tool-call assertions to image-understanding, subagent-fanout, and capability-flip scenarios

* Guard mock-only image parity assertions

* Expand agentic parity second wave

* test(qa): pad parity suspicious-pass isolation to second wave

* qa-lab: parametrize parity report title and drop stale first-wave comment

Addresses two loop-7 Copilot findings on PR #64662:

1. Hard-coded 'GPT-5.4 / Opus 4.6' markdown H1: the renderer now uses a
   template string that interpolates candidateLabel and baselineLabel, so
   any parity run (not only gpt-5.4 vs opus 4.6) renders an accurate
   title in saved reports. Default CLI flags still produce
   openai/gpt-5.4 vs anthropic/claude-opus-4-6 as the baseline pair.

2. Stale 'declared first-wave parity scenarios' comment in
   scopeSummaryToParityPack: the parity pack is now the ten-scenario
   first-wave+second-wave set (PR D + PR E). Comment updated to drop
   the first-wave qualifier and name the full QA_AGENTIC_PARITY_SCENARIOS
   constant the scope is filtering against.

New regression: 'parametrizes the markdown header from the comparison
labels' — asserts that non-default labels (openai/gpt-5.4-alt vs
openai/gpt-5.4) render in the H1.

Validation: pnpm test extensions/qa-lab/src/agentic-parity-report.test.ts
(13/13 pass).

Refs #64227

* qa-lab: fail parity gate on required scenario failures regardless of baseline parity

* test(qa): update readable-report test to cover all 10 parity scenarios

* qa-lab: strengthen parity-report fake-success detector and verify run.primaryProvider labels

* Tighten parity label and scenario checks

* fix: tighten parity label provenance checks

* fix: scope parity tool-call metrics to tool lanes

* Fix parity report label and fake-success checks

* fix(qa): tighten parity report edge cases

* qa-lab: add Anthropic /v1/messages mock route for parity baseline

Closes the last local-runnability gap on criterion 5 of the GPT-5.4 parity
completion gate in #64227 ('the parity gate shows GPT-5.4 matches or beats
Opus 4.6 on the agreed metrics').

Background: the parity gate needs two comparable scenario runs - one
against openai/gpt-5.4 and one against anthropic/claude-opus-4-6 - so the
aggregate metrics and verdict in PR D (#64441) can be computed. Today the
qa-lab mock server only implements /v1/responses, so the baseline run
against Claude Opus 4.6 requires a real Anthropic API key. That makes the
gate impossible to prove end-to-end from a local worktree and means the
CI story is always 'two real providers + quota + keys'.

This PR adds a /v1/messages Anthropic-compatible route to the existing
mock OpenAI server. The route is a thin adapter that:

- Parses Anthropic Messages API request shapes (system as string or
  [{type:text,text}], messages with string or block content, text and
  tool_result and tool_use and image blocks)
- Translates them into the ResponsesInputItem[] shape the existing shared
  scenario dispatcher (buildResponsesPayload) already understands
- Calls the shared dispatcher so both the OpenAI and Anthropic lanes run
  through the exact same scenario prompt-matching logic (same subagent
  fanout state machine, same extractRememberedFact helper, same
  '/debug/requests' telemetry)
- Converts the resulting OpenAI-format events back into an Anthropic
  message response with text and tool_use content blocks and a correct
  stop_reason (tool_use vs end_turn)

Non-streaming only: the QA suite runner falls back to non-streaming mock
mode so real Anthropic SSE isn't necessary for the parity baseline.

Also adds claude-opus-4-6 and claude-sonnet-4-6 to /v1/models so baseline
model-list probes from the suite runner resolve without extra config.

Tests added:

- advertises Anthropic claude-opus-4-6 baseline model on /v1/models
- dispatches an Anthropic /v1/messages read tool call for source discovery
  prompts (tool_use stop_reason, correct input path, /debug/requests
  records plannedToolName=read)
- dispatches Anthropic /v1/messages tool_result follow-ups through the
  shared scenario logic (subagent-handoff two-stage flow: tool_use -
  tool_result - 'Delegated task / Evidence' prose summary)

Local validation:

- pnpm test extensions/qa-lab/src/mock-openai-server.test.ts (18/18 pass)
- pnpm test extensions/qa-lab/src/mock-openai-server.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/scenario-catalog.test.ts (47/47 pass)

Refs #64227
Unblocks #64441 (parity harness) and the forthcoming qa parity run wrapper
by giving the baseline lane a local-only mock path.

* qa-lab: fix Anthropic tool_result ordering in messages adapter

Addresses the loop-6 Copilot / Greptile finding on PR #64685: in
`convertAnthropicMessagesToResponsesInput`, `tool_result` blocks were
pushed to `items` inside the per-block loop while the surrounding
user/assistant message was only pushed after the loop finished. That
reordered the function_call_output BEFORE its parent user message
whenever a user turn mixed `tool_result` with fresh text/image blocks,
which broke `extractToolOutput` (it scans AFTER the last user-role
index; function_call_output placed BEFORE that index is invisible to it)
and made the downstream scenario dispatcher behave as if no tool output
had been returned on mixed-content turns.

Fix: buffer `tool_result` and `tool_use` blocks in local arrays during
the per-block loop, push the parent role message first (when it has any
text/image pieces), then push the accumulated function_call /
function_call_output items in original order. tool_result-only user
turns still omit the parent message as before, so the non-mixed
subagent-fanout-synthesis two-stage flow that already worked keeps
working.

Regression added:

- `places tool_result after the parent user message even in mixed-content
  turns` — sends a user turn that mixes a `tool_result` block with a
  trailing fresh text block, then inspects `/debug/last-request` to
  assert that `toolOutput === 'SUBAGENT-OK'` (extractToolOutput found
  the function_call_output AFTER the last user index) and
  `prompt === 'Keep going with the fanout.'` (extractLastUserText picked
  up the trailing fresh text).

Local validation: pnpm test extensions/qa-lab/src/mock-openai-server.test.ts
(19/19 pass).

Refs #64227

* qa-lab: reject Anthropic streaming and empty model in messages mock

* qa-lab: tag mock request snapshots with a provider variant so parity runs can diff per provider

* Handle invalid Anthropic mock JSON

* fix: wire mock parity providers by model ref

* fix(qa): support Anthropic message streaming in mock parity lane

* qa-lab: record provider/model/mode in qa-suite-summary.json

Closes the 'summary cannot be label-verified' half of criterion 5 on the
GPT-5.4 parity completion gate in #64227.

Background: the parity gate in #64441 compares two qa-suite-summary.json
files and trusts whatever candidateLabel / baselineLabel the caller
passes. Today the summary JSON only contains { scenarios, counts }, so
nothing in the summary records which provider/model the run actually
used. If a maintainer swaps candidate and baseline summary paths in a
parity-report call, the verdict is silently mislabeled and nobody can
retroactively verify which run produced which summary.

Changes:

- Add a 'run' block to qa-suite-summary.json with startedAt, finishedAt,
  providerMode, primaryModel (+ provider and model splits),
  alternateModel (+ provider and model splits), fastMode, concurrency,
  scenarioIds (when explicitly filtered).
- Extract a pure 'buildQaSuiteSummaryJson(params)' helper so the summary
  JSON shape is unit-testable and the parity gate (and any future parity
  wrapper) can import the exact same type rather than reverse-engineering
  the JSON shape at runtime.
- Thread 'scenarioIds' from 'runQaSuite' into writeQaSuiteArtifacts so
  --scenario-ids flags are recorded in the summary.

Unit tests added (src/suite.summary-json.test.ts, 5 cases):

- records provider/model/mode so parity gates can verify labels
- includes scenarioIds in run metadata when provided
- records an Anthropic baseline lane cleanly for parity runs
- leaves split fields null when a model ref is malformed
- keeps scenarios and counts alongside the run metadata

This is additive: existing callers of qa-suite-summary.json continue to
see the same { scenarios, counts } shape, just with an extra run field.
No existing consumers of the JSON need to change.

The follow-up 'qa parity run' CLI wrapper (run the parity pack twice
against candidate + baseline, emit two labeled summaries in one command)
stacks cleanly on top of this change and will land as a separate PR
once #64441 and #64662 merge so the wrapper can call runQaParityReportCommand
directly.

Local validation:

- pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (5/5 pass)
- pnpm test extensions/qa-lab/src/suite.summary-json.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/scenario-catalog.test.ts (34/34 pass)

Refs #64227
Unblocks the final parity run for #64441 / #64662 by making summaries
self-describing.

* qa-lab: strengthen qa-suite-summary builder types and empty-array semantics

Addresses 4 loop-6 Copilot / codex-connector findings on PR #64689
(re-opened as #64789):

1. P2 codex + Copilot: empty `scenarioIds` array was serialized as
   `[]` because of a truthiness check. The CLI passes an empty array
   when --scenario is omitted, so full-suite runs would incorrectly
   record an explicit empty selection. Fix: switch to a
   `length > 0` check so '[] or undefined' both encode as `null`
   in the summary run metadata.

2. Copilot: `buildQaSuiteSummaryJson` was exported for parity-gate
   consumers but its return type was `Record<string, unknown>`, which
   defeated the point of exporting it. Fix: introduce a concrete
   `QaSuiteSummaryJson` type that matches the JSON shape 1-for-1 and
   make the builder return it. Downstream code (parity gate, parity
   run wrapper) can now import the type and keep consumers
   type-checked.

3. Copilot: `QaSuiteSummaryJsonParams.providerMode` re-declared the
   `'mock-openai' | 'live-frontier'` string union even though
   `QaProviderMode` is already imported from model-selection.ts. Fix:
   reuse `QaProviderMode` so provider-mode additions flow through
   both types at once.

4. Copilot: test fixtures omitted `steps` from the fake scenario
   results, creating shape drift with the real suite scenario-result
   shape. Fix: pad the test fixtures with `steps: []` and tighten the
   scenarioIds assertion to read `json.run.scenarioIds` directly (the
   new concrete return type makes the type-cast unnecessary).

New regression: `treats an empty scenarioIds array as unspecified
(no filter)` — passes `scenarioIds: []` and asserts the summary
records `scenarioIds: null`.

Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts
(6/6 pass).

Refs #64227

* qa-lab: record executed scenarioIds in summary run metadata

Addresses the pass-3 codex-connector P2 on #64789 (repl of #64689):
`run.scenarioIds` was copied from the raw `params.scenarioIds`
caller input, but `runQaSuite` normalizes that input through
`selectQaSuiteScenarios` which dedupes via `Set` and reorders the
selection to catalog order. When callers repeat --scenario ids or
pass them in non-catalog order, the summary metadata drifted from
the scenarios actually executed, which can make parity/report
tooling treat equivalent runs as different or trust inaccurate
provenance.

Fix: both writeQaSuiteArtifacts call sites in runQaSuite now pass
`selectedCatalogScenarios.map(scenario => scenario.id)` instead of
`params?.scenarioIds`, so the summary records the post-selection
executed list. This also covers the full-suite case automatically
(the executed list is the full lane-filtered catalog), giving parity
consumers a stable record of exactly which scenarios landed in the
run regardless of how the caller phrased the request.

buildQaSuiteSummaryJson's `length > 0 ? [...] : null` pass-2
semantics are preserved so the public helper still treats an empty
array as 'unspecified' for any future caller that legitimately passes
one.

Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts
(6/6 pass).

Refs #64227

* qa-lab: preserve null scenarioIds for unfiltered suite runs

Addresses the pass-4 codex-connector P2 on #64789: the pass-3 fix
always passed `selectedCatalogScenarios.map(...)` to
writeQaSuiteArtifacts, which made unfiltered full-suite runs
indistinguishable from an explicit all-scenarios selection in the
summary metadata. The 'unfiltered → null' semantic (documented in
the buildQaSuiteSummaryJson JSDoc and exercised by the
"treats an empty scenarioIds array as unspecified" regression) was
lost.

Fix: both writeQaSuiteArtifacts call sites now condition on the
caller's original `params.scenarioIds`. When the caller passed an
explicit non-empty filter, record the post-selection executed list
(pass-3 behavior, preserving Set-dedupe + catalog-order
normalization). When the caller passed undefined or an empty array,
pass undefined to writeQaSuiteArtifacts so buildQaSuiteSummaryJson's
length-check serializes null (pass-2 behavior, preserving unfiltered
semantics).

This keeps both codex-connector findings satisfied simultaneously:
- explicit --scenario filter reorders/dedupes through the executed
  list, not the raw caller input
- unfiltered full-suite run records null, not a full catalog dump
  that would shadow "explicit all-scenarios" selections

Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts
(6/6 pass).

Refs #64227

* qa-lab: reuse QaProviderMode in writeQaSuiteArtifacts param type

* qa-lab: stage mock auth profiles so the parity gate runs without real credentials

* fix(qa): clean up mock auth staging follow-ups

* ci: add parity-gate workflow that runs the GPT-5.4 vs Opus 4.6 gate end-to-end against the qa-lab mock

* ci: use supported parity gate runner label

* ci: watch gateway changes in parity gate

* docs: pin parity runbook alternate models

* fix(ci): watch qa-channel parity inputs

* qa: roll up parity proof closeout

* qa: harden mock parity review fixes

* qa-lab: fix review findings — comment wording, placeholder key, exported type, ordering assertion, remove false-positive positive-tone detection

* qa: fix memory-recall scenario count, update criterion 2 comment, cache fetchJson in model-switch

* qa-lab: clean up positive-tone comment + fix stale test expectations

* qa: pin workflow Node version to 22.14.0 + fix stale label-match wording

* qa-lab: refresh mock provider routing expectation

* docs: drop stale parity rollup rewrite from proof slice

* qa: run parity gate against mock lane

* deps: sync qa-lab lockfile

* build: refresh a2ui bundle hash

* ci: widen parity gate triggers

---------

Co-authored-by: Eva <eva@100yen.org>
2026-04-13 13:01:54 +09:00
Josh Avant
3d07dfbb65 feat(qa-lab): add Convex credential broker and admin CLI (#65596)
* QA Lab: add Convex credential source for Telegram lane

* QA Lab: scaffold Convex credential broker

* QA Lab: add Convex credential admin CLI

* QA Lab: harden Convex credential security paths

* QA Broker: validate Telegram payloads on admin add

* fix: note QA Convex credential broker in changelog (#65596) (thanks @joshavant)
2026-04-12 22:03:42 -05:00
Peter Steinberger
5da237c887 fix(ci): refresh qa-lab lockfile 2026-04-12 19:45:46 -07:00
Peter Steinberger
20266c14cb feat(qa-lab): add control ui qa-channel roundtrip scenario 2026-04-12 19:41:06 -07:00
Peter Steinberger
f682413f57 feat(qa-channel): forward inbound media attachments 2026-04-12 19:41:06 -07:00
Peter Steinberger
1a47660518 feat(browser): add qa web runtime support 2026-04-12 19:41:06 -07:00
pashpashpash
c848ebc8ce agents: split GPT-5 prompt and retry behavior (#65597)
* agents: split GPT-5 prompt and retry behavior

* agents: fix GPT-5 review follow-ups

* agents: address GPT-5 review follow-ups

* agents: avoid replaying side-effectful GPT retries

* agents: mark subagent control as mutating

* agents: fail closed on single-action retries

* commands: stabilize channel legacy doctor migration test

* agents: narrow single-action retry promise trigger
2026-04-12 18:52:22 -07:00
pashpashpash
de1b6abf94 test(memory-core): freeze dreaming session-ingest clocks (#65605) 2026-04-12 17:24:34 -07:00
Peter Steinberger
9dbbee8a02 fix(test): align trace directive type stubs 2026-04-13 00:20:52 +01:00
Peter Steinberger
cfd5f9e4e3 test(e2e): repair OpenShell prerelease smoke 2026-04-13 00:20:51 +01:00
Marcus Castro
9af8288c05 fix(whatsapp): send group reactions with target participant (#65512) 2026-04-12 20:00:19 -03:00
Marcus Castro
403783a3b1 fix(tts): correct tagged TTS syntax guidance (#65573) 2026-04-12 19:41:13 -03:00
pashpashpash
f5447aab88 OpenAI: strengthen heartbeat overlay guidance (#65148) 2026-04-13 06:47:40 +09:00
pashpashpash
383c854313 CI: fix mainline regression blockers (#65269)
* MSTeams: align logger test expectations

* Gateway: fix CI follow-up regressions

* Config: refresh generated schema baseline

* VoiceCall: type webhook test doubles

* CI: retrigger blocker workflow

* CI: retrigger retry workflow

* Agents: fix current mainline agentic regressions

* Agents: type auth controller test mock

* CI: retrigger blocker validation

* Agents: repair OpenAI replay pairing order
2026-04-13 06:18:37 +09:00
Peter Steinberger
1ea332a658 fix: repair CI type checks 2026-04-12 12:04:59 -07:00
Peter Steinberger
fcee268373 feat(qa-lab): support scenario-defined plugin runs 2026-04-12 11:59:50 -07:00
Vincent Koc
ea71a59127 fix(imessage): repair monitor retry type checks 2026-04-12 19:57:37 +01:00
Peter Steinberger
e4841d767d test: stabilize loaded full-suite checks 2026-04-12 11:52:56 -07:00
Peter Steinberger
d35cc6ef86 fix(discord): declare gateway heartbeat timeout state 2026-04-12 11:52:56 -07:00
Peter Steinberger
cb5a25d8d8 fix(discord): normalize legacy streaming aliases 2026-04-12 11:52:56 -07:00
Peter Steinberger
fa87c6334a fix(imessage): align monitor retry types 2026-04-12 11:52:33 -07:00
saram ali
acdf2b1c8a fix(memory-core): match daily notes stored in memory/ subdirectories (#64682)
* fix(memory-core): match daily notes in memory/ subdirectories in isShortTermMemoryPath

* fix(memory-core): exclude dream reports from short-term recall

* fix(memory-core): widen short-term recall path matching

* docs(changelog): note short-term recall fix

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 19:40:59 +01:00
Vincent Koc
35a784c165 fix(imessage): retry watch.subscribe startup failures (#65482)
* fix(imessage): retry watch.subscribe startup failures

* fix(imessage): sanitize watch error logging
2026-04-12 19:40:19 +01:00
Peter Steinberger
c8347e70da fix: align trace directive types 2026-04-12 11:30:44 -07:00
Peter Steinberger
e76c2812b7 style: apply oxfmt 2026-04-12 11:28:43 -07:00
Marcus Castro
aa023e4283 refactor(whatsapp): centralize account connection lifecycle (#65427)
* refactor(whatsapp): centralize account connection lifecycle

* fix(whatsapp): harden controller open failure cleanup

* refactor(whatsapp): remove active listener fallback path

* fix(whatsapp): isolate controller registry state

* debug(whatsapp): trace typing presence updates

* docs(changelog): add whatsapp lifecycle fix note

* debug(whatsapp): log global presence mode

* chore(whatsapp): remove debug presence logs

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 15:24:49 -03:00
Tak Hoffman
c37e49f275 Add /trace toggle and fix Active Memory diagnostics 2026-04-12 13:20:22 -05:00
Peter Steinberger
910a0e40d2 chore: update dependencies 2026-04-12 19:19:06 +01:00
Vincent Koc
d4fb7d893d fix(ci): repair main tsgo regressions 2026-04-12 19:14:00 +01:00
Peter Steinberger
c4412c6b0c fix: compact discord allowlist resolution logs 2026-04-12 19:08:59 +01:00
Marcus Castro
000fc7f233 refactor(qa): add shared QA channel contract and harden worker startup (#64562)
* refactor(qa): add shared transport contract and suite migration

* refactor(qa): harden worker gateway startup

* fix(qa): scope waits and sanitize shutdown artifacts

* fix(qa): confine artifacts and redact preserved logs

* fix(qa): block symlink escapes in artifact paths

* fix(gateway): clear shutdown race timers

* fix(qa): harden shutdown cleanup paths

* fix(qa): sanitize gateway logs in thrown errors

* fix(qa): harden suite startup and artifact paths

* fix(qa): stage bundled plugins from mutated config

* fix(qa): broaden gateway log bearer redaction

* fix(qa-channel): restore runtime export

* fix(qa): stop failed gateway startups as a process tree

* fix(qa-channel): load runtime hook from api surface
2026-04-12 15:02:57 -03:00
jasonxargs-boop
2204753b62 fix(memory-core): fix macOS chokidar glob issue by watching memory dir directly (#64711)
* fix(memory-core): fix macOS chokidar glob issue by watching memory dir directly

* fix(memory-core): ignore non-markdown memory watch churn

* fix(memory-core): allow multimodal watch events

* test(memory-core): type watcher ignore callback

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 18:53:20 +01:00
Peter Steinberger
15b86ac6d0 fix: narrow qmd defaults and clawblocker memory 2026-04-12 18:52:06 +01:00
saram ali
7995e408ce fix(discord): clear stale heartbeat timers in SafeGatewayPlugin.connect() (#65087)
* fix(discord): clear stale heartbeat timers in SafeGatewayPlugin.connect()

The @buape/carbon@0.15.0 heartbeat setup has a race where stopHeartbeat()
runs before heartbeatInterval is assigned, leaving a stale setInterval with
a closed reconnectCallback. When the stale interval fires ~41s later it
throws an uncaught exception that bypasses the EventEmitter error path and
crashes the gateway process via process.on('uncaughtException').

Add a connect() override in SafeGatewayPlugin that unconditionally clears
both heartbeatInterval and firstHeartbeatTimeout before calling super. The
parent's connect() only calls stopHeartbeat() when isConnecting=false; when
isConnecting=true it returns early without clearing — this override fills
that gap.

Fixes #65009. Related: #64011, #63387, #62038.

* test(discord): assert super.connect() delegation in SafeGatewayPlugin tests

* fix(ci): update raw-fetch allowlist line numbers for gateway-plugin.ts

The connect() override added in the heartbeat fix shifted the two
pre-existing fetch() callsites from lines 370/436 to 387/453.

* docs(changelog): add discord heartbeat crash note

* test(cli): align plugin registry load-context mock

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 18:40:04 +01:00
Peter Steinberger
a8e140e395 chore: bump version to 2026.4.12 2026-04-12 10:37:18 -07:00
Anonymous Amit
42590106ab improve memory fallback lexical ranking (#65395)
* improve memory fallback lexical ranking

* use neutral lexical fallback fixtures

* fix(memory-core): keep lexical boosts out of hybrid search

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 18:36:28 +01:00
Vincent Koc
8a4a63ca07 fix(memory-core): use all dreaming signals for light confidence 2026-04-12 18:30:35 +01:00
Vincent Koc
077cfca229 fix(memory-core): unblock dreaming-only promotion 2026-04-12 18:14:06 +01:00
zhouhe-xydt
879bb5dd91 fix(memory-wiki): support Unicode characters in slugifyWikiSegment (#64742)
* fix(memory-wiki): support Unicode characters in slugifyWikiSegment

Replace ASCII-only regex with Unicode-aware regex to preserve CJK,
Cyrillic, Arabic, and other non-ASCII characters in wiki slugs.

Fixes #64620

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(memory-wiki): cover Unicode slug regressions

* fix(memory-wiki): preserve combining marks in slugs

* fix(memory-wiki): cap composed source filenames

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:54:41 +01:00
MrBrain
346e38e275 fix(memory-core): isolate dreaming narrative sessions per workspace (#61674)
* fix(memory-core): isolate dreaming narrative sessions per workspace

* chore(changelog): add narrative isolation note

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:39:28 +01:00
Sergiusz
079eb18bf7 fix: harden dreaming narrative session cleanup (#65320)
* fix: harden dreaming narrative session cleanup

* fix(memory-core): harden narrative cleanup

* fix(memory-core): preserve fallback narrative sessions

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:33:47 +01:00
Pengfei Ni
aff8a0c0e7 fix(config): resolve CLI command aliases against parent plugin in plugins.allow (#64748) (#64779)
* fix(config): resolve CLI command aliases against parent plugin in plugins.allow (#64748)

The CLI allow guard checked command names (e.g. 'wiki') directly against
plugins.allow, missing the parent plugin ('memory-wiki'). Additionally,
memory-wiki did not declare 'wiki' as a commandAlias, so doctor --fix
would remove it as stale.

- Add commandAliases entry for 'wiki' in memory-wiki plugin manifest
- Check parent plugin ID in the CLI fallback allow guard
- Add tests for both allow and deny cases

* fix(cli): inject manifest registry for alias diagnostics

* Update CHANGELOG.md

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:32:11 +01:00
Leonard Sellem
c545e4605e fix(memory-wiki): pass app config into CLI metadata registrar (#65012)
* fix(memory-wiki): pass config into cli metadata registrar

* fix(memory-wiki): use cli context config for metadata registrar

* docs(changelog): note memory-wiki cli metadata fix

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-04-12 17:30:54 +01:00
Vincent Koc
43cb94a39a fix(doctor): preserve discord streaming downgrade compatibility 2026-04-12 17:09:08 +01:00