openclaw

mirror of https://github.com/openclaw/openclaw.git synced 2026-06-24 00:48:12 +00:00

Author	SHA1	Message	Date
Peter Steinberger	86fb8278ad	build: refresh a2ui bundle hash	2026-05-02 01:55:51 +01:00
Peter Steinberger	25f832531c	build: refresh a2ui bundle hash	2026-05-01 12:53:57 +01:00
Peter Steinberger	c976cf6ebd	chore: refresh a2ui bundle hash	2026-04-30 05:22:04 +01:00
Shakker	07946a404d	chore: update a2ui bundle hash	2026-04-27 11:10:12 +01:00
Peter Steinberger	f1b1c3dc99	chore: update workspace dependencies	2026-04-25 22:48:44 +01:00
Peter Steinberger	56573185f2	perf: split canvas a2ui shared imports	2026-04-25 12:52:27 +01:00
Peter Steinberger	1b997bebd0	build(a2ui): refresh bundled canvas host asset	2026-04-24 16:55:08 +01:00
Peter Steinberger	bff9f10ea6	test: remove canvas reload sleep	2026-04-23 18:37:29 +01:00
Peter Steinberger	ba9589256c	build: refresh a2ui bundle hash	2026-04-22 15:07:23 +01:00
cxy	5e72e39c18	feat(qqbot): extract self-contained engine/ architecture with QR-code onboarding, approval handling (#67960 ) * feat(qqbot): add core architecture modules * feat(qqbot): extract engine modules with DI adapters * refactor(qqbot): remove plugin-level TTS, delegate to framework Remove qqbot's internal TTS implementation and unify voice synthesis through the framework's global TTS provider registry. - Delete engine/gateway/tts-config.ts (plugin-specific TTS config) - Simplify TTSProvider interface to textToSpeech + audioFileToSilkBase64 - Remove dual-strategy TTS in handleAudioPayload (plugin + global fallback) - Strip QQBotTtsSchema from config-schema, plugin.json, and tests - Remove TTS diagnostics logging and hasTTS system prompt from gateway - Delete ~260 lines of TTS code from utils/audio-convert.ts Made-with: Cursor * feat(qqbot): extract shared engine modules for config, tools, and audio Add engine-layer modules that are self-contained and portable across both the built-in and standalone qqbot packages: - engine/config: account resolution helpers, field readers - engine/tools: channel API proxy, remind scheduling logic - engine/utils: audio format conversion, duration/error formatting, debug logging Consolidate duplicate utility functions across the codebase: - Merge debug-log.ts into log.ts - Merge error-format.ts into format.ts with full .cause chain support - Unify normalizeLowercase/readNumber/readBoolean/readStringMap into string-normalize.ts, removing private copies in resolve.ts, remind-logic.ts, and audio-convert.ts - Remove dead formatDuration export from audio-convert.ts - Delete unused config/schema.ts and config/helpers.ts Made-with: Cursor * refactor(qqbot): streamline account configuration and credential management Refactor the QQBot account configuration logic by consolidating credential management into dedicated engine modules. Key changes include: - Migrate credential clearing and validation logic to engine/config/credentials.ts. - Simplify setup input validation and application in engine/config/setup-logic.ts. - Enhance account resolution and configuration application in engine/config/resolve.ts. - Update channel and messaging logic to utilize the new credential management functions. This refactor improves code maintainability and clarity by separating concerns and reducing duplication across the codebase. * feat(qqbot): simplify api architecture * feat: 支持扫码绑定QQ机器人 * feat(qqbot): refactor gateway into inbound pipeline + outbound dispatch - Extract handleMessage (620 lines) into three modules: - inbound-context.ts: InboundContext type definition - inbound-pipeline.ts: buildInboundContext() - outbound-dispatch.ts: dispatchOutbound() - gateway.ts handleMessage reduced to ~35 line shell - Unify parseRefIndices: support both ext prefix formats + MSG_TYPE_QUOTE - Add ref/format-message-ref.ts for cache-miss quote formatting - Remove [QQBot] to= from agentBody, use GroupSystemPrompt instead - QueuedMessage: add msgType/msgElements for quote messages * fix(qqbot): fix markdownSupport loss + dynamic User-Agent Root cause: setOpenClawVersion() called _ensureInitialized(true) which cleared _appRegistry, destroying the MessageApi instance created by initApiConfig() with markdownSupport=true. Subsequent block deliver calls created a default markdownSupport=false instance, causing: 1. Markdown messages sent as plain text (msg_type=0 instead of 2) 2. message_reference incorrectly added (only suppressed in MD mode) Fix: ApiClient and TokenManager now accept userAgent as string \| (() => string). sender.ts passes the buildUserAgent function reference, so UA changes propagate automatically on next request without rebuilding any objects. - ApiClient: userAgent -> resolveUserAgent getter, called per-request - TokenManager: same pattern - types.ts: ApiClientConfig.userAgent supports string \| (() => string) - sender.ts: remove force re-init + _rebuildAppRegistry hack - initSender/setOpenClawVersion only update version variables - _ensureInitialized creates singletons once, never destroys them - _appRegistry is never cleared -> markdownSupport always preserved - runtime.ts: inject framework version via setOpenClawVersion(runtime.version) - gateway.ts: pass openclawVersion to initSender + registerPluginVersion - slash-commands-impl.ts: remove fragile require("../package.json") * feat(qqbot): implement native approval handling and configuration Add a new approval handling system for QQBot that integrates with the existing framework. Key features include: - Introduce `approval-handler.runtime.ts` for managing approval requests via QQ messages with inline keyboard support. - Create `approval-native.ts` as the entry point for QQBot's approval capability, allowing for simplified approval processes without explicit approver lists. - Implement configuration schema for exec approvals, enabling fine-grained control over who can approve requests. - Enhance messaging and interaction handling to support approval decisions through button interactions. This implementation streamlines the approval process, making it more user-friendly and efficient for QQBot users. * refactor(qqbot): enhance error handling across API and messaging modules This update introduces a centralized error formatting utility, `formatErrorMessage`, to improve consistency in error logging throughout the QQBot codebase. Key changes include: - Integration of `formatErrorMessage` in various API client, messaging, and gateway modules to standardize error messages. - Replacement of direct error message handling with the new utility to enhance readability and maintainability. These improvements streamline error reporting and provide clearer insights into issues encountered during operation. * refactor(qqbot): enhance API and messaging structure with type improvements This update refines the API and messaging modules by introducing type enhancements and restructuring function signatures for better clarity and maintainability. Key changes include: - Updated import statements to streamline type usage in and . - Refactored message sending functions to accept options objects, improving readability and flexibility. - Introduced a new method in to facilitate external message-sent notifications. - Enhanced error handling in the retry mechanism to ensure more robust behavior. These modifications aim to improve the overall code quality and developer experience within the QQBot framework. * feat: 优化文案 * refactor(qqbot): unify Logger interfaces + eliminate P0 code smells Logger unification (17 files): - Introduce single EngineLogger interface in engine/types.ts { info, error, warn?, debug? } - Delete 5 fragmented Logger interfaces: GatewayLogger, ReconnectLogger, MessageRefLogger, PathLogger, SenderLogger - Replace all references across engine/ to use EngineLogger directly P0 code smell fixes (sender.ts + messages.ts + outbound-dispatch.ts): - messages.ts: add public notifyMessageSent() method on MessageApi, replacing 8x 'as unknown as { messageSentHook }' private field hack - sender.ts: extract notifyMediaHook() helper, deduplicate 4 media send functions (sendImage/sendVoice/sendVideo/sendFile) - sender.ts: replace magic numbers 1/2/3/4 with MediaFileType enum - sender.ts: remove 4 redundant 'as MessageResponse' type assertions - outbound-dispatch.ts: remove 5 unnecessary 'as never' casts * feat(qqbot): add /bot-clear-storage command + consolidate utils/types into engine/ /bot-clear-storage (slash-commands-impl.ts): - Migrate from standalone version, aligned with its two-step flow: 1. No args: scan ~/.openclaw/media/qqbot/downloads/{appId}/ and display file list with confirmation button 2. --force: delete files + removeEmptyDirs cleanup - C2C only (group chat returns hint) - bot-help: exclude bot-upgrade and bot-clear-storage in group listings Consolidate into engine/: - Delete src/utils/audio-convert.ts (pure re-export shell, zero consumers) - Move 5 test files from src/utils/ to src/engine/utils/ (fix import paths) - Move src/types/silk-wasm.d.ts to src/engine/types/ - Remove empty src/utils/ and src/types/ directories * refactor(qqbot): restructure API and bridge components for improved modularity This update enhances the QQBot framework by reorganizing the API and bridge components, promoting better modularity and maintainability. Key changes include: - Refactored import paths to streamline access to bridge tools and configurations. - Introduced new bridge files for channel entry, runtime, and approval capabilities, centralizing related functionalities. - Updated existing functions to utilize the new bridge structure, ensuring consistency across the codebase. - Removed deprecated functions and types, simplifying the overall architecture. These modifications aim to improve code clarity and facilitate future development within the QQBot ecosystem. * refactor(qqbot): standardize engine log levels and unify log tag prefix - Rename client.ts to api-client.ts to match ApiClient class name - Downgrade ~60 non-critical info logs to debug level across 12 files (token request/response, HTTP request/response, session restore, media tag detection, image classification, quote detection, attachment download/transcode, retry attempts, etc.) - Unify log tag prefix to [qqbot:xxx] format across all engine modules ([core-api] -> [qqbot:api], [token:x] -> [qqbot:token:x], [retry] -> [qqbot:retry], [messages] -> [qqbot:messages], [sender:x] -> [qqbot:x]) - Remove unnecessary reqTs timestamp from api-client.ts log output - Add dispatch event debug log in gateway-connection.ts - Merge sendProactiveMessage into sendText, remove dead code (sendProactiveText import, getRefIdx, QQMessageResult type) - Narrow allow-from.ts type from unknown[] to Array<string \| number> * refactor(qqbot): move interaction handler from bridge to engine - Move onInteraction approval handler into engine/gateway.ts as createApprovalInteractionHandler(), eliminating the callback indirection through CoreGatewayContext - Remove onInteraction from CoreGatewayContext interface and its unused InteractionEvent import from gateway/types.ts - Remove getPlatformAdapter, parseApprovalButtonData and InteractionEvent imports from bridge/gateway.ts * refactor(qqbot): route bridge and sender logs through framework logger - Add bridge/logger.ts as a shared logger holder for bridge-layer modules, injected with ctx.log during gateway startup - Replace all console.log/console.error in bridge/ with getBridgeLogger() calls (approval, bootstrap, tools) - Restore framework logger support in sender.ts via initSender() so API-layer logs flow through OpenClaw log system - Remove all direct debugLog/debugError imports from bridge/ * feat(qqbot): per-account isolated resource stack + multi-account logger - sender.ts: global singletons (ApiClient/TokenManager/MediaApi) -> per-account AccountContext - Add _accountRegistry: Map<appId, AccountContext> - Each account owns independent client/tokenMgr/mediaApi/messageApi/logger - registerAccount() atomically sets up all resources - resolveAccount() routes to correct resource stack by appId - Remove _sharedLogger/_loggerRegistry/_appRegistry and old structures - bridge/gateway.ts: createAccountLogger() with auto [accountId] prefix - registerAccount() merges logger + markdownSupport + full API resources - engine-wide: remove ~60 manual [qqbot:${accountId}] log prefixes - Prefixes now auto-injected by per-account logger - Remove prefix/logPrefix parameter chains (outbound/outbound-deliver/typing-keepalive etc) * feat(qqbot): completes fallback path for approval with multi-account isolation When the execApprovals are not configured, multiple QQBot accounts' handlers will attempt to deliver the same approval message. The openid is account-level, and cross-account delivery will trigger a QQ Bot API 500 error. - Add account ownership verification in the fallback shouldHandle: Only match the account's handler when the request includes turnSourceAccountId; if unbound, delivery is only permitted when the number of enabled+secret accounts is ≤1. - Consolidate account ownership determination into the unified export `matchesQQBotApprovalAccount` in `exec-approvals.ts`, with both capability and native runtime paths sharing the same logic to eliminate redundancy. * feat(qqbot): optimize permission validation strategy * feat(qqbot): show plugin version in /bot-version and /bot-help Align /bot-version output with the standalone openclaw-qqbot build so users see both the QQBot plugin version and the OpenClaw framework version. Append the plugin version as a footer in /bot-help as well, matching the standalone UX. Also fix the plugin version lookup that previously rendered as 'vunknown': the old code used a hardcoded '../../package.json' relative path which resolved to 'src/package.json' (non-existent) when executed from raw sources, so the require threw and the default 'unknown' value was retained. The same broken value also leaked into the QQ Bot API User-Agent header. Replace the hardcoded path with a dedicated helper (bridge/plugin-version.ts) that walks up the directory tree from import.meta.url and validates the manifest's name field (@openclaw/qqbot) to avoid misreading the monorepo root package.json. Covered by 6 unit tests. * feat(qqbot): trust shared ~/.openclaw/media root for payload files Add getOpenClawMediaDir() and include it alongside getQQBotMediaDir() in the allowed roots of resolveQQBotPayloadLocalFilePath, so framework-produced attachments under sibling directories (e.g. media/outbound/ written by saveMediaBuffer) are trusted by auto-routed sends without triggering the path-outside-storage guard. Covered by a new test case that verifies files under ~/.openclaw/media/outbound/ resolve successfully. * fix(qqbot): ensure PlatformAdapter is registered before approval delivery After the framework centralized approval handler bootstrap (#62135), the native approval handler is spawned by the framework layer outside the qqbot gateway startAccount context. This means channel.ts's side-effect `import "./bridge/bootstrap.js"` may not have run, leaving PlatformAdapter unregistered when deliverPending calls resolveQQBotAccount -> getPlatformAdapter(). Extract ensurePlatformAdapter() from bootstrap.ts as an idempotent, re-entrant helper and call it in both capability.ts (load callback) and handler-runtime.ts (deliverPending entry) to guarantee the adapter is available regardless of initialization order. * fix(qqbot): add lazy factory for PlatformAdapter to eliminate import-order dependency The bundler splits qqbot code into multiple chunks where the adapter singleton and its consumers may live in different modules. When a consumer chunk evaluates before the bootstrap side-effect chunk, getPlatformAdapter() throws because the singleton is still null. Introduce registerPlatformAdapterFactory() in adapter/index.ts so getPlatformAdapter() can auto-initialize the adapter on first access. bootstrap.ts registers the factory at module evaluation time alongside the existing eager registration path. Also add error logging in downloadFile's catch block to surface fetch failures. * feat(qqbot): add /bot-approve slash command for exec approval config management Add /bot-approve command to the built-in QQBot plugin, ported from the standalone openclaw-qqbot implementation. This command allows users to manage tools.exec.security and tools.exec.ask settings directly from QQ. Supported sub-commands: /bot-approve on - allowlist + on-miss (recommended) /bot-approve off - full + off (no approval) /bot-approve always - allowlist + always (strict mode) /bot-approve reset - remove overrides, restore framework defaults /bot-approve status - show current security/ask values The runtime config API is injected via registerApproveRuntimeGetter() following the existing dependency injection pattern used by registerVersionResolver() and registerPluginVersion(). * fix(qqbot): ACK INTERACTION_CREATE events before processing approval buttons Send PUT /interactions/{id} immediately upon receiving any INTERACTION_CREATE event to prevent QQ from showing a timeout error to the user. The ACK is fire-and-forget and does not block subsequent approval button resolution. Also resolve merge conflict in pnpm-lock.yaml (keep @tencent-connect/qqbot-connector@1.1.0 and newer @thi.ng/bitstream@2.4.46). * feat(qqbot): enhance reminder functionality with delivery context and credential backup This update improves the QQBot reminder system by introducing a delivery context for reminders, allowing for more flexible target resolution. Key changes include: - Updated reminder logic to utilize a delivery envelope, ensuring that reminders are sent with the correct context. - Implemented credential backup and recovery mechanisms to prevent loss of appId and clientSecret during hot upgrades. - Added tests for credential backup functionality and admin resolver to ensure reliability. - Enhanced the remind tool to automatically resolve the target from the current conversation context when not explicitly provided. These enhancements aim to improve the user experience and reliability of the reminder feature within the QQBot framework. * fix(qqbot): ensure PlatformAdapter is registered before gateway message processing Call ensurePlatformAdapter() at the start of bridge/gateway.ts's startGateway() to guarantee the adapter is available when engine code (e.g. downloadFile in file-utils.ts) calls getPlatformAdapter(). When the bundler splits code into separate chunks, bootstrap.ts's module-level side-effect registration may not have executed yet by the time the gateway processes its first inbound attachment download. Also fix the TS2339 error in registerApproveRuntimeGetter by using getQQBotRuntime() (full PluginRuntime with config) instead of getQQBotRuntimeForEngine() (GatewayPluginRuntime subset without config). * fix(qqbot): make isAudioFile safe when OutboundAudioAdapter is not registered sendMedia() calls isAudioFile() as part of its media-type dispatch logic before any actual audio processing. When the audio adapter is not yet registered (e.g. framework tool calls sendMedia before gateway startup), isAudioFile() would throw 'OutboundAudioAdapter not registered' even for non-audio files like images. Wrap the getAudio() call in isAudioFile() with try/catch to return false when the adapter is unavailable, allowing non-audio media sends to proceed normally. * refactor(qqbot): remove plugin startup/upgrade greeting pipeline Drop the startup / upgrade greeting feature that was folded into the previous reminder + credential-backup commit. The pipeline has proven unnecessary for the fused build and its supporting admin-resolver scaffolding has no other consumers, so both are removed wholesale. - Delete engine/session/startup-greeting.ts and its tests: the first-launch "soul online" / "updated to vX.Y.Z" messages, the per-(accountId, appId) startup marker, the failure cooldown, and the legacy startup-marker.json migration path are all gone. - Delete engine/session/admin-resolver.ts and its tests: admin openid persistence/resolution, upgrade-greeting-target load/clear and the sendStartupGreetings dispatcher only ever served the greeting flow and were not referenced elsewhere. - channel.ts: drop the sendStartupGreetings import and the READY / RESUMED hooks that triggered greetings; credential-backup snapshots stay untouched. - engine/utils/data-paths.ts: remove getAdminMarkerFile / getLegacyAdminMarkerFile / getUpgradeGreetingTargetFile / getStartupMarkerFile / getLegacyStartupMarkerFile along with the now-stale module docblock sections. Credential-backup helpers and safeName are preserved. Net -655 LOC across 6 files. tsc --noEmit passes on extensions/qqbot/tsconfig.json and no references to the removed symbols remain in the workspace. * fix(qqbot): resolve test failures in extension batch, contracts and bundled runtime deps - bootstrap: replace sync require() with static imports for secret-input and temp-path so vitest resolve.alias works correctly (require bypasses vitest aliases causing Cannot find module errors) - format: handle null/undefined in formatErrorMessage before JSON.stringify since JSON.stringify(undefined) returns JS undefined, not a string - gateway/types: reword comment to avoid triggering the channel-import guardrail regex that forbids quoted openclaw/plugin-sdk references - package.json: mirror @tencent-connect/qqbot-connector ^1.1.0 in root dependencies as required by bundled plugin runtime dependency checks * chore: revert non-qqbot changes to align with upstream main Revert modifications to src/agents/system-prompt, src/auto-reply/reply/dispatch-from-config, and src/canvas-host/a2ui build artifacts that were inadvertently included in the qqbot feature branch. Also fix .gitignore Core/ pattern to match subdirectories. * fix(qqbot): remove unused logUnsupportedStructuredMediaTarget after API simplification * fix(qqbot): restore channel-plugin-api.ts for bundled plugin surface convention * fix(qqbot): update CI lint allowlists for restructured engine paths - Update raw fetch() allowlist in check-no-raw-channel-fetch.mjs to reflect engine/ directory restructure (src/api.ts → src/engine/api/api-client.ts, etc.) - Remove stale qqbot allowlist entry for deleted src/utils/audio-convert.ts * fix(qqbot): eliminate os.tmpdir() in engine layer via adapter injection - Make hasPlatformAdapter() also check for registered factory, so adapter is always discoverable once bootstrap has run - Remove os.tmpdir() fallbacks in platform.ts getHomeDir()/getTempDir(), delegate entirely to PlatformAdapter.getTempDir() which calls resolvePreferredOpenClawTmpDir() under the hood - Keeps engine/ layer free of openclaw/plugin-sdk imports * chore(qqbot): update CHANGELOG for engine architecture refactor (#67960) (thanks @cxyhhhhh) --------- Co-authored-by: Bobby <zkd8907@live.com> Co-authored-by: neilhwang <neilhwang@tencent.com> Co-authored-by: sliverp <870080352@qq.com>	2026-04-22 01:05:12 +08:00
Peter Steinberger	d7d1270ced	build: keep a2ui bundle stable	2026-04-21 04:11:01 +01:00
Peter Steinberger	6e58da9750	build: stabilize a2ui bundle inputs	2026-04-20 20:28:48 +01:00
Peter Steinberger	c4f628085d	build: refresh a2ui bundle	2026-04-20 19:38:59 +01:00
Peter Steinberger	38cfdad16b	test: share canvas host test helpers	2026-04-20 16:05:55 +01:00
Peter Steinberger	869950564f	build: update dependencies	2026-04-20 13:18:32 +01:00
Peter Steinberger	753183e081	build(deps): update workspace dependencies	2026-04-18 18:04:56 +01:00
Peter Steinberger	2745e5b3bd	test: narrow canvas and context hotspots	2026-04-17 19:42:59 +01:00
Gustavo Madeira Santana	ee0c8177bf	Fix canvas host header test type	2026-04-17 14:35:36 -04:00
Peter Steinberger	990bd81726	test: avoid canvas host socket setup	2026-04-17 19:29:42 +01:00
Peter Steinberger	12a59b0a18	test: trim hotspot wait overhead	2026-04-17 02:47:09 +01:00
Peter Steinberger	041266a669	chore: prepare 2026.4.15 release	2026-04-16 22:45:32 +01:00
Peter Steinberger	5ed9016914	fix: narrow a2ui bundle hash inputs	2026-04-15 00:46:40 +01:00
Peter Steinberger	956b04975d	build: refresh A2UI bundle hash	2026-04-15 00:42:05 +01:00
Peter Steinberger	f08b1cd972	build: refresh a2ui bundle hash	2026-04-14 15:19:36 +01:00
Peter Steinberger	64d237dd02	build: refresh a2ui bundle hash	2026-04-14 13:42:03 +01:00
Peter Steinberger	224cbd9ff6	chore(release): prepare 2026.4.14 beta	2026-04-14 03:06:46 +01:00
Peter Steinberger	b5fa2ed5cb	build: refresh a2ui bundle hash	2026-04-14 01:43:56 +01:00
Vincent Koc	955270fb73	fix(ci): repair telegram ui and watch regressions	2026-04-13 23:49:59 +01:00
Vincent Koc	21ca387eda	fix(ci): verify bundled plugin runtime deps	2026-04-13 11:09:13 +01:00
Peter Steinberger	abe33319d3	fix(release): allow matrix runtime pack size	2026-04-13 10:39:24 +01:00
Peter Steinberger	ee601ae993	fix(matrix): mirror staged runtime dependencies	2026-04-13 10:32:22 +01:00
Peter Steinberger	d63394247e	fix(build): refresh a2ui bundle hash	2026-04-13 10:28:03 +01:00
Peter Steinberger	72e56097ec	chore(release): prepare 2026.4.12	2026-04-13 09:49:01 +01:00
pashpashpash	b13844732e	qa: salvage GPT-5.4 parity proof slice (#65664 ) * test(qa): gate parity prose scenarios on real tool calls Closes criterion 2 of the GPT-5.4 parity completion gate in #64227 ('no fake progress / fake tool completion') for the two first/second-wave parity scenarios that can currently pass with a prose-only reply. Background: the scenario framework already exposes tool-call assertions via /debug/requests on the mock server (see approval-turn-tool-followthrough for the pattern). Most parity scenarios use this seam to require a specific plannedToolName, but source-docs-discovery-report and subagent-handoff only checked the assistant's prose text, which means a model could fabricate: - a Worked / Failed / Blocked / Follow-up report without ever calling the read tool on the docs / source files the prompt named - three labeled 'Delegated task', 'Result', 'Evidence' sections without ever calling sessions_spawn to delegate Both gaps are fake-progress loopholes for the parity gate. Changes: - source-docs-discovery-report: require at least one read tool call tied to the 'worked, failed, blocked' prompt in /debug/requests. Failure message dumps the observed plannedToolName list for debugging. - subagent-handoff: require at least one sessions_spawn tool call tied to the 'delegate' / 'subagent handoff' prompt in /debug/requests. Same debug-friendly failure message. Both assertions are gated behind !env.mock so they no-op in live-frontier mode where the real provider exposes plannedToolName through a different channel (or not at all). Not touched: memory-recall is also in the parity pack but its pass path is legitimately 'read the fact from prior-turn context'. That is a valid recall strategy, not fake progress, so it is out of scope for this PR. memory-recall's fake-progress story (no real memory_search call) would require bigger mock-server changes and belongs in a follow-up that extends the mock memory pipeline. Validation: - pnpm test extensions/qa-lab/src/scenario-catalog.test.ts Refs #64227 * test(qa): fix case-sensitive tool-call assertions and dedupe debug fetch Addresses loop-6 review feedback on PR #64681: 1. Copilot / Greptile / codex-connector all flagged that the discovery scenario's .includes('worked, failed, blocked') assertion is case-sensitive but the real prompt says 'Worked, Failed, Blocked...', so the mock-mode assertion never matches. Fix: lowercase-normalize allInputText before the contains check. 2. Greptile P2: the expr and message.expr each called fetchJson separately, incurring two round-trips to /debug/requests. Fix: hoist the fetch to a set step (discoveryDebugRequests / subagentDebugRequests) and reuse the snapshot. 3. Copilot: the subagent-handoff assertion scanned the entire request log and matched the first request with 'delegate' in its input text, which could false-pass on a stale prior scenario. Fix: reverse the array and take the most recent matching request instead. Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts (4/4 pass). Refs #64227 * test(qa): narrow subagent-handoff tool-call assertion to pre-tool requests Pass-2 codex-connector P1 finding on #64681: the reverse-find pattern I used on pass 1 usually lands on the FOLLOW-UP request after the mock runs sessions_spawn, not the pre-tool planning request that actually has plannedToolName === 'sessions_spawn'. The mock only plans that tool on requests with !toolOutput (mock-openai-server.ts:662), so the post-tool request has plannedToolName unset and the assertion fails even when the handoff succeeded. Fix: switch the assertion back to a forward .some() match but add a !request.toolOutput filter so the match is pinned to the pre-tool planning phase. The case-insensitive regex, the fetchJson dedupe, and the failure-message diagnostic from pass 1 are unchanged. Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts (4/4 pass). Refs #64227 * test(qa): pin subagent-handoff tool-call assertion to scenario prompt Addresses the pass-3 codex-connector P1 on #64681: the pass-2 fix filtered to pre-tool requests but still used a broad `/delegate\|subagent handoff/i` regex. The `subagent-fanout-synthesis` scenario runs BEFORE `subagent-handoff` in catalog order (scenarios are sorted by path), and the fanout prompt reads 'Subagent fanout synthesis check: delegate exactly two bounded subagents sequentially' — which contains 'delegate' and also plans sessions_spawn pre-tool. That produces a cross-scenario false pass where the fanout's earlier sessions_spawn request satisfies the handoff assertion even when the handoff run never delegates. Fix: tighten the input-text match from `/delegate\|subagent handoff/i` to `/delegate one bounded qa task/i`, which is the exact scenario- unique substring from the `subagent-handoff` config.prompt. That pins the assertion to this scenario's request window and closes the cross-scenario false positive. Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts (4/4 pass). Refs #64227 * test(qa): align parity assertion comments with actual filter logic Addresses two loop-7 Copilot findings on PR #64681: 1. source-docs-discovery-report.md: the explanatory comment said the debug request log was 'lowercased for case-insensitive matching', but the code actually lowercases each request's allInputText inline inside the .some() predicate, not the discoveryDebugRequests snapshot. Rewrite the comment to describe the inline-lowercase pattern so a future reader matches the code they see. 2. subagent-handoff.md: the comment said the assertion 'must be pinned to THIS scenario's request window' but the implementation actually relies on matching a scenario-unique prompt substring (/delegate one bounded qa task/i), not a request-window. Rewrite the comment to describe the substring pinning and keep the pre-tool filter rationale intact. No runtime change; comment-only fix to keep reviewer expectations aligned with the actual assertion shape. Validation: pnpm test extensions/qa-lab/src/scenario-catalog.test.ts (4/4 pass). Refs #64227 * test(qa): extend tool-call assertions to image-understanding, subagent-fanout, and capability-flip scenarios * Guard mock-only image parity assertions * Expand agentic parity second wave * test(qa): pad parity suspicious-pass isolation to second wave * qa-lab: parametrize parity report title and drop stale first-wave comment Addresses two loop-7 Copilot findings on PR #64662: 1. Hard-coded 'GPT-5.4 / Opus 4.6' markdown H1: the renderer now uses a template string that interpolates candidateLabel and baselineLabel, so any parity run (not only gpt-5.4 vs opus 4.6) renders an accurate title in saved reports. Default CLI flags still produce openai/gpt-5.4 vs anthropic/claude-opus-4-6 as the baseline pair. 2. Stale 'declared first-wave parity scenarios' comment in scopeSummaryToParityPack: the parity pack is now the ten-scenario first-wave+second-wave set (PR D + PR E). Comment updated to drop the first-wave qualifier and name the full QA_AGENTIC_PARITY_SCENARIOS constant the scope is filtering against. New regression: 'parametrizes the markdown header from the comparison labels' — asserts that non-default labels (openai/gpt-5.4-alt vs openai/gpt-5.4) render in the H1. Validation: pnpm test extensions/qa-lab/src/agentic-parity-report.test.ts (13/13 pass). Refs #64227 * qa-lab: fail parity gate on required scenario failures regardless of baseline parity * test(qa): update readable-report test to cover all 10 parity scenarios * qa-lab: strengthen parity-report fake-success detector and verify run.primaryProvider labels * Tighten parity label and scenario checks * fix: tighten parity label provenance checks * fix: scope parity tool-call metrics to tool lanes * Fix parity report label and fake-success checks * fix(qa): tighten parity report edge cases * qa-lab: add Anthropic /v1/messages mock route for parity baseline Closes the last local-runnability gap on criterion 5 of the GPT-5.4 parity completion gate in #64227 ('the parity gate shows GPT-5.4 matches or beats Opus 4.6 on the agreed metrics'). Background: the parity gate needs two comparable scenario runs - one against openai/gpt-5.4 and one against anthropic/claude-opus-4-6 - so the aggregate metrics and verdict in PR D (#64441) can be computed. Today the qa-lab mock server only implements /v1/responses, so the baseline run against Claude Opus 4.6 requires a real Anthropic API key. That makes the gate impossible to prove end-to-end from a local worktree and means the CI story is always 'two real providers + quota + keys'. This PR adds a /v1/messages Anthropic-compatible route to the existing mock OpenAI server. The route is a thin adapter that: - Parses Anthropic Messages API request shapes (system as string or [{type:text,text}], messages with string or block content, text and tool_result and tool_use and image blocks) - Translates them into the ResponsesInputItem[] shape the existing shared scenario dispatcher (buildResponsesPayload) already understands - Calls the shared dispatcher so both the OpenAI and Anthropic lanes run through the exact same scenario prompt-matching logic (same subagent fanout state machine, same extractRememberedFact helper, same '/debug/requests' telemetry) - Converts the resulting OpenAI-format events back into an Anthropic message response with text and tool_use content blocks and a correct stop_reason (tool_use vs end_turn) Non-streaming only: the QA suite runner falls back to non-streaming mock mode so real Anthropic SSE isn't necessary for the parity baseline. Also adds claude-opus-4-6 and claude-sonnet-4-6 to /v1/models so baseline model-list probes from the suite runner resolve without extra config. Tests added: - advertises Anthropic claude-opus-4-6 baseline model on /v1/models - dispatches an Anthropic /v1/messages read tool call for source discovery prompts (tool_use stop_reason, correct input path, /debug/requests records plannedToolName=read) - dispatches Anthropic /v1/messages tool_result follow-ups through the shared scenario logic (subagent-handoff two-stage flow: tool_use - tool_result - 'Delegated task / Evidence' prose summary) Local validation: - pnpm test extensions/qa-lab/src/mock-openai-server.test.ts (18/18 pass) - pnpm test extensions/qa-lab/src/mock-openai-server.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/scenario-catalog.test.ts (47/47 pass) Refs #64227 Unblocks #64441 (parity harness) and the forthcoming qa parity run wrapper by giving the baseline lane a local-only mock path. * qa-lab: fix Anthropic tool_result ordering in messages adapter Addresses the loop-6 Copilot / Greptile finding on PR #64685: in `convertAnthropicMessagesToResponsesInput`, `tool_result` blocks were pushed to `items` inside the per-block loop while the surrounding user/assistant message was only pushed after the loop finished. That reordered the function_call_output BEFORE its parent user message whenever a user turn mixed `tool_result` with fresh text/image blocks, which broke `extractToolOutput` (it scans AFTER the last user-role index; function_call_output placed BEFORE that index is invisible to it) and made the downstream scenario dispatcher behave as if no tool output had been returned on mixed-content turns. Fix: buffer `tool_result` and `tool_use` blocks in local arrays during the per-block loop, push the parent role message first (when it has any text/image pieces), then push the accumulated function_call / function_call_output items in original order. tool_result-only user turns still omit the parent message as before, so the non-mixed subagent-fanout-synthesis two-stage flow that already worked keeps working. Regression added: - `places tool_result after the parent user message even in mixed-content turns` — sends a user turn that mixes a `tool_result` block with a trailing fresh text block, then inspects `/debug/last-request` to assert that `toolOutput === 'SUBAGENT-OK'` (extractToolOutput found the function_call_output AFTER the last user index) and `prompt === 'Keep going with the fanout.'` (extractLastUserText picked up the trailing fresh text). Local validation: pnpm test extensions/qa-lab/src/mock-openai-server.test.ts (19/19 pass). Refs #64227 * qa-lab: reject Anthropic streaming and empty model in messages mock * qa-lab: tag mock request snapshots with a provider variant so parity runs can diff per provider * Handle invalid Anthropic mock JSON * fix: wire mock parity providers by model ref * fix(qa): support Anthropic message streaming in mock parity lane * qa-lab: record provider/model/mode in qa-suite-summary.json Closes the 'summary cannot be label-verified' half of criterion 5 on the GPT-5.4 parity completion gate in #64227. Background: the parity gate in #64441 compares two qa-suite-summary.json files and trusts whatever candidateLabel / baselineLabel the caller passes. Today the summary JSON only contains { scenarios, counts }, so nothing in the summary records which provider/model the run actually used. If a maintainer swaps candidate and baseline summary paths in a parity-report call, the verdict is silently mislabeled and nobody can retroactively verify which run produced which summary. Changes: - Add a 'run' block to qa-suite-summary.json with startedAt, finishedAt, providerMode, primaryModel (+ provider and model splits), alternateModel (+ provider and model splits), fastMode, concurrency, scenarioIds (when explicitly filtered). - Extract a pure 'buildQaSuiteSummaryJson(params)' helper so the summary JSON shape is unit-testable and the parity gate (and any future parity wrapper) can import the exact same type rather than reverse-engineering the JSON shape at runtime. - Thread 'scenarioIds' from 'runQaSuite' into writeQaSuiteArtifacts so --scenario-ids flags are recorded in the summary. Unit tests added (src/suite.summary-json.test.ts, 5 cases): - records provider/model/mode so parity gates can verify labels - includes scenarioIds in run metadata when provided - records an Anthropic baseline lane cleanly for parity runs - leaves split fields null when a model ref is malformed - keeps scenarios and counts alongside the run metadata This is additive: existing callers of qa-suite-summary.json continue to see the same { scenarios, counts } shape, just with an extra run field. No existing consumers of the JSON need to change. The follow-up 'qa parity run' CLI wrapper (run the parity pack twice against candidate + baseline, emit two labeled summaries in one command) stacks cleanly on top of this change and will land as a separate PR once #64441 and #64662 merge so the wrapper can call runQaParityReportCommand directly. Local validation: - pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (5/5 pass) - pnpm test extensions/qa-lab/src/suite.summary-json.test.ts extensions/qa-lab/src/cli.runtime.test.ts extensions/qa-lab/src/scenario-catalog.test.ts (34/34 pass) Refs #64227 Unblocks the final parity run for #64441 / #64662 by making summaries self-describing. * qa-lab: strengthen qa-suite-summary builder types and empty-array semantics Addresses 4 loop-6 Copilot / codex-connector findings on PR #64689 (re-opened as #64789): 1. P2 codex + Copilot: empty `scenarioIds` array was serialized as `[]` because of a truthiness check. The CLI passes an empty array when --scenario is omitted, so full-suite runs would incorrectly record an explicit empty selection. Fix: switch to a `length > 0` check so '[] or undefined' both encode as `null` in the summary run metadata. 2. Copilot: `buildQaSuiteSummaryJson` was exported for parity-gate consumers but its return type was `Record<string, unknown>`, which defeated the point of exporting it. Fix: introduce a concrete `QaSuiteSummaryJson` type that matches the JSON shape 1-for-1 and make the builder return it. Downstream code (parity gate, parity run wrapper) can now import the type and keep consumers type-checked. 3. Copilot: `QaSuiteSummaryJsonParams.providerMode` re-declared the `'mock-openai' \| 'live-frontier'` string union even though `QaProviderMode` is already imported from model-selection.ts. Fix: reuse `QaProviderMode` so provider-mode additions flow through both types at once. 4. Copilot: test fixtures omitted `steps` from the fake scenario results, creating shape drift with the real suite scenario-result shape. Fix: pad the test fixtures with `steps: []` and tighten the scenarioIds assertion to read `json.run.scenarioIds` directly (the new concrete return type makes the type-cast unnecessary). New regression: `treats an empty scenarioIds array as unspecified (no filter)` — passes `scenarioIds: []` and asserts the summary records `scenarioIds: null`. Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (6/6 pass). Refs #64227 * qa-lab: record executed scenarioIds in summary run metadata Addresses the pass-3 codex-connector P2 on #64789 (repl of #64689): `run.scenarioIds` was copied from the raw `params.scenarioIds` caller input, but `runQaSuite` normalizes that input through `selectQaSuiteScenarios` which dedupes via `Set` and reorders the selection to catalog order. When callers repeat --scenario ids or pass them in non-catalog order, the summary metadata drifted from the scenarios actually executed, which can make parity/report tooling treat equivalent runs as different or trust inaccurate provenance. Fix: both writeQaSuiteArtifacts call sites in runQaSuite now pass `selectedCatalogScenarios.map(scenario => scenario.id)` instead of `params?.scenarioIds`, so the summary records the post-selection executed list. This also covers the full-suite case automatically (the executed list is the full lane-filtered catalog), giving parity consumers a stable record of exactly which scenarios landed in the run regardless of how the caller phrased the request. buildQaSuiteSummaryJson's `length > 0 ? [...] : null` pass-2 semantics are preserved so the public helper still treats an empty array as 'unspecified' for any future caller that legitimately passes one. Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (6/6 pass). Refs #64227 * qa-lab: preserve null scenarioIds for unfiltered suite runs Addresses the pass-4 codex-connector P2 on #64789: the pass-3 fix always passed `selectedCatalogScenarios.map(...)` to writeQaSuiteArtifacts, which made unfiltered full-suite runs indistinguishable from an explicit all-scenarios selection in the summary metadata. The 'unfiltered → null' semantic (documented in the buildQaSuiteSummaryJson JSDoc and exercised by the "treats an empty scenarioIds array as unspecified" regression) was lost. Fix: both writeQaSuiteArtifacts call sites now condition on the caller's original `params.scenarioIds`. When the caller passed an explicit non-empty filter, record the post-selection executed list (pass-3 behavior, preserving Set-dedupe + catalog-order normalization). When the caller passed undefined or an empty array, pass undefined to writeQaSuiteArtifacts so buildQaSuiteSummaryJson's length-check serializes null (pass-2 behavior, preserving unfiltered semantics). This keeps both codex-connector findings satisfied simultaneously: - explicit --scenario filter reorders/dedupes through the executed list, not the raw caller input - unfiltered full-suite run records null, not a full catalog dump that would shadow "explicit all-scenarios" selections Validation: pnpm test extensions/qa-lab/src/suite.summary-json.test.ts (6/6 pass). Refs #64227 * qa-lab: reuse QaProviderMode in writeQaSuiteArtifacts param type * qa-lab: stage mock auth profiles so the parity gate runs without real credentials * fix(qa): clean up mock auth staging follow-ups * ci: add parity-gate workflow that runs the GPT-5.4 vs Opus 4.6 gate end-to-end against the qa-lab mock * ci: use supported parity gate runner label * ci: watch gateway changes in parity gate * docs: pin parity runbook alternate models * fix(ci): watch qa-channel parity inputs * qa: roll up parity proof closeout * qa: harden mock parity review fixes * qa-lab: fix review findings — comment wording, placeholder key, exported type, ordering assertion, remove false-positive positive-tone detection * qa: fix memory-recall scenario count, update criterion 2 comment, cache fetchJson in model-switch * qa-lab: clean up positive-tone comment + fix stale test expectations * qa: pin workflow Node version to 22.14.0 + fix stale label-match wording * qa-lab: refresh mock provider routing expectation * docs: drop stale parity rollup rewrite from proof slice * qa: run parity gate against mock lane * deps: sync qa-lab lockfile * build: refresh a2ui bundle hash * ci: widen parity gate triggers --------- Co-authored-by: Eva <eva@100yen.org>	2026-04-13 13:01:54 +09:00
Peter Steinberger	b42937908d	chore(release): prepare 2026.4.12-beta.1	2026-04-13 00:20:52 +01:00
Peter Steinberger	35b0586cb1	build: update A2UI bundle hash	2026-04-12 11:41:24 -07:00
Peter Steinberger	67af6f0baf	fix: restore main CI checks	2026-04-12 11:28:43 -07:00
Vincent Koc	766954d9a1	fix(build): refresh a2ui bundle hash	2026-04-12 12:50:42 +01:00
Vincent Koc	518e1b5e23	fix(build): refresh a2ui bundle hash	2026-04-12 09:50:20 +01:00
Peter Steinberger	769908ec3f	chore(release): prepare 2026.4.11	2026-04-12 01:05:56 +01:00
Vincent Koc	329a0f00ce	fix(canvas): refresh a2ui bundle hash	2026-04-12 00:37:14 +01:00
HDYA	26f633b604	feat(msteams): add federated credential support (certificate + managed identity) (#53615 ) * feat(msteams): add federated authentication support (certificate + managed identity + workload identity) * msteams: fix vitest 4.1.2 compat, type errors, and regenerate config baseline * msteams: fix lint errors, update fetch allowlist, regenerate protocol Swift * fix(msteams): gate secret-only delegated auth flows * fix(ci): unblock gateway watch and install smoke * fix(ci): restore mergeability for pr 53615 * fix(ci): restore channel registry helper typing * fix(ci): refresh raw fetch guard allowlist --------- Co-authored-by: Chudi Huang <Chudi.Huang@microsoft.com> Co-authored-by: Brad Groux <3053586+BradGroux@users.noreply.github.com>	2026-04-11 13:29:22 -05:00
Peter Steinberger	788c37a6c2	chore(release): prepare 2026.4.11-beta.1	2026-04-11 16:10:13 +01:00
Peter Steinberger	0d733a28e1	build(canvas): refresh a2ui input hash	2026-04-11 14:19:51 +01:00
Peter Steinberger	a8284e39de	build(canvas): stabilize a2ui bundle inputs	2026-04-11 14:19:25 +01:00
Peter Steinberger	9bde608f38	build: keep a2ui bundle generated	2026-04-11 14:18:04 +01:00
Peter Steinberger	0ed512bbdf	build: refresh a2ui bundle	2026-04-11 14:18:04 +01:00
Peter Steinberger	370efaa4a0	build(canvas): refresh a2ui bundle	2026-04-11 13:49:04 +01:00
Vincent Koc	74e7b8d47b	fix(cycles): bulk extract leaf type surfaces	2026-04-11 13:26:50 +01:00
Peter Steinberger	9e0d358695	refactor: simplify runtime conversions	2026-04-11 01:23:34 +01:00

1 2 3 4

179 Commits