Vincent Koc
e9720c27fa
fix(qa): accept Codex capped read evidence ( #96366 )
2026-06-24 18:07:13 +08:00
Vincent Koc
8242923fe3
fix(qa): allow async runtime fixture starts
2026-06-24 17:52:16 +08:00
Dallin Romney
9666db607e
test(qa): clean up smoke taxonomy profile ( #96320 )
2026-06-24 00:43:00 -07:00
Vincent Koc
0e71ae5df4
fix(qa): enforce fanout completion drain
2026-06-24 08:29:37 +08:00
Vincent Koc
e457c4c324
fix(qa): drain fanout child completions
2026-06-24 08:29:37 +08:00
Vincent Koc
ab9d3ad6d7
fix(qa): settle channel no-reply check
2026-06-24 08:29:37 +08:00
Vincent Koc
960b9fa4f3
fix(qa): scope no-outbound waits
2026-06-24 08:29:37 +08:00
Vincent Koc
6f80552ee9
fix(qa): prove direct reply routing via qa channel
2026-06-24 00:41:28 +08:00
Dallin Romney
f6b2a5ffb4
test(qa): harden all-profile evidence scenarios ( #96003 )
2026-06-23 00:07:51 -07:00
Vincent Koc
f24b1a9c0c
test(qa): relax heartbeat target none startup probe
2026-06-23 08:18:55 +02:00
Vincent Koc
53f9b6a36b
test(qa): align release memory scenario assertions
2026-06-23 07:43:06 +02:00
Dallin Romney
438f208a76
perf(qa-lab): speed up unified QA suites ( #95944 )
...
* perf(qa-lab): speed up smoke ci suite
* fix(qa-lab): satisfy suite scheduler lint
* fix(qa-lab): settle unified partitions before retry
* fix(qa-lab): preserve isolated suite safeguards
* refactor(qa-lab): make suite isolation explicit
* fix(qa-lab): preserve channel-driver suite serialization
* fix(qa-lab): narrow flow-only isolation metadata
2026-06-22 21:55:54 -07:00
Vincent Koc
495a4f9b8e
test(qa): accept verified live fanout completions
2026-06-23 06:46:40 +02:00
Dallin Romney
d3dc7aaa87
docs: update maturity scorecard ( #95933 )
...
* docs: update maturity scorecard
* docs: rerender maturity scorecard from all evidence
2026-06-22 21:37:03 -07:00
Vincent Koc
68a1e00b73
fix(agents): retry silent subagent completion handoffs
2026-06-23 06:04:16 +02:00
Vincent Koc
54b2243de3
test(qa): tighten release profile scenario waits
2026-06-23 06:04:16 +02:00
Vincent Koc
a9024741c2
test(qa): pin live artifact scenario contracts
2026-06-23 05:13:35 +02:00
Vincent Koc
d1b268f7f7
fix(qa): normalize completed wait envelopes
2026-06-23 05:13:35 +02:00
Vincent Koc
c48dd3cdd1
fix(ci): align maturity score source with taxonomy
2026-06-23 10:46:07 +08:00
Dallin Romney
b71ddbf1b4
ci: simplify maturity scorecard QA evidence inputs ( #95898 )
...
* ci: simplify maturity scorecard evidence inputs
* ci: keep maturity renderer defaults runnable
* ci: validate maturity evidence source
* ci: split maturity scorecard codex agent
* ci: remove codex copy from maturity evidence workflow
* ci: narrow maturity evidence workflow secrets
2026-06-22 19:24:43 -07:00
Vincent Koc
d716dfd532
test(qa): wait for live history replies in flow scenarios
2026-06-23 04:01:11 +02:00
Vincent Koc
def4b51485
fix(qa): gate smoke profile scenarios by channel driver
2026-06-23 09:34:52 +08:00
Vincent Koc
43f2b61f3b
test(qa): keep image generation fixture on mock lane
2026-06-23 02:35:02 +02:00
Vincent Koc
086c629556
test(qa): scope provider-sensitive flow fixtures
2026-06-23 02:17:20 +02:00
Vincent Koc
befe04f465
test(qa): accept Sonnet max thinking support
2026-06-23 01:57:43 +02:00
Vincent Koc
264b37e9d2
test(qa): avoid redacted config cleanup patch
2026-06-23 01:39:39 +02:00
Dallin Romney
63b13ea837
feat(qa): crabline channel driver ( #91502 )
...
* feat(qa): add crabline channel driver seam
* feat: run crabline channel driver smoke
* chore: keep crabline qa dependency dev-only
* refactor(qa): keep crabline driver details opaque
* chore(qa): pin crabline to merged driver API
* feat(qa): drive channel driver from profiles
* fix(qa): declare crabline runtime peer
* feat(qa): resolve crabline channel from scenarios
* feat(qa): treat unsupported profile channels as coverage gaps
* Revert "feat(qa): treat unsupported profile channels as coverage gaps"
This reverts commit 65a9701655 .
* fix(qa): adapt crabline driver to chat sdk cli
* refactor(qa): pass channel driver metadata directly
* chore(qa): update crabline provider pin
* chore(qa): default channel scenarios to driver
* chore: repair qa dependency lockfile
* chore: allow native qa dependency builds
* fix(qa): satisfy crabline driver lint
* fix(qa): satisfy crabline ci gates
* Use crabline transport for smoke QA profile
* fix(qa): keep crabline driver opt-in
* fix(qa): reuse crabline telegram driver token
* fix(qa): route smoke profile through crabline
* fix(qa): run full smoke profile lane
* fix(qa): remove smoke scenario workflow filter
* fix: stabilize crabline smoke qa profile
* fix: pin crabline qa dependency
* test: keep crabline smoke credential-free
* fix: skip visible reasoning lane for crabline smoke
* fix: unblock crabline qa ci
* Update crabline dependency
* Pin crabline to merged main
* Use Crabline fake provider servers
2026-06-22 15:24:59 -07:00
Vincent Koc
7cc21ef59d
fix(qa): stabilize smoke-ci scenarios
2026-06-22 15:41:53 +08:00
Vincent Koc
7d4d8a7f3d
test(agents): add large prompt cache coverage ( #95653 )
...
Adds large OpenAI and Anthropic prompt-cache live coverage plus a QA Lab long-context tool-result scenario.
Co-authored-by: vincentkoc <vincentkoc@users.noreply.github.com >
Reviewed-by: @vincentkoc
2026-06-22 11:39:37 +08:00
Dallin Romney
5dd30c3995
test: fold HTTP API script proof into QA Lab ( #94700 )
...
* test: fold HTTP API script proof into QA Lab
* test: remove folded HTTP API script tests
* test: relax QA native scenario catalog inventory
* test: trim folded QA Lab script cruft
* test: align folded QA coverage ids
* test: keep native QA evidence out of parity tiers
* test: update mirrored QA routing expectation
* test: preserve chat tools profile build guard
* test: avoid overclaiming gateway tool API coverage
* test: pin folded QA coverage ids
2026-06-20 23:10:35 -07:00
Jesse Merhi
5db2f6c1fc
Add stdout diagnostics OTEL log exporter
...
Adds stdout and both-mode diagnostics OTEL log export, with focused QA Lab smoke coverage and docs/config updates.
Prepared head SHA: efa2ef07ab
Verification: CI 27808480969 passed for the prepared head.
Reviewed-by: @jesse-merhi
2026-06-19 16:06:37 +10:00
Dallin Romney
e12cf72b17
Standardize QA coverage IDs on dotted names ( #94702 )
...
* fix: standardize qa coverage ids
* test: avoid qa coverage id assertion spread
2026-06-18 17:25:26 -07:00
Dallin Romney
4ca0e52d0e
test: fold channel message flows into qa e2e ( #93174 )
...
* test: fold channel flows into qa e2e
* test: keep channel flow skill pointed at qa
* test: move channel flow proof under telegram
2026-06-18 15:45:33 -07:00
Dallin Romney
c4ae2be947
fix: taxonomy coverage id cleanup ( #94304 )
...
* fix: split taxonomy coverage id features
* fix: clean taxonomy feature row names
* docs: clarify taxonomy coverage id semantics
* docs: tighten coverage id guidance
* fix: keep taxonomy features product shaped
* fix: narrow sdk artifact coverage bundle
* fix: name taxonomy coverage ids clearly
* fix: polish taxonomy feature descriptions
2026-06-18 15:16:58 -07:00
Colin Johnson
d5a27b0b96
test: add QA Lab UX Matrix evidence scenario ( #94306 )
...
* test: add qa lab ux matrix script scenario
* fix(qa-lab): annotate UX Matrix producer catch callback as unknown for oxlint
---------
Co-authored-by: Dallin Romney <dallinromney@gmail.com >
2026-06-18 14:10:06 -07:00
Dallin Romney
e17d111990
fix: require all taxonomy coverage ids ( #94296 )
2026-06-17 16:38:14 -07:00
Colin Johnson
591313e80a
qa-lab: support script-backed evidence scenarios ( #94276 )
...
* qa: add script scenario execution kind
* fix(qa-lab): carry suite profile into script producer evidence and simplify artifact path resolution
* fix(qa-lab): keep out-of-repo producer artifacts absolute to avoid ../ traversal refs
---------
Co-authored-by: Dallin Romney <dallinromney@gmail.com >
2026-06-17 15:09:25 -07:00
Vincent Koc
8288b4d4c9
fix(qa-lab): stabilize web search parity fixture
2026-06-18 00:07:36 +02:00
Dallin Romney
f7e5132ffd
test: fold gateway smoke into qa e2e ( #93178 )
2026-06-17 14:55:28 -07:00
Dallin Romney
fae4a01d0d
test: fold otel smoke into qa e2e ( #93181 )
...
* test: fold otel smoke into qa e2e
* test: eliminate otel smoke script
2026-06-17 14:54:58 -07:00
Dallin Romney
0a6736af09
test: fold lifecycle and package proof into QA Lab ( #93114 )
...
* test: fold script coverage into qa scenarios
* test: migrate script checks into qa e2e
* test: point qa code refs at migrated e2e
* test: fold plugin lifecycle probe into qa e2e
* test: use shared temp dirs in plugin lifecycle probe
* test: fold plugin lifecycle sweep into qa lab
* test: trim lifecycle docker text assertions
* test: keep followup script conversions split
* test: make lifecycle docker runner script-safe
* test: update changed helper routing expectation
2026-06-17 14:22:04 -07:00
Dallin Romney
450060d7a2
test(qa): expand smoke-ci and release categories and coverage ( #93175 )
...
* test(qa): add smoke ci primary coverage evidence
* test(qa): remove overstated primary coverage claims
* test(qa): make release profile include smoke ci
* test(qa): trim taxonomy formatting churn
* test(qa): avoid hardcoded profile names in coverage test
* test(qa): make release profile cover taxonomy
* test(qa): type profile fixture all category flag
* test(qa): include channel delivery in smoke ci profile
2026-06-15 18:05:52 -07:00
Dallin Romney
fef8394079
Convert QA scenarios to YAML files ( #92915 )
...
* refactor: load QA scenarios from YAML
* docs: update personal QA scenario docs
* test: keep QA scenarios YAML-only
2026-06-14 17:31:18 -07:00
NVIDIAN
ecaebfc51b
fix(agents): retry thinking-only errored turns ( #92191 )
...
Retry replay-safe reasoning-only provider errors before assistant failover while preserving classified fallback and terminal-output ownership. Adds deterministic Anthropic gateway fault-injection coverage and focused regression tests.\n\nCo-authored-by: ai-hpc <mail.speedy.hpc@hotmail.com >
2026-06-14 09:39:27 -07:00
Dallin Romney
a3e9dfee0e
Simplify QA scorecard mapping shape ( #92558 )
...
* test(qa): simplify scorecard mapping shape
* test(qa): use typed scorecard evidence refs
* test(qa): map scorecard categories by coverage id
* feat: align qa coverage with taxonomy features
* refactor: keep qa coverage ids canonical
* refactor: minimize qa coverage id churn
* test: align qa coverage id assertions
* test: update qa evidence coverage expectations
* refactor qa taxonomy coverage ids
* style qa taxonomy coverage ids
* test qa coverage lint fix
* test qa coverage type fix
2026-06-14 00:16:33 -07:00
Dallin Romney
561b293c7a
Run Vitest and Playwright scenarios from qa suite ( #92606 )
...
* test(qa): run vitest and playwright scenarios from qa suite
* fix(qa): harden scenario suite dispatch
* refactor(qa): share scenario path utilities
* refactor(qa): share test file scenario runner
* refactor(qa): route test file scenarios through suite runtime
* refactor(qa): use explicit suite runtime result kind
* test(qa): write suite evidence artifact
* refactor(qa): clarify suite execution dispatch
* fix(qa): keep test-file scenarios out of flow-only runners
* refactor(qa): export mixed scenario suite runner
2026-06-13 01:06:10 -07:00
Vincent Koc
7d3e8dc963
test(qa): restore memory fallback config safely
2026-06-10 18:03:15 +09:00
Vincent Koc
2c146261a2
fix(qa): accept completed mock image turns
2026-06-10 17:48:33 +09:00
Vincent Koc
87abb8defb
fix(qa): preserve Matrix recovery state in sqlite
2026-06-10 17:35:41 +09:00
Vincent Koc
3b180d5d99
fix(qa): wait for restart wake before capability check
2026-06-10 17:09:18 +09:00