Commit Graph

58 Commits

Author SHA1 Message Date
Peter Steinberger
b5d2bd6f41 fix(qa): tighten frontier scope evals 2026-04-07 20:32:42 +01:00
Peter Steinberger
4e69a9b329 fix(qa): restore safe no-fork gateway runtime 2026-04-07 20:32:42 +01:00
Vincent Koc
cde12e63e7 perf(qa): lazy-load runner catalog for lab ui 2026-04-07 20:32:42 +01:00
Vincent Koc
f312d6c106 fix(qa): preserve gateway cli auth in no-fork rpc path 2026-04-07 20:32:42 +01:00
Vincent Koc
e7538b4499 perf(qa): drop per-rpc gateway cli forks 2026-04-07 20:32:42 +01:00
Vincent Koc
02bd9e8c10 perf(qa): trim frontier direct-agent waits 2026-04-07 20:32:42 +01:00
Vincent Koc
35eb70f1f5 test(qa): retry flaky local fetches in lab server tests 2026-04-07 20:32:42 +01:00
Vincent Koc
986536ff6b fix(qa): keep direct self-check outputs under repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
f6544a0a3b fix(qa): anchor runner artifacts to repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
daeff2fa89 fix(qa): default docker artifacts from repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
76bde3d42b fix(qa): support neutral-cwd docker commands 2026-04-07 20:32:42 +01:00
Vincent Koc
816a3eae8a chore(qa): align qa cli provider input types 2026-04-07 20:32:42 +01:00
Vincent Koc
5aa4fd3216 fix(qa): normalize qa cli lane inputs 2026-04-07 20:32:42 +01:00
Vincent Koc
7d18b145f8 fix(qa): keep manual alternate model aligned 2026-04-07 20:32:42 +01:00
Vincent Koc
cdf18c16b4 fix(qa): default manual lanes by provider mode 2026-04-07 20:32:42 +01:00
Vincent Koc
3182588ad4 fix(qa): allow random qa-lab control-ui origins 2026-04-07 20:32:42 +01:00
Vincent Koc
82535771cd fix(qa): pin gateway child control ui root 2026-04-07 20:32:42 +01:00
Vincent Koc
f9f38a48e6 fix(qa): align mock model-switch continuity 2026-04-07 20:32:42 +01:00
Vincent Koc
9a106f7e3c fix(qa): support neutral-cwd suite runs 2026-04-07 20:32:42 +01:00
Vincent Koc
f93b217834 feat(qa): add manual harness lane 2026-04-07 20:32:42 +01:00
Vincent Koc
63e6bb026c fix(qa): isolate gateway child runtime 2026-04-07 20:32:42 +01:00
Vincent Koc
4f421fa0f1 fix(qa): harden frontier claude bakeoffs 2026-04-07 20:32:42 +01:00
Vincent Koc
18fb171179 feat(qa): add frontier harness bakeoff loop 2026-04-07 20:32:41 +01:00
Peter Steinberger
a3d5630232 test: stabilize scoped runners and qa ports 2026-04-07 15:28:46 +01:00
Peter Steinberger
ad605052bf refactor: dedupe provider lowercase helpers 2026-04-07 15:12:31 +01:00
Peter Steinberger
1cec37184c fix: harden qa memory dreaming sweep 2026-04-07 12:57:33 +01:00
Peter Steinberger
c541a9c110 Tests: fix flaky shard expectations 2026-04-07 12:22:51 +01:00
Peter Steinberger
f2494aa33f feat: streamline qa lab live runs 2026-04-07 10:05:49 +01:00
Peter Steinberger
124cd5e307 feat: auto-reload qa lab fast refresh 2026-04-07 09:54:59 +01:00
Peter Steinberger
54a884865e feat: add fast qa lab ui refresh mode 2026-04-07 09:45:11 +01:00
Peter Steinberger
17085ec1a4 fix: make qa lab docker boot resilient 2026-04-07 09:04:18 +01:00
Peter Steinberger
e169fcd263 refactor: dedupe qa and diff error formatting 2026-04-07 05:06:54 +01:00
Peter Steinberger
f6c3474342 fix(qa-lab): bump health timeout to 360s, add settle delay, show compose path in error 2026-04-06 22:51:06 +01:00
Peter Steinberger
80c8567f9d fix: resolve merge conflicts and preserve runtime test fixes 2026-04-06 22:46:33 +01:00
Peter Steinberger
cfebdee073 refactor: dedupe qa cli shutdown handling 2026-04-06 22:21:01 +01:00
Peter Steinberger
1a893132f6 refactor: dedupe qa mock input text extraction 2026-04-06 22:21:00 +01:00
Peter Steinberger
27dc1bd0fc fix(qa-lab): improve health timeout error message and fix port-free check 2026-04-06 19:35:18 +01:00
Peter Steinberger
37b7e22e13 fix(qa-lab): increase health check timeout to 240s 2026-04-06 19:35:15 +01:00
Peter Steinberger
41da6faa9e fix(qa-lab): tear down previous docker stack before starting new one 2026-04-06 18:41:17 +01:00
Peter Steinberger
bb29c8696a fix: harden qa lab docker launcher startup 2026-04-06 18:01:08 +01:00
Peter Steinberger
b4e1747391 feat: add one-command qa lab docker launcher 2026-04-06 17:47:17 +01:00
Peter Steinberger
5f906c926d refactor: remove qa-e2e compatibility facade 2026-04-06 17:23:35 +01:00
Peter Steinberger
350238d402 feat: add interactive qa lab suite runner 2026-04-06 17:23:35 +01:00
Peter Steinberger
d60149c655 test: move provider tests into owning extensions 2026-04-06 16:47:03 +01:00
Peter Steinberger
af62a2c2e4 style: fix extension lint violations 2026-04-06 14:53:55 +01:00
Peter Steinberger
4f1cbcdcd9 feat(qa): add attachment understanding scenario 2026-04-06 04:46:28 +01:00
Peter Steinberger
2285bacd21 fix(qa): support image understanding inputs 2026-04-06 04:46:27 +01:00
Peter Steinberger
1373ac6c9e feat(qa): execute ten new repo-backed scenarios 2026-04-06 04:28:33 +01:00
Peter Steinberger
746b112dac fix(openai): allow qa image generation mock routing 2026-04-06 04:28:33 +01:00
Peter Steinberger
979409eab5 fix(qa): harden new scenario suite 2026-04-06 02:41:03 +01:00