Commit Graph

89 Commits

Author SHA1 Message Date
Peter Steinberger
97dfbe0fe1 feat: add qa character vibes eval 2026-04-08 12:05:24 +01:00
Peter Steinberger
9bf3482470 refactor: finish markdown-only qa runner 2026-04-08 11:56:02 +01:00
Peter Steinberger
95e397a266 refactor: dedupe repeated test helpers 2026-04-08 09:58:22 +01:00
Peter Steinberger
492e98a88a refactor: move qa suite logic into scenario markdown 2026-04-08 09:13:57 +01:00
Vincent Koc
be530f085d refactor(plugin-sdk): share tool payload extraction 2026-04-08 09:07:28 +01:00
Vincent Koc
4260ac4cf6 perf(plugins): narrow boundary compile sdk imports 2026-04-08 08:52:51 +01:00
Peter Steinberger
21d9bac5ec fix: stabilize live qa scenario suite 2026-04-08 08:17:59 +01:00
Peter Steinberger
8cbd60d203 chore: prepare 2026.4.9 release 2026-04-08 08:02:53 +01:00
Peter Steinberger
b73d8ef7d7 refactor: split qa scenarios into per-file markdown defs 2026-04-08 05:37:17 +01:00
Peter Steinberger
4f8471617a chore: prepare 2026.4.8 2026-04-08 04:21:51 +01:00
Peter Steinberger
0e91c25c0b chore: prepare 2026.4.7 2026-04-08 02:14:59 +01:00
Peter Steinberger
c0aed59fca refactor: move qa suite definitions into markdown 2026-04-07 23:39:50 +01:00
Peter Steinberger
e0ad3e79e6 refactor: dedupe normalization lowercase helpers 2026-04-07 22:57:52 +01:00
Peter Steinberger
83bdba2bae fix: resolve rebase regressions for ci landing 2026-04-07 21:02:04 +01:00
Peter Steinberger
5b090561fb refactor: dedupe browser whatsapp qa lowercase helpers 2026-04-07 20:58:01 +01:00
Peter Steinberger
a00b01f5ed fix: harden complex qa suite scenarios 2026-04-07 20:35:39 +01:00
Peter Steinberger
b5d2bd6f41 fix(qa): tighten frontier scope evals 2026-04-07 20:32:42 +01:00
Peter Steinberger
4e69a9b329 fix(qa): restore safe no-fork gateway runtime 2026-04-07 20:32:42 +01:00
Vincent Koc
cde12e63e7 perf(qa): lazy-load runner catalog for lab ui 2026-04-07 20:32:42 +01:00
Vincent Koc
f312d6c106 fix(qa): preserve gateway cli auth in no-fork rpc path 2026-04-07 20:32:42 +01:00
Vincent Koc
e7538b4499 perf(qa): drop per-rpc gateway cli forks 2026-04-07 20:32:42 +01:00
Vincent Koc
02bd9e8c10 perf(qa): trim frontier direct-agent waits 2026-04-07 20:32:42 +01:00
Vincent Koc
35eb70f1f5 test(qa): retry flaky local fetches in lab server tests 2026-04-07 20:32:42 +01:00
Vincent Koc
986536ff6b fix(qa): keep direct self-check outputs under repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
f6544a0a3b fix(qa): anchor runner artifacts to repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
daeff2fa89 fix(qa): default docker artifacts from repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
76bde3d42b fix(qa): support neutral-cwd docker commands 2026-04-07 20:32:42 +01:00
Vincent Koc
816a3eae8a chore(qa): align qa cli provider input types 2026-04-07 20:32:42 +01:00
Vincent Koc
5aa4fd3216 fix(qa): normalize qa cli lane inputs 2026-04-07 20:32:42 +01:00
Vincent Koc
7d18b145f8 fix(qa): keep manual alternate model aligned 2026-04-07 20:32:42 +01:00
Vincent Koc
cdf18c16b4 fix(qa): default manual lanes by provider mode 2026-04-07 20:32:42 +01:00
Vincent Koc
3182588ad4 fix(qa): allow random qa-lab control-ui origins 2026-04-07 20:32:42 +01:00
Vincent Koc
82535771cd fix(qa): pin gateway child control ui root 2026-04-07 20:32:42 +01:00
Vincent Koc
f9f38a48e6 fix(qa): align mock model-switch continuity 2026-04-07 20:32:42 +01:00
Vincent Koc
9a106f7e3c fix(qa): support neutral-cwd suite runs 2026-04-07 20:32:42 +01:00
Vincent Koc
f93b217834 feat(qa): add manual harness lane 2026-04-07 20:32:42 +01:00
Vincent Koc
63e6bb026c fix(qa): isolate gateway child runtime 2026-04-07 20:32:42 +01:00
Vincent Koc
4f421fa0f1 fix(qa): harden frontier claude bakeoffs 2026-04-07 20:32:42 +01:00
Vincent Koc
18fb171179 feat(qa): add frontier harness bakeoff loop 2026-04-07 20:32:41 +01:00
Peter Steinberger
a3d5630232 test: stabilize scoped runners and qa ports 2026-04-07 15:28:46 +01:00
Peter Steinberger
ad605052bf refactor: dedupe provider lowercase helpers 2026-04-07 15:12:31 +01:00
Peter Steinberger
1cec37184c fix: harden qa memory dreaming sweep 2026-04-07 12:57:33 +01:00
Peter Steinberger
c541a9c110 Tests: fix flaky shard expectations 2026-04-07 12:22:51 +01:00
Peter Steinberger
524951e124 fix(ci): route qa-lab web imports through package barrels 2026-04-07 10:24:02 +01:00
Peter Steinberger
f2494aa33f feat: streamline qa lab live runs 2026-04-07 10:05:49 +01:00
Peter Steinberger
124cd5e307 feat: auto-reload qa lab fast refresh 2026-04-07 09:54:59 +01:00
Peter Steinberger
54a884865e feat: add fast qa lab ui refresh mode 2026-04-07 09:45:11 +01:00
Peter Steinberger
36aeef30c2 style: add padding to qa lab scenario list 2026-04-07 09:45:11 +01:00
Peter Steinberger
1baf5533aa feat(qa-lab): add Clawfather/Claw avatars and live-watch mode for scenario runs 2026-04-07 09:24:26 +01:00
Peter Steinberger
282188a326 fix(qa-lab): widen sidebar to 360px and allow scenario titles to wrap 2026-04-07 09:21:25 +01:00