Commit Graph

88 Commits

Author SHA1 Message Date
Peter Steinberger
0c278bb93c refactor: break runtime import cycles 2026-04-09 03:56:22 +01:00
Peter Steinberger
be46d0ddc6 test: update character eval public panel 2026-04-09 01:25:59 +01:00
Peter Steinberger
39cc6b7dc7 fix: stabilize character eval and Qwen model routing 2026-04-09 01:04:09 +01:00
Shakker
d66e2d5b33 test: cover curated qa missing-key reply classification 2026-04-08 21:55:39 +01:00
Shakker
c63d25bd9b fix: classify curated qa missing-key replies 2026-04-08 21:55:39 +01:00
Shakker
9cfa152962 test: cover mixed-traffic qa wait cursors 2026-04-08 21:55:39 +01:00
Shakker
204d766b27 fix: align qa wait cursor semantics 2026-04-08 21:55:39 +01:00
Shakker
a6d76df4f0 test: cover qa scenario wait failure replies 2026-04-08 21:55:39 +01:00
Shakker
b3f3cfd598 fix: fail fast across qa scenario wait paths 2026-04-08 21:55:39 +01:00
Shakker
491e216c45 fix: fail fast on qa live auth errors 2026-04-08 21:55:39 +01:00
Peter Steinberger
21ef1bf8de feat: parallelize character eval runs 2026-04-08 20:05:55 +01:00
Peter Steinberger
f1e75d3259 fix: load QA live provider overrides 2026-04-08 20:05:55 +01:00
Peter Steinberger
a3d21539ef test: stabilize full-suite execution 2026-04-08 19:40:57 +01:00
Peter Steinberger
4a51a1031d feat: add character eval model options 2026-04-08 17:05:30 +01:00
Peter Steinberger
4bbf78e566 test: make character eval scenario natural 2026-04-08 17:05:30 +01:00
Peter Steinberger
3101d81053 feat: add QA character eval reports 2026-04-08 15:52:49 +01:00
Peter Steinberger
aa3b1357cb fix: support Codex CLI QA auth 2026-04-08 15:52:01 +01:00
Peter Steinberger
97dfbe0fe1 feat: add qa character vibes eval 2026-04-08 12:05:24 +01:00
Peter Steinberger
9bf3482470 refactor: finish markdown-only qa runner 2026-04-08 11:56:02 +01:00
Peter Steinberger
95e397a266 refactor: dedupe repeated test helpers 2026-04-08 09:58:22 +01:00
Peter Steinberger
492e98a88a refactor: move qa suite logic into scenario markdown 2026-04-08 09:13:57 +01:00
Vincent Koc
be530f085d refactor(plugin-sdk): share tool payload extraction 2026-04-08 09:07:28 +01:00
Vincent Koc
4260ac4cf6 perf(plugins): narrow boundary compile sdk imports 2026-04-08 08:52:51 +01:00
Peter Steinberger
21d9bac5ec fix: stabilize live qa scenario suite 2026-04-08 08:17:59 +01:00
Peter Steinberger
b73d8ef7d7 refactor: split qa scenarios into per-file markdown defs 2026-04-08 05:37:17 +01:00
Peter Steinberger
c0aed59fca refactor: move qa suite definitions into markdown 2026-04-07 23:39:50 +01:00
Peter Steinberger
e0ad3e79e6 refactor: dedupe normalization lowercase helpers 2026-04-07 22:57:52 +01:00
Peter Steinberger
83bdba2bae fix: resolve rebase regressions for ci landing 2026-04-07 21:02:04 +01:00
Peter Steinberger
5b090561fb refactor: dedupe browser whatsapp qa lowercase helpers 2026-04-07 20:58:01 +01:00
Peter Steinberger
a00b01f5ed fix: harden complex qa suite scenarios 2026-04-07 20:35:39 +01:00
Peter Steinberger
b5d2bd6f41 fix(qa): tighten frontier scope evals 2026-04-07 20:32:42 +01:00
Peter Steinberger
4e69a9b329 fix(qa): restore safe no-fork gateway runtime 2026-04-07 20:32:42 +01:00
Vincent Koc
cde12e63e7 perf(qa): lazy-load runner catalog for lab ui 2026-04-07 20:32:42 +01:00
Vincent Koc
f312d6c106 fix(qa): preserve gateway cli auth in no-fork rpc path 2026-04-07 20:32:42 +01:00
Vincent Koc
e7538b4499 perf(qa): drop per-rpc gateway cli forks 2026-04-07 20:32:42 +01:00
Vincent Koc
02bd9e8c10 perf(qa): trim frontier direct-agent waits 2026-04-07 20:32:42 +01:00
Vincent Koc
35eb70f1f5 test(qa): retry flaky local fetches in lab server tests 2026-04-07 20:32:42 +01:00
Vincent Koc
986536ff6b fix(qa): keep direct self-check outputs under repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
f6544a0a3b fix(qa): anchor runner artifacts to repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
daeff2fa89 fix(qa): default docker artifacts from repo root 2026-04-07 20:32:42 +01:00
Vincent Koc
76bde3d42b fix(qa): support neutral-cwd docker commands 2026-04-07 20:32:42 +01:00
Vincent Koc
816a3eae8a chore(qa): align qa cli provider input types 2026-04-07 20:32:42 +01:00
Vincent Koc
5aa4fd3216 fix(qa): normalize qa cli lane inputs 2026-04-07 20:32:42 +01:00
Vincent Koc
7d18b145f8 fix(qa): keep manual alternate model aligned 2026-04-07 20:32:42 +01:00
Vincent Koc
cdf18c16b4 fix(qa): default manual lanes by provider mode 2026-04-07 20:32:42 +01:00
Vincent Koc
3182588ad4 fix(qa): allow random qa-lab control-ui origins 2026-04-07 20:32:42 +01:00
Vincent Koc
82535771cd fix(qa): pin gateway child control ui root 2026-04-07 20:32:42 +01:00
Vincent Koc
f9f38a48e6 fix(qa): align mock model-switch continuity 2026-04-07 20:32:42 +01:00
Vincent Koc
9a106f7e3c fix(qa): support neutral-cwd suite runs 2026-04-07 20:32:42 +01:00
Vincent Koc
f93b217834 feat(qa): add manual harness lane 2026-04-07 20:32:42 +01:00