Commit Graph

18 Commits

Author SHA1 Message Date
Peter Steinberger
39cc6b7dc7 fix: stabilize character eval and Qwen model routing 2026-04-09 01:04:09 +01:00
Peter Steinberger
21ef1bf8de feat: parallelize character eval runs 2026-04-08 20:05:55 +01:00
Peter Steinberger
4bbf78e566 test: make character eval scenario natural 2026-04-08 17:05:30 +01:00
Peter Steinberger
3101d81053 feat: add QA character eval reports 2026-04-08 15:52:49 +01:00
Peter Steinberger
97dfbe0fe1 feat: add qa character vibes eval 2026-04-08 12:05:24 +01:00
Peter Steinberger
9bf3482470 refactor: finish markdown-only qa runner 2026-04-08 11:56:02 +01:00
Peter Steinberger
492e98a88a refactor: move qa suite logic into scenario markdown 2026-04-08 09:13:57 +01:00
Peter Steinberger
21d9bac5ec fix: stabilize live qa scenario suite 2026-04-08 08:17:59 +01:00
Peter Steinberger
b73d8ef7d7 refactor: split qa scenarios into per-file markdown defs 2026-04-08 05:37:17 +01:00
Peter Steinberger
c0aed59fca refactor: move qa suite definitions into markdown 2026-04-07 23:39:50 +01:00
Peter Steinberger
a00b01f5ed fix: harden complex qa suite scenarios 2026-04-07 20:35:39 +01:00
Vincent Koc
e8b446b985 docs(qa): expand frontier bakeoff runbook 2026-04-07 20:32:42 +01:00
Vincent Koc
18fb171179 feat(qa): add frontier harness bakeoff loop 2026-04-07 20:32:41 +01:00
Peter Steinberger
1cec37184c fix: harden qa memory dreaming sweep 2026-04-07 12:57:33 +01:00
Peter Steinberger
4f1cbcdcd9 feat(qa): add attachment understanding scenario 2026-04-06 04:46:28 +01:00
Peter Steinberger
1373ac6c9e feat(qa): execute ten new repo-backed scenarios 2026-04-06 04:28:33 +01:00
Peter Steinberger
979409eab5 fix(qa): harden new scenario suite 2026-04-06 02:41:03 +01:00
Peter Steinberger
8e1c81e707 feat(qa): recreate qa lab docker stack 2026-04-05 23:21:56 +01:00