Commit Graph

17 Commits

Author SHA1 Message Date
Peter Steinberger
58531530d9 test: tighten qa live scenarios 2026-04-11 00:58:40 +01:00
Peter Steinberger
debe372c9a test: add medium game qa scenarios 2026-04-10 23:29:58 +01:00
Peter Steinberger
d5df4cd4e5 test: add Anthropic Opus QA smokes 2026-04-10 17:24:54 +01:00
Peter Steinberger
07e7222e28 test: split Claude CLI QA auth modes 2026-04-10 14:56:36 +01:00
Peter Steinberger
6286810388 test: add Claude CLI provider QA scenario 2026-04-10 14:23:19 +01:00
Peter Steinberger
8763614d1e test: cover bundled plugin skill runtime 2026-04-10 10:11:35 +01:00
Peter Steinberger
68b4b36a90 test: harden qa eval scenarios 2026-04-10 10:11:35 +01:00
Peter Steinberger
50f5091979 test: strengthen character eval judging 2026-04-10 08:04:49 +01:00
Peter Steinberger
39cc6b7dc7 fix: stabilize character eval and Qwen model routing 2026-04-09 01:04:09 +01:00
Peter Steinberger
21ef1bf8de feat: parallelize character eval runs 2026-04-08 20:05:55 +01:00
Peter Steinberger
4bbf78e566 test: make character eval scenario natural 2026-04-08 17:05:30 +01:00
Peter Steinberger
3101d81053 feat: add QA character eval reports 2026-04-08 15:52:49 +01:00
Peter Steinberger
97dfbe0fe1 feat: add qa character vibes eval 2026-04-08 12:05:24 +01:00
Peter Steinberger
9bf3482470 refactor: finish markdown-only qa runner 2026-04-08 11:56:02 +01:00
Peter Steinberger
492e98a88a refactor: move qa suite logic into scenario markdown 2026-04-08 09:13:57 +01:00
Peter Steinberger
21d9bac5ec fix: stabilize live qa scenario suite 2026-04-08 08:17:59 +01:00
Peter Steinberger
b73d8ef7d7 refactor: split qa scenarios into per-file markdown defs 2026-04-08 05:37:17 +01:00