Peter Steinberger
|
58531530d9
|
test: tighten qa live scenarios
|
2026-04-11 00:58:40 +01:00 |
|
Peter Steinberger
|
debe372c9a
|
test: add medium game qa scenarios
|
2026-04-10 23:29:58 +01:00 |
|
Peter Steinberger
|
d5df4cd4e5
|
test: add Anthropic Opus QA smokes
|
2026-04-10 17:24:54 +01:00 |
|
Peter Steinberger
|
07e7222e28
|
test: split Claude CLI QA auth modes
|
2026-04-10 14:56:36 +01:00 |
|
Peter Steinberger
|
6286810388
|
test: add Claude CLI provider QA scenario
|
2026-04-10 14:23:19 +01:00 |
|
Peter Steinberger
|
8763614d1e
|
test: cover bundled plugin skill runtime
|
2026-04-10 10:11:35 +01:00 |
|
Peter Steinberger
|
68b4b36a90
|
test: harden qa eval scenarios
|
2026-04-10 10:11:35 +01:00 |
|
Peter Steinberger
|
50f5091979
|
test: strengthen character eval judging
|
2026-04-10 08:04:49 +01:00 |
|
Peter Steinberger
|
39cc6b7dc7
|
fix: stabilize character eval and Qwen model routing
|
2026-04-09 01:04:09 +01:00 |
|
Peter Steinberger
|
21ef1bf8de
|
feat: parallelize character eval runs
|
2026-04-08 20:05:55 +01:00 |
|
Peter Steinberger
|
4bbf78e566
|
test: make character eval scenario natural
|
2026-04-08 17:05:30 +01:00 |
|
Peter Steinberger
|
3101d81053
|
feat: add QA character eval reports
|
2026-04-08 15:52:49 +01:00 |
|
Peter Steinberger
|
97dfbe0fe1
|
feat: add qa character vibes eval
|
2026-04-08 12:05:24 +01:00 |
|
Peter Steinberger
|
9bf3482470
|
refactor: finish markdown-only qa runner
|
2026-04-08 11:56:02 +01:00 |
|
Peter Steinberger
|
492e98a88a
|
refactor: move qa suite logic into scenario markdown
|
2026-04-08 09:13:57 +01:00 |
|
Peter Steinberger
|
21d9bac5ec
|
fix: stabilize live qa scenario suite
|
2026-04-08 08:17:59 +01:00 |
|
Peter Steinberger
|
b73d8ef7d7
|
refactor: split qa scenarios into per-file markdown defs
|
2026-04-08 05:37:17 +01:00 |
|
Peter Steinberger
|
c0aed59fca
|
refactor: move qa suite definitions into markdown
|
2026-04-07 23:39:50 +01:00 |
|
Peter Steinberger
|
a00b01f5ed
|
fix: harden complex qa suite scenarios
|
2026-04-07 20:35:39 +01:00 |
|
Vincent Koc
|
e8b446b985
|
docs(qa): expand frontier bakeoff runbook
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
18fb171179
|
feat(qa): add frontier harness bakeoff loop
|
2026-04-07 20:32:41 +01:00 |
|
Peter Steinberger
|
1cec37184c
|
fix: harden qa memory dreaming sweep
|
2026-04-07 12:57:33 +01:00 |
|
Peter Steinberger
|
4f1cbcdcd9
|
feat(qa): add attachment understanding scenario
|
2026-04-06 04:46:28 +01:00 |
|
Peter Steinberger
|
1373ac6c9e
|
feat(qa): execute ten new repo-backed scenarios
|
2026-04-06 04:28:33 +01:00 |
|
Peter Steinberger
|
979409eab5
|
fix(qa): harden new scenario suite
|
2026-04-06 02:41:03 +01:00 |
|
Peter Steinberger
|
8e1c81e707
|
feat(qa): recreate qa lab docker stack
|
2026-04-05 23:21:56 +01:00 |
|