Files
boocode/docs/adr/0002-arena-dedicated-tables-not-flow-runner.md
indifferentketchup d6d246c15b feat(web,coder): arena pane — compare 2-6 AI competitors on same prompt
Arena is a new pane kind for competitive AI evaluation. A Battle runs
the same prompt against 2-6 Contestants across two concurrent lanes:
local lane (llama-swap models, serial) and cloud lane (parallel).

Added to all three registries: @boocode/contracts WsFrameSchema,
server InferenceFrame, and web WsFrame.

Backend (apps/coder):
- arena-runner: battle scheduler, lane classifier, benchmark, results
  writer, resume, user winner override
- arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL
- arena-decisions: status transitions and resume logic (unit-tested)
- arena-analyzer-helpers: pure helper functions (unit-tested)
- arena-model-call: model call utility for analysis
- arena routes: create/get/list/stop/analyze/cross-examine/winner/diff
- schema: battles, contestants, cross_examinations tables (idempotent)
- remove old /api/arena* routes and tasks.arena_id column

Frontend (apps/web):
- ArenaLauncherDialog: battle type, prompt, contestant selection
- ArenaPane: live roster, streaming output, analysis, cross-exam
- DiffView: unified diff with line-by-line color for coding contests
- Winner override per-row dropdown (Trophy icon)
- battle_updated WS handler for live winner/analysis updates
- arena pane kind in Workspace, ChatTabBar, useSidebar

Cross-app:
- ArenaState and ArenaContestantShape/WsFrame types (contracts)
- battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame
- manifest.json written per battle results folder
- /Arena added to .gitignore

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:25:29 +00:00

1.4 KiB

Arena gets dedicated battles/contestants tables and replaces the old API-only arena

The Arena feature reuses the dispatcher, the onTaskTerminal advance hook, the streaming→WS-frame pipeline, and the pane pattern from the Orchestrator, but persists to its own battles + contestants tables rather than the Orchestrator's flow_runs/flow_steps. A Battle is not shaped like a flow — it has two scheduling lanes, per-contestant benchmarks, on-disk results folders, a two-stage analysis, and cross-examinations — so modelling it as flow steps would fight the schema. Each Contestant links to a real tasks row via task_id, inheriting all worktree/streaming/dispatch machinery. This also replaces the earlier v2.0.5 API-only arena (POST /api/arena, tasks.arena_id, select-winner): that feature had no UI and no users, and the new Arena is a strict superset, so the old routes and the tasks.arena_id column are removed rather than left as a second, competing "arena" concept.

Consequences

  • Analysis and cross-examination run through a small pluggable Analyzer seam (v1 = default-model two-stage judge). A v2 that drives a Han Orchestrator flow as the analyzer slots in behind that seam without a schema change.
  • The arena pane kind, ArenaState, and battle_* WS frames are added alongside (not folded into) the Orchestrator's, mirroring its patterns.