feat(web,coder): arena pane — compare 2-6 AI competitors on same prompt

Arena is a new pane kind for competitive AI evaluation. A Battle runs
the same prompt against 2-6 Contestants across two concurrent lanes:
local lane (llama-swap models, serial) and cloud lane (parallel).

Added to all three registries: @boocode/contracts WsFrameSchema,
server InferenceFrame, and web WsFrame.

Backend (apps/coder):
- arena-runner: battle scheduler, lane classifier, benchmark, results
  writer, resume, user winner override
- arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL
- arena-decisions: status transitions and resume logic (unit-tested)
- arena-analyzer-helpers: pure helper functions (unit-tested)
- arena-model-call: model call utility for analysis
- arena routes: create/get/list/stop/analyze/cross-examine/winner/diff
- schema: battles, contestants, cross_examinations tables (idempotent)
- remove old /api/arena* routes and tasks.arena_id column

Frontend (apps/web):
- ArenaLauncherDialog: battle type, prompt, contestant selection
- ArenaPane: live roster, streaming output, analysis, cross-exam
- DiffView: unified diff with line-by-line color for coding contests
- Winner override per-row dropdown (Trophy icon)
- battle_updated WS handler for live winner/analysis updates
- arena pane kind in Workspace, ChatTabBar, useSidebar

Cross-app:
- ArenaState and ArenaContestantShape/WsFrame types (contracts)
- battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame
- manifest.json written per battle results folder
- /Arena added to .gitignore
This commit is contained in:
2026-06-06 23:25:29 +00:00
parent 84a024a5a4
commit 3474be4865
34 changed files with 4581 additions and 146 deletions

View File

@@ -78,7 +78,7 @@ BooCoder at port 9502: `curl http://100.114.205.53:9502/api/health`. Runs as `bo
- `FAST_MODEL` (optional) — cheaper model for titles, summaries, labeling (auto_name.ts, tool-summaries.ts). Falls back to session model or DEFAULT_MODEL. Set to a small llama-swap model (e.g. `nemotron-nano-4b`) to avoid loading the 35B for 20-token calls.
- Qwen Code dispatch: `OPENAI_BASE_URL=http://100.101.41.16:8401/v1 OPENAI_API_KEY=dummy qwen -p "<task>" --output-format stream-json`. Install: `npm install -g @qwen-code/qwen-code@latest`. Node ≥22 on host (container stays Node 20; BooCoder dispatches via direct spawn on host). No `--yolo` flag — `-p` runs autonomously without prompts. ACP bridge is an HTTP daemon (not stdio); use PTY dispatch.
- Arena: `POST /api/arena {project_id, input, contestants: [{agent?, model?}]}` dispatches the same task to N models/agents in parallel; each contestant gets its own task + worktree. `GET /api/arena/:id` for results; `POST /api/arena/:id/select/:task_id` picks a winner.
- Arena: `POST /api/battles {project_id, battle_type, prompt, contestants}` starts a battle; `GET /api/battles/:id` returns battle + contestants + cross-examinations; `POST /api/battles/:id/stop` cancels; `POST /api/battles/:id/analyze` triggers/re-triggers two-stage digest→judge analysis; `GET /api/battles/:id/analysis` reads `analysis.md`; `POST /api/battles/:id/cross-examine {identity, model}` runs a cross-examination. All `/api/battles*` routes are served by `apps/coder` at port 9502 (proxied through `apps/server` as `/api/coder/battles*`).
## Workflow