Arena is a new pane kind for competitive AI evaluation. A Battle runs the same prompt against 2-6 Contestants across two concurrent lanes: local lane (llama-swap models, serial) and cloud lane (parallel). Added to all three registries: @boocode/contracts WsFrameSchema, server InferenceFrame, and web WsFrame. Backend (apps/coder): - arena-runner: battle scheduler, lane classifier, benchmark, results writer, resume, user winner override - arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL - arena-decisions: status transitions and resume logic (unit-tested) - arena-analyzer-helpers: pure helper functions (unit-tested) - arena-model-call: model call utility for analysis - arena routes: create/get/list/stop/analyze/cross-examine/winner/diff - schema: battles, contestants, cross_examinations tables (idempotent) - remove old /api/arena* routes and tasks.arena_id column Frontend (apps/web): - ArenaLauncherDialog: battle type, prompt, contestant selection - ArenaPane: live roster, streaming output, analysis, cross-exam - DiffView: unified diff with line-by-line color for coding contests - Winner override per-row dropdown (Trophy icon) - battle_updated WS handler for live winner/analysis updates - arena pane kind in Workspace, ChatTabBar, useSidebar Cross-app: - ArenaState and ArenaContestantShape/WsFrame types (contracts) - battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame - manifest.json written per battle results folder - /Arena added to .gitignore Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3.3 KiB
Context: BooCode
Glossary of the domain language. Terms only — no implementation detail.
Workspace
-
Pane — one tile in the multi-pane workspace. Each pane has a kind: Chat (BooChat), Coder (BooCoder), Terminal (BooTerm), Orchestrator, Arena, plus artifact/settings kinds.
-
Backend — an AI engine a task is dispatched to: native (BooChat inference on a local llama-swap model) or an external CLI agent (Claude Code, OpenCode, Qwen, Goose). Code sometimes calls this the "agent" (
tasks.agent). -
BooChat Agent (a.k.a. persona) — a preset from the
data/AGENTS.mdregistry (e.g. "Code Reviewer", "Debugger"): a system prompt + tool whitelist + sampling knobs that runs on the native backend with a chosen model. Distinct from a Backend — this is the overloaded sense of "agent" the UI's Agent picker selects.
Arena
A way to run the same prompt against several AI competitors at once and pick the best result.
-
Battle — one Arena run. Dated. Produces a results folder at
/<project-root>/Arena/<dated-battle>/. (The earlier API-only feature called this an "arena"; a Battle is one such run.) -
Battle Type — what is being compared:
- Coding — Contestants change code; a result is the diff they produced (plus their explanation). Each Contestant works in its own worktree.
- Q&A — Contestants answer a prompt; a result is the text answer. No code changes.
-
Contestant — one competitor in a Battle, given the Battle's prompt. What defines a Contestant depends on Battle Type:
- Coding — a Backend + Model (e.g. Claude Code + opus, native BooCode + 35b). Each works in its own isolated git worktree (a branched on-disk copy of the project). Contestants do not see each other's work.
- Q&A — a BooChat Agent (persona) + Model (e.g. Debugger + 35b), running on the native backend only. No worktree (no code changes). The same model can appear under two Contestants, so a Contestant's identity is the (backend-or-persona, model) pair, not the model alone.
-
Benchmark — per-Contestant performance captured during a Battle. Wall-clock duration is recorded for every Contestant; throughput (tokens/sec) is recorded only for local (llama-swap) models, which are the ones the speed comparison is meaningful for.
-
Arena results folder (
/<project-root>/Arena/<dated-battle>/) — where a Battle's results are written (not the working copies — those stay in each Contestant's worktree). Holds the per-Contestant result and the final analysis. -
Lane — how a Battle's Contestants are scheduled. The local lane holds every llama-swap-backed Contestant and runs them strictly one at a time (the local server can only load one model at a time, which also keeps their speed Benchmark fair). The cloud lane holds cloud-backed Contestants (Claude Code, OpenCode-on-cloud) and runs them all in parallel. The two lanes run concurrently with each other.
-
Analysis — an end-of-Battle judgement of the Contestants' results, produced by the default BooChat model, naming a Winner.
-
Cross-examination — an after-the-Battle step where a chosen model (from any agent) is pointed at the Battle's results to interrogate / compare them.