Arena is a new pane kind for competitive AI evaluation. A Battle runs
the same prompt against 2-6 Contestants across two concurrent lanes:
local lane (llama-swap models, serial) and cloud lane (parallel).
Added to all three registries: @boocode/contracts WsFrameSchema,
server InferenceFrame, and web WsFrame.
Backend (apps/coder):
- arena-runner: battle scheduler, lane classifier, benchmark, results
writer, resume, user winner override
- arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL
- arena-decisions: status transitions and resume logic (unit-tested)
- arena-analyzer-helpers: pure helper functions (unit-tested)
- arena-model-call: model call utility for analysis
- arena routes: create/get/list/stop/analyze/cross-examine/winner/diff
- schema: battles, contestants, cross_examinations tables (idempotent)
- remove old /api/arena* routes and tasks.arena_id column
Frontend (apps/web):
- ArenaLauncherDialog: battle type, prompt, contestant selection
- ArenaPane: live roster, streaming output, analysis, cross-exam
- DiffView: unified diff with line-by-line color for coding contests
- Winner override per-row dropdown (Trophy icon)
- battle_updated WS handler for live winner/analysis updates
- arena pane kind in Workspace, ChatTabBar, useSidebar
Cross-app:
- ArenaState and ArenaContestantShape/WsFrame types (contracts)
- battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame
- manifest.json written per battle results folder
- /Arena added to .gitignore
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Brings the deterministic Han-flow conductor into BooCode: launch any read-only
flow from BooChat or BooCoder, watch each agent stream live in a Paseo-style
run pane, get an evidence-disciplined report — on local Qwen, persisted and
resumable. Read-only enforced hard via qwen --approval-mode plan (orchestrator
tasks fail closed if qwen is unavailable; never fall to write-capable native).
Backend (apps/coder): re-homed conductor defs, flow_runs/flow_steps schema,
flow-runner + dispatcher onTaskTerminal hook, restart-resume, runs routes
(launch/list/get/cancel), user-channel WS. Contracts: two flow_run_* frames.
Web: orchestrator pane kind + OrchestratorPane, Workflow button + slash flows
(BooChat/BooCoder parity), FlowLauncherDialog, "New Orchestrator" in the + and
split menus, runs history + export. Plan: openspec/changes/orchestrator.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second checkpoint of in-flight work (sessions route, api types, ChatTabBar,
PaneHeaderActions, Workspace, useWorkspacePanes) so the Orchestrator branch
can rebase onto current main before merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In-flight workspace UX work.
- Extract a shared PaneHeaderActions cluster (+/Split/Reopen/History/Close)
used by ChatTabBar + the Workspace coder/terminal pane headers, replacing the
divergent per-header copies; SessionLandingPage history + useWorkspacePanes
tweaks.
- Fix coder-side correctness bug: resolveChatId read sessions.workspace_panes as
a bare WorkspacePane[] but v2.6.5 widened it to a WorkspaceState envelope, so
it mis-read panes and clobbered tabNumbers/nextTabNumber/closedPaneStack on
every pane-chat write. New normalizeWorkspaceState handles either shape and
preserves the envelope (+ regression test).
- CLAUDE.md doc-sync (coder vitest suite, deploy-by-surface, dual-remote push,
in-flight-web-WIP staging, release-branch naming).
Web tsc + coder build + coder tests green. Builds on v2.7.6.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>