Arena is a new pane kind for competitive AI evaluation. A Battle runs
the same prompt against 2-6 Contestants across two concurrent lanes:
local lane (llama-swap models, serial) and cloud lane (parallel).
Added to all three registries: @boocode/contracts WsFrameSchema,
server InferenceFrame, and web WsFrame.
Backend (apps/coder):
- arena-runner: battle scheduler, lane classifier, benchmark, results
writer, resume, user winner override
- arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL
- arena-decisions: status transitions and resume logic (unit-tested)
- arena-analyzer-helpers: pure helper functions (unit-tested)
- arena-model-call: model call utility for analysis
- arena routes: create/get/list/stop/analyze/cross-examine/winner/diff
- schema: battles, contestants, cross_examinations tables (idempotent)
- remove old /api/arena* routes and tasks.arena_id column
Frontend (apps/web):
- ArenaLauncherDialog: battle type, prompt, contestant selection
- ArenaPane: live roster, streaming output, analysis, cross-exam
- DiffView: unified diff with line-by-line color for coding contests
- Winner override per-row dropdown (Trophy icon)
- battle_updated WS handler for live winner/analysis updates
- arena pane kind in Workspace, ChatTabBar, useSidebar
Cross-app:
- ArenaState and ArenaContestantShape/WsFrame types (contracts)
- battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame
- manifest.json written per battle results folder
- /Arena added to .gitignore
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Brings the deterministic Han-flow conductor into BooCode: launch any read-only
flow from BooChat or BooCoder, watch each agent stream live in a Paseo-style
run pane, get an evidence-disciplined report — on local Qwen, persisted and
resumable. Read-only enforced hard via qwen --approval-mode plan (orchestrator
tasks fail closed if qwen is unavailable; never fall to write-capable native).
Backend (apps/coder): re-homed conductor defs, flow_runs/flow_steps schema,
flow-runner + dispatcher onTaskTerminal hook, restart-resume, runs routes
(launch/list/get/cancel), user-channel WS. Contracts: two flow_run_* frames.
Web: orchestrator pane kind + OrchestratorPane, Workflow button + slash flows
(BooChat/BooCoder parity), FlowLauncherDialog, "New Orchestrator" in the + and
split menus, runs history + export. Plan: openspec/changes/orchestrator.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five independent items from the post-review backlog. F1: Stop on an external
agent task now aborts the running child via a per-task AbortController registry
reachable from the cancel route, and finalizes the assistant message as
cancelled (fixing two latent bugs — catch blocks left the message streaming,
and warm success-paths wrote complete on an aborted turn); warm pools/worktrees
are preserved and the native path is unchanged. F2/F3: prune the tool-call
parser to its two load-bearing exports (unexport eight zero-caller symbols, add
a gate test for the <invoke>-as-text fallback) and route placeholder-rejection
logging through pino. F6: a 90s per-chunk stall-timeout wraps native inference's
fullStream via AbortSignal.any so a hung stream finalizes the message instead of
hanging — no retry (a pure classifyStreamError helper is added). F7: a read-only
view_session_history MCP tool (newest-N, chronological). F9: retire the unused
apps/coder/web :9502 fallback SPA, keeping every API/WS/health/MCP route.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move all hand-synced cross-app wire contracts into one built workspace
package, @boocode/contracts, consumed by server/web/coder/coder-web via
workspace:* + a per-subpath exports map. The ws-frames and provider-config
Zod schemas are schema-first (z.infer); MessageMetadata, ErrorReason,
AgentSessionConfig, the provider snapshot types, and WorktreeRiskReport are
each single-sourced. Deletes the byte-identical copies and their parity
tests, fixes a live AgentSessionConfig drift (coder dead copy removed,
unified to the web required/nullable shape), removes the dead pending_change
WS arms in the fallback SPA, and inverts the build order (contracts builds
first) across root build, Dockerfile, and the coder deploy docs. Reverses
the shared-package decision declined in v2.5.12.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>