feat(web,coder): arena pane — compare 2-6 AI competitors on same prompt

Arena is a new pane kind for competitive AI evaluation. A Battle runs the same prompt against 2-6 Contestants across two concurrent lanes: local lane (llama-swap models, serial) and cloud lane (parallel). Added to all three registries: @boocode/contracts WsFrameSchema, server InferenceFrame, and web WsFrame. Backend (apps/coder): - arena-runner: battle scheduler, lane classifier, benchmark, results writer, resume, user winner override - arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL - arena-decisions: status transitions and resume logic (unit-tested) - arena-analyzer-helpers: pure helper functions (unit-tested) - arena-model-call: model call utility for analysis - arena routes: create/get/list/stop/analyze/cross-examine/winner/diff - schema: battles, contestants, cross_examinations tables (idempotent) - remove old /api/arena* routes and tasks.arena_id column Frontend (apps/web): - ArenaLauncherDialog: battle type, prompt, contestant selection - ArenaPane: live roster, streaming output, analysis, cross-exam - DiffView: unified diff with line-by-line color for coding contests - Winner override per-row dropdown (Trophy icon) - battle_updated WS handler for live winner/analysis updates - arena pane kind in Workspace, ChatTabBar, useSidebar Cross-app: - ArenaState and ArenaContestantShape/WsFrame types (contracts) - battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame - manifest.json written per battle results folder - /Arena added to .gitignore Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
feat(coder): unified Plan/Ask/Bypass permission picker
2026-06-06 23:25:29 +00:00 · 2026-06-05 15:14:21 +00:00 · 2026-06-03 17:00:49 +00:00 · 2026-06-03 17:00:49 +00:00 · 2026-06-03 16:48:50 +00:00 · 2026-06-03 16:43:53 +00:00
58 changed files with 4828 additions and 203 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,8 @@
 node_modules
 dist
 .env
 .env.*
 !.env.example
 # Claude / Cursor (local agent & IDE config — CLAUDE.md and AGENTS.md stay tracked)
 .claude/
@@ -18,3 +20,4 @@ data/*
 !data/mcp.example.json
 !data/coder-providers.example.json
 codecontext/fork.tar.gz
 /Arena
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,10 @@
 All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch.
 ## v2.7.18-permission-modes — 2026-06-05
 Adds a unified **permission picker** to the BooCoder composer — Plan / Ask Permission / Bypass — replacing the old raw per-agent mode dropdown that exposed each agent's full native vocabulary with inconsistent labels. The three options map generically onto every provider's existing mode metadata: the `plan`-id mode → Plan, the default mode → Ask, the `isUnattended` mode → Bypass (claude `bypassPermissions`, qwen `yolo`, opencode `full-access`); goose has no modes so it shows no picker, exactly as before. `modeId` stays the single wire field — the active unified mode is derived from it, so no contracts change was needed. Native BooCode gains its own mode set (registered in the manifest and exposed by the snapshot): **Ask** stages edits to the pending-changes queue as today, **Bypass** auto-applies the queue to disk after the turn (both the interactive messages path and the task-based dispatcher path), and **Plan** falls back to Ask — the shared `apps/server` inference engine is deliberately left untouched. A supporting fix preserves the `isUnattended` flag on live-probed ACP modes (`acp-derive.ts`) so opencode's bypass mode is still detectable from the wire. Coder 373 tests green, coder + web typecheck clean. Built on `v2.7.17-orchestrator`.
 ## v2.7.17-orchestrator — 2026-06-03
 Brings the deterministic multi-agent "conductor" into the app as the **Orchestrator**: launch any read-only Han flow (research, code-review, investigate, architectural-analysis, security-review, …) from BooChat or BooCoder and watch each specialist agent stream live in a Paseo-style run pane, ending with an evidence-disciplined, adversarially-validated report — all on free local Qwen, persisted and resumable. Built and audited end-to-end via `paseo-epic` in an isolated worktree, on top of the prior `/opt/boocode/conductor` standalone CLI: the conductor's 22 flow definitions, Spine factory, and Han evidence/YAGNI contracts were re-homed into `apps/coder/src/conductor`, and a new DB-backed flow-runner (`flow_runs`/`flow_steps`) dispatches each step as a real BooCoder task through the existing dispatcher — reusing its streaming→WS-frame pipeline and worktree-as-read-snapshot, with an `onTaskTerminal` hook that advances the wave and a startup resume that re-dispatches in-flight steps after a coder restart. Read-only is enforced hard: every step is dispatched `qwen --approval-mode plan`, an adversarial-security review caught and closed a bypass where a qwen-unavailable task silently fell through to write-capable native inference (now fails closed), and the ACP path's mode-set was made fail-closed too. The UI adds a fourth `orchestrator` pane kind (collapsed agent roster, expand-one live stream, report on top), a Workflow button + slash flows on the shared `ChatInput` for full BooChat/BooCoder parity, a "New Orchestrator" entry in the + and split menus, a category-grouped launcher dialog, runs history, and export (copy / save-to-file / send-to-chat) — fed by two new `flow_run_*` WS frames on a coder user channel. Qwen-only by design (Claude Code remains the Claude path); the existing model-competition Arena stays a separate feature. The flow launcher and the `/` slash menu both carry chevron-expandable per-item explanations (an always-on one-liner expands to a 1–2 sentence what-it-does / when-to-use blurb, condensed from each Han skill's own description), with a "read-only" pill pinned in the launcher and the fast/concise toggle wired through to the workers. Spec/plan in `openspec/changes/orchestrator`; coder 373 tests green (42 new scheduler/resume/read-only decision tests), contracts/coder/server builds + web tsc clean. Built on `v2.7.16-container-git-safedir`; pairs conceptually with the earlier `v2.7.12-audit-cleanup` multi-agent orchestration.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -74,11 +74,11 @@ Schema CHECK migration order when renaming allowed values: (1) `ALTER TABLE ...
 Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only add-existing scope), `BOOTSTRAP_ROOT` (/opt/projects, writable bootstrap mkdir target — host must `mkdir -p` it before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale; the public host is behind Authelia, unusable from server context), `BOOCODE_TOOLS` (`core`|`standard`|`all`, default `all`; a ceiling, never expands an agent's whitelist), `MCP_CONFIG_PATH` (default `/data/mcp.json`, opencode `mcpServers` shape; missing = no MCP), `CONTEXT7_API_KEY` (the Context7 MCP key, referenced from `data/mcp.json` as `"{env:CONTEXT7_API_KEY}"`). `data/mcp.json` is **gitignored** but no longer holds secrets — string values support opencode-style `{env:VAR}` substitution (`mcp-config.ts:substituteEnvVars`, applied before Zod validation; unset var → `''` + warn), so real keys live in `.env`; template `data/mcp.example.json`. A config-only edit there needs only `docker compose restart boocode` (data/ is bind-mounted); changing a referenced secret edits `.env`. MCP loads at server startup with per-server graceful degradation; the coder does NOT load MCP (BooChat only).
-BooCoder at port 9502: `curl http://100.114.205.53:9502/api/health`. Runs as `boocoder.service` on the host (not Docker). Deploy: `pnpm -C packages/contracts build && pnpm -C apps/server build && pnpm -C apps/coder build && sudo systemctl restart boocoder`. Health reports tool count: `{"ok":true,"db":true,"tools":33}`.
+BooCoder at port 9502: `curl http://100.114.205.53:9502/api/health`. Runs as `boocoder.service` on the host (not Docker). Its env file `apps/coder/.env.host` is gitignored (`.env.*`, with `!.env.example`) — a fresh host recreates it from `.env.example` (incl. `CLAUDE_SDK_BACKEND=1` for the Claude Agent-SDK backend). Deploy: `pnpm -C packages/contracts build && pnpm -C apps/server build && pnpm -C apps/coder build && sudo systemctl restart boocoder`. Health reports tool count: `{"ok":true,"db":true,"tools":33}`.
 - `FAST_MODEL` (optional) — cheaper model for titles, summaries, labeling (auto_name.ts, tool-summaries.ts). Falls back to session model or DEFAULT_MODEL. Set to a small llama-swap model (e.g. `nemotron-nano-4b`) to avoid loading the 35B for 20-token calls.
 - Qwen Code dispatch: `OPENAI_BASE_URL=http://100.101.41.16:8401/v1 OPENAI_API_KEY=dummy qwen -p "<task>" --output-format stream-json`. Install: `npm install -g @qwen-code/qwen-code@latest`. Node ≥22 on host (container stays Node 20; BooCoder dispatches via direct spawn on host). No `--yolo` flag — `-p` runs autonomously without prompts. ACP bridge is an HTTP daemon (not stdio); use PTY dispatch.
- Arena: `POST /api/arena {project_id, input, contestants: [{agent?, model?}]}` dispatches the same task to N models/agents in parallel; each contestant gets its own task + worktree. `GET /api/arena/:id` for results; `POST /api/arena/:id/select/:task_id` picks a winner.
+- Arena: `POST /api/battles {project_id, battle_type, prompt, contestants}` starts a battle; `GET /api/battles/:id` returns battle + contestants + cross-examinations; `POST /api/battles/:id/stop` cancels; `POST /api/battles/:id/analyze` triggers/re-triggers two-stage digest→judge analysis; `GET /api/battles/:id/analysis` reads `analysis.md`; `POST /api/battles/:id/cross-examine {identity, model}` runs a cross-examination. All `/api/battles*` routes are served by `apps/coder` at port 9502 (proxied through `apps/server` as `/api/coder/battles*`).
 ## Workflow
--- a/CONTEXT.md
+++ b/CONTEXT.md
@@ -0,0 +1,67 @@
 # Context: BooCode
 Glossary of the domain language. Terms only — no implementation detail.
 ## Workspace
 - **Pane** — one tile in the multi-pane workspace. Each pane has a *kind*:
  Chat (BooChat), Coder (BooCoder), Terminal (BooTerm), Orchestrator, Arena,
  plus artifact/settings kinds.
 - **Backend** — an AI engine a task is dispatched to: *native* (BooChat
  inference on a local llama-swap model) or an *external* CLI agent (Claude Code,
  OpenCode, Qwen, Goose). Code sometimes calls this the "agent" (`tasks.agent`).
 - **BooChat Agent** (a.k.a. *persona*) — a preset from the `data/AGENTS.md`
  registry (e.g. "Code Reviewer", "Debugger"): a system prompt + tool whitelist +
  sampling knobs that runs **on the native backend** with a chosen model.
  Distinct from a Backend — this is the overloaded sense of "agent" the UI's
  Agent picker selects.
 ## Arena
 A way to run the **same prompt** against several AI competitors at once and pick
 the best result.
 - **Battle** — one Arena run. Dated. Produces a results folder at
  `/<project-root>/Arena/<dated-battle>/`. (The earlier API-only feature called
  this an "arena"; a Battle is one such run.)
 - **Battle Type** — what is being compared:
  - *Coding* — Contestants change code; a result is the **diff** they produced
    (plus their explanation). Each Contestant works in its own worktree.
  - *Q&A* — Contestants answer a prompt; a result is the **text answer**. No
    code changes.
 - **Contestant** — one competitor in a Battle, given the Battle's prompt. What
  defines a Contestant depends on Battle Type:
  - *Coding* — a **Backend + Model** (e.g. Claude Code + opus, native BooCode +
    35b). Each works in its own isolated git **worktree** (a branched on-disk
    copy of the project). Contestants do not see each other's work.
  - *Q&A* — a **BooChat Agent (persona) + Model** (e.g. Debugger + 35b), running
    on the native backend only. No worktree (no code changes).
  The same model can appear under two Contestants, so a Contestant's identity is
  the (backend-or-persona, model) pair, not the model alone.
 - **Benchmark** — per-Contestant performance captured during a Battle. Wall-clock
  **duration** is recorded for every Contestant; **throughput** (tokens/sec) is
  recorded only for local (llama-swap) models, which are the ones the speed
  comparison is meaningful for.
 - **Arena results folder** (`/<project-root>/Arena/<dated-battle>/`) — where a
  Battle's *results* are written (not the working copies — those stay in each
  Contestant's worktree). Holds the per-Contestant result and the final
  analysis.
 - **Lane** — how a Battle's Contestants are scheduled. The *local lane* holds
  every llama-swap-backed Contestant and runs them strictly one at a time (the
  local server can only load one model at a time, which also keeps their speed
  Benchmark fair). The *cloud lane* holds cloud-backed Contestants (Claude Code,
  OpenCode-on-cloud) and runs them all in parallel. The two lanes run
  concurrently with each other.
 - **Analysis** — an end-of-Battle judgement of the Contestants' results,
  produced by the default BooChat model, naming a **Winner**.
 - **Cross-examination** — an after-the-Battle step where a chosen model (from any
  agent) is pointed at the Battle's results to interrogate / compare them.
--- a/CURRENT.md
+++ b/CURRENT.md
@@ -1,9 +1,9 @@
 # Current focus
-Last updated: 2026-06-02
+Last updated: 2026-06-05
- **Last shipped:** `v2.7.8-ember-coder-tabs-model-chips` (2026-06-01)
+- **Last shipped:** `v2.7.18-permission-modes` (2026-06-05) — unified Plan/Ask/Bypass permission picker in the BooCoder composer (incl. native-BooCode auto-apply on Bypass).
- **Branch:** `codebase-audit-cleanup` (audit + cleanup epic, off main HEAD)
+- **Branch:** `main`
- **In progress:** Phase 3 — stale comments + docs refresh
+- **In progress:** nothing committed — dogfooding the Orchestrator to surface the next real backlog. Claude Agent-SDK backend enabled (`CLAUDE_SDK_BACKEND`). Optional/exploratory: verify-gate ensembler over pending changes.
 See `CHANGELOG.md` for the full shipped history. That file is always authoritative; this file is a quick orientation pointer only.
--- a/README.md
+++ b/README.md
@@ -1,10 +1,10 @@
 # boocode
-Self-hosted single-user developer chat app. 3-app monorepo: BooChat (read-only chat), BooCoder (write tools + agent dispatch), BooTerm (PTY terminals).
+Self-hosted single-user developer chat app. 3-app monorepo: BooChat (read-only chat), BooCoder (write tools + agent dispatch), BooTerm (PTY terminals) — plus the in-app **Orchestrator**, a deterministic multi-agent conductor that runs read-only Han analysis/review flows on local Qwen.
-**Latest release:** `v2.2.1-pane-scoped-chats` (2026-05-26) · [`CHANGELOG.md`](CHANGELOG.md) · **Current focus:** [`CURRENT.md`](CURRENT.md)
+**Latest release:** `v2.7.17-orchestrator` (2026-06-03) · [`CHANGELOG.md`](CHANGELOG.md) · **Current focus:** [`CURRENT.md`](CURRENT.md)
-**Agent navigation:** [`AGENTS.md`](AGENTS.md) · **Architecture:** [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) · **Engineering reference:** [`CLAUDE.md`](CLAUDE.md)
+**Architecture:** [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) · **Engineering reference:** [`CLAUDE.md`](CLAUDE.md) · **Roadmap:** [`boocode_roadmap.md`](boocode_roadmap.md)
 ## Stack
@@ -75,15 +75,16 @@ curl http://100.114.205.53:9502/api/health
 ## What's shipped
-See [`boocode_roadmap.md`](boocode_roadmap.md) for full version history. Highlights as of **v2.2.1**:
+See [`boocode_roadmap.md`](boocode_roadmap.md) and [`CHANGELOG.md`](CHANGELOG.md) for full version history. Highlights as of **v2.7.17**:
- **BooChat**: streaming chat, file-read tools, compaction, reasoning support, HTML/Markdown artifact panes, cross-repo read grants, MCP client (multi-server + stdio), tool-cost tracking, skills system, builtin agent registry, multi-pane workspace (chat / terminal / coder)
+- **BooChat**: streaming chat, file-read tools, compaction, reasoning support, HTML/Markdown artifact panes, cross-repo read grants, MCP client (multi-server + stdio), tool-cost tracking, skills system, builtin agent registry, multi-pane workspace (chat / terminal / coder / orchestrator)
 - **BooTerm**: in-browser terminal panes via tmux + xterm.js, per-session tmux sessions, SSH-out support
- **BooCoder (v2.2)**: write tools (`edit_file`, `create_file`, `delete_file`, `apply_pending`, `rewind`), pending-changes queue with diff UI, Paseo-style provider snapshot (7 providers: boocode, cursor, claude, opencode, goose, qwen, copilot), `AgentComposerBar` (provider / mode / model / thinking), ACP dispatch with inline permission prompts + tool/reasoning streaming, PTY fallback, Arena, MCP server (6 tools, stdio), CLI client, human inbox, Boomerang orchestration, path-guard fuzz suite, **pane-scoped chats** (v2.2.1 — each coder/terminal pane owns its chat)
+- **BooCoder**: write tools (`edit_file` with fuzzy matching, `create_file`, `delete_file`, `apply_pending`, `rewind`, git-ref checkpoints), pending-changes queue + a **Files/Git diff panel** (stage / commit / discard), provider snapshot (5 providers: boocode, claude, opencode, goose, qwen — cursor/copilot retired), `AgentComposerBar`, warm ACP + **persistent agent sessions** (opencode HTTP server; claude via the Agent SDK with native session resume) + PTY fallback, config-backed provider lifecycle, Arena (same task → N models), MCP server, CLI client, human inbox, Boomerang orchestration, pane-scoped chats
 - **Orchestrator** (v2.7.17): launch any of 22 read-only Han flows (research, code-review, investigate, architectural-analysis, …) from BooChat or BooCoder via the Workflow button, a slash command, or **+ menu → New Orchestrator**; each step runs as a bounded agent on local Qwen (hard read-only via `qwen --approval-mode plan`), streaming live in a Paseo-style run pane with an evidence-disciplined, adversarially-validated report. Persisted + resumable. `@boocode/contracts` single-sources the cross-app wire contracts (v2.7.13).
 ## Planned
- **v2.3 provider lifecycle** — config-backed provider registry (`/data/coder-providers.json`), enable/disable toggles, two-tier probe (openspec drafted). See [`CURRENT.md`](CURRENT.md).
+Most prior roadmap milestones have shipped (see [`boocode_roadmap.md`](boocode_roadmap.md)). What remains is optional/exploratory — e.g. a verify-gate ensembler over pending changes (majority-vote diff ranking). No committed milestones currently in flight.
 ## License
--- a/apps/coder/.env.host
+++ b/apps/coder/.env.host
@@ -1,17 +0,0 @@
 NODE_ENV=production
 PORT=9502
 HOST=100.114.205.53
 DATABASE_URL=postgres://boocode:devpass@127.0.0.1:5500/boochat
 LLAMA_SWAP_URL=http://100.101.41.16:8401
 PROJECT_ROOT_WHITELIST=/opt
 BOOTSTRAP_ROOT=/opt/projects
 DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4
 LOG_LEVEL=info
 SEARXNG_URL=http://100.114.205.53:8888
 GITEA_BASE_URL=https://git.indifferentketchup.com
 GITEA_USER=indifferentketchup
 GITEA_SSH_HOST=100.114.205.53:2222
 MCP_CONFIG_PATH=/data/mcp.json
 SKILLS_ROOT=/opt/boocode/data/skills
 CODER_PROVIDERS_PATH=/opt/boocode/data/coder-providers.json
 CLAUDE_SDK_BACKEND=1
--- a/apps/coder/CLAUDE.md
+++ b/apps/coder/CLAUDE.md
@@ -32,3 +32,8 @@
 - **Claude SDK backend tool RESULTS arrive as `type:'user'` SDK messages** (tool_result content blocks): `mapSdkMessage` (`claude-sdk-map.ts`) MUST map the `user` case → a terminal `tool_update` (completed/failed + output), else the tool_call persists `status:'running'` and the UI spinner never stops. The dispatcher's `tool_update` path then publishes + persists it.
 - **ACP command discovery is async**: `acp-probe.ts` must poll after `newSession` for `available_commands_update` (commands arrive in a later notification; reading synchronously captures 0). PTY providers (claude) discover from disk via `claude-command-discovery.ts` (`~/.claude/commands` + `enabledPlugins`, bare names, deduped). `AgentCommand.kind` tags `'command'` vs `'skill'`; `CoderPane`'s `slashGroups` splits them into icon'd groups. `SlashCommandPicker`'s `groups?` prop is opt-in.
 - **A new per-message coder field silently drops unless you update every mapper**: the HTTP read SELECT + `mapCoderMessageRow` (`apps/coder/src/routes/messages.ts`), **the WS `snapshot` SELECT (`apps/coder/src/routes/ws.ts`)** — it has its OWN column list and the client's `snapshot` handler `setMessages`-overwrites the HTTP load, so a field present in the HTTP route but absent here shows live yet vanishes on refresh — `CoderPane.tsx` (`RawCoderMessage`/`CoderMessage`/`mapCoderTimelineRow` + the live `message_complete` WS reducer), `CoderMessageWire` (`CoderMessageList.tsx`), and `api/types.ts`. The client `mapCoderTimelineRow` whitelists fields — easiest to forget. This bit `model` twice: the client chain (`v2.7.9`) and then the WS snapshot SELECT (`v2.7.11`) — the chip showed live but vanished on coder refresh until both were fixed.
 ## Orchestrator (v2.7.17)
 - **In-app multi-agent conductor**: `services/flow-runner.ts` runs a flow by inserting each step as a `tasks` row (the existing dispatcher runs it) and advancing on a new `onTaskTerminal` dispatcher-deps hook; persisted in `flow_runs`/`flow_steps` (resumed at startup via `initResume`). The 22 conductor flow defs + Spine factory are re-homed under `src/conductor/`. Pure scheduler/resume helpers in `flow-runner-decisions.ts`. Full design: `openspec/changes/archived/orchestrator/`.
 - **Read-only is load-bearing — don't add a dispatch path that bypasses it.** Every step dispatches `agent='qwen', mode_id='plan'`; `dispatcher.ts` force-routes qwen+plan to the PTY `--approval-mode plan` gate and HARD-FAILS the task (never falls to write-capable native inference) when qwen is unavailable (`shouldFailOnMissingAgent`). `BOOCODE_TOOLS` gates BooChat's NATIVE inference tools only — it does NOT govern an external CLI agent (qwen/opencode bring their own write tools); read-only for a dispatched agent is the agent-layer mode (PTY `--approval-mode plan`; ACP `setSessionMode` is fail-OPEN by default, fail-CLOSED for `plan` via `READ_ONLY_MODE_IDS` in `acp-dispatch.ts`).
--- a/apps/coder/src/index.ts
+++ b/apps/coder/src/index.ts
@@ -23,8 +23,8 @@ import { registerAgentSessionRoutes } from './routes/agent-sessions.js';
 import { registerTaskRoutes } from './routes/tasks.js';
 import { registerInboxRoutes } from './routes/inbox.js';
 import { registerStatsRoutes } from './routes/stats.js';
 import { registerArenaRoutes } from './routes/arena.js';
 import { registerRunsRoutes } from './routes/runs.js';
 import { registerArenaRoutes } from './routes/arena.js';
 import { registerProviderRoutes } from './routes/providers.js';
 import { registerWorktreeSafetyRoutes } from './routes/worktree-safety.js';
 import { registerLifecycleRoutes } from './routes/lifecycle.js';
@@ -34,10 +34,13 @@ import { createDispatcher } from './services/dispatcher.js';
 // Orchestrator (Phase 2): DB-backed flow-runner; advances on the dispatcher's
 // onTaskTerminal hook.
 import { createFlowRunner } from './services/flow-runner.js';
 // Arena: DB-backed battle-runner; also advances on the onTaskTerminal hook.
 import { createBattleRunner, type DispatchContestantFn } from './services/arena-runner.js';
 import { createAnalyzer } from './services/arena-analyzer.js';
 import { agentPool } from './services/agent-pool.js';
 import { createOrphanWorktreeReaper } from './services/orphan-worktree-reaper.js';
 import { probeAgents } from './services/agent-probe.js';
-import { getProviderSnapshot, persistProbedModels } from './services/provider-snapshot.js';
+import { getProviderSnapshot, persistProbedModels, fetchLlamaSwapModels } from './services/provider-snapshot.js';
 import { setPermissionHooks } from './services/permission-waiter.js';
 import { publishAgentStatus } from './services/agent-status-publish.js';
 import { homedir } from 'node:os';
@@ -220,31 +223,119 @@ async function main() {
  // Orchestrator (Phase 2): the flow-runner reacts to the dispatcher's
  // onTaskTerminal hook to advance flow_runs. Created before the dispatcher so its
-  // terminal callback can be wired in. Its launch() is driven by the runs route
+  // terminal callback can be wired in.
  // (a later phase); resume on startup is a later phase too.
  const flowRunner = createFlowRunner({ sql, broker, log: app.log, config });
-  // Phase 4: dispatcher — polls tasks table and runs inference. onTaskTerminal
+  // Arena SEAM (a): build the local-model set from the live llama-swap model list.
-  // notifies the flow-runner when a step's task settles (D-2).
+  // Both bare IDs ('qwen3.6-35b') and prefixed IDs ('llama-swap/qwen3.6-35b') are
  // included so opencode-style prefixed contestants and native-style bare contestants
  // both classify correctly as local.
  const localModelsList = await fetchLlamaSwapModels(config).catch(() => []);
  const localModels = new Set([
    ...localModelsList.map((m) => m.id),
    ...localModelsList.map((m) => `llama-swap/${m.id}`),
  ]);
  // Arena dispatch function — Phase 4 SEAM (b).
  // Coding: insert a tasks row with agent=identity (null for native/boocode);
  //   the dispatcher creates a worktree and runs the external agent (or native).
  // Q&A: pre-create a session with agent_id stamped to the persona slug so native
  //   inference loads the persona's system_prompt + tools from AGENTS.md;
  //   task.session_id is pre-set so runNativeInference reuses the session.
  const dispatchContestant: DispatchContestantFn = async ({
    projectId,
    prompt,
    identity,
    model,
    battleType,
  }) => {
    if (battleType === 'qa') {
      const sessionName = `Arena Q&A [${identity}]: ${prompt.slice(0, 30)}`;
      const [session] = await sql<{ id: string }[]>`
        INSERT INTO sessions (project_id, name, model, agent_id, status)
        VALUES (${projectId}, ${sessionName}, ${model}, ${identity}, 'open')
        RETURNING id
      `;
      const [task] = await sql<{ id: string }[]>`
        INSERT INTO tasks (project_id, input, model, session_id)
        VALUES (${projectId}, ${prompt}, ${model}, ${session!.id})
        RETURNING id
      `;
      return { taskId: task!.id, sessionId: session!.id };
    }
    // Coding: boocode = native inference (no external agent); any other identity
    // is an external agent name (claude, opencode, qwen, goose) that maps to
    // available_agents and gets its own per-task worktree via runExternalAgent.
    // Session is created lazily by the dispatcher, so sessionId is unknown here.
    const agentName = identity === 'boocode' ? null : identity;
    const [task] = await sql<{ id: string }[]>`
      INSERT INTO tasks (project_id, input, agent, model)
      VALUES (${projectId}, ${prompt}, ${agentName}, ${model})
      RETURNING id
    `;
    return { taskId: task!.id, sessionId: null };
  };
  // Arena analyzer: two-stage digest→judge (v1). Pluggable seam — a v2 Han
  // Orchestrator flow can replace this without schema changes.
  const analyzer = createAnalyzer({
    sql,
    broker,
    log: app.log,
    config,
    localModels,
  });
  // Arena battle-runner: notified on the same onTaskTerminal hook as the flow-runner.
  const battleRunner = createBattleRunner({
    sql,
    broker,
    log: app.log,
    dispatch: dispatchContestant,
    onBattleComplete: (battleId) => {
      void analyzer.analyze(battleId);
    },
    onCrossExamStart: ({ battleId, crossExamId, identity, model }) => {
      void analyzer.crossExamine(battleId, crossExamId, { identity, model });
    },
    localModels,
  });
  // Compose onTaskTerminal: both flow-runner and battle-runner are notified.
  // Each ignores tasks it doesn't own (flow-runner checks flow_steps.task_id;
  // battle-runner checks contestants.task_id).
  const onTaskTerminal = (taskId: string, state: string): void => {
    flowRunner.handleTaskTerminal(taskId, state);
    battleRunner.handleTaskTerminal(taskId, state);
  };
  // Phase 4: dispatcher — polls tasks table and runs inference. The composed
  // onTaskTerminal hook notifies both the flow-runner and the battle-runner when
  // any task settles.
  const dispatcher = createDispatcher({
    sql,
    inference: inferenceApi,
    broker,
    log: app.log,
    config,
-    onTaskTerminal: flowRunner.handleTaskTerminal,
+    onTaskTerminal,
  });
  dispatcher.start();
-  // Phase 5: re-advance any flow_runs that were 'running' when the service last
+  // Re-advance in-flight flow_runs and battles after a coder restart. Both run
-  // stopped (D-9). Runs AFTER dispatcher.start() so re-dispatched 'pending' tasks
+  // AFTER dispatcher.start() so re-dispatched 'pending' tasks are picked up.
  // are picked up by the dispatcher's startup poll.
  void flowRunner.initResume().catch((err) => {
    app.log.error(
      { err: err instanceof Error ? err.message : String(err) },
      'flow-runner: initResume failed',
    );
  });
  void battleRunner.initResume().catch((err) => {
    app.log.error(
      { err: err instanceof Error ? err.message : String(err) },
      'arena: initResume failed',
    );
  });
  // v2.6 Phase 3: configure + start the agent-pool lifecycle sweep (idle-TTL +
  // LRU-cap eviction of warm backends, plus each backend's proactive health probe)
@@ -281,8 +372,8 @@ async function main() {
  registerTaskRoutes(app, sql, inferenceApi, dispatcher.cancelExternalTask);
  registerInboxRoutes(app, sql);
  registerStatsRoutes(app, sql);
  registerArenaRoutes(app, sql);
  registerRunsRoutes(app, sql, flowRunner, dispatcher.cancelExternalTask);
  registerArenaRoutes(app, sql, battleRunner, dispatcher.cancelExternalTask, config);
  registerProviderRoutes(app, sql, config);
  registerWorktreeSafetyRoutes(app, sql);
  registerLifecycleRoutes(app, sql);
--- a/apps/coder/src/routes/arena.ts
+++ b/apps/coder/src/routes/arena.ts
@@ -1,136 +1,412 @@
 /**
- * v2.0.5: Arena routes — competitive dispatch of the same task to multiple agents.
+ * Arena routes — HTTP surface for the Battle UI.
 *
- * POST /api/arena        — create an arena with 2-5 contestants
+ * POST /api/battles                         — launch a battle
- * GET  /api/arena/:id    — get all tasks in an arena
+ * GET  /api/battles?project_id=             — list battles for a project
- * POST /api/arena/:id/select/:task_id — mark a task as the arena winner
+ * GET  /api/battles/:id                     — one battle + contestants + cross-exams
 * POST /api/battles/:id/stop                — cancel a running battle
 * POST /api/battles/:id/analyze             — trigger analysis (Phase 5 fills the logic)
 * POST /api/battles/:id/cross-examine       — start a cross-examination (Phase 5 fills the logic)
 *
 * Mirrors the shape of runs.ts (Orchestrator routes). Battle creation delegates to
 * the battle-runner; cancellation calls cancelBattle then aborts in-flight tasks
 * via the dispatcher's cancelExternalTask.
 */
 import type { FastifyInstance } from 'fastify';
 import { z } from 'zod';
 import { readFile } from 'node:fs/promises';
 import { join } from 'node:path';
 import type { Sql } from '../db.js';
 import type { Config } from '../config.js';
 import type { BattleRunner } from '../services/arena-runner.js';
 import type { ExternalCancelFn } from './tasks.js';
 import { arenaModelCall } from '../services/arena-model-call.js';
-const ContestantSchema = z.object({
+// ─── Validation schemas ───────────────────────────────────────────────────────
-  agent: z.string().max(100).optional(),
+
-  model: z.string().max(200).optional(),
+const UuidParam = z.string().uuid();
-  mode_id: z.string().max(200).optional(),
+
-  thinking_option_id: z.string().max(200).optional(),
+const ContestantInput = z.object({
  identity: z.string().min(1).max(200),
  model: z.string().min(1).max(200),
 });
-const CreateArenaBody = z.object({
+const CreateBattleBody = z.object({
  project_id: z.string().uuid(),
-  input: z.string().min(1).max(64_000),
+  battle_type: z.enum(['coding', 'qa']),
-  contestants: z.array(ContestantSchema).min(2).max(5),
+  prompt: z.string().min(1).max(64_000),
  contestants: z
    .array(ContestantInput)
    .min(2, 'at least 2 contestants required')
    .max(6, 'at most 6 contestants allowed'),
 });
-interface TaskRow {
+const ListBattlesQuery = z.object({
-  id: string;
+  project_id: z.string().uuid(),
-  agent: string | null;
+});
  model: string | null;
  mode_id: string | null;
  thinking_option_id: string | null;
  state: string;
 }
-export function registerArenaRoutes(app: FastifyInstance, sql: Sql): void {
+const CrossExamineBody = z.object({
-  // POST /api/arena — create a new arena
+  identity: z.string().min(1).max(200),
-  app.post('/api/arena', async (req, reply) => {
+  model: z.string().min(1).max(200),
-    const parsed = CreateArenaBody.safeParse(req.body);
+});
 const SetWinnerBody = z.object({
  winner_contestant_id: z.string().uuid().nullable(),
 });
 // ─── Route registration ───────────────────────────────────────────────────────
 const GeneratePromptBody = z.object({
  description: z.string().min(1).max(2_000),
 });
 export function registerArenaRoutes(
  app: FastifyInstance,
  sql: Sql,
  battleRunner: BattleRunner,
  cancelExternal: ExternalCancelFn,
  config: Config,
 ): void {
  // POST /api/battles/generate-prompt — draft a fuller battle prompt from a
  // short description using the default BooChat model. One-shot, non-streaming.
  // Must be registered BEFORE /api/battles/:id so the literal 'generate-prompt'
  // path is not mistaken for a UUID param.
  app.post('/api/battles/generate-prompt', async (req, reply) => {
    const parsed = GeneratePromptBody.safeParse(req.body);
    if (!parsed.success) {
      reply.code(400);
      return { error: 'invalid body', details: parsed.error.flatten() };
    }
-    const { project_id, input, contestants } = parsed.data;
+    const { description } = parsed.data;
    const arenaId = crypto.randomUUID();
-    const tasks: TaskRow[] = [];
+    try {
-    for (const contestant of contestants) {
+      const prompt = await arenaModelCall({
-      const [task] = await sql<TaskRow[]>`
+        config,
-        INSERT INTO tasks (project_id, input, agent, model, mode_id, thinking_option_id, arena_id)
+        model: config.DEFAULT_MODEL,
-        VALUES (
+        system: [
-          ${project_id},
+          'You are a battle-prompt writer for an AI Arena.',
-          ${input},
+          'The user gives you a short description of a coding or Q&A challenge.',
-          ${contestant.agent ?? null},
+          'Expand it into a clear, self-contained prompt (2–6 sentences) that any AI model can act on.',
-          ${contestant.model ?? null},
+          'Include specific acceptance criteria where helpful.',
-          ${contestant.mode_id ?? null},
+          'Output ONLY the prompt — no preamble, no labels, no meta-commentary.',
-          ${contestant.thinking_option_id ?? null},
+        ].join(' '),
-          ${arenaId}
+        user: description,
-        )
+        maxTokens: 400,
-        RETURNING id, agent, model, mode_id, thinking_option_id, state
+        temperature: 0.6,
-      `;
+      });
-      tasks.push(task!);
+      return { prompt };
    } catch (err) {
      app.log.warn(
        { err: err instanceof Error ? err.message : String(err) },
        'arena generate-prompt: model call failed',
      );
      reply.code(502);
      return { error: 'model call failed' };
    }
  });
  // POST /api/battles — launch a battle
  app.post('/api/battles', async (req, reply) => {
    const parsed = CreateBattleBody.safeParse(req.body);
    if (!parsed.success) {
      reply.code(400);
      return { error: 'invalid body', details: parsed.error.flatten() };
    }
    const { project_id, battle_type, prompt, contestants } = parsed.data;
    // Reject duplicate (identity, model) pairs up front — the schema UNIQUE
    // constraint would catch it too, but an early 422 is friendlier.
    const seen = new Set<string>();
    for (const c of contestants) {
      const key = `${c.identity}::${c.model}`;
      if (seen.has(key)) {
        reply.code(422);
        return {
          error: 'duplicate_contestant',
          message: `duplicate contestant: identity="${c.identity}" model="${c.model}"`,
        };
      }
      seen.add(key);
    }
    // Verify project exists
    const [proj] = await sql<{ id: string }[]>`SELECT id FROM projects WHERE id = ${project_id}`;
    if (!proj) {
      reply.code(404);
      return { error: 'project not found' };
    }
    const { battleId } = await battleRunner.startBattle({
      projectId: project_id,
      battleType: battle_type,
      prompt,
      contestants,
    });
    reply.code(201);
-    return {
+    return { battle_id: battleId };
      arena_id: arenaId,
      tasks: tasks.map((t) => ({
        id: t.id,
        agent: t.agent,
        model: t.model,
        mode_id: t.mode_id,
        thinking_option_id: t.thinking_option_id,
        state: t.state,
      })),
    };
  });
-  // GET /api/arena/:arena_id — list all tasks in an arena
+  // GET /api/battles?project_id= — list battles, most-recent-first
-  app.get<{ Params: { arena_id: string } }>('/api/arena/:arena_id', async (req, reply) => {
+  app.get('/api/battles', async (req, reply) => {
-    const { arena_id } = req.params;
+    const parsed = ListBattlesQuery.safeParse(req.query);
-
+    if (!parsed.success) {
    // Validate UUID format
    const uuidRegex = /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
    if (!uuidRegex.test(arena_id)) {
      reply.code(400);
-      return { error: 'invalid arena_id format' };
+      return { error: 'invalid query', details: parsed.error.flatten() };
    }
-    const tasks = await sql`
+    const battles = await sql`
-      SELECT id, project_id, state, input, output_summary, agent, model, mode_id, thinking_option_id, execution_path, session_id, started_at, ended_at, created_at, arena_id
+      SELECT id, project_id, battle_type, prompt, status,
-      FROM tasks
+             winner_contestant_id, results_path, error,
-      WHERE arena_id = ${arena_id}
+             created_at, updated_at
-      ORDER BY created_at
+      FROM battles
      WHERE project_id = ${parsed.data.project_id}
      ORDER BY created_at DESC
      LIMIT 100
    `;
-    if (tasks.length === 0) {
+    return { battles };
      reply.code(404);
      return { error: 'arena not found' };
    }
    return { arena_id, tasks };
  });
-  // POST /api/arena/:arena_id/select/:task_id — mark the winner
+  // GET /api/battles/:id — one battle + its contestants + cross-examinations
-  app.post<{ Params: { arena_id: string; task_id: string } }>(
+  app.get<{ Params: { id: string } }>('/api/battles/:id', async (req, reply) => {
-    '/api/arena/:arena_id/select/:task_id',
+    const parsedId = UuidParam.safeParse(req.params.id);
-    async (req, reply) => {
+    if (!parsedId.success) {
-      const { arena_id, task_id } = req.params;
+      reply.code(400);
-
+      return { error: 'invalid id' };
      // Verify the task belongs to this arena
      const rows = await sql<{ id: string; state: string; arena_id: string | null }[]>`
        SELECT id, state, arena_id FROM tasks WHERE id = ${task_id}
      `;
      if (rows.length === 0) {
        reply.code(404);
        return { error: 'task not found' };
      }
      const task = rows[0]!;
      if (task.arena_id !== arena_id) {
        reply.code(409);
        return { error: 'task does not belong to this arena' };
      }
      // Mark as selected via output_summary prefix (lightweight — no schema change)
      await sql`
        UPDATE tasks
        SET output_summary = COALESCE('[SELECTED] ' || output_summary, '[SELECTED]')
        WHERE id = ${task_id}
      `;
      return { selected: true, task_id, arena_id };
    }
-  );
+    const id = parsedId.data;
    const [battle] = await sql<{
      id: string;
      project_id: string;
      battle_type: string;
      prompt: string;
      status: string;
      winner_contestant_id: string | null;
      results_path: string | null;
      error: string | null;
      created_at: unknown;
      updated_at: unknown;
    }[]>`
      SELECT id, project_id, battle_type, prompt, status,
             winner_contestant_id, results_path, error,
             created_at, updated_at
      FROM battles WHERE id = ${id}
    `;
    if (!battle) {
      reply.code(404);
      return { error: 'battle not found' };
    }
    const contestants = await sql`
      SELECT id, battle_id, identity, model, lane, task_id, worktree_id,
             status, duration_ms, tokens_per_sec, cost_tokens, result_path, error,
             created_at, updated_at
      FROM contestants
      WHERE battle_id = ${id}
      ORDER BY created_at ASC
    `;
    const crossExaminations = await sql`
      SELECT id, battle_id, identity, model, verdict, created_at
      FROM cross_examinations
      WHERE battle_id = ${id}
      ORDER BY created_at ASC
    `;
    return { battle, contestants, cross_examinations: crossExaminations };
  });
  // POST /api/battles/:id/stop — cancel a running battle
  app.post<{ Params: { id: string } }>('/api/battles/:id/stop', async (req, reply) => {
    const parsedId = UuidParam.safeParse(req.params.id);
    if (!parsedId.success) {
      reply.code(400);
      return { error: 'invalid id' };
    }
    const id = parsedId.data;
    const [row] = await sql<{ id: string; status: string }[]>`
      SELECT id, status FROM battles WHERE id = ${id}
    `;
    if (!row) {
      reply.code(404);
      return { error: 'battle not found' };
    }
    if (row.status !== 'running') {
      reply.code(409);
      return { error: `cannot stop battle in status '${row.status}'` };
    }
    const { cancelled, taskIds } = await battleRunner.cancelBattle(id);
    if (!cancelled) {
      reply.code(409);
      return { error: 'battle is no longer running' };
    }
    // Abort any in-flight dispatcher tasks (cloud contestants running externally).
    for (const taskId of taskIds) {
      cancelExternal(taskId);
    }
    return { cancelled: true };
  });
  // GET /api/battles/:id/analysis — read analysis.md from the battle's results_path
  app.get<{ Params: { id: string } }>('/api/battles/:id/analysis', async (req, reply) => {
    const parsedId = UuidParam.safeParse(req.params.id);
    if (!parsedId.success) {
      reply.code(400);
      return { error: 'invalid id' };
    }
    const id = parsedId.data;
    const [row] = await sql<{ results_path: string | null }[]>`
      SELECT results_path FROM battles WHERE id = ${id}
    `;
    if (!row) {
      reply.code(404);
      return { error: 'battle not found' };
    }
    if (!row.results_path) {
      reply.code(404);
      return { error: 'analysis not ready' };
    }
    try {
      const text = await readFile(join(row.results_path, 'analysis.md'), 'utf8');
      return { text };
    } catch {
      reply.code(404);
      return { error: 'analysis not ready' };
    }
  });
  // POST /api/battles/:id/analyze — trigger or re-trigger analysis
  app.post<{ Params: { id: string } }>('/api/battles/:id/analyze', async (req, reply) => {
    const parsedId = UuidParam.safeParse(req.params.id);
    if (!parsedId.success) {
      reply.code(400);
      return { error: 'invalid id' };
    }
    const id = parsedId.data;
    const [row] = await sql<{ id: string; status: string }[]>`
      SELECT id, status FROM battles WHERE id = ${id}
    `;
    if (!row) {
      reply.code(404);
      return { error: 'battle not found' };
    }
    if (row.status === 'running') {
      reply.code(409);
      return { error: 'battle is still running — wait for all contestants to finish' };
    }
    const result = await battleRunner.triggerAnalysis(id);
    if (!result.triggered) {
      reply.code(404);
      return { error: 'battle not found' };
    }
    reply.code(202);
    return { triggered: true };
  });
  // PATCH /api/battles/:id/winner — manually set or clear the winner.
  // Validates the contestant belongs to the battle; publishes battle_updated so
  // the pane badge reflects the override immediately. Human is authoritative.
  app.patch<{ Params: { id: string } }>('/api/battles/:id/winner', async (req, reply) => {
    const parsedId = UuidParam.safeParse(req.params.id);
    if (!parsedId.success) {
      reply.code(400);
      return { error: 'invalid id' };
    }
    const parsed = SetWinnerBody.safeParse(req.body);
    if (!parsed.success) {
      reply.code(400);
      return { error: 'invalid body', details: parsed.error.flatten() };
    }
    const result = await battleRunner.setWinner(parsedId.data, parsed.data.winner_contestant_id);
    if (!result.ok) {
      if (result.notFound) { reply.code(404); return { error: 'battle not found' }; }
      if (result.invalidContestant) { reply.code(422); return { error: 'contestant not found in this battle' }; }
      reply.code(500); return { error: 'unknown error' };
    }
    return { ok: true };
  });
  // GET /api/battles/:id/contestants/:cid/diff — read the diff.patch for a coding contestant.
  app.get<{ Params: { id: string; cid: string } }>('/api/battles/:id/contestants/:cid/diff', async (req, reply) => {
    const parsedId = UuidParam.safeParse(req.params.id);
    const parsedCid = UuidParam.safeParse(req.params.cid);
    if (!parsedId.success || !parsedCid.success) {
      reply.code(400);
      return { error: 'invalid id' };
    }
    const [contestant] = await sql<{ result_path: string | null }[]>`
      SELECT result_path FROM contestants
      WHERE id = ${parsedCid.data} AND battle_id = ${parsedId.data}
    `;
    if (!contestant) {
      reply.code(404);
      return { error: 'contestant not found' };
    }
    if (!contestant.result_path) {
      reply.code(404);
      return { error: 'diff not available' };
    }
    try {
      const text = await readFile(join(contestant.result_path, 'diff.patch'), 'utf8');
      return { diff: text };
    } catch {
      reply.code(404);
      return { error: 'diff not available' };
    }
  });
  // POST /api/battles/:id/cross-examine — start a cross-examination
  app.post<{ Params: { id: string } }>('/api/battles/:id/cross-examine', async (req, reply) => {
    const parsedId = UuidParam.safeParse(req.params.id);
    if (!parsedId.success) {
      reply.code(400);
      return { error: 'invalid id' };
    }
    const id = parsedId.data;
    const parsed = CrossExamineBody.safeParse(req.body);
    if (!parsed.success) {
      reply.code(400);
      return { error: 'invalid body', details: parsed.error.flatten() };
    }
    const [row] = await sql<{ id: string; status: string }[]>`
      SELECT id, status FROM battles WHERE id = ${id}
    `;
    if (!row) {
      reply.code(404);
      return { error: 'battle not found' };
    }
    if (row.status === 'running') {
      reply.code(409);
      return { error: 'battle is still running — cross-examine after all contestants finish' };
    }
    const { crossExamId } = await battleRunner.startCrossExam(id, {
      identity: parsed.data.identity,
      model: parsed.data.model,
    });
    reply.code(202);
    return { cross_exam_id: crossExamId };
  });
 }
--- a/apps/coder/src/routes/messages.ts
+++ b/apps/coder/src/routes/messages.ts
@@ -4,6 +4,7 @@ import type { Sql } from '../db.js';
 import type { Broker } from '@boocode/server/broker';
 import type { WsFrame } from '@boocode/contracts/ws-frames';
 import { resolveChatId } from './chat-resolve.js';
 import { applyAll } from '../services/pending_changes.js';
 const AnswerUserInputBody = z.object({
  tool_call_id: z.string().min(1),
@@ -247,6 +248,35 @@ export function registerMessageRoutes(
      inference.enqueue(sessionId, chatId, assistantMsg!.id, 'default');
      // Bypass permission mode (native BooCode): auto-apply staged edits to disk
      // once the turn settles. `enqueue` registers synchronously, so hasActive is
      // true immediately; poll until it clears, apply, then re-publish
      // message_complete so the DiffPanel reflects the now-applied (non-pending)
      // state. Best-effort — failures stay in the pending queue for manual apply.
      if (mode_id === 'bypass') {
        const projectId = sessionRows[0]!.project_id;
        const assistantId = assistantMsg!.id;
        void (async () => {
          try {
            const [proj] = await sql<{ path: string }[]>`SELECT path FROM projects WHERE id = ${projectId}`;
            if (!proj?.path) return;
            for (let i = 0; i < 1200 && inference.hasActive(chatId); i++) {
              await new Promise((r) => setTimeout(r, 1000));
            }
            const applied = await applyAll(sql, sessionId, proj.path);
            if (applied.length > 0) {
              broker.publishFrame(sessionId, {
                type: 'message_complete',
                message_id: assistantId,
                chat_id: chatId,
              } as unknown as WsFrame);
            }
          } catch {
            /* best-effort auto-apply — leave staged changes for manual apply */
          }
        })();
      }
      reply.code(202);
      return { user_message_id: userMsg!.id, assistant_message_id: assistantMsg!.id };
    },
--- a/apps/coder/src/schema.sql
+++ b/apps/coder/src/schema.sql
@@ -54,9 +54,6 @@ DO $$ BEGIN
  END IF;
 END $$;
 -- v2.0.5: arena support — group tasks into competitive arenas.
 ALTER TABLE tasks ADD COLUMN IF NOT EXISTS arena_id UUID;
 -- Human inbox: tasks needing attention
 CREATE OR REPLACE VIEW human_inbox AS
  SELECT * FROM tasks WHERE state IN ('blocked', 'failed');
@@ -81,6 +78,7 @@ ALTER TABLE tasks ADD COLUMN IF NOT EXISTS thinking_option_id TEXT;
 DROP VIEW IF EXISTS human_inbox;
 ALTER TABLE tasks DROP COLUMN IF EXISTS feature_values;
 ALTER TABLE tasks DROP COLUMN IF EXISTS worktree_path;
 ALTER TABLE tasks DROP COLUMN IF EXISTS arena_id;
 CREATE OR REPLACE VIEW human_inbox AS
  SELECT * FROM tasks WHERE state IN ('blocked', 'failed');
@@ -157,7 +155,7 @@ CREATE UNIQUE INDEX IF NOT EXISTS worktrees_active_path_uidx ON worktrees(path)
 DROP TABLE IF EXISTS session_worktrees;
 -- Dispatch hint: which chat (tab) a task belongs to. The coder message route and
-- skills route set it from the frontend tab; session-less creators (arena, MCP,
+-- skills route set it from the frontend tab; session-less creators (MCP,
 -- new_task, generic /api/tasks) leave it NULL and the dispatcher creates a chat.
 ALTER TABLE tasks ADD COLUMN IF NOT EXISTS chat_id UUID REFERENCES chats(id) ON DELETE SET NULL;
@@ -271,7 +269,7 @@ ALTER TABLE agent_sessions ADD CONSTRAINT agent_sessions_backend_chk
  CHECK (backend IN ('opencode_server', 'acp_warm', 'claude_sdk'));
 -- LISTEN/NOTIFY fast path: every tasks INSERT (from any call site — routes,
-- new_task tool, arena, MCP server) fires pg_notify('tasks_new') in the same
+-- new_task tool, MCP server) fires pg_notify('tasks_new') in the same
 -- transaction, so the dispatcher reacts immediately instead of waiting for the
 -- fallback poll. Postgres holds the notification until COMMIT, so the listener
 -- always sees the committed row. A trigger covers all insert paths with no
@@ -357,3 +355,71 @@ DO $$ BEGIN
      CHECK (status IN ('pending', 'running', 'completed', 'failed', 'skipped', 'cancelled'));
  END IF;
 END $$;
 -- Arena: battles + contestants + cross_examinations.
 -- project_id carries no FK (matches tasks.project_id + flow_runs.project_id convention).
 -- winner_contestant_id FK is deferred (forward reference): added via guarded ALTER below.
 CREATE TABLE IF NOT EXISTS battles (
  id                   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  project_id           UUID NOT NULL,
  battle_type          TEXT NOT NULL,
  prompt               TEXT NOT NULL,
  status               TEXT NOT NULL DEFAULT 'pending',
  winner_contestant_id UUID,
  results_path         TEXT,
  error                TEXT,
  created_at           TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
  updated_at           TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
  CONSTRAINT battles_type_chk   CHECK (battle_type IN ('coding', 'qa')),
  CONSTRAINT battles_status_chk CHECK (status IN ('pending', 'running', 'completed', 'failed', 'cancelled'))
 );
 CREATE TABLE IF NOT EXISTS contestants (
  id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  battle_id      UUID NOT NULL REFERENCES battles(id) ON DELETE CASCADE,
  identity       TEXT NOT NULL,
  model          TEXT NOT NULL,
  lane           TEXT NOT NULL,
  task_id        UUID REFERENCES tasks(id) ON DELETE SET NULL,
  worktree_id    UUID REFERENCES worktrees(id) ON DELETE SET NULL,
  status         TEXT NOT NULL DEFAULT 'queued',
  duration_ms    INTEGER,
  tokens_per_sec DOUBLE PRECISION,
  cost_tokens    INTEGER,
  result_path    TEXT,
  error          TEXT,
  created_at     TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
  updated_at     TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
  CONSTRAINT contestants_lane_chk   CHECK (lane   IN ('local', 'cloud')),
  CONSTRAINT contestants_status_chk CHECK (status IN ('queued', 'running', 'done', 'error')),
  UNIQUE (battle_id, identity, model)
 );
 CREATE TABLE IF NOT EXISTS cross_examinations (
  id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  battle_id  UUID NOT NULL REFERENCES battles(id) ON DELETE CASCADE,
  identity   TEXT NOT NULL,
  model      TEXT NOT NULL,
  verdict    TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
 );
 -- Add the winner FK now that contestants exists.
 DO $$ BEGIN
  IF NOT EXISTS (SELECT 1 FROM pg_constraint WHERE conname = 'battles_winner_contestant_id_fkey') THEN
    ALTER TABLE battles ADD CONSTRAINT battles_winner_contestant_id_fkey
      FOREIGN KEY (winner_contestant_id) REFERENCES contestants(id) ON DELETE SET NULL;
  END IF;
 END $$;
 -- battles query (GET /api/battles?project_id=).
 CREATE INDEX IF NOT EXISTS battles_project_created_idx ON battles(project_id, created_at DESC);
 -- Lane-scheduler advance scans (contestants WHERE battle_id = ? AND status = ?).
 CREATE INDEX IF NOT EXISTS contestants_battle_status_idx ON contestants(battle_id, status);
 -- onTaskTerminal callback: look up the contestant owning a completed task.
 CREATE INDEX IF NOT EXISTS contestants_task_id_idx ON contestants(task_id);
 -- Cross-examination listing per battle.
 CREATE INDEX IF NOT EXISTS cross_examinations_battle_idx ON cross_examinations(battle_id);
--- a/apps/coder/src/services/tests/arena-analyzer-helpers.test.ts
+++ b/apps/coder/src/services/tests/arena-analyzer-helpers.test.ts
@@ -0,0 +1,254 @@
 import { describe, it, expect } from 'vitest';
 import {
  buildDigestPrompt,
  buildJudgePrompt,
  buildCrossExamPrompt,
  extractWinner,
  shouldNameWinner,
  type ContestantDigest,
  type ContestantDigestInput,
 } from '../arena-analyzer-helpers.js';
 // ─── shouldNameWinner ─────────────────────────────────────────────────────────
 describe('shouldNameWinner', () => {
  it('returns false with 0 succeeded contestants', () => {
    expect(shouldNameWinner(0)).toBe(false);
  });
  it('returns false with exactly 1 succeeded contestant', () => {
    expect(shouldNameWinner(1)).toBe(false);
  });
  it('returns true with exactly 2 succeeded contestants', () => {
    expect(shouldNameWinner(2)).toBe(true);
  });
  it('returns true with more than 2 succeeded contestants', () => {
    expect(shouldNameWinner(3)).toBe(true);
    expect(shouldNameWinner(6)).toBe(true);
  });
 });
 // ─── extractWinner ────────────────────────────────────────────────────────────
 describe('extractWinner', () => {
  it('extracts identity and model from a WINNER: line', () => {
    const output = 'Some analysis\n\nWINNER: claude/opus-4-5\n\nMore text.';
    expect(extractWinner(output)).toEqual({ identity: 'claude', model: 'opus-4-5' });
  });
  it('is case-insensitive for the WINNER keyword', () => {
    expect(extractWinner('winner: boocode/qwen3.6-35b')).toEqual({
      identity: 'boocode',
      model: 'qwen3.6-35b',
    });
    expect(extractWinner('Winner: opencode/some-model')).toEqual({
      identity: 'opencode',
      model: 'some-model',
    });
  });
  it('returns null when NO_WINNER is declared', () => {
    expect(extractWinner('WINNER: NO_WINNER')).toBeNull();
    expect(extractWinner('winner: no_winner')).toBeNull();
  });
  it('returns null when no WINNER line is present', () => {
    expect(extractWinner('Just some analysis text with no verdict.')).toBeNull();
    expect(extractWinner('')).toBeNull();
  });
  it('returns null when the WINNER line has no slash separator', () => {
    expect(extractWinner('WINNER: justidentity')).toBeNull();
  });
  it('returns null when the WINNER line is empty after the colon', () => {
    expect(extractWinner('WINNER:')).toBeNull();
    expect(extractWinner('WINNER:   ')).toBeNull();
  });
  it('handles leading and trailing whitespace around the slash parts', () => {
    const result = extractWinner('WINNER:  claude / opus-4-5 ');
    expect(result).toEqual({ identity: 'claude', model: 'opus-4-5' });
  });
  it('picks the first WINNER line when multiple are present', () => {
    const output = 'WINNER: claude/opus-4-5\nWINNER: opencode/other-model';
    expect(extractWinner(output)).toEqual({ identity: 'claude', model: 'opus-4-5' });
  });
  it('handles model names that contain slashes by splitting at the first slash only', () => {
    // edge case: model name with a slash — should still split at first slash
    // identity = 'native', model = 'llama-swap/qwen3.6'
    const result = extractWinner('WINNER: native/llama-swap/qwen3.6');
    expect(result).toEqual({ identity: 'native', model: 'llama-swap/qwen3.6' });
  });
 });
 // ─── buildDigestPrompt ────────────────────────────────────────────────────────
 describe('buildDigestPrompt', () => {
  const base: ContestantDigestInput = {
    identity: 'claude',
    model: 'opus-4-5',
    resultMd: '# Output\n\nSome result content.',
    benchmarkLine: '12000ms',
  };
  it('returns an object with non-empty system and user strings', () => {
    const { system, user } = buildDigestPrompt(base);
    expect(system.length).toBeGreaterThan(0);
    expect(user.length).toBeGreaterThan(0);
  });
  it('includes the contestant identity and model in the user prompt', () => {
    const { user } = buildDigestPrompt(base);
    expect(user).toContain('claude');
    expect(user).toContain('opus-4-5');
  });
  it('includes the benchmark line in the user prompt', () => {
    const { user } = buildDigestPrompt(base);
    expect(user).toContain('12000ms');
  });
  it('includes the result.md content in the user prompt', () => {
    const { user } = buildDigestPrompt(base);
    expect(user).toContain('Some result content.');
  });
  it('includes the diff.patch when provided', () => {
    const input: ContestantDigestInput = { ...base, diffPatch: '--- a/foo.ts\n+++ b/foo.ts\n+added' };
    const { user } = buildDigestPrompt(input);
    expect(user).toContain('added');
    expect(user).toContain('```diff');
  });
  it('omits the diff section when diffPatch is undefined', () => {
    const { user } = buildDigestPrompt(base);
    expect(user).not.toContain('```diff');
  });
  it('truncates resultMd longer than 8000 characters', () => {
    const longResult = 'x'.repeat(10_000);
    const { user } = buildDigestPrompt({ ...base, resultMd: longResult });
    // The truncated content must not exceed 8000 chars in the sliced section.
    // We just check the total user string doesn't balloon unreasonably.
    expect(user.length).toBeLessThan(15_000);
  });
  it('truncates diffPatch longer than 5000 characters', () => {
    const longDiff = '+' + 'x'.repeat(10_000);
    const { user } = buildDigestPrompt({ ...base, diffPatch: longDiff });
    expect(user.length).toBeLessThan(16_000);
  });
 });
 // ─── buildJudgePrompt ─────────────────────────────────────────────────────────
 describe('buildJudgePrompt', () => {
  const digests: ContestantDigest[] = [
    { identity: 'claude', model: 'opus-4-5', digest: 'Good result.', benchmarkLine: '5000ms' },
    { identity: 'opencode', model: 'qwen3.6', digest: 'Decent result.', benchmarkLine: '8000ms' },
  ];
  it('includes the original prompt in the user section', () => {
    const { user } = buildJudgePrompt('Write a sorting algorithm', digests);
    expect(user).toContain('Write a sorting algorithm');
  });
  it('includes each contestant heading in the user section', () => {
    const { user } = buildJudgePrompt('prompt', digests);
    expect(user).toContain('claude');
    expect(user).toContain('opus-4-5');
    expect(user).toContain('opencode');
    expect(user).toContain('qwen3.6');
  });
  it('includes each contestant digest text', () => {
    const { user } = buildJudgePrompt('prompt', digests);
    expect(user).toContain('Good result.');
    expect(user).toContain('Decent result.');
  });
  it('instructs the model to name a WINNER when 2+ digests are provided', () => {
    const { system } = buildJudgePrompt('prompt', digests);
    expect(system).toContain('WINNER:');
  });
  it('instructs the model NOT to name a winner when fewer than 2 digests are provided', () => {
    const oneDigest = digests.slice(0, 1);
    const { system } = buildJudgePrompt('prompt', oneDigest);
    expect(system).toContain('NO_WINNER');
    expect(system).not.toContain('WINNER: <identity>');
  });
  it('instructs NO_WINNER when digests list is empty', () => {
    const { system } = buildJudgePrompt('prompt', []);
    expect(system).toContain('NO_WINNER');
  });
  it('truncates originalPrompt longer than 2000 characters', () => {
    const longPrompt = 'p'.repeat(5_000);
    const { user } = buildJudgePrompt(longPrompt, digests);
    // Should not contain more than 2000 chars of the prompt.
    const promptSection = user.split('# Contestant Digests')[0] ?? '';
    expect(promptSection.length).toBeLessThan(3_000);
  });
 });
 // ─── buildCrossExamPrompt ─────────────────────────────────────────────────────
 describe('buildCrossExamPrompt', () => {
  const digests: ContestantDigest[] = [
    { identity: 'claude', model: 'opus-4-5', digest: 'Strong result.', benchmarkLine: '5000ms' },
    { identity: 'boocode', model: 'qwen3.6-35b', digest: 'Decent result.', benchmarkLine: '12000ms' },
  ];
  const baseOpts = {
    originalPrompt: 'Write a sorting algorithm.',
    digests,
    analysisContent: '# Arena Analysis\n\nClaude did better.\n\nWINNER: claude/opus-4-5',
    proposedWinner: 'claude/opus-4-5',
    examinerIdentity: 'goose',
    examinerModel: 'gpt-4o',
  };
  it('includes the examiner identity and model in the system prompt', () => {
    const { system } = buildCrossExamPrompt(baseOpts);
    expect(system).toContain('goose');
    expect(system).toContain('gpt-4o');
  });
  it('includes the original prompt in the user section', () => {
    const { user } = buildCrossExamPrompt(baseOpts);
    expect(user).toContain('Write a sorting algorithm.');
  });
  it('includes each contestant digest', () => {
    const { user } = buildCrossExamPrompt(baseOpts);
    expect(user).toContain('Strong result.');
    expect(user).toContain('Decent result.');
  });
  it('includes the proposed analysis content', () => {
    const { user } = buildCrossExamPrompt(baseOpts);
    expect(user).toContain('Claude did better.');
  });
  it('includes the proposed winner when set', () => {
    const { user } = buildCrossExamPrompt(baseOpts);
    expect(user).toContain('claude/opus-4-5');
  });
  it('notes that no winner was proposed when proposedWinner is null', () => {
    const { user } = buildCrossExamPrompt({ ...baseOpts, proposedWinner: null });
    expect(user).toContain('No winner was proposed');
  });
  it('instructs the examiner to provide a VERDICT line', () => {
    const { system } = buildCrossExamPrompt(baseOpts);
    expect(system).toContain('VERDICT:');
  });
 });
--- a/apps/coder/src/services/tests/arena-decisions.test.ts
+++ b/apps/coder/src/services/tests/arena-decisions.test.ts
@@ -0,0 +1,332 @@
 import { describe, it, expect } from 'vitest';
 import {
  classifyLane,
  nextLocalContestant,
  isBattleComplete,
  computeBenchmark,
  sanitizeSlug,
  buildBattleSlug,
  buildContestantDir,
  reconcileContestantResume,
  reconcileContestants,
  type ContestantSlot,
 } from '../arena-decisions.js';
 // Local models = what the llama-swap server actually serves.
 const LOCAL_MODELS: ReadonlySet<string> = new Set([
  'qwen3.6-35b-a3b-mxfp4',
  'qwen2.5-coder-7b',
 ]);
 // ─── classifyLane ────────────────────────────────────────────────────────────
 describe('classifyLane', () => {
  it('classifies qa battles as local regardless of identity or model', () => {
    expect(classifyLane('qa', 'boocode', 'qwen3.6-35b-a3b-mxfp4', LOCAL_MODELS)).toBe('local');
    expect(classifyLane('qa', 'claude', 'claude-opus-4-5', LOCAL_MODELS)).toBe('local');
    expect(classifyLane('qa', 'Debugger', 'cloud-model', new Set())).toBe('local');
    expect(classifyLane('qa', 'opencode', 'any-model', LOCAL_MODELS)).toBe('local');
  });
  it('classifies coding contestants as local when model is in localModels', () => {
    expect(classifyLane('coding', 'boocode', 'qwen3.6-35b-a3b-mxfp4', LOCAL_MODELS)).toBe('local');
    expect(classifyLane('coding', 'opencode', 'qwen3.6-35b-a3b-mxfp4', LOCAL_MODELS)).toBe('local');
    expect(classifyLane('coding', 'qwen', 'qwen2.5-coder-7b', LOCAL_MODELS)).toBe('local');
  });
  it('classifies coding contestants as cloud when model is not in localModels', () => {
    expect(classifyLane('coding', 'claude', 'claude-opus-4-5', LOCAL_MODELS)).toBe('cloud');
    expect(classifyLane('coding', 'opencode', 'claude-opus-4-5', LOCAL_MODELS)).toBe('cloud');
    expect(classifyLane('coding', 'goose', 'gpt-4o', LOCAL_MODELS)).toBe('cloud');
    expect(classifyLane('coding', 'qwen', 'unknown-remote-model', LOCAL_MODELS)).toBe('cloud');
  });
  it('uses the injected localModels set, not a hardcoded list', () => {
    const custom = new Set(['my-local-model']);
    expect(classifyLane('coding', 'any-agent', 'my-local-model', custom)).toBe('local');
    expect(classifyLane('coding', 'boocode', 'other-model', custom)).toBe('cloud');
  });
  it('defaults to cloud for an empty localModels set', () => {
    expect(classifyLane('coding', 'boocode', 'qwen3.6-35b-a3b-mxfp4', new Set())).toBe('cloud');
    expect(classifyLane('coding', 'native', 'any-local-model', new Set())).toBe('cloud');
  });
 });
 // ─── nextLocalContestant ─────────────────────────────────────────────────────
 describe('nextLocalContestant', () => {
  it('returns null for an empty list', () => {
    expect(nextLocalContestant([])).toBeNull();
  });
  it('returns null when no local contestants are queued', () => {
    const slots: ContestantSlot[] = [
      { id: 'c1', lane: 'local', status: 'running' },
      { id: 'c2', lane: 'cloud', status: 'queued' },
    ];
    expect(nextLocalContestant(slots)).toBeNull();
  });
  it('returns the first queued local contestant in order', () => {
    const slots: ContestantSlot[] = [
      { id: 'c1', lane: 'local', status: 'done' },
      { id: 'c2', lane: 'local', status: 'queued' },
      { id: 'c3', lane: 'local', status: 'queued' },
    ];
    expect(nextLocalContestant(slots)).toBe('c2');
  });
  it('skips done/error local contestants and cloud contestants', () => {
    const slots: ContestantSlot[] = [
      { id: 'c1', lane: 'cloud', status: 'queued' },
      { id: 'c2', lane: 'local', status: 'error' },
      { id: 'c3', lane: 'local', status: 'queued' },
    ];
    expect(nextLocalContestant(slots)).toBe('c3');
  });
  it('returns null when all local contestants are done or error', () => {
    const slots: ContestantSlot[] = [
      { id: 'c1', lane: 'local', status: 'done' },
      { id: 'c2', lane: 'local', status: 'error' },
    ];
    expect(nextLocalContestant(slots)).toBeNull();
  });
 });
 // ─── isBattleComplete ────────────────────────────────────────────────────────
 describe('isBattleComplete', () => {
  it('returns false for an empty list', () => {
    expect(isBattleComplete([])).toBe(false);
  });
  it('returns true when all contestants are done', () => {
    expect(isBattleComplete([{ status: 'done' }, { status: 'done' }])).toBe(true);
  });
  it('returns true when all contestants are error', () => {
    expect(isBattleComplete([{ status: 'error' }, { status: 'error' }])).toBe(true);
  });
  it('returns true for a mixed done/error result', () => {
    expect(isBattleComplete([{ status: 'done' }, { status: 'error' }, { status: 'done' }])).toBe(true);
  });
  it('returns false while any contestant is still running', () => {
    expect(isBattleComplete([{ status: 'done' }, { status: 'running' }])).toBe(false);
  });
  it('returns false while any contestant is still queued', () => {
    expect(isBattleComplete([{ status: 'done' }, { status: 'queued' }])).toBe(false);
  });
 });
 // ─── computeBenchmark ────────────────────────────────────────────────────────
 describe('computeBenchmark', () => {
  const t0 = new Date('2026-06-06T10:00:00.000Z');
  const t1 = new Date('2026-06-06T10:00:05.000Z'); // +5 000ms
  it('computes duration in ms for both lanes', () => {
    const local = computeBenchmark(t0, t1, 100, 'local');
    expect(local.durationMs).toBe(5000);
    const cloud = computeBenchmark(t0, t1, null, 'cloud');
    expect(cloud.durationMs).toBe(5000);
  });
  it('computes tokens/sec for local lane when costTokens is known', () => {
    const bench = computeBenchmark(t0, t1, 500, 'local');
    expect(bench.tokensPerSec).toBeCloseTo(100, 5); // 500 / 5 = 100 tok/s
  });
  it('omits tokens/sec for cloud lane regardless of costTokens', () => {
    const bench = computeBenchmark(t0, t1, 500, 'cloud');
    expect(bench.tokensPerSec).toBeNull();
  });
  it('omits tokens/sec for local lane when costTokens is null', () => {
    const bench = computeBenchmark(t0, t1, null, 'local');
    expect(bench.tokensPerSec).toBeNull();
  });
  it('returns durationMs = 0 and null tokensPerSec when timestamps are equal', () => {
    const bench = computeBenchmark(t0, t0, 100, 'local');
    expect(bench.durationMs).toBe(0);
    expect(bench.tokensPerSec).toBeNull();
  });
  it('clamps negative duration to 0 (clock skew)', () => {
    const bench = computeBenchmark(t1, t0, 50, 'local');
    expect(bench.durationMs).toBe(0);
    expect(bench.tokensPerSec).toBeNull();
  });
 });
 // ─── sanitizeSlug ────────────────────────────────────────────────────────────
 describe('sanitizeSlug', () => {
  it('lowercases and preserves alphanumeric + hyphens', () => {
    expect(sanitizeSlug('claude')).toBe('claude');
    expect(sanitizeSlug('claude-opus-4-5')).toBe('claude-opus-4-5');
  });
  it('replaces spaces and special characters with hyphens', () => {
    expect(sanitizeSlug('Code Reviewer')).toBe('code-reviewer');
    expect(sanitizeSlug('native/boocode')).toBe('native-boocode');
    expect(sanitizeSlug('qwen2.5-coder-35b')).toBe('qwen2-5-coder-35b');
  });
  it('collapses consecutive non-alphanumeric runs to a single hyphen', () => {
    expect(sanitizeSlug('foo  bar---baz')).toBe('foo-bar-baz');
  });
  it('strips leading and trailing hyphens', () => {
    expect(sanitizeSlug('---foo---')).toBe('foo');
  });
  it('truncates to 64 characters', () => {
    const long = 'a'.repeat(100);
    expect(sanitizeSlug(long).length).toBe(64);
  });
 });
 // ─── buildBattleSlug ─────────────────────────────────────────────────────────
 describe('buildBattleSlug', () => {
  it('builds a deterministic dated slug from id, type, and createdAt', () => {
    const id = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890';
    const createdAt = new Date('2026-06-06T12:00:00.000Z');
    const slug = buildBattleSlug(id, 'coding', createdAt);
    expect(slug).toBe('2026-06-06-coding-a1b2c3d4');
  });
  it('includes the battle type in the slug', () => {
    const id = 'aaaaaaaa-0000-0000-0000-000000000000';
    const createdAt = new Date('2026-01-01T00:00:00.000Z');
    expect(buildBattleSlug(id, 'qa', createdAt)).toContain('-qa-');
    expect(buildBattleSlug(id, 'coding', createdAt)).toContain('-coding-');
  });
  it('uses the first 8 hex chars of the uuid (dashes stripped)', () => {
    const id = 'deadbeef-0000-0000-0000-000000000000';
    const slug = buildBattleSlug(id, 'coding', new Date('2026-06-06T00:00:00Z'));
    expect(slug.endsWith('-deadbeef')).toBe(true);
  });
 });
 // ─── buildContestantDir ──────────────────────────────────────────────────────
 describe('buildContestantDir', () => {
  it('joins sanitized identity and model with a hyphen', () => {
    expect(buildContestantDir('claude', 'claude-opus-4-5')).toBe('claude-claude-opus-4-5');
  });
  it('sanitizes both parts independently', () => {
    expect(buildContestantDir('Code Reviewer', 'qwen2.5-35b')).toBe('code-reviewer-qwen2-5-35b');
  });
 });
 // ─── reconcileContestantResume ───────────────────────────────────────────────
 describe('reconcileContestantResume', () => {
  it('keeps non-running contestants regardless of task state', () => {
    for (const status of ['queued', 'done', 'error']) {
      expect(reconcileContestantResume(status, 'tid', 'completed')).toBe('keep');
      expect(reconcileContestantResume(status, null, null)).toBe('keep');
    }
  });
  it('re-dispatches a running contestant with no task_id', () => {
    expect(reconcileContestantResume('running', null, null)).toBe('re-dispatch');
  });
  it('re-dispatches a running contestant whose task row is absent', () => {
    expect(reconcileContestantResume('running', 'tid', null)).toBe('re-dispatch');
  });
  it('marks done when the task completed before the terminal callback ran', () => {
    expect(reconcileContestantResume('running', 'tid', 'completed')).toBe('mark-done');
  });
  it('marks error when the task failed', () => {
    expect(reconcileContestantResume('running', 'tid', 'failed')).toBe('mark-error');
  });
  it('marks cancelled when the task was cancelled', () => {
    expect(reconcileContestantResume('running', 'tid', 'cancelled')).toBe('mark-cancelled');
  });
  it('keeps a running contestant whose task is pending (dispatcher handles it)', () => {
    expect(reconcileContestantResume('running', 'tid', 'pending')).toBe('keep');
  });
  it('re-dispatches when the task is stuck running (process died)', () => {
    expect(reconcileContestantResume('running', 'tid', 'running')).toBe('re-dispatch');
  });
  it('re-dispatches when the task is blocked (permission dialog gone on restart)', () => {
    expect(reconcileContestantResume('running', 'tid', 'blocked')).toBe('re-dispatch');
  });
 });
 // ─── reconcileContestants ────────────────────────────────────────────────────
 describe('reconcileContestants', () => {
  it('returns one decision per contestant', () => {
    const contestants = [
      { contestantId: 'c1', taskId: null, status: 'done' },
      { contestantId: 'c2', taskId: 't1', status: 'running' },
      { contestantId: 'c3', taskId: 't2', status: 'running' },
    ];
    const taskStates = new Map([['t1', 'completed'], ['t2', 'running']]);
    const decisions = reconcileContestants(contestants, taskStates);
    expect(decisions).toHaveLength(3);
    expect(decisions[0]).toEqual({ contestantId: 'c1', action: 'keep' });
    expect(decisions[1]).toEqual({ contestantId: 'c2', action: 'mark-done' });
    expect(decisions[2]).toEqual({ contestantId: 'c3', action: 're-dispatch' });
  });
  it('re-dispatches a running contestant whose taskId is absent from taskStates', () => {
    const contestants = [{ contestantId: 'c1', taskId: 'orphan', status: 'running' }];
    const decisions = reconcileContestants(contestants, new Map());
    expect(decisions[0]?.action).toBe('re-dispatch');
  });
  it('re-dispatches a running contestant with null taskId', () => {
    const contestants = [{ contestantId: 'c1', taskId: null, status: 'running' }];
    const decisions = reconcileContestants(contestants, new Map());
    expect(decisions[0]?.action).toBe('re-dispatch');
  });
  it('returns empty array for no contestants', () => {
    expect(reconcileContestants([], new Map())).toEqual([]);
  });
  it('keeps a running contestant whose task is pending', () => {
    const contestants = [{ contestantId: 'c1', taskId: 't1', status: 'running' }];
    const taskStates = new Map([['t1', 'pending']]);
    const decisions = reconcileContestants(contestants, taskStates);
    expect(decisions[0]?.action).toBe('keep');
  });
  it('handles a mixed battle: done/queued kept, stale running re-dispatched', () => {
    const contestants = [
      { contestantId: 'c1', taskId: 't1', status: 'done' },
      { contestantId: 'c2', taskId: null, status: 'queued' },
      { contestantId: 'c3', taskId: 't2', status: 'running' },
      { contestantId: 'c4', taskId: 't3', status: 'running' },
    ];
    const taskStates = new Map([
      ['t1', 'completed'],
      ['t2', 'running'],  // stuck — process dead
      ['t3', 'pending'],  // dispatcher will handle
    ]);
    const decisions = reconcileContestants(contestants, taskStates);
    expect(decisions.find((d) => d.contestantId === 'c1')?.action).toBe('keep');
    expect(decisions.find((d) => d.contestantId === 'c2')?.action).toBe('keep');
    expect(decisions.find((d) => d.contestantId === 'c3')?.action).toBe('re-dispatch');
    expect(decisions.find((d) => d.contestantId === 'c4')?.action).toBe('keep');
  });
 });
--- a/apps/coder/src/services/acp-derive.ts
+++ b/apps/coder/src/services/acp-derive.ts
@@ -68,11 +68,18 @@ export function deriveModesFromACP(
 ): { modes: ProviderMode[]; currentModeId: string | null } {
  if (modeState?.availableModes?.length) {
    return {
-      modes: modeState.availableModes.map((mode) => ({
+      // ACP omits the unattended flag; inherit it from the manifest fallback by
-        id: mode.id,
+      // id so the unified permission picker can still detect each agent's bypass
-        label: mode.name,
+      // mode (e.g. opencode `full-access`) from live-probed modes.
-        description: mode.description ?? undefined,
+      modes: modeState.availableModes.map((mode) => {
-      })),
+        const fb = fallbackModes.find((f) => f.id === mode.id);
        return {
          id: mode.id,
          label: mode.name,
          description: mode.description ?? undefined,
          ...(fb?.isUnattended ? { isUnattended: true } : {}),
        };
      }),
      currentModeId: modeState.currentModeId ?? null,
    };
  }
--- a/apps/coder/src/services/arena-analyzer-helpers.ts
+++ b/apps/coder/src/services/arena-analyzer-helpers.ts
@@ -0,0 +1,191 @@
 /**
 * Pure, side-effect-free helpers for the Arena analyzer.
 * No DB, no IO, no network — safe to unit-test directly.
 *
 * Covers: digest-prompt assembly, judge-prompt assembly, winner extraction
 * from the judge output, the <2-survivors no-winner rule, and the
 * cross-examination prompt.
 */
 // ─── Shared types ─────────────────────────────────────────────────────────────
 export interface ContestantDigestInput {
  identity: string;
  model: string;
  resultMd: string;
  diffPatch?: string;
  benchmarkLine: string;
 }
 export interface ContestantDigest {
  identity: string;
  model: string;
  digest: string;
  benchmarkLine: string;
 }
 // ─── Digest stage ─────────────────────────────────────────────────────────────
 /**
 * Build the system + user prompts for the per-contestant digest call.
 * The digest is a short structured summary; it keeps each call's context small
 * so the downstream judge only sees digests (not raw diffs).
 */
 export function buildDigestPrompt(input: ContestantDigestInput): { system: string; user: string } {
  const system =
    'You are an expert technical analyst evaluating the output of an AI coding or Q&A battle. ' +
    'Produce a concise structured digest (under 300 words, Markdown bullet points) covering: ' +
    '(1) correctness and quality, (2) completeness, (3) notable strengths, (4) notable weaknesses or issues. ' +
    'Do not reference the battle or other contestants — focus only on this submission.';
  const parts: string[] = [
    `# Contestant: ${input.identity} / ${input.model}`,
    `\nBenchmark: ${input.benchmarkLine}`,
    '\n## Result\n',
    input.resultMd.slice(0, 8_000),
  ];
  if (input.diffPatch) {
    parts.push('\n## Code Changes (diff)\n```diff');
    parts.push(input.diffPatch.slice(0, 5_000));
    parts.push('```');
  }
  return { system, user: parts.join('\n') };
 }
 // ─── Judge stage ──────────────────────────────────────────────────────────────
 /**
 * Build the system + user prompts for the comparative judge call.
 * Receives contestant digests (NOT raw diffs) to keep context bounded.
 *
 * The judge output must contain a line starting with WINNER: or NO_WINNER.
 * The caller extracts it with extractWinner().
 */
 export function buildJudgePrompt(
  originalPrompt: string,
  digests: ContestantDigest[],
 ): { system: string; user: string } {
  const canName = shouldNameWinner(digests.length);
  const winnerInstruction = canName
    ? 'After your comparative analysis, name the best submission on its own line in this exact format:\n' +
      'WINNER: <identity>/<model>\n' +
      'where <identity> and <model> exactly match the heading above. No other text on that line.'
    : 'Fewer than 2 contestants succeeded. Do NOT name a winner. Write the following on its own line:\nNO_WINNER';
  const system =
    'You are an expert judge for an AI battle. You have received digest summaries of each ' +
    "contestant's work on the same task. Write a comparative analysis, then follow these instructions:\n" +
    winnerInstruction;
  const parts: string[] = [
    '# Original Task Prompt\n',
    originalPrompt.slice(0, 2_000),
    '\n# Contestant Digests\n',
  ];
  for (const d of digests) {
    parts.push(`\n## ${d.identity} / ${d.model}`);
    parts.push(`Benchmark: ${d.benchmarkLine}`);
    parts.push(d.digest);
  }
  parts.push(
    '\n# Instructions\nCompare the contestants and follow the winner-naming instructions above.',
  );
  return { system, user: parts.join('\n') };
 }
 // ─── No-winner rule ───────────────────────────────────────────────────────────
 /**
 * Returns true when enough contestants succeeded to name a winner.
 * Rule: at least 2 must have produced a result. With 0 or 1 success the
 * analysis must NOT name a winner (no meaningful comparison possible).
 */
 export function shouldNameWinner(succeededCount: number): boolean {
  return succeededCount >= 2;
 }
 // ─── Winner extraction ────────────────────────────────────────────────────────
 /**
 * Parse the judge's text output and extract the declared winner.
 * Looks for a line matching: WINNER: <identity>/<model>
 * Returns null when no valid winner line is found, or when the line contains
 * NO_WINNER.
 *
 * The parse is lenient on surrounding whitespace and case for the keyword.
 */
 export function extractWinner(judgeOutput: string): { identity: string; model: string } | null {
  for (const line of judgeOutput.split('\n')) {
    const trimmed = line.trim();
    if (!trimmed.toUpperCase().startsWith('WINNER:')) continue;
    const rest = trimmed.slice('WINNER:'.length).trim();
    if (rest.toUpperCase() === 'NO_WINNER' || rest === '') return null;
    const slashIdx = rest.indexOf('/');
    if (slashIdx === -1) return null;
    const identity = rest.slice(0, slashIdx).trim();
    const model = rest.slice(slashIdx + 1).trim();
    if (identity && model) return { identity, model };
  }
  return null;
 }
 // ─── Cross-examination stage ──────────────────────────────────────────────────
 /**
 * Build the system + user prompts for a cross-examination call.
 * The cross-examiner sees the original prompt, contestant digests, and the
 * proposed analysis, and is asked to challenge the result.
 */
 export function buildCrossExamPrompt(opts: {
  originalPrompt: string;
  digests: ContestantDigest[];
  analysisContent: string;
  proposedWinner: string | null;
  examinerIdentity: string;
  examinerModel: string;
 }): { system: string; user: string } {
  const system =
    `You are ${opts.examinerIdentity} (model: ${opts.examinerModel}), acting as an independent ` +
    'cross-examiner in an AI battle. Your role is to critically challenge the proposed analysis ' +
    'and winner, then give your own verdict. Be rigorous but fair. ' +
    'End your response with your verdict on its own line:\n' +
    'VERDICT: <identity>/<model>  — if you agree or disagree with the proposed winner but can name one\n' +
    'VERDICT: NO_WINNER  — if no clear winner exists';
  const parts: string[] = [
    '# Original Task Prompt\n',
    opts.originalPrompt.slice(0, 2_000),
    '\n# Contestant Digests\n',
  ];
  for (const d of opts.digests) {
    parts.push(`\n## ${d.identity} / ${d.model}`);
    parts.push(`Benchmark: ${d.benchmarkLine}`);
    parts.push(d.digest);
  }
  parts.push('\n# Proposed Analysis\n');
  parts.push(opts.analysisContent.slice(0, 5_000));
  if (opts.proposedWinner) {
    parts.push(`\n*(Proposed winner: ${opts.proposedWinner})*`);
  } else {
    parts.push('\n*(No winner was proposed — fewer than 2 contestants succeeded.)*');
  }
  parts.push(
    '\n# Your Cross-Examination\n' +
      'Challenge the analysis above, then give your independent verdict (VERDICT: … on its own line).',
  );
  return { system, user: parts.join('\n') };
 }
--- a/apps/coder/src/services/arena-analyzer.ts
+++ b/apps/coder/src/services/arena-analyzer.ts
@@ -0,0 +1,496 @@
 /**
 * Arena Analyzer — pluggable seam for battle analysis and cross-examination.
 *
 * The Analyzer interface is the plug point: a v2 Han Orchestrator flow can
 * replace the v1 two-stage digest→judge implementation without a schema change.
 *
 * v1 implementation uses DEFAULT_MODEL via direct llama-swap calls (arenaModelCall):
 *   Digest stage  — one call per succeeded contestant, concurrent; produces a
 *                   bounded summary of each result (result.md + diff.patch for
 *                   coding, result.md for Q&A).
 *   Judge stage   — one call with all digests + the original prompt; writes
 *                   analysis.md, names a winner (unless < 2 succeeded), and
 *                   updates battles.winner_contestant_id.
 *
 * Cross-examination:
 *   Local model   — direct arenaModelCall to llama-swap with the chosen model.
 *   Cloud model   — inserts a tasks row (triggers the dispatcher via pg_notify);
 *                   polls for completion; reads output_summary as the verdict.
 *   In both cases the verdict is written to cross_examinations.verdict, appended
 *   to <resultsPath>/cross-exam.md, and a battle_updated frame is published.
 *
 * Never throws — all errors are caught, logged, and swallowed so the caller
 * (arena-runner's onBattleComplete / onCrossExamStart) is never wedged.
 */
 import { readFile, writeFile, mkdir } from 'node:fs/promises';
 import { join } from 'node:path';
 import type { Sql } from '../db.js';
 import type { Broker } from '@boocode/server/broker';
 import type { WsFrame } from '@boocode/contracts/ws-frames';
 import type { FastifyBaseLogger } from 'fastify';
 import type { Config } from '../config.js';
 import type { BattleType } from '@boocode/contracts/arena';
 import { arenaModelCall } from './arena-model-call.js';
 import {
  buildDigestPrompt,
  buildJudgePrompt,
  buildCrossExamPrompt,
  extractWinner,
  shouldNameWinner,
  type ContestantDigest,
 } from './arena-analyzer-helpers.js';
 // ─── Public interface ─────────────────────────────────────────────────────────
 /** Pluggable analysis seam — swap to a Han Orchestrator flow in v2. */
 export interface Analyzer {
  /** Run the two-stage digest→judge analysis for a completed battle. */
  analyze(battleId: string): Promise<void>;
  /**
   * Run a cross-examination for an already-inserted cross_examinations row.
   * The result is written back to that row and a battle_updated frame is published.
   */
  crossExamine(
    battleId: string,
    crossExamId: string,
    opts: { identity: string; model: string },
  ): Promise<void>;
 }
 // ─── Internal DB row types ────────────────────────────────────────────────────
 interface BattleRow {
  id: string;
  project_id: string;
  battle_type: BattleType;
  prompt: string;
  status: string;
  results_path: string | null;
  winner_contestant_id: string | null;
 }
 interface ContestantRow {
  id: string;
  identity: string;
  model: string;
  lane: string;
  status: string;
  result_path: string | null;
  duration_ms: number | null;
  tokens_per_sec: number | null;
 }
 // ─── Factory ──────────────────────────────────────────────────────────────────
 interface AnalyzerDeps {
  sql: Sql;
  broker: Broker;
  log: FastifyBaseLogger;
  config: Pick<Config, 'LLAMA_SWAP_URL' | 'DEFAULT_MODEL'>;
  /** Model IDs served by local llama-swap — cross-exam routing uses this. */
  localModels: ReadonlySet<string>;
 }
 export function createAnalyzer(deps: AnalyzerDeps): Analyzer {
  const { sql, broker, log, config, localModels } = deps;
  // ─── analyze ──────────────────────────────────────────────────────────────
  async function analyze(battleId: string): Promise<void> {
    try {
      await runAnalysis(battleId);
    } catch (err) {
      log.error(
        { err: errMsg(err), battleId },
        'arena-analyzer: analysis failed',
      );
    }
  }
  async function runAnalysis(battleId: string): Promise<void> {
    const battle = await loadBattle(battleId);
    if (!battle) {
      log.warn({ battleId }, 'arena-analyzer: battle not found');
      return;
    }
    const contestants = await loadContestants(battleId);
    const succeeded = contestants.filter((c) => c.status === 'done' && c.result_path);
    log.info(
      { battleId, total: contestants.length, succeeded: succeeded.length },
      'arena-analyzer: starting analysis',
    );
    // Digest stage — concurrent, one call per succeeded contestant.
    const digests = (
      await Promise.all(succeeded.map((c) => digestContestant(battle, c)))
    ).filter((d): d is ContestantDigest => d !== null);
    // Failed contestants are noted in the analysis even if they produced no digest.
    const failedNotes = contestants
      .filter((c) => c.status === 'error')
      .map((c) => `- **${c.identity} / ${c.model}**: failed (no result)\n`);
    // Judge stage — single call with all digests.
    const { analysisText, winner } = await judgeContestants(battle, digests, failedNotes);
    // Write analysis.md to the battle results folder.
    const resultsPath = battle.results_path;
    if (resultsPath) {
      await mkdir(resultsPath, { recursive: true });
      await writeFile(join(resultsPath, 'analysis.md'), analysisText, 'utf8');
    }
    // Resolve the winner to a contestant id and update the battle row.
    let winnerId: string | null = null;
    if (winner && shouldNameWinner(succeeded.length)) {
      const winnerContestant = contestants.find(
        (c) => c.identity === winner.identity && c.model === winner.model,
      );
      if (winnerContestant) {
        winnerId = winnerContestant.id;
        await sql`
          UPDATE battles
          SET winner_contestant_id = ${winnerId}, updated_at = clock_timestamp()
          WHERE id = ${battleId}
        `;
        log.info({ battleId, winnerId, identity: winner.identity, model: winner.model }, 'arena-analyzer: winner set');
      } else {
        log.warn({ battleId, winner }, 'arena-analyzer: judge named a winner not found in contestants');
      }
    }
    publishUser({
      type: 'battle_updated',
      battle_id: battleId,
      winner_contestant_id: winnerId,
      analysis_ready: true,
    });
    log.info({ battleId }, 'arena-analyzer: analysis complete');
  }
  // ─── crossExamine ─────────────────────────────────────────────────────────
  async function crossExamine(
    battleId: string,
    crossExamId: string,
    opts: { identity: string; model: string },
  ): Promise<void> {
    try {
      await runCrossExam(battleId, crossExamId, opts);
    } catch (err) {
      log.error(
        { err: errMsg(err), battleId, crossExamId },
        'arena-analyzer: cross-exam failed',
      );
    }
  }
  async function runCrossExam(
    battleId: string,
    crossExamId: string,
    opts: { identity: string; model: string },
  ): Promise<void> {
    const battle = await loadBattle(battleId);
    if (!battle) {
      log.warn({ battleId }, 'arena-analyzer: battle not found for cross-exam');
      return;
    }
    const contestants = await loadContestants(battleId);
    // Re-read the digests (if contestants have results) for context.
    const succeeded = contestants.filter((c) => c.status === 'done' && c.result_path);
    const digests = (
      await Promise.all(succeeded.map((c) => digestContestant(battle, c)))
    ).filter((d): d is ContestantDigest => d !== null);
    // Read analysis.md for the proposed analysis content.
    let analysisContent = '';
    if (battle.results_path) {
      analysisContent = await readFile(
        join(battle.results_path, 'analysis.md'), 'utf8',
      ).catch(() => '');
    }
    // Resolve proposed winner label.
    let proposedWinner: string | null = null;
    if (battle.winner_contestant_id) {
      const w = contestants.find((c) => c.id === battle.winner_contestant_id);
      if (w) proposedWinner = `${w.identity}/${w.model}`;
    }
    const { system, user } = buildCrossExamPrompt({
      originalPrompt: battle.prompt,
      digests,
      analysisContent,
      proposedWinner,
      examinerIdentity: opts.identity,
      examinerModel: opts.model,
    });
    log.info({ battleId, crossExamId, identity: opts.identity, model: opts.model }, 'arena-analyzer: running cross-exam');
    const verdict = await executeModelCall({
      battleId,
      projectId: battle.project_id,
      identity: opts.identity,
      model: opts.model,
      system,
      user,
    });
    // Persist verdict and append to cross-exam.md.
    await sql`
      UPDATE cross_examinations
      SET verdict = ${verdict}
      WHERE id = ${crossExamId}
    `;
    if (battle.results_path) {
      const crossExamPath = join(battle.results_path, 'cross-exam.md');
      const section =
        `\n---\n\n# Cross-Examination by ${opts.identity} / ${opts.model}\n\n` +
        `${verdict}\n`;
      await writeFile(crossExamPath, section, { flag: 'a', encoding: 'utf8' });
    }
    publishUser({
      type: 'battle_updated',
      battle_id: battleId,
      cross_exam_id: crossExamId,
    });
    log.info({ battleId, crossExamId }, 'arena-analyzer: cross-exam complete');
  }
  // ─── Model call routing ───────────────────────────────────────────────────
  /**
   * Route a one-shot model call to llama-swap (local) or the task dispatcher
   * (cloud). Cloud dispatch inserts a tasks row and polls for completion.
   */
  async function executeModelCall(opts: {
    battleId: string;
    projectId: string;
    identity: string;
    model: string;
    system: string;
    user: string;
  }): Promise<string> {
    const isLocal = localModels.has(opts.model) || localModels.has(`llama-swap/${opts.model}`);
    if (isLocal) {
      return arenaModelCall({
        config,
        model: opts.model,
        system: opts.system,
        user: opts.user,
        maxTokens: 2_000,
        temperature: 0.3,
      });
    }
    // Cloud path: dispatch through the task system and poll for completion.
    return executeCloudModelCall(opts);
  }
  async function executeCloudModelCall(opts: {
    projectId: string;
    identity: string;
    model: string;
    system: string;
    user: string;
  }): Promise<string> {
    // The cross-exam prompt is the full input to the external agent. We embed
    // the system prompt as a preamble in the user message (external agents don't
    // take a separate system arg through the tasks dispatcher).
    const input = `${opts.system}\n\n${opts.user}`;
    // For well-known external agents, stamp the agent name so the dispatcher
    // routes via PTY/ACP. For unknown identities fall back to native inference
    // (agent = null → DEFAULT_MODEL text generation).
    const knownAgents = new Set(['claude', 'opencode', 'qwen', 'goose']);
    const agentName = knownAgents.has(opts.identity) ? opts.identity : null;
    const [task] = await sql<{ id: string }[]>`
      INSERT INTO tasks (project_id, input, agent, model)
      VALUES (${opts.projectId}, ${input}, ${agentName}, ${opts.model})
      RETURNING id
    `;
    const taskId = task!.id;
    log.info({ taskId, identity: opts.identity, model: opts.model }, 'arena-analyzer: cloud cross-exam task dispatched');
    // Poll until terminal (up to 5 minutes).
    const timeoutMs = 5 * 60 * 1_000;
    const pollMs = 2_000;
    const deadline = Date.now() + timeoutMs;
    while (Date.now() < deadline) {
      await sleep(pollMs);
      const [row] = await sql<{ state: string; output_summary: string | null }[]>`
        SELECT state, output_summary FROM tasks WHERE id = ${taskId}
      `;
      if (!row) break;
      if (row.state === 'completed') return row.output_summary ?? '';
      if (row.state === 'failed' || row.state === 'cancelled') {
        throw new Error(`cross-exam task ${row.state}: ${row.output_summary ?? ''}`);
      }
    }
    throw new Error(`cloud cross-exam task timed out after ${timeoutMs / 1000}s`);
  }
  // ─── Digest helper ────────────────────────────────────────────────────────
  async function digestContestant(
    battle: BattleRow,
    c: ContestantRow,
  ): Promise<ContestantDigest | null> {
    if (!c.result_path) return null;
    const resultMd = await readFile(join(c.result_path, 'result.md'), 'utf8').catch(() => '');
    let diffPatch: string | undefined;
    if (battle.battle_type === 'coding') {
      diffPatch = await readFile(join(c.result_path, 'diff.patch'), 'utf8').catch(
        () => undefined,
      );
    }
    const benchmarkLine = formatBenchmarkLine(c);
    const { system, user } = buildDigestPrompt({
      identity: c.identity,
      model: c.model,
      resultMd,
      diffPatch,
      benchmarkLine,
    });
    let digest: string;
    try {
      digest = await arenaModelCall({
        config,
        model: config.DEFAULT_MODEL,
        system,
        user,
        maxTokens: 500,
        temperature: 0.3,
      });
    } catch (err) {
      log.warn(
        { err: errMsg(err), identity: c.identity, model: c.model },
        'arena-analyzer: digest call failed — skipping contestant',
      );
      return null;
    }
    return { identity: c.identity, model: c.model, digest, benchmarkLine };
  }
  // ─── Judge helper ─────────────────────────────────────────────────────────
  async function judgeContestants(
    battle: BattleRow,
    digests: ContestantDigest[],
    failedNotes: string[],
  ): Promise<{ analysisText: string; winner: { identity: string; model: string } | null }> {
    const { system, user } = buildJudgePrompt(battle.prompt, digests);
    let judgeOutput = '';
    try {
      judgeOutput = await arenaModelCall({
        config,
        model: config.DEFAULT_MODEL,
        system,
        user,
        maxTokens: 2_000,
        temperature: 0.3,
      });
    } catch (err) {
      log.error({ err: errMsg(err), battleId: battle.id }, 'arena-analyzer: judge call failed');
      judgeOutput = '*(Judge call failed — no comparison produced.)*';
    }
    const winner = shouldNameWinner(digests.length) ? extractWinner(judgeOutput) : null;
    const sections: string[] = [
      `# Arena Analysis`,
      `\n**Battle type:** ${battle.battle_type}`,
    ];
    if (failedNotes.length > 0) {
      sections.push('\n## Failed Contestants\n');
      sections.push(...failedNotes);
    }
    if (digests.length > 0) {
      sections.push('\n## Contestant Digests\n');
      for (const d of digests) {
        sections.push(`### ${d.identity} / ${d.model}`);
        sections.push(`*Benchmark: ${d.benchmarkLine}*\n`);
        sections.push(d.digest);
      }
    }
    sections.push("\n## Judge's Verdict\n");
    sections.push(judgeOutput);
    if (winner) {
      sections.push(`\n## Winner\n**${winner.identity} / ${winner.model}**`);
    } else {
      const reason =
        digests.length < 2
          ? 'fewer than 2 contestants produced results'
          : 'no clear winner identified';
      sections.push(`\n## Winner\n*No winner named (${reason}).*`);
    }
    return { analysisText: sections.join('\n'), winner };
  }
  // ─── DB helpers ───────────────────────────────────────────────────────────
  async function loadBattle(battleId: string): Promise<BattleRow | null> {
    const [b] = await sql<BattleRow[]>`
      SELECT id, project_id, battle_type, prompt, status, results_path, winner_contestant_id
      FROM battles WHERE id = ${battleId}
    `;
    return b ?? null;
  }
  async function loadContestants(battleId: string): Promise<ContestantRow[]> {
    return sql<ContestantRow[]>`
      SELECT id, identity, model, lane, status, result_path, duration_ms, tokens_per_sec
      FROM contestants WHERE battle_id = ${battleId}
      ORDER BY created_at ASC
    `;
  }
  // ─── Misc helpers ─────────────────────────────────────────────────────────
  function formatBenchmarkLine(c: ContestantRow): string {
    const parts: string[] = [];
    if (c.duration_ms !== null) parts.push(`${c.duration_ms}ms`);
    if (c.tokens_per_sec !== null) parts.push(`${c.tokens_per_sec.toFixed(1)} tok/s`);
    return parts.length > 0 ? parts.join(', ') : 'no benchmark';
  }
  function publishUser(frame: Record<string, unknown>): void {
    broker.publishUserFrame('default', frame as unknown as WsFrame);
  }
  function sleep(ms: number): Promise<void> {
    return new Promise((resolve) => setTimeout(resolve, ms));
  }
  return { analyze, crossExamine };
 }
 function errMsg(e: unknown): string {
  return e instanceof Error ? e.message : String(e);
 }
--- a/apps/coder/src/services/arena-decisions.ts
+++ b/apps/coder/src/services/arena-decisions.ts
@@ -0,0 +1,186 @@
 /**
 * Pure scheduling and classification decisions for the Arena battle-runner.
 * No database, no IO. Mirrors the pattern of flow-runner-decisions.ts.
 *
 * Vocabulary:
 *   local lane  — llama-swap-backed contestants, run strictly one at a time
 *   cloud lane  — cloud-backed contestants, run all in parallel
 *
 * A contestant's status lifecycle:
 *   queued → running → done | error
 */
 import type { BattleType, ContestantLane } from '@boocode/contracts/arena';
 // ─── Lane classification ──────────────────────────────────────────────────────
 /**
 * Classify a contestant into a lane.
 *
 * Q&A contestants always run on the native (llama-swap) backend → local.
 * Coding contestants: their MODEL is checked against the localModels set
 * (all model IDs served by the local llama-swap server). This means an
 * opencode or qwen contestant pointed at a local model counts as local,
 * which correctly captures GPU-contention and fair benchmarking (ADR 0001).
 *
 * @param battleType  'coding' | 'qa'
 * @param identity    backend name (coding) or persona name (qa) — not used for lane logic
 * @param model       the contestant's model id
 * @param localModels set of model IDs served by the local llama-swap server
 */
 export function classifyLane(
  battleType: BattleType,
  _identity: string,
  model: string,
  localModels: ReadonlySet<string>,
 ): ContestantLane {
  if (battleType === 'qa') return 'local';
  return localModels.has(model) ? 'local' : 'cloud';
 }
 // ─── Local-lane queue ─────────────────────────────────────────────────────────
 export interface ContestantSlot {
  id: string;
  lane: ContestantLane;
  status: string;
 }
 /**
 * The next queued local contestant to dispatch — the first 'queued' contestant
 * in the local lane, in creation order (caller must supply rows in created_at ASC).
 * Returns null when the local queue is empty or all local slots are non-queued.
 */
 export function nextLocalContestant(contestants: readonly ContestantSlot[]): string | null {
  for (const c of contestants) {
    if (c.lane === 'local' && c.status === 'queued') return c.id;
  }
  return null;
 }
 // ─── Battle completion ────────────────────────────────────────────────────────
 /**
 * True when every contestant has reached a terminal state (done | error).
 * Returns false for an empty list — a battle with no contestants never completes.
 */
 export function isBattleComplete(contestants: readonly { status: string }[]): boolean {
  if (contestants.length === 0) return false;
  return contestants.every((c) => c.status === 'done' || c.status === 'error');
 }
 // ─── Benchmark ────────────────────────────────────────────────────────────────
 export interface Benchmark {
  durationMs: number;
  tokensPerSec: number | null;
 }
 /**
 * Compute the benchmark for a contestant.
 * Wall-clock duration is captured for every contestant; tokens/sec is only
 * meaningful for local (llama-swap) contestants where the model has sole
 * access to the GPU and the measurement is fair.
 */
 export function computeBenchmark(
  startedAt: Date,
  endedAt: Date,
  costTokens: number | null,
  lane: ContestantLane,
 ): Benchmark {
  const durationMs = Math.max(0, endedAt.getTime() - startedAt.getTime());
  const tokensPerSec =
    lane === 'local' && costTokens !== null && durationMs > 0
      ? (costTokens / durationMs) * 1000
      : null;
  return { durationMs, tokensPerSec };
 }
 // ─── Slug / path helpers ──────────────────────────────────────────────────────
 /**
 * Sanitize a string for use as a directory name component.
 * Lowercases, replaces non-alphanumeric runs with '-', trims leading/trailing
 * dashes, and caps at 64 characters.
 */
 export function sanitizeSlug(s: string): string {
  return s
    .toLowerCase()
    .replace(/[^a-z0-9]+/g, '-')
    .replace(/^-+|-+$/g, '')
    .slice(0, 64);
 }
 /**
 * Build the dated battle slug used as the Arena results folder name.
 * Format: YYYY-MM-DD-<battleType>-<first-8-hex-of-uuid>
 * Deterministic: callers can rebuild it from (id, type, created_at) on resume.
 */
 export function buildBattleSlug(battleId: string, battleType: BattleType, createdAt: Date): string {
  const date = createdAt.toISOString().slice(0, 10);
  const shortId = battleId.replace(/-/g, '').slice(0, 8);
  return `${date}-${battleType}-${shortId}`;
 }
 /**
 * Build the per-contestant results directory name within a battle folder.
 * Format: <sanitized-identity>-<sanitized-model>
 */
 export function buildContestantDir(identity: string, model: string): string {
  return `${sanitizeSlug(identity)}-${sanitizeSlug(model)}`;
 }
 // ─── Resume reconciliation ────────────────────────────────────────────────────
 export type ContestantResumeAction =
  | 'keep'
  | 're-dispatch'
  | 'mark-done'
  | 'mark-error'
  | 'mark-cancelled';
 export interface ContestantResumeDecision {
  contestantId: string;
  action: ContestantResumeAction;
 }
 /**
 * Decide what to do with ONE contestant during startup resume.
 * Mirrors reconcileResumeStep from flow-runner-decisions.ts.
 *
 * @param status    contestants.status
 * @param taskId    contestants.task_id (null when not yet dispatched)
 * @param taskState tasks.state for taskId, or null if the task row is absent
 */
 export function reconcileContestantResume(
  status: string,
  taskId: string | null,
  taskState: string | null,
 ): ContestantResumeAction {
  if (status !== 'running') return 'keep';
  if (!taskId || taskState === null) return 're-dispatch';
  switch (taskState) {
    case 'completed': return 'mark-done';
    case 'failed':    return 'mark-error';
    case 'cancelled': return 'mark-cancelled';
    case 'pending':   return 'keep'; // dispatcher startup poll will run it normally
    default:          return 're-dispatch'; // 'running'/'blocked' — process is dead
  }
 }
 /**
 * Reconcile every contestant of an in-flight battle for startup resume.
 * Returns one decision per contestant. Pure — no IO.
 */
 export function reconcileContestants(
  contestants: ReadonlyArray<{ contestantId: string; taskId: string | null; status: string }>,
  taskStates: ReadonlyMap<string, string>,
 ): ContestantResumeDecision[] {
  return contestants.map((c) => ({
    contestantId: c.contestantId,
    action: reconcileContestantResume(
      c.status,
      c.taskId,
      c.taskId ? (taskStates.get(c.taskId) ?? null) : null,
    ),
  }));
 }
--- a/apps/coder/src/services/arena-model-call.ts
+++ b/apps/coder/src/services/arena-model-call.ts
@@ -0,0 +1,70 @@
 /**
 * One-shot model completion for the Arena analyzer.
 *
 * Calls the local llama-swap server directly for a single non-streaming
 * completion. Used for the digest and judge stages (always DEFAULT_MODEL)
 * and for local-model cross-examinations (any local model).
 *
 * Mirrors apps/server/src/services/task-model.ts but targets the coder's
 * config shape and uses a longer timeout appropriate for analysis calls.
 */
 import type { Config } from '../config.js';
 const TIMEOUT_MS = 120_000;
 export async function arenaModelCall(opts: {
  config: Pick<Config, 'LLAMA_SWAP_URL'>;
  model: string;
  system: string;
  user: string;
  maxTokens?: number;
  temperature?: number;
 }): Promise<string> {
  const { config, model, system, user } = opts;
  const maxTokens = opts.maxTokens ?? 2_000;
  const temperature = opts.temperature ?? 0.3;
  const res = await fetch(`${config.LLAMA_SWAP_URL}/v1/chat/completions`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model,
      messages: [
        { role: 'system', content: system },
        { role: 'user', content: user },
      ],
      max_tokens: maxTokens,
      temperature,
      stream: false,
      chat_template_kwargs: { enable_thinking: false },
    }),
    signal: AbortSignal.timeout(TIMEOUT_MS),
  });
  if (!res.ok) {
    const text = await res.text().catch(() => '');
    throw new Error(`llama-swap responded ${res.status}: ${text.slice(0, 200)}`);
  }
  const data = (await res.json()) as {
    choices?: Array<{
      message?: { content?: string; reasoning_content?: string };
    }>;
  };
  const choice = data.choices?.[0]?.message;
  if (!choice) return '';
  const content = (choice.content ?? '').trim();
  if (content.length > 0) return content;
  // For thinking-mode models the answer sometimes only lands in reasoning_content.
  const reasoning = (choice.reasoning_content ?? '').trim();
  if (reasoning.length > 0) {
    const lines = reasoning.split('\n').filter((l) => l.trim().length > 0);
    return lines[lines.length - 1] ?? '';
  }
  return '';
 }
--- a/apps/coder/src/services/arena-runner.ts
+++ b/apps/coder/src/services/arena-runner.ts
@@ -0,0 +1,895 @@
 /**
 * Arena battle-runner — DB-backed execution engine for Arena battles.
 *
 * Mirrors flow-runner.ts but implements the Arena's two-lane scheduler instead
 * of the Orchestrator's wave scheduler. Persists to battles/contestants tables
 * (not flow_runs/flow_steps). Each contestant is dispatched as a real tasks row
 * via an injected DispatchContestantFn (Phase 4 wires this to the dispatcher).
 * Advances on the dispatcher's onTaskTerminal hook.
 *
 * Scheduling:
 *   - Cloud lane: all contestants start immediately, in parallel.
 *   - Local lane: contestants run strictly one at a time (serial queue). Only
 *     the first local contestant runs at start; the next is dispatched when the
 *     current one terminates. Both lanes run concurrently with each other.
 *
 * Results:
 *   Written to <projectRoot>/Arena/<battleSlug>/<identity>-<model>/
 *   Coding: result.md + diff.patch (from the contestant's worktree).
 *   Q&A:    result.md with the text answer.
 *
 * Analyzer seam:
 *   onBattleComplete is called when all contestants are terminal. Phase 5 wires
 *   this to the two-stage digest→judge analyzer. A failed contestant does NOT
 *   abort the battle — others continue and the analyzer judges survivors.
 */
 import type { Sql } from '../db.js';
 import type { Broker } from '@boocode/server/broker';
 import type { WsFrame } from '@boocode/contracts/ws-frames';
 import type { FastifyBaseLogger } from 'fastify';
 import type { BattleType, ContestantLane } from '@boocode/contracts/arena';
 import { mkdir, writeFile } from 'node:fs/promises';
 import { join } from 'node:path';
 import { diffWorktree } from './worktrees.js';
 import {
  buildBattleSlug,
  buildContestantDir,
  classifyLane,
  computeBenchmark,
  isBattleComplete,
  nextLocalContestant,
  reconcileContestants,
  type ContestantResumeAction,
  type ContestantSlot,
 } from './arena-decisions.js';
 // ─── Public types ─────────────────────────────────────────────────────────────
 export interface ContestantSpec {
  /** Backend name (coding) or persona name (qa). */
  identity: string;
  model: string;
 }
 export interface BattleStartOpts {
  projectId: string;
  battleType: BattleType;
  prompt: string;
  /** 2–6 contestants. Duplicate (identity, model) pairs are rejected by the schema UNIQUE constraint. */
  contestants: ContestantSpec[];
 }
 /**
 * Injected dispatch function — Phase 4 wires this to the real task inserter.
 * Must INSERT a tasks row and return its id. The arena-runner sets the
 * contestant's task_id and status after this call.
 * `sessionId` is returned when already known (Q&A pre-creates the session);
 * null for coding contestants whose session is created lazily by the dispatcher.
 */
 export type DispatchContestantFn = (opts: {
  projectId: string;
  contestantId: string;
  prompt: string;
  identity: string;
  model: string;
  battleType: BattleType;
 }) => Promise<{ taskId: string; sessionId: string | null }>;
 /**
 * Called once when every contestant in a battle has reached a terminal state.
 * Phase 5 wires this to the two-stage digest→judge analyzer.
 * Must never throw — the caller swallows errors.
 */
 export type OnBattleComplete = (battleId: string) => void;
 /**
 * Called after a cross_examinations row has been inserted, with its id.
 * Phase 5 wires this to the analyzer's cross-examination runner.
 * Must never throw — the caller swallows errors.
 */
 export type OnCrossExamStart = (opts: {
  battleId: string;
  crossExamId: string;
  identity: string;
  model: string;
 }) => void;
 export interface BattleRunner {
  /** Start a battle: persist it + its contestants, classify lanes, dispatch initial wave. */
  startBattle(opts: BattleStartOpts): Promise<{ battleId: string }>;
  /**
   * Wire to createDispatcher({ onTaskTerminal }). Fires when ANY task settles;
   * the runner ignores tasks it doesn't own. Never throws.
   */
  handleTaskTerminal(taskId: string, state: string): void;
  /**
   * Re-advance any battles still marked 'running' after a coder restart.
   * Mirrors flow-runner's initResume (D-9). Never throws.
   */
  initResume(): Promise<void>;
  /**
   * Cancel a running battle. Marks it and all non-terminal contestants cancelled,
   * publishes frames, and returns the task_ids of in-flight contestants so the
   * route can abort them via the dispatcher's cancelExternalTask.
   */
  cancelBattle(battleId: string): Promise<{ cancelled: boolean; taskIds: string[] }>;
  /**
   * Trigger analysis for a completed (or manually re-analyzed) battle.
   * Phase 5 wires this to the two-stage digest→judge analyzer. For now, calls
   * the injected onBattleComplete seam directly.
   */
  triggerAnalysis(battleId: string): Promise<{ triggered: boolean }>;
  /**
   * Start a cross-examination on a battle. Inserts a cross_examinations row and
   * invokes the analyzer seam. Phase 5 fills the actual verdict logic.
   */
  startCrossExam(
    battleId: string,
    opts: { identity: string; model: string },
  ): Promise<{ crossExamId: string }>;
  /**
   * Manually set (or clear) the winner. Validates the contestant belongs to the
   * battle, updates battles.winner_contestant_id, and publishes a battle_updated
   * frame so the pane reflects the override immediately.
   */
  setWinner(battleId: string, winnerId: string | null): Promise<{
    ok: boolean;
    notFound?: boolean;
    invalidContestant?: boolean;
  }>;
 }
 // ─── Internal row shapes ──────────────────────────────────────────────────────
 interface ContestantRow {
  id: string;
  battle_id: string;
  identity: string;
  model: string;
  lane: ContestantLane;
  task_id: string | null;
  worktree_id: string | null;
  status: string;
 }
 interface BattleRow {
  id: string;
  project_id: string;
  battle_type: BattleType;
  prompt: string;
  status: string;
  results_path: string | null;
  created_at: Date;
 }
 // ─── Deps / factory ───────────────────────────────────────────────────────────
 interface Deps {
  sql: Sql;
  broker: Broker;
  log: FastifyBaseLogger;
  dispatch: DispatchContestantFn;
  onBattleComplete: OnBattleComplete;
  /**
   * Called after a cross_examinations row is inserted. Phase 5 wires this to
   * the analyzer's cross-examination runner. Optional: absent → no cross-exam
   * logic runs (stub behaviour for tests).
   */
  onCrossExamStart?: OnCrossExamStart;
  /**
   * Model IDs served by the local llama-swap server. Used for lane classification:
   * a contestant whose model is in this set runs in the local lane (serial, GPU-fair).
   * Q&A contestants are always local regardless of this set.
   * Defaults to an empty set → all coding contestants go to the cloud lane.
   */
  localModels?: ReadonlySet<string>;
 }
 const DEFAULT_LOCAL_MODELS: ReadonlySet<string> = new Set();
 export function createBattleRunner(deps: Deps): BattleRunner {
  const { sql, broker, log, dispatch, onBattleComplete, onCrossExamStart } = deps;
  const localModels = deps.localModels ?? DEFAULT_LOCAL_MODELS;
  // Serialize local-lane advance per battle so two near-simultaneous terminal
  // callbacks don't double-dispatch the next local contestant.
  const advanceChain = new Map<string, Promise<void>>();
  // Delta bridge: per-contestant broker unsubscribe functions.
  // 'terminated' sentinel prevents a late-arriving setupDeltaBridge from
  // registering a subscription that would never be cleaned up.
  const deltaUnsubs = new Map<string, (() => void) | 'terminated'>();
  function publishUser(frame: Record<string, unknown>): void {
    broker.publishUserFrame('default', frame as unknown as WsFrame);
  }
  /**
   * Subscribe to the contestant's inference session and forward delta frames
   * to the user channel as contestant_updated{delta}. Polls for session_id
   * when not immediately known (coding contestants whose session is created
   * lazily by the dispatcher). Unsubscribes on termination or max retries.
   */
  async function setupDeltaBridge(
    battleId: string,
    contestantId: string,
    taskId: string,
    knownSessionId: string | null,
  ): Promise<void> {
    let sessionId = knownSessionId;
    if (!sessionId) {
      // Coding contestant: session_id is written by the dispatcher just before
      // inference starts. Poll until it appears or the contestant terminates.
      for (let i = 0; i < 50; i++) {
        if (deltaUnsubs.get(contestantId) === 'terminated') return;
        const [row] = await sql<{ session_id: string | null }[]>`
          SELECT session_id FROM tasks WHERE id = ${taskId}
        `.catch(() => []);
        if (row?.session_id) { sessionId = row.session_id; break; }
        await new Promise((r) => setTimeout(r, 200));
      }
    }
    if (!sessionId) return;
    if (deltaUnsubs.get(contestantId) === 'terminated') return;
    const unsub = broker.subscribe(sessionId, (frame) => {
      if (frame.type === 'delta') {
        const deltaContent = (frame as unknown as { content?: unknown }).content;
        if (typeof deltaContent === 'string') {
          publishUser({
            type: 'contestant_updated',
            battle_id: battleId,
            contestant_id: contestantId,
            delta: deltaContent,
          });
        }
      }
    });
    const existing = deltaUnsubs.get(contestantId);
    if (existing === 'terminated') {
      unsub();
    } else {
      deltaUnsubs.set(contestantId, unsub);
    }
  }
  function teardownDeltaBridge(contestantId: string): void {
    const entry = deltaUnsubs.get(contestantId);
    if (typeof entry === 'function') {
      entry();
      deltaUnsubs.delete(contestantId);
    } else {
      deltaUnsubs.set(contestantId, 'terminated');
    }
  }
  // ─── startBattle ────────────────────────────────────────────────────────────
  async function startBattle(opts: BattleStartOpts): Promise<{ battleId: string }> {
    if (opts.contestants.length < 2 || opts.contestants.length > 6) {
      throw new Error(`battle requires 2–6 contestants; got ${opts.contestants.length}`);
    }
    const [proj] = await sql<{ path: string }[]>`SELECT path FROM projects WHERE id = ${opts.projectId}`;
    if (!proj) throw new Error(`project not found: ${opts.projectId}`);
    // Insert the battle row as 'running'; update results_path once we have the id.
    const [battle] = await sql<{ id: string; created_at: Date }[]>`
      INSERT INTO battles (project_id, battle_type, prompt, status)
      VALUES (${opts.projectId}, ${opts.battleType}, ${opts.prompt}, 'running')
      RETURNING id, created_at
    `;
    const battleId = battle!.id;
    const battleSlug = buildBattleSlug(battleId, opts.battleType, battle!.created_at);
    const resultsPath = join(proj.path, 'Arena', battleSlug);
    await sql`
      UPDATE battles SET results_path = ${resultsPath}, updated_at = clock_timestamp()
      WHERE id = ${battleId}
    `;
    // Insert all contestant rows with lane classification.
    const contestantRows: Array<{ id: string; identity: string; model: string; lane: ContestantLane }> = [];
    for (const spec of opts.contestants) {
      const lane = classifyLane(opts.battleType, spec.identity, spec.model, localModels);
      const [row] = await sql<{ id: string }[]>`
        INSERT INTO contestants (battle_id, identity, model, lane, status)
        VALUES (${battleId}, ${spec.identity}, ${spec.model}, ${lane}, 'queued')
        RETURNING id
      `;
      contestantRows.push({ id: row!.id, identity: spec.identity, model: spec.model, lane });
    }
    // Write initial manifest so the results folder is always populated.
    await writeManifest(
      battleId, resultsPath, opts.battleType, opts.prompt, battle!.created_at,
      contestantRows.map((c) => ({ identity: c.identity, model: c.model, lane: c.lane })),
      null,
    ).catch((err) => {
      log.warn({ err: errMsg(err), battleId }, 'arena-runner: initial manifest write failed');
    });
    publishUser({
      type: 'battle_started',
      battle_id: battleId,
      battle_type: opts.battleType,
      prompt: opts.prompt,
      contestants: contestantRows.map((c) => ({
        id: c.id,
        identity: c.identity,
        model: c.model,
        lane: c.lane,
      })),
    });
    // Dispatch: cloud lane starts all contestants in parallel; local lane starts
    // only the first queued contestant (serial queue).
    let localStarted = false;
    for (const c of contestantRows) {
      if (c.lane === 'cloud') {
        await dispatchContestant(battleId, opts.projectId, opts.battleType, opts.prompt, c);
      } else if (!localStarted) {
        await dispatchContestant(battleId, opts.projectId, opts.battleType, opts.prompt, c);
        localStarted = true;
        // remaining local contestants stay 'queued' until this one finishes
      }
    }
    return { battleId };
  }
  async function dispatchContestant(
    battleId: string,
    projectId: string,
    battleType: BattleType,
    prompt: string,
    c: { id: string; identity: string; model: string; lane: ContestantLane },
  ): Promise<void> {
    const { taskId, sessionId } = await dispatch({
      projectId,
      contestantId: c.id,
      prompt,
      identity: c.identity,
      model: c.model,
      battleType,
    });
    await sql`
      UPDATE contestants
      SET task_id = ${taskId}, status = 'running', updated_at = clock_timestamp()
      WHERE id = ${c.id}
    `;
    publishContestantFrame(battleId, c.id, { status: 'running' });
    // Start the delta bridge in the background; unsubscribe when the contestant
    // terminates (teardownDeltaBridge called in handleTaskTerminal).
    void setupDeltaBridge(battleId, c.id, taskId, sessionId ?? null);
  }
  // ─── local-lane advance (serialized per battle) ───────────────────────────
  function advanceLocalLane(battleId: string): Promise<void> {
    const prev = advanceChain.get(battleId) ?? Promise.resolve();
    const next = prev
      .catch(() => {})
      .then(() =>
        advanceLocalLaneInner(battleId).catch((err) => {
          log.error({ err: errMsg(err), battleId }, 'arena-runner: advanceLocalLane failed');
        }),
      );
    advanceChain.set(battleId, next);
    void next.finally(() => {
      if (advanceChain.get(battleId) === next) advanceChain.delete(battleId);
    });
    return next;
  }
  async function advanceLocalLaneInner(battleId: string): Promise<void> {
    const battle = await loadBattle(battleId);
    if (!battle || battle.status !== 'running') return;
    const contestants = await loadContestants(battleId);
    const slots: ContestantSlot[] = contestants.map((c) => ({
      id: c.id,
      lane: c.lane,
      status: c.status,
    }));
    // Nothing to do if the local lane is still busy.
    const localRunning = slots.some((c) => c.lane === 'local' && c.status === 'running');
    if (localRunning) return;
    const nextId = nextLocalContestant(slots);
    if (!nextId) return; // local queue is exhausted
    const next = contestants.find((c) => c.id === nextId)!;
    await dispatchContestant(battleId, battle.project_id, battle.battle_type, battle.prompt, {
      id: next.id,
      identity: next.identity,
      model: next.model,
      lane: next.lane,
    });
  }
  // ─── handleTaskTerminal ───────────────────────────────────────────────────
  function handleTaskTerminal(taskId: string, state: string): void {
    void (async () => {
      // Look up which contestant owns this task (contestants_task_id_idx).
      const [row] = await sql<ContestantRow[]>`
        SELECT id, battle_id, identity, model, lane, task_id, worktree_id, status
        FROM contestants WHERE task_id = ${taskId}
      `;
      if (!row) return; // not an arena task — ignore
      if (row.status !== 'running') return; // already settled (idempotent)
      const battle = await loadBattle(row.battle_id);
      // Pull the task row for benchmark + output.
      const [task] = await sql<{
        chat_id: string | null;
        started_at: Date | null;
        ended_at: Date | null;
        cost_tokens: number | null;
      }[]>`SELECT chat_id, started_at, ended_at, cost_tokens FROM tasks WHERE id = ${taskId}`;
      const endedAt = task?.ended_at ?? new Date();
      if (state === 'completed') {
        const startedAt = task?.started_at ?? endedAt;
        const bench = computeBenchmark(startedAt, endedAt, task?.cost_tokens ?? null, row.lane);
        const output = task?.chat_id ? await readChatOutput(task.chat_id) : '';
        const resultPath = battle
          ? await writeContestantResults(battle, row, output, bench).catch((err) => {
              log.warn({ err: errMsg(err), contestantId: row.id }, 'arena-runner: result write failed');
              return null;
            })
          : null;
        await sql`
          UPDATE contestants
          SET status      = 'done',
              duration_ms = ${Math.round(bench.durationMs)},
              tokens_per_sec = ${bench.tokensPerSec},
              cost_tokens = ${task?.cost_tokens ?? null},
              result_path = ${resultPath},
              updated_at  = clock_timestamp()
          WHERE id = ${row.id} AND status = 'running'
        `;
        teardownDeltaBridge(row.id);
        // Check if this was the last contestant.
        const allContestants = await loadContestants(row.battle_id);
        const battleDone = isBattleComplete(allContestants);
        publishContestantFrame(row.battle_id, row.id, {
          status: 'done',
          duration_ms: Math.round(bench.durationMs),
          ...(bench.tokensPerSec !== null ? { tokens_per_sec: bench.tokensPerSec } : {}),
          ...(battleDone ? { battle_status: 'completed' } : {}),
        });
        if (battleDone) {
          await completeBattle(row.battle_id);
        } else if (row.lane === 'local') {
          void advanceLocalLane(row.battle_id);
        }
      } else {
        // failed or cancelled — the contest continues; this contestant is error.
        const errorMsg = state === 'cancelled' ? 'cancelled' : `task ${state}`;
        await sql`
          UPDATE contestants
          SET status = 'error', error = ${errorMsg}, updated_at = clock_timestamp()
          WHERE id = ${row.id} AND status = 'running'
        `;
        teardownDeltaBridge(row.id);
        const allContestants = await loadContestants(row.battle_id);
        const battleDone = isBattleComplete(allContestants);
        publishContestantFrame(row.battle_id, row.id, {
          status: 'error',
          error: errorMsg,
          ...(battleDone ? { battle_status: 'completed' } : {}),
        });
        if (battleDone) {
          await completeBattle(row.battle_id);
        } else if (row.lane === 'local') {
          void advanceLocalLane(row.battle_id);
        }
      }
    })().catch((err) => {
      log.error({ err: errMsg(err), taskId }, 'arena-runner: handleTaskTerminal failed');
    });
  }
  // ─── battle finalization ──────────────────────────────────────────────────
  async function completeBattle(battleId: string): Promise<void> {
    const updated = await sql`
      UPDATE battles SET status = 'completed', updated_at = clock_timestamp()
      WHERE id = ${battleId} AND status = 'running'
    `;
    if (updated.count === 0) return; // already terminal (race guard)
    log.info({ battleId }, 'arena-runner: battle completed');
    // Update manifest with finished_at timestamp.
    const completedBattle = await loadBattle(battleId);
    if (completedBattle?.results_path) {
      const contestants = await loadContestants(battleId);
      await writeManifest(
        battleId,
        completedBattle.results_path,
        completedBattle.battle_type,
        completedBattle.prompt,
        completedBattle.created_at,
        contestants.map((c) => ({ identity: c.identity, model: c.model, lane: c.lane })),
        new Date(),
      ).catch((err) => {
        log.warn({ err: errMsg(err), battleId }, 'arena-runner: manifest update failed');
      });
    }
    onBattleComplete(battleId);
  }
  // ─── manifest writer ─────────────────────────────────────────────────────
  async function writeManifest(
    battleId: string,
    resultsPath: string,
    battleType: BattleType,
    prompt: string,
    createdAt: Date,
    contestants: Array<{ identity: string; model: string; lane: ContestantLane }>,
    finishedAt: Date | null,
  ): Promise<void> {
    await mkdir(resultsPath, { recursive: true });
    const manifest = {
      id: battleId,
      battle_type: battleType,
      prompt,
      contestants,
      created_at: createdAt.toISOString(),
      finished_at: finishedAt?.toISOString() ?? null,
    };
    await writeFile(join(resultsPath, 'manifest.json'), JSON.stringify(manifest, null, 2), 'utf8');
  }
  // ─── results writer ───────────────────────────────────────────────────────
  async function writeContestantResults(
    battle: BattleRow,
    contestant: { identity: string; model: string; lane: ContestantLane; worktree_id: string | null },
    output: string,
    bench: { durationMs: number; tokensPerSec: number | null },
  ): Promise<string> {
    const resultsPath = await getOrBuildResultsPath(battle);
    if (!resultsPath) throw new Error('cannot resolve results path for battle ' + battle.id);
    const contestantDir = buildContestantDir(contestant.identity, contestant.model);
    const dir = join(resultsPath, contestantDir);
    await mkdir(dir, { recursive: true });
    const benchLines = [
      `duration: ${bench.durationMs}ms`,
      bench.tokensPerSec != null ? `tokens/sec: ${bench.tokensPerSec.toFixed(1)}` : null,
    ]
      .filter(Boolean)
      .join('\n');
    const resultMd =
      `# ${contestant.identity} / ${contestant.model}\n\n` +
      `## Benchmark\n\n${benchLines}\n\n` +
      `## Output\n\n${output}\n`;
    await writeFile(join(dir, 'result.md'), resultMd, 'utf8');
    if (battle.battle_type === 'coding' && contestant.worktree_id) {
      const [wt] = await sql<{ path: string; base_commit: string | null }[]>`
        SELECT path, base_commit FROM worktrees WHERE id = ${contestant.worktree_id}
      `;
      if (wt) {
        const [proj] = await sql<{ path: string }[]>`
          SELECT path FROM projects WHERE id = ${battle.project_id}
        `;
        if (proj) {
          const diff = await diffWorktree(wt.path, proj.path, {
            baseRef: wt.base_commit ?? undefined,
          }).catch(() => '');
          await writeFile(join(dir, 'diff.patch'), diff, 'utf8');
        }
      }
    }
    return dir;
  }
  /** Resolve or rebuild results_path for a battle (handles crash-before-UPDATE). */
  async function getOrBuildResultsPath(battle: BattleRow): Promise<string | null> {
    if (battle.results_path) return battle.results_path;
    const [proj] = await sql<{ path: string }[]>`SELECT path FROM projects WHERE id = ${battle.project_id}`;
    if (!proj) return null;
    const slug = buildBattleSlug(battle.id, battle.battle_type, battle.created_at);
    const resultsPath = join(proj.path, 'Arena', slug);
    await sql`
      UPDATE battles SET results_path = ${resultsPath}, updated_at = clock_timestamp()
      WHERE id = ${battle.id}
    `;
    return resultsPath;
  }
  // ─── helpers ──────────────────────────────────────────────────────────────
  async function readChatOutput(chatId: string): Promise<string> {
    const [m] = await sql<{ content: string | null }[]>`
      SELECT content FROM messages
      WHERE chat_id = ${chatId} AND role = 'assistant'
      ORDER BY created_at DESC LIMIT 1
    `;
    return m?.content ?? '';
  }
  async function loadBattle(battleId: string): Promise<BattleRow | null> {
    const [b] = await sql<BattleRow[]>`
      SELECT id, project_id, battle_type, prompt, status, results_path, created_at
      FROM battles WHERE id = ${battleId}
    `;
    return b ?? null;
  }
  async function loadContestants(battleId: string): Promise<ContestantRow[]> {
    return sql<ContestantRow[]>`
      SELECT id, battle_id, identity, model, lane, task_id, worktree_id, status
      FROM contestants WHERE battle_id = ${battleId}
      ORDER BY created_at ASC
    `;
  }
  function publishContestantFrame(
    battleId: string,
    contestantId: string,
    extra: Record<string, unknown>,
  ): void {
    publishUser({
      type: 'contestant_updated',
      battle_id: battleId,
      contestant_id: contestantId,
      ...extra,
    });
  }
  // ─── initResume ───────────────────────────────────────────────────────────
  async function initResume(): Promise<void> {
    const battles = await sql<BattleRow[]>`
      SELECT id, project_id, battle_type, prompt, status, results_path, created_at
      FROM battles WHERE status = 'running'
    `;
    if (battles.length === 0) return;
    log.info({ count: battles.length }, 'arena-runner: resuming in-flight battles on startup');
    for (const battle of battles) {
      await resumeBattle(battle).catch((err) => {
        log.error({ err: errMsg(err), battleId: battle.id }, 'arena-runner: initResume failed for battle');
      });
    }
  }
  async function resumeBattle(battle: BattleRow): Promise<void> {
    const contestants = await loadContestants(battle.id);
    const taskIds = contestants.map((c) => c.task_id).filter((id): id is string => id !== null);
    const taskStates = new Map<string, string>();
    if (taskIds.length > 0) {
      const tasks = await sql<{ id: string; state: string }[]>`
        SELECT id, state FROM tasks WHERE id = ANY(${taskIds})
      `;
      for (const t of tasks) taskStates.set(t.id, t.state);
    }
    const decisions = reconcileContestants(
      contestants.map((c) => ({ contestantId: c.id, taskId: c.task_id, status: c.status })),
      taskStates,
    );
    for (const decision of decisions) {
      if (decision.action === 'keep') continue;
      const contestant = contestants.find((c) => c.id === decision.contestantId)!;
      await applyResumeDecision(battle, contestant, decision.action);
    }
    // Re-check completion after applying decisions.
    const updated = await loadContestants(battle.id);
    if (isBattleComplete(updated)) {
      await completeBattle(battle.id);
    } else {
      // Advance local lane in case a slot opened up.
      void advanceLocalLane(battle.id);
    }
    log.info({ battleId: battle.id }, 'arena-runner: battle resumed');
  }
  async function applyResumeDecision(
    battle: BattleRow,
    contestant: ContestantRow,
    action: ContestantResumeAction,
  ): Promise<void> {
    switch (action) {
      case 'keep': break;
      case 'mark-done': {
        const taskRow = contestant.task_id
          ? (await sql<{ started_at: Date | null; ended_at: Date | null; cost_tokens: number | null; chat_id: string | null }[]>`
              SELECT started_at, ended_at, cost_tokens, chat_id FROM tasks WHERE id = ${contestant.task_id}`)[0]
          : null;
        const endedAt = taskRow?.ended_at ?? new Date();
        const startedAt = taskRow?.started_at ?? endedAt;
        const bench = computeBenchmark(startedAt, endedAt, taskRow?.cost_tokens ?? null, contestant.lane);
        const output = taskRow?.chat_id ? await readChatOutput(taskRow.chat_id) : '';
        const resultPath = battle
          ? await writeContestantResults(battle, contestant, output, bench).catch((err) => {
              log.warn({ err: errMsg(err), contestantId: contestant.id }, 'arena-runner: resume result write failed');
              return null;
            })
          : null;
        await sql`
          UPDATE contestants
          SET status = 'done',
              duration_ms = ${Math.round(bench.durationMs)},
              tokens_per_sec = ${bench.tokensPerSec},
              result_path = ${resultPath},
              updated_at = clock_timestamp()
          WHERE id = ${contestant.id}
        `;
        break;
      }
      case 'mark-error':
        await sql`
          UPDATE contestants
          SET status = 'error', error = 'task failed before callback',
              updated_at = clock_timestamp()
          WHERE id = ${contestant.id}
        `;
        break;
      case 'mark-cancelled':
        await sql`
          UPDATE contestants
          SET status = 'error', error = 'cancelled before callback',
              updated_at = clock_timestamp()
          WHERE id = ${contestant.id}
        `;
        break;
      case 're-dispatch': {
        const { taskId } = await dispatch({
          projectId: battle.project_id,
          contestantId: contestant.id,
          prompt: battle.prompt,
          identity: contestant.identity,
          model: contestant.model,
          battleType: battle.battle_type,
        });
        await sql`
          UPDATE contestants
          SET task_id = ${taskId}, updated_at = clock_timestamp()
          WHERE id = ${contestant.id}
        `;
        log.info(
          { battleId: battle.id, contestantId: contestant.id, taskId },
          'arena-runner: contestant re-dispatched on resume',
        );
        break;
      }
    }
  }
  // ─── cancelBattle ─────────────────────────────────────────────────────────
  async function cancelBattle(battleId: string): Promise<{ cancelled: boolean; taskIds: string[] }> {
    const updated = await sql`
      UPDATE battles SET status = 'cancelled', updated_at = clock_timestamp()
      WHERE id = ${battleId} AND status = 'running'
    `;
    if (updated.count === 0) return { cancelled: false, taskIds: [] };
    // Mark all non-terminal contestants cancelled and collect in-flight task_ids.
    const contestants = await sql<{ id: string; task_id: string | null; status: string }[]>`
      SELECT id, task_id, status FROM contestants
      WHERE battle_id = ${battleId} AND status NOT IN ('done', 'error')
    `;
    if (contestants.length > 0) {
      await sql`
        UPDATE contestants
        SET status = 'error', error = 'battle cancelled', updated_at = clock_timestamp()
        WHERE battle_id = ${battleId} AND status NOT IN ('done', 'error')
      `;
      for (const c of contestants) {
        publishContestantFrame(battleId, c.id, {
          status: 'error',
          error: 'battle cancelled',
          battle_status: 'cancelled',
        });
      }
    }
    const taskIds = contestants
      .filter(
        (c): c is typeof c & { task_id: string } =>
          c.task_id !== null && c.status === 'running',
      )
      .map((c) => c.task_id);
    log.info({ battleId }, 'arena-runner: battle cancelled by request');
    return { cancelled: true, taskIds };
  }
  // ─── triggerAnalysis (Phase 5 seam) ──────────────────────────────────────
  async function triggerAnalysis(battleId: string): Promise<{ triggered: boolean }> {
    const battle = await loadBattle(battleId);
    if (!battle) return { triggered: false };
    log.info({ battleId }, 'arena-runner: triggerAnalysis requested');
    // Calls the injected onBattleComplete seam — Phase 5 replaces this with the
    // real two-stage digest→judge analyzer (see ADR 0002 + plan Phase 5).
    onBattleComplete(battleId);
    return { triggered: true };
  }
  // ─── startCrossExam (Phase 5 seam) ───────────────────────────────────────
  async function startCrossExam(
    battleId: string,
    opts: { identity: string; model: string },
  ): Promise<{ crossExamId: string }> {
    const [row] = await sql<{ id: string }[]>`
      INSERT INTO cross_examinations (battle_id, identity, model)
      VALUES (${battleId}, ${opts.identity}, ${opts.model})
      RETURNING id
    `;
    const crossExamId = row!.id;
    log.info({ battleId, crossExamId, ...opts }, 'arena-runner: cross-exam inserted, triggering analyzer');
    if (onCrossExamStart) {
      try {
        onCrossExamStart({ battleId, crossExamId, identity: opts.identity, model: opts.model });
      } catch (err) {
        log.error({ err: err instanceof Error ? err.message : String(err), battleId, crossExamId }, 'arena-runner: onCrossExamStart threw');
      }
    }
    return { crossExamId };
  }
  // ─── setWinner (user override) ────────────────────────────────────────────
  async function setWinner(
    battleId: string,
    winnerId: string | null,
  ): Promise<{ ok: boolean; notFound?: boolean; invalidContestant?: boolean }> {
    const [row] = await sql<{ id: string }[]>`SELECT id FROM battles WHERE id = ${battleId}`;
    if (!row) return { ok: false, notFound: true };
    if (winnerId !== null) {
      const [c] = await sql<{ id: string }[]>`
        SELECT id FROM contestants WHERE id = ${winnerId} AND battle_id = ${battleId}
      `;
      if (!c) return { ok: false, invalidContestant: true };
    }
    await sql`
      UPDATE battles SET winner_contestant_id = ${winnerId}, updated_at = clock_timestamp()
      WHERE id = ${battleId}
    `;
    publishUser({ type: 'battle_updated', battle_id: battleId, winner_contestant_id: winnerId });
    return { ok: true };
  }
  return { startBattle, handleTaskTerminal, initResume, cancelBattle, triggerAnalysis, startCrossExam, setWinner };
 }
 function errMsg(e: unknown): string {
  return e instanceof Error ? e.message : String(e);
 }
--- a/apps/coder/src/services/claude-command-discovery.ts
+++ b/apps/coder/src/services/claude-command-discovery.ts
@@ -12,12 +12,48 @@ import { homedir } from 'node:os';
 import { join } from 'node:path';
 import type { AgentCommand } from './provider-types.js';
-/** Minimal frontmatter reader — single-line `key: value` between `---` fences. */
+/**
 * Frontmatter reader between `---` fences. Handles single-line `key: value`
 * AND YAML block scalars (`key: >` folded / `key: |` literal) whose value
 * spans the following more-indented lines — the shape most plugin SKILL.md
 * descriptions use (`description: >`).
 */
 function frontmatterField(content: string, field: string): string | undefined {
  const block = content.match(/^---\r?\n([\s\S]*?)\r?\n---/);
  if (!block?.[1]) return undefined;
-  const m = block[1].match(new RegExp(`^${field}:\\s*(.+)$`, 'm'));
+  const lines = block[1].split(/\r?\n/);
-  return m?.[1]?.trim().replace(/^["']|["']$/g, '') || undefined;
+  const keyRe = new RegExp(`^(\\s*)${field}:\\s*(.*)$`);
  for (let i = 0; i < lines.length; i++) {
    const m = lines[i]?.match(keyRe);
    if (!m) continue;
    const keyIndent = (m[1] ?? '').length;
    const inline = (m[2] ?? '').trim();
    // Block scalar: `>` (folded) or `|` (literal), optional chomping `+`/`-`.
    if (/^[>|][+-]?$/.test(inline)) {
      const folded = inline[0] === '>';
      const body: string[] = [];
      for (let j = i + 1; j < lines.length; j++) {
        const line = lines[j] ?? '';
        if (line.trim() === '') {
          body.push('');
          continue;
        }
        const indent = line.length - line.trimStart().length;
        if (indent <= keyIndent) break; // dedent ends the block
        body.push(line.slice(keyIndent + 1));
      }
      const joined = folded
        ? body
            .map((l) => l.trim())
            .join(' ')
            .replace(/\s+/g, ' ')
            .trim()
        : body.join('\n').replace(/\n+$/, '');
      return joined || undefined;
    }
    return inline.replace(/^["']|["']$/g, '').trim() || undefined;
  }
  return undefined;
 }
 function readCommandDir(dir: string): AgentCommand[] {
--- a/apps/coder/src/services/dispatcher.ts
+++ b/apps/coder/src/services/dispatcher.ts
@@ -4,6 +4,7 @@ import type { Broker } from '@boocode/server/broker';
 import type { WsFrame } from '@boocode/contracts/ws-frames';
 import type { Config } from '../config.js';
 import { createWorktree, diffWorktree, cleanupWorktree, ensureSessionWorktree } from './worktrees.js';
 import { applyAll } from './pending_changes.js';
 import { createCheckpoint } from './checkpoints.js';
 import { makeDcpStreamStripper } from './dcp-strip.js';
 import { dispatchViaAcp } from './acp-dispatch.js';
@@ -305,10 +306,13 @@ export function createDispatcher(deps: Deps): {
  // ─── Path A: Native Inference ───────────────────────────────────────────────
-  async function runNativeInference(task: { id: string; project_id: string; input: string; agent: string | null; model: string | null; session_id: string | null }): Promise<void> {
+  async function runNativeInference(task: { id: string; project_id: string; input: string; agent: string | null; model: string | null; mode_id: string | null; session_id: string | null }): Promise<void> {
    const taskId = task.id;
    log.info({ taskId }, 'dispatcher: starting task (path A — native)');
    // Declared before try so the catch block can write it back on the task row.
    let chatId: string | null = null;
    try {
      // Mark running
      await sql`
@@ -317,26 +321,29 @@ export function createDispatcher(deps: Deps): {
        WHERE id = ${taskId}
      `;
-      // Create session + chat for this task
+      // Session setup: reuse a pre-created session (e.g. Q&A arena contestants
      // whose persona is stamped on the session via agent_id) or create a fresh one.
      const model = task.model ?? config.DEFAULT_MODEL;
-      const sessionName = 'Task: ' + task.input.slice(0, 40);
+      let sessionId: string;
-
+      if (task.session_id) {
-      const [session] = await sql<{ id: string }[]>`
+        sessionId = task.session_id;
-        INSERT INTO sessions (project_id, name, model, status)
+      } else {
-        VALUES (${task.project_id}, ${sessionName}, ${model}, 'open')
+        const sessionName = 'Task: ' + task.input.slice(0, 40);
-        RETURNING id
+        const [session] = await sql<{ id: string }[]>`
-      `;
+          INSERT INTO sessions (project_id, name, model, status)
-      const sessionId = session!.id;
+          VALUES (${task.project_id}, ${sessionName}, ${model}, 'open')
          RETURNING id
        `;
        sessionId = session!.id;
        await sql`UPDATE tasks SET session_id = ${sessionId} WHERE id = ${taskId}`;
      }
      const [chat] = await sql<{ id: string }[]>`
        INSERT INTO chats (session_id, name, status)
        VALUES (${sessionId}, 'Task execution', 'open')
        RETURNING id
      `;
-      const chatId = chat!.id;
+      chatId = chat!.id;
      // Link task to session
      await sql`UPDATE tasks SET session_id = ${sessionId} WHERE id = ${taskId}`;
      // Create user message + streaming assistant
      await sql<{ id: string }[]>`
@@ -381,10 +388,26 @@ export function createDispatcher(deps: Deps): {
        const summary = (msg?.content ?? '').slice(0, 500);
        await sql`
          UPDATE tasks
-          SET state = 'completed', ended_at = clock_timestamp(), output_summary = ${summary}, cost_tokens = ${costTokens}
+          SET state = 'completed', ended_at = clock_timestamp(), output_summary = ${summary}, cost_tokens = ${costTokens}, chat_id = ${chatId}
          WHERE id = ${taskId}
        `;
        log.info({ taskId, costTokens }, 'dispatcher: task completed (native)');
        // Bypass permission mode: auto-apply the staged edits to disk after the
        // turn. Ask/Plan leave them in the pending-changes queue for review.
        if (task.mode_id === 'bypass') {
          try {
            const [proj] = await sql<{ path: string }[]>`SELECT path FROM projects WHERE id = ${task.project_id}`;
            if (proj?.path) {
              const applied = await applyAll(sql, sessionId, proj.path);
              log.info({ taskId, applied: applied.length }, 'dispatcher: native bypass auto-applied pending changes');
            }
          } catch (applyErr) {
            log.warn(
              { taskId, err: applyErr instanceof Error ? applyErr.message : String(applyErr) },
              'dispatcher: native bypass auto-apply failed',
            );
          }
        }
      } else {
        const [msg] = await sql<{ content: string | null }[]>`
          SELECT content FROM messages WHERE id = ${assistantId}
@@ -392,7 +415,7 @@ export function createDispatcher(deps: Deps): {
        const summary = (msg?.content ?? 'Inference failed').slice(0, 500);
        await sql`
          UPDATE tasks
-          SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${summary}, cost_tokens = ${costTokens}
+          SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${summary}, cost_tokens = ${costTokens}, chat_id = ${chatId}
          WHERE id = ${taskId}
        `;
        log.warn({ taskId, finalStatus }, 'dispatcher: task failed (native)');
@@ -402,7 +425,7 @@ export function createDispatcher(deps: Deps): {
      log.error({ taskId, err: errMsg }, 'dispatcher: task error (native)');
      await sql`
        UPDATE tasks
-        SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)}
+        SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)}, chat_id = ${chatId}
        WHERE id = ${taskId}
      `.catch(() => {});
    }
--- a/apps/coder/src/services/provider-manifest.ts
+++ b/apps/coder/src/services/provider-manifest.ts
@@ -32,6 +32,18 @@ const QWEN_PTY_MODES: ProviderMode[] = [
  { id: 'yolo', label: 'YOLO', description: 'Auto-approve all tools', isUnattended: true },
 ];
 // Native BooCode (llama-swap) has no agent-native mode vocabulary, so we define
 // one that matches the unified permission ladder. `bypass` is the only mode that
 // changes behavior (auto-apply staged edits after the turn — dispatcher.ts);
 // `plan` falls back to `ask` semantics for native (writes still stage to the
 // pending-changes queue). External agents map the same three unified modes onto
 // THEIR native ids via the `plan`-id / default / `isUnattended` shape.
 const BOOCODE_MODES: ProviderMode[] = [
  { id: 'plan', label: 'Plan', description: 'Read-only analysis (native BooCode falls back to Ask)' },
  { id: 'ask', label: 'Ask Permission', description: 'Stage edits to the pending-changes queue for review' },
  { id: 'bypass', label: 'Bypass', description: 'Auto-apply edits to disk after the turn', isUnattended: true },
 ];
 const CLAUDE_THINKING = [
  { id: 'low', label: 'Low' },
  { id: 'medium', label: 'Medium' },
@@ -41,6 +53,10 @@ const CLAUDE_THINKING = [
 ];
 export const PROVIDER_MANIFEST: Record<string, ProviderManifestEntry> = {
  boocode: {
    defaultModeId: 'ask',
    modes: BOOCODE_MODES,
  },
  claude: {
    defaultModeId: 'default',
    modes: CLAUDE_MODES,
--- a/apps/coder/src/services/provider-snapshot.ts
+++ b/apps/coder/src/services/provider-snapshot.ts
@@ -122,12 +122,14 @@ async function buildProviderEntry(
    };
  }
-  // 2. Native boocode → always ready (llama-swap models).
+  // 2. Native boocode → always ready (llama-swap models). Exposes the unified
  // permission modes (plan/ask/bypass) so the composer's permission picker works
  // for native BooCode too; `bypass` auto-applies staged edits (dispatcher.ts).
  if (isNative) {
    return {
      name, label: resolved.label, transport, status: 'ready',
-      enabled: true, installed: true, models: withConfigModels(llamaModels), modes: [],
+      enabled: true, installed: true, models: withConfigModels(llamaModels),
-      defaultModeId: null, commands: manifestCommands,
+      modes: fallbackModes, defaultModeId, commands: manifestCommands,
    };
  }
--- a/apps/server/src/schema.sql
+++ b/apps/server/src/schema.sql
@@ -372,13 +372,12 @@ ALTER TABLE messages ADD COLUMN IF NOT EXISTS tail_start_id UUID REFERENCES mess
 ALTER TABLE chats ADD COLUMN IF NOT EXISTS needs_compaction BOOLEAN NOT NULL DEFAULT FALSE;
 CREATE INDEX IF NOT EXISTS idx_messages_chat_compacted ON messages (chat_id, compacted_at);
-- tasks table (provider dispatch, arena)
+-- tasks table (provider dispatch)
 CREATE TABLE IF NOT EXISTS tasks (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  project_id        UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
  session_id        UUID REFERENCES sessions(id) ON DELETE CASCADE,
  parent_task_id    UUID REFERENCES tasks(id),
  arena_id          UUID,
  state             TEXT NOT NULL DEFAULT 'pending'
                    CHECK (state IN ('pending','running','completed','failed','blocked','cancelled')),
  input             TEXT NOT NULL,
@@ -405,3 +404,6 @@ DO $$ BEGIN
      FOREIGN KEY (session_id) REFERENCES sessions(id) ON DELETE CASCADE;
  END IF;
 END $$;
 -- Remove the v2.0.5 arena_id column (replaced by the new Arena feature).
 ALTER TABLE tasks DROP COLUMN IF EXISTS arena_id;
--- a/apps/server/src/services/inference/types.ts
+++ b/apps/server/src/services/inference/types.ts
@@ -44,7 +44,11 @@ export interface InferenceFrame {
    | 'chat_renamed'
    | 'error'
    | 'flow_run_started'
-    | 'flow_run_step_updated';
+    | 'flow_run_step_updated'
    // arena frames
    | 'battle_started'
    | 'contestant_updated'
    | 'battle_updated';
  message_id?: string;
  message_ids?: string[];
  chat_id?: string;
@@ -84,6 +88,19 @@ export interface InferenceFrame {
  status?: string;
  run_status?: 'running' | 'completed' | 'failed' | 'cancelled';
  report?: string;
  // arena frames
  battle_id?: string;
  battle_type?: 'coding' | 'qa';
  prompt?: string;
  contestants?: Array<{ id: string; identity: string; model: string; lane: 'local' | 'cloud' }>;
  contestant_id?: string;
  battle_status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
  duration_ms?: number;
  tokens_per_sec?: number;
  winner_contestant_id?: string | null;
  analysis_ready?: boolean;
  cross_exam_id?: string;
  delta?: string;
 }
 export type FramePublisher = (sessionId: string, frame: InferenceFrame) => void;
--- a/apps/web/src/App.tsx
+++ b/apps/web/src/App.tsx
@@ -16,6 +16,7 @@ import { RightRailDrawerProvider, useRightRailDrawer } from '@/hooks/useRightRai
 import { useViewport } from '@/hooks/useViewport';
 import { ThemeFx } from '@/components/fx/ThemeFx';
 import { FlowLauncherDialog } from '@/components/FlowLauncherDialog';
 import { ArenaLauncherDialog } from '@/components/ArenaLauncherDialog';
 function SessionRightRail() {
  const { id } = useParams<{ id: string }>();
@@ -102,6 +103,7 @@ function AppShell() {
        </Routes>
        <Toaster position="bottom-right" />
        <FlowLauncherDialog />
        <ArenaLauncherDialog />
      </div>
    </>
  );
--- a/apps/web/src/api/client.ts
+++ b/apps/web/src/api/client.ts
@@ -27,6 +27,9 @@ import type {
  WorkspaceState,
  FlowRunRow,
  FlowStepRow,
  BattleShape,
  ContestantShape,
  CrossExaminationShape,
 } from './types';
 // v2.6 Phase 1-UX §9b: chat-scoped agent-session rows. Returned by
@@ -518,6 +521,63 @@ export const api = {
      request<AgentsResponse>(`/api/projects/${projectId}/agents`),
  },
  // Arena battle API — proxied to boocoder at /api/coder/battles/*.
  battles: {
    create: (body: {
      project_id: string;
      battle_type: 'coding' | 'qa';
      prompt: string;
      contestants: Array<{ identity: string; model: string }>;
    }) =>
      request<{ battle_id: string }>('/api/coder/battles', {
        method: 'POST',
        body: JSON.stringify(body),
      }),
    list: (projectId: string) =>
      request<{ battles: BattleShape[] }>(
        `/api/coder/battles?project_id=${encodeURIComponent(projectId)}`,
      ),
    get: (battleId: string) =>
      request<{
        battle: BattleShape;
        contestants: ContestantShape[];
        cross_examinations: CrossExaminationShape[];
      }>(`/api/coder/battles/${encodeURIComponent(battleId)}`),
    stop: (battleId: string) =>
      request<{ cancelled: boolean }>(
        `/api/coder/battles/${encodeURIComponent(battleId)}/stop`,
        { method: 'POST' },
      ),
    analyze: (battleId: string) =>
      request<{ triggered: boolean }>(
        `/api/coder/battles/${encodeURIComponent(battleId)}/analyze`,
        { method: 'POST' },
      ),
    crossExamine: (battleId: string, body: { identity: string; model: string }) =>
      request<{ cross_exam_id: string }>(
        `/api/coder/battles/${encodeURIComponent(battleId)}/cross-examine`,
        { method: 'POST', body: JSON.stringify(body) },
      ),
    getAnalysis: (battleId: string) =>
      request<{ text: string }>(
        `/api/coder/battles/${encodeURIComponent(battleId)}/analysis`,
      ),
    generatePrompt: (description: string) =>
      request<{ prompt: string }>('/api/coder/battles/generate-prompt', {
        method: 'POST',
        body: JSON.stringify({ description }),
      }),
    setWinner: (battleId: string, body: { winner_contestant_id: string | null }) =>
      request<{ ok: boolean }>(
        `/api/coder/battles/${encodeURIComponent(battleId)}/winner`,
        { method: 'PATCH', body: JSON.stringify(body) },
      ),
    getDiff: (battleId: string, contestantId: string) =>
      request<{ diff: string }>(
        `/api/coder/battles/${encodeURIComponent(battleId)}/contestants/${encodeURIComponent(contestantId)}/diff`,
      ),
  },
  skills: {
    list: () => request<{ skills: Skill[] }>('/api/skills'),
  },
--- a/apps/web/src/api/types.ts
+++ b/apps/web/src/api/types.ts
@@ -391,7 +391,8 @@ export type WorkspacePaneKind =
  | 'settings'
  | 'markdown_artifact'
  | 'html_artifact'
-  | 'orchestrator';
+  | 'orchestrator'
  | 'arena';
 // Mixed tabs: a pane can hold tabs of different kinds (a BooChat tab next to a
 // BooCode tab next to a Terminal tab). Each tab carries its own kind; the active
@@ -424,6 +425,10 @@ export interface OrchestratorState {
  band: 'small' | 'medium' | 'large';
 }
 // Arena pane state — single-sourced in @boocode/contracts; edit the package, not here.
 import type { ArenaState, BattleShape, ContestantShape, CrossExaminationShape, BattleType, BattleStatus, ContestantStatus, ContestantLane } from '@boocode/contracts/arena';
 export type { ArenaState, BattleShape, ContestantShape, CrossExaminationShape, BattleType, BattleStatus, ContestantStatus, ContestantLane };
 // Orchestrator run API types (returned by GET /api/coder/runs/:id).
 export interface FlowRunRow {
  id: string;
@@ -475,6 +480,8 @@ export interface WorkspacePane {
  html_artifact_state?: HtmlArtifactState;
  // orchestrator pane: populated only when kind === 'orchestrator'.
  orchestrator_state?: OrchestratorState;
  // arena pane: populated only when kind === 'arena'.
  arena_state?: ArenaState;
 }
 // Reopen LIFO stack entry. Shape unchanged from the prior module-level stack;
@@ -592,4 +599,31 @@ export type WsFrame =
      status: 'pending' | 'running' | 'completed' | 'failed' | 'skipped' | 'cancelled';
      run_status?: 'running' | 'completed' | 'failed' | 'cancelled';
      report?: string;
    }
  // arena frames: battle lifecycle + per-contestant streaming
  | {
      type: 'battle_started';
      battle_id: string;
      battle_type: 'coding' | 'qa';
      prompt: string;
      contestants: Array<{ id: string; identity: string; model: string; lane: 'local' | 'cloud' }>;
    }
  | {
      type: 'contestant_updated';
      battle_id: string;
      contestant_id: string;
      status?: 'queued' | 'running' | 'done' | 'error';
      duration_ms?: number;
      tokens_per_sec?: number;
      battle_status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
      delta?: string;
      error?: string;
    }
  | {
      type: 'battle_updated';
      battle_id: string;
      status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
      winner_contestant_id?: string | null;
      analysis_ready?: boolean;
      cross_exam_id?: string;
    };
--- a/apps/web/src/components/AgentComposerBar.tsx
+++ b/apps/web/src/components/AgentComposerBar.tsx
@@ -1,5 +1,5 @@
 import { useEffect, useMemo, useRef, useState } from 'react';
-import { Check, ChevronDown, RefreshCw, Loader2, Shield, Brain, Bot } from 'lucide-react';
+import { Check, ChevronDown, RefreshCw, Loader2, Shield, ShieldAlert, Eye, Brain, Bot } from 'lucide-react';
 import { api } from '@/api/client';
 import type { AgentSessionConfig, ProviderSnapshotEntry, AgentCommand } from '@/api/types';
 import { useProviderSnapshot, refreshProviderSnapshot } from '@/hooks/useProviderSnapshot';
@@ -14,8 +14,22 @@ import {
 import { BottomSheet } from '@/components/BottomSheet';
 import { useViewport } from '@/hooks/useViewport';
 import { formatModelLabel } from '@/lib/model-label';
 import {
  availablePermissionModes,
  permissionForModeId,
  nativeModeForPermission,
  type PermissionMode,
 } from '@/lib/permission-mode';
 import { cn } from '@/lib/utils';
 // Permission picker icon — varies with the active mode so the (icon-only) control
 // is glanceable: Eye = Plan (read-only), Shield = Ask, ShieldAlert = Bypass.
 function permissionIcon(mode: PermissionMode): React.ReactNode {
  if (mode === 'plan') return <Eye className="size-3 shrink-0" />;
  if (mode === 'bypass') return <ShieldAlert className="size-3 shrink-0 text-amber-500" />;
  return <Shield className="size-3 shrink-0" />;
 }
 const PREFS_KEY = 'boocode.coder.agent-prefs';
@@ -350,7 +364,11 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
  }
  const providerOptions = entries.map((e) => ({ id: e.name, label: e.label }));
-  const modeOptions = (currentEntry?.modes ?? []).map((m) => ({ id: m.id, label: m.label }));
+  // Unified permission ladder (Plan / Ask / Bypass) mapped onto this provider's
  // native modes. `value.modeId` stays the wire field; the active unified mode is
  // derived from it.
  const permissionModes = availablePermissionModes(currentEntry?.modes ?? []);
  const currentPermission = permissionForModeId(value.modeId, currentEntry?.modes ?? []);
  const modelOptions = (currentEntry?.models ?? []).map((m) => ({ id: m.id, label: formatModelLabel(m.label) }));
  const thinkingOpts = thinkingOptions.map((t) => ({ id: t.id, label: t.label }));
@@ -380,15 +398,25 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
          </>
        }
      />
-      {/* Mode (shield) only when the provider actually exposes modes. Native
+      {/* Permission ladder (Plan / Ask / Bypass) — shown when the provider exposes
-          BooCoder has none, so it's hidden rather than shown disabled. */}
+          modes. Picks the unified mode; we resolve it to the provider's native
-      {modeOptions.length > 0 && (
+          modeId. Icon varies with the active mode (Bypass is amber). */}
      {permissionModes.length > 0 && (
        <CompactPicker
-          label="Mode"
+          label="Permission"
-          value={value.modeId ?? ''}
+          value={currentPermission}
-          options={modeOptions}
+          options={permissionModes}
-          onPick={(modeId) => persist({ ...value, modeId })}
+          onPick={(perm) =>
-          icon={<Shield className="size-3 shrink-0" />}
+            persist({
              ...value,
              modeId: nativeModeForPermission(
                perm as PermissionMode,
                currentEntry?.modes ?? [],
                currentEntry?.defaultModeId ?? null,
              ),
            })
          }
          icon={permissionIcon(currentPermission)}
          iconOnly
        />
      )}
--- a/apps/web/src/components/ArenaLauncherDialog.tsx
+++ b/apps/web/src/components/ArenaLauncherDialog.tsx
@@ -0,0 +1,410 @@
 // ArenaLauncherDialog — mirrors FlowLauncherDialog.
 // Opens via sessionEvents 'open_arena_launcher'.
 // Flow: pick Battle Type → write/generate prompt → add 2–6 contestants → Start.
 import { useCallback, useEffect, useRef, useState } from 'react';
 import { Loader2, Minus, Plus, Swords, TriangleAlert, X } from 'lucide-react';
 import { toast } from 'sonner';
 import {
  Dialog,
  DialogContent,
  DialogFooter,
  DialogHeader,
  DialogTitle,
 } from '@/components/ui/dialog';
 import { Button } from '@/components/ui/button';
 import { Label } from '@/components/ui/label';
 import { api } from '@/api/client';
 import type { Agent, ProviderSnapshotEntry } from '@/api/types';
 import { sessionEvents } from '@/hooks/sessionEvents';
 import { useProviderSnapshot } from '@/hooks/useProviderSnapshot';
 import { cn } from '@/lib/utils';
 // ─── types ────────────────────────────────────────────────────────────────────
 type BattleType = 'coding' | 'qa';
 interface Contestant {
  key: string; // local unique key for React
  identity: string;
  model: string;
 }
 // ─── helpers ─────────────────────────────────────────────────────────────────
 function newContestant(): Contestant {
  return { key: crypto.randomUUID(), identity: '', model: '' };
 }
 function isDuplicate(contestants: Contestant[], c: Contestant): boolean {
  const dups = contestants.filter(
    (x) => x.key !== c.key && x.identity === c.identity && x.model === c.model && x.identity !== '',
  );
  return dups.length > 0;
 }
 function hasDuplicatePair(contestants: Contestant[]): boolean {
  return contestants.some((c) => isDuplicate(contestants, c));
 }
 function localCount(battleType: BattleType, contestants: Contestant[], snapshot: ProviderSnapshotEntry[] | null): number {
  if (battleType === 'qa') return contestants.filter((c) => c.identity !== '').length;
  const boocode = snapshot?.find((e) => e.name === 'boocode');
  const localModelIds = new Set(boocode?.models.map((m) => m.id) ?? []);
  return contestants.filter((c) => {
    // Match bare IDs (boocode/native) and llama-swap/-prefixed IDs used by
    // opencode and other external agents pointing at the local llama-swap server.
    return localModelIds.has(c.model) || localModelIds.has(c.model.replace(/^llama-swap\//, ''));
  }).length;
 }
 // ─── ContestantRow ────────────────────────────────────────────────────────────
 function ContestantRow({
  contestant,
  battleType,
  snapshot,
  agents,
  allContestants,
  onUpdate,
  onRemove,
  removable,
 }: {
  contestant: Contestant;
  battleType: BattleType;
  snapshot: ProviderSnapshotEntry[] | null;
  agents: Agent[];
  allContestants: Contestant[];
  onUpdate: (patch: Partial<Contestant>) => void;
  onRemove: () => void;
  removable: boolean;
 }) {
  const dup = isDuplicate(allContestants, contestant);
  // Identity options for Coding: installed provider names.
  // Identity options for Q&A: agents by id.
  const identityOptions =
    battleType === 'coding'
      ? (snapshot ?? [])
          .filter((e) => e.installed && e.enabled)
          .map((e) => ({ value: e.name, label: e.label }))
      : agents.map((a) => ({ value: a.id, label: a.name }));
  // Model options: for Coding use the selected provider's models; for Q&A use boocode models.
  const modelOptions: { value: string; label: string }[] = (() => {
    if (battleType === 'coding') {
      const provider = (snapshot ?? []).find((e) => e.name === contestant.identity);
      return (provider?.models ?? []).map((m) => ({ value: m.id, label: m.label }));
    }
    // Q&A: native backend only — use boocode models
    const boocode = (snapshot ?? []).find((e) => e.name === 'boocode');
    return (boocode?.models ?? []).map((m) => ({ value: m.id, label: m.label }));
  })();
  function handleIdentityChange(value: string) {
    // Reset model when identity changes so stale model doesn't persist.
    onUpdate({ identity: value, model: '' });
  }
  function handleModelChange(value: string) {
    onUpdate({ model: value });
  }
  return (
    <div className={cn('flex items-center gap-2', dup && 'opacity-60')}>
      <select
        value={contestant.identity}
        onChange={(e) => handleIdentityChange(e.target.value)}
        className="flex-1 min-w-0 text-xs border border-border rounded bg-background px-2 py-1.5 text-foreground focus:outline-none focus:ring-1 focus:ring-ring"
        aria-label={battleType === 'coding' ? 'Backend' : 'Persona'}
      >
        <option value="">{battleType === 'coding' ? 'Backend…' : 'Persona…'}</option>
        {identityOptions.map((o) => (
          <option key={o.value} value={o.value}>{o.label}</option>
        ))}
      </select>
      <select
        value={contestant.model}
        onChange={(e) => handleModelChange(e.target.value)}
        disabled={!contestant.identity}
        className="flex-1 min-w-0 text-xs border border-border rounded bg-background px-2 py-1.5 text-foreground focus:outline-none focus:ring-1 focus:ring-ring disabled:opacity-50"
        aria-label="Model"
      >
        <option value="">Model…</option>
        {modelOptions.map((o) => (
          <option key={o.value} value={o.value}>{o.label}</option>
        ))}
      </select>
      {dup && (
        <span title="Duplicate contestant" className="shrink-0 text-destructive">
          <TriangleAlert size={12} />
        </span>
      )}
      {removable && (
        <button
          type="button"
          onClick={onRemove}
          className="shrink-0 inline-flex items-center justify-center p-1 rounded text-muted-foreground hover:bg-muted hover:text-foreground"
          aria-label="Remove contestant"
        >
          <Minus size={12} />
        </button>
      )}
    </div>
  );
 }
 // ─── ArenaLauncherDialog ──────────────────────────────────────────────────────
 export function ArenaLauncherDialog() {
  const [open, setOpen] = useState(false);
  const [projectId, setProjectId] = useState('');
  const [placement, setPlacement] = useState<'new' | 'split'>('new');
  const [battleType, setBattleType] = useState<BattleType>('coding');
  const [prompt, setPrompt] = useState('');
  const [contestants, setContestants] = useState<Contestant[]>(() => [
    newContestant(),
    newContestant(),
  ]);
  const [generating, setGenerating] = useState(false);
  const [starting, setStarting] = useState(false);
  const [agents, setAgents] = useState<Agent[]>([]);
  const promptRef = useRef<HTMLTextAreaElement>(null);
  const snapshot = useProviderSnapshot();
  useEffect(() => {
    return sessionEvents.subscribe((ev) => {
      if (ev.type !== 'open_arena_launcher') return;
      setProjectId(ev.project_id);
      setPlacement(ev.placement ?? 'new');
      setBattleType('coding');
      setPrompt('');
      setContestants([newContestant(), newContestant()]);
      setGenerating(false);
      setStarting(false);
      setOpen(true);
    });
  }, []);
  // Load agents list when dialog opens (for Q&A mode).
  useEffect(() => {
    if (!open || !projectId) return;
    api.agents.list(projectId)
      .then((r) => setAgents(r.agents))
      .catch(() => {});
  }, [open, projectId]);
  const handleGeneratePrompt = useCallback(async () => {
    const description = prompt.trim();
    if (!description || generating) return;
    setGenerating(true);
    try {
      const { prompt: generated } = await api.battles.generatePrompt(description);
      setPrompt(generated);
      promptRef.current?.focus();
    } catch (err) {
      toast.error(err instanceof Error ? err.message : 'Generate failed');
    } finally {
      setGenerating(false);
    }
  }, [prompt, generating]);
  function updateContestant(key: string, patch: Partial<Contestant>) {
    setContestants((prev) => prev.map((c) => (c.key === key ? { ...c, ...patch } : c)));
  }
  function removeContestant(key: string) {
    setContestants((prev) => prev.filter((c) => c.key !== key));
  }
  function addContestant() {
    if (contestants.length >= 6) return;
    setContestants((prev) => [...prev, newContestant()]);
  }
  const canStart =
    !starting &&
    prompt.trim().length > 0 &&
    contestants.length >= 2 &&
    contestants.every((c) => c.identity !== '' && c.model !== '') &&
    !hasDuplicatePair(contestants);
  const localLaneCount = localCount(battleType, contestants, snapshot);
  const showLocalWarning = localLaneCount >= 3;
  async function handleStart() {
    if (!canStart) return;
    setStarting(true);
    try {
      const { battle_id } = await api.battles.create({
        project_id: projectId,
        battle_type: battleType,
        prompt: prompt.trim(),
        contestants: contestants.map((c) => ({ identity: c.identity, model: c.model })),
      });
      sessionEvents.emit({
        type: 'open_arena_pane',
        state: { battle_id, battle_type: battleType, prompt: prompt.trim() },
        placement,
      });
      setOpen(false);
    } catch (err) {
      toast.error(err instanceof Error ? err.message : 'Failed to start battle');
    } finally {
      setStarting(false);
    }
  }
  return (
    <Dialog open={open} onOpenChange={setOpen}>
      <DialogContent
        className="flex flex-col gap-0 p-0 max-h-[85vh] sm:max-w-lg overflow-hidden"
        showCloseButton={false}
      >
        <DialogHeader className="gap-1.5 px-4 pt-4 pb-3 border-b shrink-0">
          <div className="flex items-center gap-2">
            <Swords size={14} className="text-muted-foreground shrink-0" />
            <DialogTitle className="text-sm font-medium">New Arena Battle</DialogTitle>
          </div>
          <p className="text-xs text-muted-foreground">
            Run the same prompt against multiple AI competitors and pick the best result.
          </p>
        </DialogHeader>
        <div className="flex flex-col gap-4 overflow-y-auto overscroll-contain px-4 py-3">
          {/* Battle type */}
          <div className="flex flex-col gap-1.5">
            <Label className="text-xs text-muted-foreground">Battle type</Label>
            <div className="flex gap-1.5">
              {(['coding', 'qa'] as const).map((t) => (
                <button
                  key={t}
                  type="button"
                  onClick={() => { setBattleType(t); setContestants([newContestant(), newContestant()]); }}
                  aria-pressed={battleType === t}
                  className={cn(
                    'flex-1 rounded-lg border py-1.5 text-xs transition-colors capitalize',
                    battleType === t
                      ? 'border-primary bg-primary/10 text-primary font-medium'
                      : 'border-border text-muted-foreground hover:bg-muted hover:text-foreground',
                  )}
                >
                  {t === 'coding' ? 'Coding' : 'Q&A'}
                </button>
              ))}
            </div>
            <p className="text-xs text-muted-foreground">
              {battleType === 'coding'
                ? 'Each contestant works in its own isolated worktree. Results include a diff.'
                : 'Contestants answer the prompt as text. No code changes.'}
            </p>
          </div>
          {/* Prompt */}
          <div className="flex flex-col gap-1.5">
            <div className="flex items-center justify-between">
              <Label htmlFor="arena-prompt" className="text-xs text-muted-foreground">
                Prompt
              </Label>
              <button
                type="button"
                onClick={() => void handleGeneratePrompt()}
                disabled={generating || prompt.trim().length === 0}
                className="text-xs text-primary hover:text-primary/80 disabled:opacity-40 disabled:cursor-default flex items-center gap-1"
                title="Expand your description into a fuller battle prompt"
              >
                {generating && <Loader2 size={10} className="animate-spin" />}
                Generate prompt
              </button>
            </div>
            <textarea
              id="arena-prompt"
              ref={promptRef}
              value={prompt}
              onChange={(e) => setPrompt(e.target.value)}
              placeholder={
                battleType === 'coding'
                  ? 'Describe a coding task, or enter a short description and click Generate prompt…'
                  : 'Ask a question or describe a topic, or enter a short description and click Generate prompt…'
              }
              rows={4}
              className="w-full text-sm border border-border rounded bg-background px-3 py-2 text-foreground placeholder:text-muted-foreground focus:outline-none focus:ring-1 focus:ring-ring resize-none"
            />
          </div>
          {/* Contestants */}
          <div className="flex flex-col gap-2">
            <div className="flex items-center justify-between">
              <Label className="text-xs text-muted-foreground">
                Contestants ({contestants.length}/6)
              </Label>
              <span className="text-xs text-muted-foreground">
                {battleType === 'coding' ? 'Backend + Model' : 'Persona + Model'}
              </span>
            </div>
            <div className="flex flex-col gap-1.5">
              {contestants.map((c) => (
                <ContestantRow
                  key={c.key}
                  contestant={c}
                  battleType={battleType}
                  snapshot={snapshot}
                  agents={agents}
                  allContestants={contestants}
                  onUpdate={(patch) => updateContestant(c.key, patch)}
                  onRemove={() => removeContestant(c.key)}
                  removable={contestants.length > 2}
                />
              ))}
            </div>
            {contestants.length < 6 && (
              <button
                type="button"
                onClick={addContestant}
                className="flex items-center gap-1.5 text-xs text-muted-foreground hover:text-foreground py-1"
              >
                <Plus size={12} /> Add contestant
              </button>
            )}
            {hasDuplicatePair(contestants) && (
              <div className="flex items-center gap-1.5 text-xs text-destructive">
                <TriangleAlert size={12} />
                Duplicate contestants (same identity + model) are not allowed.
              </div>
            )}
            {showLocalWarning && (
              <div className="flex items-center gap-1.5 text-xs text-amber-600 dark:text-amber-400">
                <TriangleAlert size={12} />
                {localLaneCount} local contestants will run serially (one GPU load at a time). This battle will take a while.
              </div>
            )}
          </div>
        </div>
        <DialogFooter className="px-4 py-3 border-t shrink-0 flex items-center justify-between">
          <button
            type="button"
            onClick={() => setOpen(false)}
            className="flex items-center gap-1.5 text-xs text-muted-foreground hover:text-foreground"
          >
            <X size={12} /> Cancel
          </button>
          <Button
            type="button"
            size="sm"
            onClick={() => void handleStart()}
            disabled={!canStart}
          >
            {starting ? <Loader2 className="animate-spin" /> : <Swords size={14} />}
            Start battle
          </Button>
        </DialogFooter>
      </DialogContent>
    </Dialog>
  );
 }
--- a/apps/web/src/components/ChatTabBar.tsx
+++ b/apps/web/src/components/ChatTabBar.tsx
@@ -37,6 +37,7 @@ interface Props {
  onNewTab: (kind: WorkspaceTabKind) => void;
  onSplitPane: (kind: 'chat' | 'terminal' | 'coder') => void;
  onNewOrchestrator?: () => void;
  onNewArena?: () => void;
  onReopenPane?: () => void;
  onShowHistory: () => void;
  onRename: (chatId: string, name: string) => Promise<void>;
@@ -69,6 +70,7 @@ export function ChatTabBar({
  onNewTab,
  onSplitPane,
  onNewOrchestrator,
  onNewArena,
  onReopenPane,
  onShowHistory,
  onRename,
@@ -230,6 +232,7 @@ export function ChatTabBar({
          onNewTab={onNewTab}
          onSplitPane={onSplitPane}
          onNewOrchestrator={onNewOrchestrator}
          onNewArena={onNewArena}
          onReopenPane={onReopenPane}
          onShowHistory={onShowHistory}
          onRemovePane={onRemovePane}
--- a/apps/web/src/components/PaneHeaderActions.tsx
+++ b/apps/web/src/components/PaneHeaderActions.tsx
@@ -1,4 +1,4 @@
-import { Code, Columns2, History, MessageSquare, Plus, RotateCcw, Terminal, Workflow, X } from 'lucide-react';
+import { Code, Columns2, History, MessageSquare, Plus, RotateCcw, Swords, Terminal, Workflow, X } from 'lucide-react';
 import {
  DropdownMenu,
  DropdownMenuContent,
@@ -19,6 +19,8 @@ interface Props {
  // When provided, shows a "New Orchestrator" item that opens the flow launcher.
  // Orchestrators are always split (run-bound; can't live as a tab in another pane).
  onNewOrchestrator?: () => void;
  // When provided, shows a "New Arena" item that opens the arena launcher.
  onNewArena?: () => void;
  onReopenPane?: () => void;
  onShowHistory: () => void;
  onRemovePane?: () => void;
@@ -35,6 +37,7 @@ export function PaneHeaderActions({
  onNewTab,
  onSplitPane,
  onNewOrchestrator,
  onNewArena,
  onReopenPane,
  onShowHistory,
  onRemovePane,
@@ -71,6 +74,11 @@ export function PaneHeaderActions({
              <Workflow size={14} /> New Orchestrator
            </DropdownMenuItem>
          )}
          {onNewArena && (
            <DropdownMenuItem onSelect={onNewArena}>
              <Swords size={14} /> New Arena
            </DropdownMenuItem>
          )}
        </DropdownMenuContent>
      </DropdownMenu>
@@ -101,6 +109,11 @@ export function PaneHeaderActions({
              <Workflow size={14} /> New Orchestrator
            </DropdownMenuItem>
          )}
          {onNewArena && (
            <DropdownMenuItem onSelect={onNewArena}>
              <Swords size={14} /> New Arena
            </DropdownMenuItem>
          )}
        </DropdownMenuContent>
      </DropdownMenu>
--- a/apps/web/src/components/SlashCommandPicker.tsx
+++ b/apps/web/src/components/SlashCommandPicker.tsx
@@ -225,7 +225,7 @@ export function SlashCommandPicker({
              setHighlightIndex(i);
              setExpandedIndex((prev) => (prev === i ? null : i));
            }}
-            className="-mr-1 -mt-0.5 flex shrink-0 items-center justify-center rounded p-1 text-muted-foreground/60 transition-colors hover:bg-foreground/10 hover:text-foreground max-md:min-h-[36px] max-md:min-w-[36px]"
+            className="-mr-1 -mt-0.5 flex shrink-0 items-center justify-center rounded-md border border-border bg-background p-1 text-muted-foreground transition-colors hover:bg-muted hover:text-foreground aria-expanded:bg-muted aria-expanded:text-foreground max-md:min-h-[36px] max-md:min-w-[36px]"
          >
            <ChevronRight
              className={cn(
--- a/apps/web/src/components/Workspace.tsx
+++ b/apps/web/src/components/Workspace.tsx
@@ -13,6 +13,7 @@ import { CoderPane } from '@/components/panes/CoderPane';
 import { MarkdownArtifactPane } from '@/components/MarkdownArtifactPane';
 import { HtmlArtifactPane } from '@/components/HtmlArtifactPane';
 import { OrchestratorPane } from '@/components/panes/OrchestratorPane';
 import { ArenaPane } from '@/components/panes/ArenaPane';
 import { ChatTabBar, type TabDescriptor } from '@/components/ChatTabBar';
 import { SessionLandingPage } from '@/components/SessionLandingPage';
 import { cn } from '@/lib/utils';
@@ -134,6 +135,14 @@ export function Workspace({
    });
  }
  function handleNewArena() {
    sessionEvents.emit({
      type: 'open_arena_launcher',
      project_id: projectId,
      placement: 'split',
    });
  }
  // v1.10 booterm + mixed tabs: per-terminal-TAB label, keyed by the terminal
  // tab id (which keys its tmux session). Numbered across the workspace.
  const terminalLabels = useMemo(() => {
@@ -180,6 +189,7 @@ export function Workspace({
          const isTerminal = pane.kind === 'terminal';
          const isCoder = pane.kind === 'coder';
          const isOrchestrator = pane.kind === 'orchestrator';
          const isArena = pane.kind === 'arena';
          const isArtifact = pane.kind === 'markdown_artifact' || pane.kind === 'html_artifact';
          // v1.9: when maximized, hide every pane except the settings one.
          // display:none keeps the React tree mounted so streams / drafts
@@ -192,8 +202,8 @@ export function Workspace({
            }
            return null;
          }
-          // Terminal + coder + orchestrator panes own their tab strip (no chats, no ChatTabBar).
+          // Terminal + coder + orchestrator + arena panes own their tab strip (no chats, no ChatTabBar).
-          const isChromeless = isSettings || isTerminal || isCoder || isArtifact || isOrchestrator;
+          const isChromeless = isSettings || isTerminal || isCoder || isArtifact || isOrchestrator || isArena;
          return (
          <div
            key={pane.id}
@@ -218,7 +228,7 @@ export function Workspace({
                  (chat / coder / terminal / empty-landing). The "+" adds a tab
                  of any kind; Split adds a pane. Settings/artifact panes own
                  their own headers. Hidden on mobile (mobile uses pane panes). */}
-              {!isMobile && !isSettings && !isArtifact && !isOrchestrator && (
+              {!isMobile && !isSettings && !isArtifact && !isOrchestrator && !isArena && (
                <ChatTabBar
                  pane={pane}
                  tabs={paneTabs(pane)}
@@ -231,6 +241,7 @@ export function Workspace({
                  onNewTab={(kind) => void createTab(idx, kind)}
                  onSplitPane={(kind) => onAddPane(kind)}
                  onNewOrchestrator={handleNewOrchestrator}
                  onNewArena={handleNewArena}
                  onReopenPane={hasClosedPanes ? reopenPane : undefined}
                  onShowHistory={() => openSessionHistory(idx)}
                  onRename={renameChat}
@@ -277,6 +288,12 @@ export function Workspace({
                  state={pane.orchestrator_state}
                  onClose={() => removePane(idx)}
                />
              ) : pane.kind === 'arena' && pane.arena_state ? (
                <ArenaPane
                  state={pane.arena_state}
                  projectId={projectId}
                  onClose={() => removePane(idx)}
                />
              ) : pane.kind === 'markdown_artifact' && pane.markdown_artifact_state ? (
                <MarkdownArtifactPane
                  chatId={pane.markdown_artifact_state.chat_id}
--- a/apps/web/src/components/panes/ArenaPane.tsx
+++ b/apps/web/src/components/panes/ArenaPane.tsx
@@ -0,0 +1,664 @@
 // ArenaPane — live view for an Arena battle.
 // Mirrors OrchestratorPane: header with status/winner, contestant roster
 // (collapsed rows, expand-one), analysis panel, cross-examination control.
 //
 // Subscribes to the coder user channel (via useCoderUserEvents → sessionEvents)
 // for battle_started / contestant_updated / battle_updated frames.
 import { useCallback, useEffect, useRef, useState } from 'react';
 import { ChevronDown, ChevronRight, Loader2, MoreHorizontal, RotateCcw, Swords, Trophy, X } from 'lucide-react';
 import { toast } from 'sonner';
 import { api } from '@/api/client';
 import type { ArenaState, BattleShape, ContestantShape, CrossExaminationShape, ProviderSnapshotEntry } from '@/api/types';
 import { sessionEvents } from '@/hooks/sessionEvents';
 import { useProviderSnapshot } from '@/hooks/useProviderSnapshot';
 import {
  DropdownMenu,
  DropdownMenuContent,
  DropdownMenuItem,
  DropdownMenuTrigger,
 } from '@/components/ui/dropdown-menu';
 import { cn } from '@/lib/utils';
 // ─── Status dot (mirrors FlowStepStatusDot) ───────────────────────────────────
 function ContestantStatusDot({ status }: { status: ContestantShape['status'] }) {
  if (status === 'running') {
    return (
      <span
        aria-label="running"
        className="inline-block w-3 h-3 rounded-full border-2 border-emerald-500 border-t-transparent animate-spin shrink-0"
      />
    );
  }
  const cls =
    status === 'done'
      ? 'bg-emerald-500'
      : status === 'error'
        ? 'bg-destructive'
        : 'bg-muted-foreground/40'; // queued
  return <span aria-label={status} className={cn('inline-block w-1.5 h-1.5 rounded-full shrink-0', cls)} />;
 }
 // ─── Lane badge ───────────────────────────────────────────────────────────────
 function LaneBadge({ lane }: { lane: ContestantShape['lane'] }) {
  return (
    <span
      className={cn(
        'text-[10px] px-1 py-0.5 rounded shrink-0',
        lane === 'local'
          ? 'bg-sky-500/10 text-sky-600 dark:text-sky-400'
          : 'bg-violet-500/10 text-violet-600 dark:text-violet-400',
      )}
    >
      {lane}
    </span>
  );
 }
 // ─── Duration formatter ───────────────────────────────────────────────────────
 function formatDuration(ms: number | null): string {
  if (ms == null) return '';
  const s = Math.round(ms / 1000);
  if (s < 60) return `${s}s`;
  return `${Math.floor(s / 60)}m${String(s % 60).padStart(2, '0')}s`;
 }
 // ─── Live ticker for running contestants ─────────────────────────────────────
 function LiveDuration({ startedAt }: { startedAt: number }) {
  const [elapsed, setElapsed] = useState(() => Date.now() - startedAt);
  useEffect(() => {
    const id = setInterval(() => setElapsed(Date.now() - startedAt), 1000);
    return () => clearInterval(id);
  }, [startedAt]);
  return <span>{formatDuration(elapsed)}</span>;
 }
 // ─── DiffView ─────────────────────────────────────────────────────────────────
 function DiffView({ diff }: { diff: string }) {
  const lines = diff.split('\n');
  return (
    <div className="border-t border-border/50">
      <div className="px-3 pt-2 pb-1 text-[10px] font-medium uppercase tracking-wide text-muted-foreground">
        Diff
      </div>
      <pre className="px-3 pb-3 text-xs font-mono whitespace-pre leading-relaxed overflow-x-auto">
        {lines.map((line, i) => {
          const cls =
            line.startsWith('+') && !line.startsWith('+++')
              ? 'text-emerald-600 dark:text-emerald-400'
              : line.startsWith('-') && !line.startsWith('---')
                ? 'text-destructive'
                : line.startsWith('@@')
                  ? 'text-violet-500 dark:text-violet-400'
                  : 'text-muted-foreground';
          return (
            <span key={i} className={cn('block', cls)}>
              {line || ' '}
            </span>
          );
        })}
      </pre>
    </div>
  );
 }
 // ─── ContestantRow ────────────────────────────────────────────────────────────
 interface ContestantRowState {
  data: ContestantShape;
  output: string;
  startedAt: number | null;
 }
 function ContestantRow({
  row,
  isExpanded,
  onToggle,
  isWinner,
  battleId,
  battleType,
 }: {
  row: ContestantRowState;
  isExpanded: boolean;
  onToggle: () => void;
  isWinner: boolean;
  battleId: string;
  battleType: 'coding' | 'qa';
 }) {
  const { data, output, startedAt } = row;
  const label = `${data.identity} / ${data.model}`;
  // Lazy-fetch diff for coding contestants once they are done and expanded.
  const [diff, setDiff] = useState<string | null>(null);
  useEffect(() => {
    if (!isExpanded || battleType !== 'coding' || data.status !== 'done') return;
    if (diff !== null) return;
    api.battles.getDiff(battleId, data.id)
      .then(({ diff: d }) => setDiff(d))
      .catch(() => setDiff(''));
  }, [isExpanded, battleType, data.status, data.id, battleId, diff]);
  async function handleSetWinner(contestantId: string | null) {
    try {
      await api.battles.setWinner(battleId, { winner_contestant_id: contestantId });
    } catch {
      // WS frame updates the badge; a failed call just leaves it unchanged
    }
  }
  return (
    <div>
      <button
        type="button"
        onClick={onToggle}
        className="w-full flex items-center gap-2 px-3 py-2.5 text-left hover:bg-muted/30 transition-colors"
      >
        <ContestantStatusDot status={data.status} />
        <span className="text-sm flex-1 truncate min-w-0">{label}</span>
        {isWinner && (
          <Trophy size={11} className="shrink-0 text-emerald-500" aria-label="winner" />
        )}
        <LaneBadge lane={data.lane} />
        {data.status === 'running' && startedAt != null ? (
          <span className="text-xs text-muted-foreground shrink-0 tabular-nums">
            <LiveDuration startedAt={startedAt} />
          </span>
        ) : data.duration_ms != null ? (
          <span className="text-xs text-muted-foreground shrink-0 tabular-nums">
            {formatDuration(data.duration_ms)}
          </span>
        ) : null}
        {data.tokens_per_sec != null && (
          <span className="text-xs text-muted-foreground shrink-0 hidden sm:block tabular-nums">
            {data.tokens_per_sec.toFixed(1)} tok/s
          </span>
        )}
        {data.status === 'error' && (
          <span className="text-xs text-destructive shrink-0 hidden sm:block truncate max-w-[100px]" title={data.error ?? ''}>
            {data.error ?? 'error'}
          </span>
        )}
        {isExpanded ? (
          <ChevronDown size={12} className="shrink-0 text-muted-foreground" />
        ) : (
          <ChevronRight size={12} className="shrink-0 text-muted-foreground" />
        )}
        {/* Row menu: winner override. Stop propagation so the row toggle isn't triggered. */}
        <span onClick={(e) => e.stopPropagation()}>
          <DropdownMenu>
            <DropdownMenuTrigger asChild>
              <button
                type="button"
                className="shrink-0 p-0.5 rounded text-muted-foreground hover:text-foreground hover:bg-muted"
                aria-label="Contestant options"
              >
                <MoreHorizontal size={12} />
              </button>
            </DropdownMenuTrigger>
            <DropdownMenuContent align="end">
              {!isWinner && (
                <DropdownMenuItem onSelect={() => void handleSetWinner(data.id)}>
                  <Trophy size={12} /> Set as winner
                </DropdownMenuItem>
              )}
              {isWinner && (
                <DropdownMenuItem onSelect={() => void handleSetWinner(null)}>
                  Clear winner
                </DropdownMenuItem>
              )}
            </DropdownMenuContent>
          </DropdownMenu>
        </span>
      </button>
      {isExpanded && (
        <div className="border-t border-border/50 bg-muted/10 max-h-[55vh] overflow-y-auto">
          {output.length === 0 ? (
            <div className="flex items-center justify-center py-6 text-sm text-muted-foreground">
              {data.status === 'queued'
                ? 'Waiting to start…'
                : data.status === 'error'
                  ? data.error ?? 'Error'
                  : 'Connecting…'}
            </div>
          ) : (
            <pre className="p-3 text-xs font-mono whitespace-pre-wrap leading-relaxed break-all text-foreground">
              {output}
            </pre>
          )}
          {battleType === 'coding' && data.status === 'done' && diff && (
            <DiffView diff={diff} />
          )}
        </div>
      )}
    </div>
  );
 }
 // ─── CrossExaminationPanel ────────────────────────────────────────────────────
 function CrossExaminationPanel({
  battleId,
  crossExams,
  snapshot,
 }: {
  battleId: string;
  crossExams: CrossExaminationShape[];
  snapshot: ProviderSnapshotEntry[] | null;
 }) {
  const [identity, setIdentity] = useState('');
  const [model, setModel] = useState('');
  const [running, setRunning] = useState(false);
  const identityOptions = (snapshot ?? [])
    .filter((e) => e.installed && e.enabled)
    .map((e) => ({ value: e.name, label: e.label }));
  const modelOptions = (() => {
    const provider = (snapshot ?? []).find((e) => e.name === identity);
    return (provider?.models ?? []).map((m) => ({ value: m.id, label: m.label }));
  })();
  async function handleRun() {
    if (!identity || !model || running) return;
    setRunning(true);
    try {
      await api.battles.crossExamine(battleId, { identity, model });
      // The verdict arrives via battle_updated frame; ArenaPane will refetch.
    } catch (err) {
      toast.error(err instanceof Error ? err.message : 'Cross-examination failed');
    } finally {
      setRunning(false);
    }
  }
  return (
    <div className="border-t border-border p-4 flex flex-col gap-3">
      <div className="text-xs font-medium text-muted-foreground uppercase tracking-wide">
        Cross-examination
      </div>
      <p className="text-xs text-muted-foreground">
        Challenge the results with any model. The verdict is advisory and never changes the recorded winner.
      </p>
      <div className="flex gap-2 items-center flex-wrap">
        <select
          value={identity}
          onChange={(e) => { setIdentity(e.target.value); setModel(''); }}
          className="flex-1 min-w-[120px] text-xs border border-border rounded bg-background px-2 py-1.5 text-foreground focus:outline-none focus:ring-1 focus:ring-ring"
          aria-label="Backend"
        >
          <option value="">Backend…</option>
          {identityOptions.map((o) => (
            <option key={o.value} value={o.value}>{o.label}</option>
          ))}
        </select>
        <select
          value={model}
          onChange={(e) => setModel(e.target.value)}
          disabled={!identity}
          className="flex-1 min-w-[120px] text-xs border border-border rounded bg-background px-2 py-1.5 text-foreground focus:outline-none focus:ring-1 focus:ring-ring disabled:opacity-50"
          aria-label="Model"
        >
          <option value="">Model…</option>
          {modelOptions.map((o) => (
            <option key={o.value} value={o.value}>{o.label}</option>
          ))}
        </select>
        <button
          type="button"
          onClick={() => void handleRun()}
          disabled={!identity || !model || running}
          className="inline-flex items-center gap-1 text-xs px-2 py-1.5 rounded border border-border text-foreground hover:bg-muted disabled:opacity-50"
        >
          {running && <Loader2 size={10} className="animate-spin" />}
          Run
        </button>
      </div>
      {crossExams.length > 0 && (
        <div className="flex flex-col gap-3 mt-1">
          {crossExams.map((xe) => (
            <div key={xe.id} className="rounded border border-border/50 bg-muted/20 p-3">
              <div className="text-xs font-medium text-muted-foreground mb-1.5">
                {xe.identity} / {xe.model}
              </div>
              {xe.verdict ? (
                <div className="text-sm whitespace-pre-wrap leading-relaxed">{xe.verdict}</div>
              ) : (
                <div className="text-xs text-muted-foreground flex items-center gap-1.5">
                  <Loader2 size={10} className="animate-spin" /> Running…
                </div>
              )}
            </div>
          ))}
        </div>
      )}
    </div>
  );
 }
 // ─── ArenaPane ────────────────────────────────────────────────────────────────
 interface Props {
  state: ArenaState;
  projectId: string; // available for future use (e.g. file browser affordance)
  onClose: () => void;
 }
 export function ArenaPane({ state, onClose }: Props) {
  const [battle, setBattle] = useState<BattleShape | null>(null);
  const [contestantRows, setContestantRows] = useState<ContestantRowState[]>([]);
  const [crossExams, setCrossExams] = useState<CrossExaminationShape[]>([]);
  const [analysis, setAnalysis] = useState<string | null>(null);
  const [expandedId, setExpandedId] = useState<string | null>(null);
  const [stopping, setStopping] = useState(false);
  const [reanalyzing, setReanalyzing] = useState(false);
  const startTimesRef = useRef<Map<string, number>>(new Map());
  const snapshot = useProviderSnapshot();
  // Fetch current battle state on mount / battle_id change.
  useEffect(() => {
    setBattle(null);
    setContestantRows([]);
    setCrossExams([]);
    setAnalysis(null);
    setExpandedId(null);
    api.battles.get(state.battle_id)
      .then(({ battle: b, contestants, cross_examinations }) => {
        setBattle(b);
        setContestantRows(
          contestants.map((c) => ({
            data: c,
            output: '',
            startedAt: c.status === 'running' ? Date.now() : null,
          })),
        );
        setCrossExams(cross_examinations);
        // Fetch analysis text if battle is already completed.
        if (b.status === 'completed') {
          api.battles.getAnalysis(state.battle_id)
            .then(({ text }) => setAnalysis(text))
            .catch(() => {});
        }
        // Auto-expand first running contestant.
        const firstRunning = contestants.find((c) => c.status === 'running');
        if (firstRunning) setExpandedId(firstRunning.id);
      })
      .catch(() => {});
  }, [state.battle_id]);
  // Subscribe to live battle/contestant frames.
  useEffect(() => {
    return sessionEvents.subscribe((ev) => {
      if (ev.type === 'battle_started' && ev.battle_id === state.battle_id) {
        setContestantRows((prev) => {
          if (prev.length > 0) return prev;
          return ev.contestants.map((c) => ({
            data: {
              id: c.id,
              battle_id: ev.battle_id,
              identity: c.identity,
              model: c.model,
              lane: c.lane,
              task_id: null,
              worktree_id: null,
              status: 'queued' as const,
              duration_ms: null,
              tokens_per_sec: null,
              cost_tokens: null,
              result_path: null,
              error: null,
              created_at: new Date().toISOString(),
              updated_at: new Date().toISOString(),
            },
            output: '',
            startedAt: null,
          }));
        });
      } else if (ev.type === 'contestant_updated' && ev.battle_id === state.battle_id) {
        setContestantRows((prev) =>
          prev.map((row) => {
            if (row.data.id !== ev.contestant_id) return row;
            const updatedData: ContestantShape = {
              ...row.data,
              ...(ev.status != null ? { status: ev.status } : {}),
              ...(ev.duration_ms != null ? { duration_ms: ev.duration_ms } : {}),
              ...(ev.tokens_per_sec != null ? { tokens_per_sec: ev.tokens_per_sec } : {}),
              ...(ev.error != null ? { error: ev.error } : {}),
            };
            const newStartedAt =
              ev.status === 'running' && row.startedAt == null
                ? Date.now()
                : ev.status === 'done' || ev.status === 'error'
                  ? null
                  : row.startedAt;
            if (ev.status === 'running') {
              startTimesRef.current.set(ev.contestant_id, newStartedAt ?? Date.now());
              setExpandedId(ev.contestant_id);
            }
            return {
              data: updatedData,
              output: ev.delta ? row.output + ev.delta : row.output,
              startedAt: newStartedAt,
            };
          }),
        );
        if (ev.battle_status) {
          setBattle((prev) => prev ? { ...prev, status: ev.battle_status! } : prev);
        }
      } else if (ev.type === 'battle_updated' && ev.battle_id === state.battle_id) {
        setBattle((prev) => {
          if (!prev) return prev;
          return {
            ...prev,
            ...(ev.status != null ? { status: ev.status } : {}),
            ...(ev.winner_contestant_id !== undefined ? { winner_contestant_id: ev.winner_contestant_id } : {}),
          };
        });
        if (ev.analysis_ready) {
          api.battles.getAnalysis(state.battle_id)
            .then(({ text }) => setAnalysis(text))
            .catch(() => setAnalysis('Analysis ready — failed to load text.'));
        }
        if (ev.cross_exam_id) {
          // Refetch cross-exams to get the latest verdict.
          api.battles.get(state.battle_id)
            .then(({ cross_examinations }) => setCrossExams(cross_examinations))
            .catch(() => {});
        }
      }
    });
  }, [state.battle_id]);
  const toggleExpand = useCallback((id: string) => {
    setExpandedId((prev) => (prev === id ? null : id));
  }, []);
  async function handleStop() {
    if (stopping) return;
    setStopping(true);
    try {
      await api.battles.stop(state.battle_id);
    } catch {
      // non-fatal
    } finally {
      setStopping(false);
    }
  }
  async function handleReanalyze() {
    if (reanalyzing) return;
    setReanalyzing(true);
    try {
      await api.battles.analyze(state.battle_id);
      toast.success('Re-analysis triggered');
    } catch (err) {
      toast.error(err instanceof Error ? err.message : 'Re-analysis failed');
    } finally {
      setReanalyzing(false);
    }
  }
  function handleOpenResults() {
    if (!battle?.results_path) return;
    sessionEvents.emit({ type: 'open_file_in_browser', path: battle.results_path });
  }
  function handleCopyAnalysis() {
    if (!analysis) return;
    navigator.clipboard.writeText(analysis).catch(() => toast.error('Clipboard write failed'));
  }
  const battleStatus = battle?.status ?? 'running';
  const isRunning = battleStatus === 'running' || battleStatus === 'pending';
  const isCompleted = battleStatus === 'completed';
  const winnerId = battle?.winner_contestant_id;
  const winnerRow = winnerId ? contestantRows.find((r) => r.data.id === winnerId) : null;
  const winnerLabel = winnerRow ? `${winnerRow.data.identity} / ${winnerRow.data.model}` : null;
  return (
    <div className="flex flex-col h-full min-h-0 overflow-hidden">
      {/* Header */}
      <div className="flex items-center gap-2 border-b border-border bg-muted/20 px-3 py-2 shrink-0">
        <Swords size={13} className="text-muted-foreground shrink-0" />
        <span className="text-sm font-medium truncate min-w-0 flex-1" title={state.prompt}>
          {state.prompt.length > 60 ? state.prompt.slice(0, 60) + '…' : state.prompt}
        </span>
        <span className="text-xs text-muted-foreground shrink-0 capitalize">{state.battle_type}</span>
        {winnerLabel && (
          <span
            className="text-xs px-1.5 py-0.5 rounded bg-emerald-500/10 text-emerald-600 dark:text-emerald-400 shrink-0 hidden sm:block truncate max-w-[130px]"
            title={`Winner: ${winnerLabel}`}
          >
            ✓ {winnerLabel}
          </span>
        )}
        <div className="ml-auto flex items-center gap-1 shrink-0">
          {isRunning ? (
            <button
              type="button"
              onClick={() => void handleStop()}
              disabled={stopping}
              className="inline-flex items-center gap-1 text-xs px-1.5 py-0.5 rounded border border-border text-muted-foreground hover:text-foreground hover:bg-muted disabled:opacity-50"
              title="Stop battle"
            >
              Stop
            </button>
          ) : (
            <span
              className={cn(
                'text-xs px-1.5 py-0.5 rounded',
                isCompleted
                  ? 'text-emerald-600 bg-emerald-500/10'
                  : battleStatus === 'failed' || battleStatus === 'cancelled'
                    ? 'text-destructive bg-destructive/10'
                    : 'text-muted-foreground bg-muted/40',
              )}
            >
              {battleStatus}
            </span>
          )}
          {isCompleted && (
            <DropdownMenu>
              <DropdownMenuTrigger asChild>
                <button
                  type="button"
                  className="inline-flex items-center justify-center p-1 rounded text-muted-foreground hover:bg-muted hover:text-foreground"
                  aria-label="Battle options"
                >
                  <MoreHorizontal size={14} />
                </button>
              </DropdownMenuTrigger>
              <DropdownMenuContent align="end">
                <DropdownMenuItem onSelect={() => void handleReanalyze()} disabled={reanalyzing}>
                  <RotateCcw size={14} /> Re-analyze
                </DropdownMenuItem>
                {battle?.results_path && (
                  <DropdownMenuItem onSelect={handleOpenResults}>
                    Open results folder
                  </DropdownMenuItem>
                )}
                {analysis && (
                  <DropdownMenuItem onSelect={handleCopyAnalysis}>
                    Copy analysis
                  </DropdownMenuItem>
                )}
              </DropdownMenuContent>
            </DropdownMenu>
          )}
          <button
            type="button"
            onClick={onClose}
            className="inline-flex items-center justify-center p-1 rounded text-muted-foreground hover:bg-muted hover:text-foreground"
            aria-label="Close pane"
            title="Close pane"
          >
            <X size={12} />
          </button>
        </div>
      </div>
      {/* Body */}
      <div className="flex-1 min-h-0 overflow-y-auto">
        {/* Analysis panel */}
        {analysis && (
          <div className="border-b border-border p-4">
            <div className="text-xs font-medium text-muted-foreground uppercase tracking-wide mb-2 pb-1 border-b border-border/50">
              Analysis
            </div>
            <div className="text-sm text-foreground whitespace-pre-wrap leading-relaxed">
              {analysis}
            </div>
            {winnerLabel && (
              <div className="mt-2 text-sm font-medium text-emerald-600 dark:text-emerald-400">
                Winner: {winnerLabel}
              </div>
            )}
          </div>
        )}
        {/* Empty state */}
        {contestantRows.length === 0 && !analysis && (
          <div className="flex items-center justify-center h-24 text-sm text-muted-foreground">
            Starting battle…
          </div>
        )}
        {/* Contestant roster */}
        <div className="divide-y divide-border">
          {contestantRows.map((row) => (
            <ContestantRow
              key={row.data.id}
              row={row}
              isExpanded={expandedId === row.data.id}
              onToggle={() => toggleExpand(row.data.id)}
              isWinner={winnerId === row.data.id}
              battleId={state.battle_id}
              battleType={battle?.battle_type ?? state.battle_type}
            />
          ))}
        </div>
        {/* Cross-examination panel — available after battle finishes */}
        {!isRunning && (
          <CrossExaminationPanel
            battleId={state.battle_id}
            crossExams={crossExams}
            snapshot={snapshot}
          />
        )}
      </div>
    </div>
  );
 }
--- a/apps/web/src/hooks/sessionEvents.ts
+++ b/apps/web/src/hooks/sessionEvents.ts
@@ -3,7 +3,11 @@
 // also refresh the sidebar's session list).
 import type {
  ArenaState,
  BattleShape,
  Chat,
  ContestantShape,
  CrossExaminationShape,
  ErrorReason,
  HtmlArtifactState,
  MarkdownArtifactState,
@@ -231,6 +235,53 @@ export interface FlowRunStepUpdatedEvent {
  report?: string;
 }
 // Arena: emitted by "New Arena" menu items to request the launcher dialog.
 export interface OpenArenaLauncherEvent {
  type: 'open_arena_launcher';
  project_id: string;
  placement?: 'new' | 'split';
 }
 // Arena: emitted after a battle is created to open/focus the arena pane.
 export interface OpenArenaPaneEvent {
  type: 'open_arena_pane';
  state: ArenaState;
  placement?: 'new' | 'split';
 }
 // Arena: battle lifecycle frames forwarded from the coder user channel.
 export interface BattleStartedEvent {
  type: 'battle_started';
  battle_id: string;
  battle_type: 'coding' | 'qa';
  prompt: string;
  contestants: Array<{ id: string; identity: string; model: string; lane: 'local' | 'cloud' }>;
 }
 export interface ContestantUpdatedEvent {
  type: 'contestant_updated';
  battle_id: string;
  contestant_id: string;
  status?: 'queued' | 'running' | 'done' | 'error';
  duration_ms?: number;
  tokens_per_sec?: number;
  battle_status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
  delta?: string;
  error?: string;
 }
 export interface BattleUpdatedEvent {
  type: 'battle_updated';
  battle_id: string;
  status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
  winner_contestant_id?: string | null;
  analysis_ready?: boolean;
  cross_exam_id?: string;
 }
 // Re-export arena API shapes for consumers that need the full battle data.
 export type { BattleShape, ContestantShape, CrossExaminationShape };
 export type SessionEvent =
  | SessionRenamedEvent
  | ProjectCreatedEvent
@@ -262,7 +313,12 @@ export type SessionEvent =
  | OpenOrchestratorPaneEvent
  | FlowRunStartedEvent
  | FlowRunStepUpdatedEvent
-  | OpenFlowLauncherEvent;
+  | OpenFlowLauncherEvent
  | OpenArenaLauncherEvent
  | OpenArenaPaneEvent
  | BattleStartedEvent
  | ContestantUpdatedEvent
  | BattleUpdatedEvent;
 type Listener = (event: SessionEvent) => void;
 const listeners = new Set<Listener>();
--- a/apps/web/src/hooks/useCoderUserEvents.ts
+++ b/apps/web/src/hooks/useCoderUserEvents.ts
@@ -8,7 +8,13 @@
 import { useEffect } from 'react';
 import { WsFrameSchema } from '@boocode/contracts/ws-frames';
 import { sessionEvents } from './sessionEvents';
-import type { FlowRunStartedEvent, FlowRunStepUpdatedEvent } from './sessionEvents';
+import type {
  BattleStartedEvent,
  BattleUpdatedEvent,
  ContestantUpdatedEvent,
  FlowRunStartedEvent,
  FlowRunStepUpdatedEvent,
 } from './sessionEvents';
 const RECONNECT_INITIAL_MS = 1000;
 const RECONNECT_MAX_MS = 30_000;
@@ -49,6 +55,12 @@ export function useCoderUserEvents(): void {
          sessionEvents.emit(frame as unknown as FlowRunStartedEvent);
        } else if (frame.type === 'flow_run_step_updated') {
          sessionEvents.emit(frame as unknown as FlowRunStepUpdatedEvent);
        } else if (frame.type === 'battle_started') {
          sessionEvents.emit(frame as unknown as BattleStartedEvent);
        } else if (frame.type === 'contestant_updated') {
          sessionEvents.emit(frame as unknown as ContestantUpdatedEvent);
        } else if (frame.type === 'battle_updated') {
          sessionEvents.emit(frame as unknown as BattleUpdatedEvent);
        }
      };
--- a/apps/web/src/hooks/useSessionStream.ts
+++ b/apps/web/src/hooks/useSessionStream.ts
@@ -204,6 +204,13 @@ function applyFrame(state: State, frame: WsFrame): State {
      // No-op here to keep TS exhaustiveness satisfied.
      return state;
    }
    case 'battle_started':
    case 'contestant_updated':
    case 'battle_updated': {
      // Arena frames consumed by ArenaPane's own subscription.
      // No-op here to keep TS exhaustiveness satisfied.
      return state;
    }
  }
 }
--- a/apps/web/src/hooks/useSidebar.ts
+++ b/apps/web/src/hooks/useSidebar.ts
@@ -195,6 +195,13 @@ function applyEvent(prev: SidebarResponse, event: import('./sessionEvents').Sess
    case 'flow_run_step_updated':
      // Consumed by useWorkspacePanes / OrchestratorPane / FlowLauncherDialog; sidebar has no stake.
      return prev;
    case 'open_arena_launcher':
    case 'open_arena_pane':
    case 'battle_started':
    case 'contestant_updated':
    case 'battle_updated':
      // Consumed by useWorkspacePanes / ArenaPane / ArenaLauncherDialog; sidebar has no stake.
      return prev;
    case 'project_archived': {
      const next = prev.projects.filter((p) => p.id !== event.project_id);
      if (next.length === prev.projects.length) return prev;
--- a/apps/web/src/hooks/useWorkspacePanes.ts
+++ b/apps/web/src/hooks/useWorkspacePanes.ts
@@ -3,6 +3,7 @@ import type { DragEvent } from 'react';
 import { toast } from 'sonner';
 import { api } from '@/api/client';
 import type {
  ArenaState,
  ClosedPaneEntry,
  HtmlArtifactState,
  MarkdownArtifactState,
@@ -187,6 +188,16 @@ function orchestratorPane(state: OrchestratorState): WorkspacePane {
  };
 }
 function arenaPane(state: ArenaState): WorkspacePane {
  return {
    id: generateId(),
    kind: 'arena',
    chatIds: [],
    activeChatIdx: -1,
    arena_state: state,
  };
 }
 // v1.9: settings panes are ephemeral. Filter them out before persisting so a
 // page reload always returns to a clean workspace; the user re-opens via the
 // sidebar Settings button when needed.
@@ -290,6 +301,8 @@ export interface UseWorkspacePanesResult {
  createTab: (paneIdx: number, kind: WorkspaceTabKind) => Promise<void>;
  /** Open an orchestrator run pane (or focus an existing one for the same run_id). */
  addOrchestratorPane: (state: OrchestratorState) => string | null;
  /** Open an arena battle pane (or focus an existing one for the same battle_id). */
  addArenaPane: (state: ArenaState) => string | null;
  /** Back-compat alias for createTab(paneIdx, 'coder'). */
  createCoderTab: (paneIdx: number) => Promise<void>;
  // Open-on-first-click, close-on-second-click. Singleton — settings panes
@@ -877,6 +890,38 @@ export function useWorkspacePanes(sessionId: string): UseWorkspacePanesResult {
    });
  }, [addOrchestratorPane]);
  const addArenaPane = useCallback((state: ArenaState): string | null => {
    let openedId: string | null = null;
    setPanes((prev) => {
      const existingIdx = prev.findIndex(
        (p) => p.kind === 'arena' && p.arena_state?.battle_id === state.battle_id,
      );
      if (existingIdx >= 0) {
        setActivePaneIdx(existingIdx);
        openedId = prev[existingIdx]!.id;
        return prev;
      }
      if (nonSettingsCount(prev) >= MAX_PANES) {
        toast.error(`Maximum ${MAX_PANES} panes`);
        return prev;
      }
      const newPane = arenaPane(state);
      openedId = newPane.id;
      const next = [...prev, newPane];
      setActivePaneIdx(next.length - 1);
      return next;
    });
    return openedId;
  }, []);
  // Arena pane: open via sessionEvents (fired by the launcher).
  useEffect(() => {
    return sessionEvents.subscribe((ev) => {
      if (ev.type !== 'open_arena_pane') return;
      addArenaPane(ev.state);
    });
  }, [addArenaPane]);
  // Returns the new settings pane id when one is OPENED (so mobile callers can
  // push ?pane= atomically — see addPaneAndSwitch), or null when it was closed.
  // Id generated outside the updater so a strict-mode double-invoke agrees.
@@ -1121,6 +1166,7 @@ export function useWorkspacePanes(sessionId: string): UseWorkspacePanesResult {
    addSplitPane,
    createTab,
    addOrchestratorPane,
    addArenaPane,
    createCoderTab,
    toggleSettingsPane,
    removePane,
--- a/apps/web/src/lib/permission-mode.ts
+++ b/apps/web/src/lib/permission-mode.ts
@@ -0,0 +1,55 @@
 // Unified permission ladder shown in the composer's permission picker. Maps a
 // curated three-option control (Plan / Ask Permission / Bypass) onto each
 // provider's native mode vocabulary, derived purely from the snapshot's mode
 // metadata (`plan` id, the default mode, and the `isUnattended` bypass mode).
 // `modeId` stays the single wire field sent to the dispatcher — there is no
 // separate persisted permission field; the active unified mode is derived from
 // the current `modeId`.
 import type { ProviderMode } from '@/api/types';
 export type PermissionMode = 'plan' | 'ask' | 'bypass';
 export const PERMISSION_LABELS: Record<PermissionMode, string> = {
  plan: 'Plan',
  ask: 'Ask Permission',
  bypass: 'Bypass',
 };
 /** The native modeId for a unified permission, or null when the provider has no
 *  modes (e.g. goose). `plan` → the `plan`-id mode; `bypass` → the `isUnattended`
 *  mode; `ask` → the non-unattended default. Falls back to defaultModeId. */
 export function nativeModeForPermission(
  mode: PermissionMode,
  modes: ProviderMode[],
  defaultModeId: string | null,
 ): string | null {
  if (modes.length === 0) return null;
  if (mode === 'plan') return modes.find((m) => m.id === 'plan')?.id ?? defaultModeId;
  if (mode === 'bypass') return modes.find((m) => m.isUnattended)?.id ?? defaultModeId;
  return (
    modes.find((m) => m.id === defaultModeId && !m.isUnattended)?.id ??
    modes.find((m) => !m.isUnattended && m.id !== 'plan')?.id ??
    defaultModeId
  );
 }
 /** Which unified permission a native modeId corresponds to (for picker state). */
 export function permissionForModeId(modeId: string | null, modes: ProviderMode[]): PermissionMode {
  if (!modeId) return 'ask';
  if (modeId === 'plan') return 'plan';
  if (modes.find((m) => m.id === modeId)?.isUnattended) return 'bypass';
  return 'ask';
 }
 /** The unified permission options a provider supports, in fixed Plan→Ask→Bypass
 *  order. Empty when the provider exposes no modes (no picker shown). */
 export function availablePermissionModes(
  modes: ProviderMode[],
 ): Array<{ id: PermissionMode; label: string }> {
  if (modes.length === 0) return [];
  const out: Array<{ id: PermissionMode; label: string }> = [];
  if (modes.some((m) => m.id === 'plan')) out.push({ id: 'plan', label: PERMISSION_LABELS.plan });
  out.push({ id: 'ask', label: PERMISSION_LABELS.ask });
  if (modes.some((m) => m.isUnattended)) out.push({ id: 'bypass', label: PERMISSION_LABELS.bypass });
  return out;
 }
--- a/boocode_roadmap.md
+++ b/boocode_roadmap.md
@@ -1,9 +1,11 @@
 # BooCode roadmap (v1.x–v2.x)
-Last updated: 2026-05-31
+Last updated: 2026-06-03
 > **Companion doc:** `boocode_code_review.md` holds the full external-repo inventory, lift rationale, and license analysis. This document is the canonical source for shipping state, version ordering, and what's planned vs. shipped.
 > **Shipped since this doc's body was written (v2.7.12–v2.7.17, 2026-06-02→03; see `CHANGELOG.md` for detail):** `v2.7.12-audit-cleanup` (repo-wide dead-code/dedup pass, ~−4,600 LOC), `v2.7.13-contracts-ssot` (the `@boocode/contracts` shared wire-contract package — the "unified types" deferred item), `v2.7.14-backlog-hardening` (5 v2-review items incl. external task-cancel, stall-timeout, retire `:9502` SPA), `v2.7.15-git-diff-panel` + `v2.7.16-container-git-safedir` (Files/Git tab), and `v2.7.17-orchestrator` (the in-app multi-agent Orchestrator on local Qwen). The "Write/edit robustness" and "Claude provider SDK" milestones below — previously marked "planned" — are also now shipped (see those sections).
 ## Overview
 BooCode is a **3-app monorepo** at `/opt/boocode/` (locked 2026-05-22):
@@ -452,9 +454,9 @@ The original plan (kept for record): expose `boocoder acp` (JSON-RPC over stdio)
 -----
-## Write/edit robustness (planned)
+## Write/edit robustness — SHIPPED
-**Status: planned, not started.** From the v2 review (`boocode_code_review_v2.md` §5b; `cline/cline`, Apache-2.0 — code-liftable). Two lifts that harden BooCoder's write surface where it's weakest for local quantized models:
+**Status: SHIPPED (by v2.7.x).** Both lifts are live: the fuzzy patch applier (`apps/coder/src/services/fuzzy-match.ts`, consumed by `pending_changes.ts` — `edit_file` is no longer exact-match) and the `git`-ref checkpoint snapshot (`apps/coder/src/services/checkpoints.ts` → `createCheckpoint`, private `refs/boocode/checkpoints/<id>` ref). The original "planned" note below is retained for provenance. From the v2 review (`boocode_code_review_v2.md` §5b; `cline/cline`, Apache-2.0 — code-liftable). Two lifts that harden BooCoder's write surface where it's weakest for local quantized models:
 1. **Fuzzy patch applier for `edit_file`.** BooCoder's `edit_file` is exact-match today (`apps/coder/src/services/pending_changes.ts` — `if (!content.includes(oldStr)) throw`; no whitespace/unicode tolerance, no multi-occurrence guard). Lift cline's tiered match ladder (exact → `trimEnd` → `trim` → Levenshtein ≥0.66) + unicode canonicalization (dashes, curly quotes, nbsp) + multi-occurrence guard; unmatched → warning, not throw. `apply-patch-parser.ts:347-431`.
 2. **`git stash create` + private-ref checkpoint.** A per-turn workspace snapshot that captures **all** state — including edits made by dispatched external agents (opencode/claude/qwen/goose), build artifacts, test side-effects — which BooCoder's current `rewind` cannot (it only reverse-applies BooCoder's own queued `pending_changes`). Snapshot stored under a private `refs/…/checkpoints/…` ref, restorable with conversation-trim in sync. `checkpoint-hooks.ts:177-253`.
@@ -463,9 +465,9 @@ The original plan (kept for record): expose `boocoder acp` (JSON-RPC over stdio)
 -----
-## Claude provider — SDK transport + native session resume (planned)
+## Claude provider — SDK transport + native session resume — SHIPPED (enabled 2026-06-03)
-**Status: planned, not started.** From the v2 review (`boocode_code_review_v2.md` §5h–§5i) + a direct read of the published SDK `.d.ts` (`@anthropic-ai/claude-agent-sdk@0.3.158`, reviewed 2026-05-31). Today BooCoder dispatches `claude` one-shot via PTY (`claude --output-format stream-json`) with no continuity. Plan:
+**Status: BUILT and ENABLED.** The Agent-SDK backend (`apps/coder/src/services/backends/claude-sdk.ts`) and the `PostgresSessionStore` (`claude-session-store.ts`, keyed `(chat_id, agent)`) are implemented; it was shipped behind the `CLAUDE_SDK_BACKEND` env flag (off by default in code) and is **enabled in `apps/coder/.env.host` (`CLAUDE_SDK_BACKEND=1`, confirmed live in the running host service)** — chat-tab `claude` tasks route through the warm SDK backend with native session resume instead of one-shot PTY. The original "planned" note below is retained for provenance. From the v2 review (`boocode_code_review_v2.md` §5h–§5i) + a direct read of the published SDK `.d.ts` (`@anthropic-ai/claude-agent-sdk@0.3.158`, reviewed 2026-05-31). Today BooCoder dispatches `claude` one-shot via PTY (`claude --output-format stream-json`) with no continuity. Plan:
 1. **Adopt the Agent SDK** (`@anthropic-ai/claude-agent-sdk`) over the PTY path. `query({ prompt, options })` yields structured `SDKMessage`s — `SDKSystemMessage` (`subtype:'init'`, carries the session id + tool/skill/mcp lists), `SDKPartialAssistantMessage` (`type:'stream_event'` deltas), `SDKResultMessage` (turn end) — no stdout scraping. `happy` (`slopus/happy`) is the working existence-proof.
 2. **Native session resume via a pluggable `SessionStore`.** Implement `PostgresSessionStore implements SessionStore` (5 methods: `append`/`load`/`listSessions`/`delete`/`listSubkeys`) over BooCode's Postgres, keyed by `(chat_id, agent)`; drive turns with `query({ options: { sessionStore, resume } })` and the SDK materializes the stored session for the CLI subprocess. **This supersedes happy's SessionStart-hook + jsonl-watcher** — that was a workaround predating the feature (happy pins SDK `^0.2.96`; the `SessionStore` API is `0.3.x`). `importSessionToStore()` migrates an existing local session; `InMemorySessionStore` is the reference shape.
--- a/docs/DEFERRED-WORK.md
+++ b/docs/DEFERRED-WORK.md
@@ -1,5 +1,7 @@
 # Deferred work — post stale cleanup (2026-05-26)
 > **⚠️ SUPERSEDED (2026-06-03): most items in this doc have since shipped.** Task cancel → abort ACP/PTY child (v2.7.14), unified `packages/types` (v2.7.13 `@boocode/contracts`), retire `apps/coder/web/` fallback SPA (v2.7.14), `console.debug`→pino in the xml-parser (v2.7.14), and the large-file splits (v2.7.12) are all done; the ACP cold-probe skip shipped earlier (v2.3). Treat this doc as historical — see `CHANGELOG.md` (v2.7.12–v2.7.17) for what actually shipped. Kept for the design rationale in the detail sections below.
 This document describes work intentionally **not** shipped in the 2026-05-26 stale/simplify batch. Each item needs a product or architecture decision before implementation. See also [`STALE-DEPRECATED.md`](./STALE-DEPRECATED.md) for what was resolved in that batch.
 Last updated: 2026-05-29
--- a/docs/adr/0001-arena-two-lane-scheduling.md
+++ b/docs/adr/0001-arena-two-lane-scheduling.md
@@ -0,0 +1,19 @@
 # Arena schedules contestants in a local lane (serial) and a cloud lane (parallel)
 A Battle runs the same prompt against 2–6 Contestants. The local llama-swap
 server can only hold one model in memory at a time, so llama-swap-backed
 Contestants are placed in a **local lane** and run strictly one at a time, while
 cloud-backed Contestants (Claude Code, OpenCode-on-cloud) run all in parallel in
 a **cloud lane**; the two lanes run concurrently. We chose this over running
 everything serially (too slow for cloud) or everything in parallel (impossible
 for local, and it would corrupt the speed Benchmark) because the single-model
 constraint is physical and the serial local lane also gives each local model an
 uncontended, fair tokens/sec measurement.
 ## Consequences
 - A Battle's wall-clock is roughly `max(slowest cloud contestant, sum of local
  contestants)`. Deep local lanes (especially all-local Q&A battles) are slow by
  design; the launcher warns when the local lane is deep.
 - The speed Benchmark (tokens/sec) is only meaningful for local-lane Contestants,
  which is acceptable since external CLI agents don't report token usage anyway.
--- a/docs/adr/0002-arena-dedicated-tables-not-flow-runner.md
+++ b/docs/adr/0002-arena-dedicated-tables-not-flow-runner.md
@@ -0,0 +1,22 @@
 # Arena gets dedicated battles/contestants tables and replaces the old API-only arena
 The Arena feature reuses the dispatcher, the `onTaskTerminal` advance hook, the
 streaming→WS-frame pipeline, and the pane pattern from the Orchestrator, but
 persists to its **own `battles` + `contestants` tables** rather than the
 Orchestrator's `flow_runs`/`flow_steps`. A Battle is not shaped like a flow — it
 has two scheduling lanes, per-contestant benchmarks, on-disk results folders, a
 two-stage analysis, and cross-examinations — so modelling it as flow steps would
 fight the schema. Each Contestant links to a real `tasks` row via `task_id`,
 inheriting all worktree/streaming/dispatch machinery. This also **replaces the
 earlier v2.0.5 API-only arena** (`POST /api/arena`, `tasks.arena_id`,
 select-winner): that feature had no UI and no users, and the new Arena is a
 strict superset, so the old routes and the `tasks.arena_id` column are removed
 rather than left as a second, competing "arena" concept.
 ## Consequences
 - Analysis and cross-examination run through a small pluggable **Analyzer** seam
  (v1 = default-model two-stage judge). A v2 that drives a Han Orchestrator flow
  as the analyzer slots in behind that seam without a schema change.
 - The `arena` pane kind, `ArenaState`, and `battle_*` WS frames are added
  alongside (not folded into) the Orchestrator's, mirroring its patterns.
--- a/openspec/changes/archived/contracts-ssot/proposal.md
+++ b/openspec/changes/archived/contracts-ssot/proposal.md
--- a/openspec/changes/archived/contracts-ssot/tasks.md
+++ b/openspec/changes/archived/contracts-ssot/tasks.md
--- a/openspec/changes/archived/orchestrator/artifacts/.discovery-notes.md
+++ b/openspec/changes/archived/orchestrator/artifacts/.discovery-notes.md
--- a/openspec/changes/archived/orchestrator/artifacts/design-context.md
+++ b/openspec/changes/archived/orchestrator/artifacts/design-context.md
--- a/openspec/changes/archived/orchestrator/artifacts/implementation-decision-log.md
+++ b/openspec/changes/archived/orchestrator/artifacts/implementation-decision-log.md
--- a/openspec/changes/archived/orchestrator/artifacts/implementation-iteration-history.md
+++ b/openspec/changes/archived/orchestrator/artifacts/implementation-iteration-history.md
--- a/openspec/changes/archived/orchestrator/design.md
+++ b/openspec/changes/archived/orchestrator/design.md
--- a/openspec/changes/archived/orchestrator/proposal.md
+++ b/openspec/changes/archived/orchestrator/proposal.md
--- a/openspec/changes/archived/orchestrator/tasks.md
+++ b/openspec/changes/archived/orchestrator/tasks.md
--- a/packages/contracts/package.json
+++ b/packages/contracts/package.json
@@ -28,6 +28,10 @@
    "./worktree-risk": {
      "types": "./dist/worktree-risk.d.ts",
      "default": "./dist/worktree-risk.js"
    },
    "./arena": {
      "types": "./dist/arena.d.ts",
      "default": "./dist/arena.js"
    }
  },
  "scripts": {
--- a/packages/contracts/src/arena.ts
+++ b/packages/contracts/src/arena.ts
@@ -0,0 +1,55 @@
 /** Arena types — single source of truth for cross-app Arena wire contracts. */
 export type BattleType = 'coding' | 'qa';
 export type BattleStatus = 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
 export type ContestantStatus = 'queued' | 'running' | 'done' | 'error';
 export type ContestantLane = 'local' | 'cloud';
 // Pane state — carried on the WorkspacePane row, mirrors OrchestratorState.
 export interface ArenaState {
  battle_id: string;
  battle_type: BattleType;
  prompt: string;
 }
 export interface BattleShape {
  id: string;
  project_id: string;
  battle_type: BattleType;
  prompt: string;
  status: BattleStatus;
  winner_contestant_id: string | null;
  results_path: string | null;
  error: string | null;
  created_at: string;
  updated_at: string;
 }
 export interface ContestantShape {
  id: string;
  battle_id: string;
  /** Backend name (coding) or persona name (qa). Unique per (battle, model) pair. */
  identity: string;
  model: string;
  lane: ContestantLane;
  task_id: string | null;
  worktree_id: string | null;
  status: ContestantStatus;
  duration_ms: number | null;
  tokens_per_sec: number | null;
  cost_tokens: number | null;
  result_path: string | null;
  error: string | null;
  created_at: string;
  updated_at: string;
 }
 export interface CrossExaminationShape {
  id: string;
  battle_id: string;
  /** Backend + model performing the cross-examination. */
  identity: string;
  model: string;
  verdict: string | null;
  created_at: string;
 }
--- a/packages/contracts/src/ws-frames.ts
+++ b/packages/contracts/src/ws-frames.ts
@@ -358,6 +358,53 @@ export const FlowRunStepUpdatedFrame = z.object({
  report: z.string().optional(),
 });
 // ---- arena frames ----------------------------------------------------------
 const ContestantManifestEntry = z.object({
  id: Uuid,
  identity: z.string().min(1),
  model: z.string().min(1),
  lane: z.enum(['local', 'cloud']),
 });
 // Published once when a battle starts. Carries the contestant roster so the
 // ArenaPane can build its grid immediately.
 export const BattleStartedFrame = z.object({
  type: z.literal('battle_started'),
  battle_id: Uuid,
  battle_type: z.enum(['coding', 'qa']),
  prompt: z.string(),
  contestants: z.array(ContestantManifestEntry),
 });
 // Published on every contestant status change or streaming update.
 // `delta` carries the latest chunk of streaming output while status='running'.
 // `battle_status` is present only on the final transition that closes the battle.
 export const ContestantUpdatedFrame = z.object({
  type: z.literal('contestant_updated'),
  battle_id: Uuid,
  contestant_id: Uuid,
  status: z.enum(['queued', 'running', 'done', 'error']).optional(),
  duration_ms: z.number().int().nonnegative().optional(),
  tokens_per_sec: z.number().nonnegative().optional(),
  battle_status: z.enum(['pending', 'running', 'completed', 'failed', 'cancelled']).optional(),
  delta: z.string().optional(),
  error: z.string().optional(),
 });
 // Published when battle-level state changes that don't ride on a contestant
 // update: analysis finished, winner set, cross-exam verdict ready. The pane
 // uses this to update its analysis panel and winner badge without a refetch.
 // Fields are all optional — publishers include only what changed.
 export const BattleUpdatedFrame = z.object({
  type: z.literal('battle_updated'),
  battle_id: Uuid,
  status: z.enum(['pending', 'running', 'completed', 'failed', 'cancelled']).optional(),
  winner_contestant_id: Uuid.nullable().optional(),
  analysis_ready: z.boolean().optional(),
  cross_exam_id: Uuid.optional(),
 });
 // ---- discriminated union ---------------------------------------------------
 export const WsFrameSchema = z.discriminatedUnion('type', [
@@ -381,6 +428,10 @@ export const WsFrameSchema = z.discriminatedUnion('type', [
  // orchestrator
  FlowRunStartedFrame,
  FlowRunStepUpdatedFrame,
  // arena
  BattleStartedFrame,
  ContestantUpdatedFrame,
  BattleUpdatedFrame,
  // per-user
  ChatStatusFrame,
  SessionUpdatedFrame,
@@ -425,6 +476,9 @@ export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
  'agent_status_updated',
  'flow_run_started',
  'flow_run_step_updated',
  'battle_started',
  'contestant_updated',
  'battle_updated',
  'chat_status',
  'session_updated',
  'session_renamed',
Author	SHA1	Message	Date
indifferentketchup	d6d246c15b	feat(web,coder): arena pane — compare 2-6 AI competitors on same prompt Arena is a new pane kind for competitive AI evaluation. A Battle runs the same prompt against 2-6 Contestants across two concurrent lanes: local lane (llama-swap models, serial) and cloud lane (parallel). Added to all three registries: @boocode/contracts WsFrameSchema, server InferenceFrame, and web WsFrame. Backend (apps/coder): - arena-runner: battle scheduler, lane classifier, benchmark, results writer, resume, user winner override - arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL - arena-decisions: status transitions and resume logic (unit-tested) - arena-analyzer-helpers: pure helper functions (unit-tested) - arena-model-call: model call utility for analysis - arena routes: create/get/list/stop/analyze/cross-examine/winner/diff - schema: battles, contestants, cross_examinations tables (idempotent) - remove old /api/arena* routes and tasks.arena_id column Frontend (apps/web): - ArenaLauncherDialog: battle type, prompt, contestant selection - ArenaPane: live roster, streaming output, analysis, cross-exam - DiffView: unified diff with line-by-line color for coding contests - Winner override per-row dropdown (Trophy icon) - battle_updated WS handler for live winner/analysis updates - arena pane kind in Workspace, ChatTabBar, useSidebar Cross-app: - ArenaState and ArenaContestantShape/WsFrame types (contracts) - battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame - manifest.json written per battle results folder - /Arena added to .gitignore Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 23:25:29 +00:00
indifferentketchup	e04d0fdaa8	feat(coder): unified Plan/Ask/Bypass permission picker Replace the raw per-agent mode dropdown in the BooCoder composer with a curated three-option permission ladder mapped generically onto each provider's native modes: `plan` id -> Plan, default -> Ask, isUnattended -> Bypass (claude bypassPermissions, qwen yolo, opencode full-access). modeId stays the single wire field; the active unified mode is derived from it (no contracts change). Native BooCode gains its own mode set: Ask stages to the pending-changes queue (today's behavior), Bypass auto-applies the queue to disk after the turn (interactive messages path + task dispatcher path), Plan falls back to Ask. The shared apps/server inference engine is left untouched. Also preserve isUnattended on live-probed ACP modes so opencode's bypass mode stays detectable from the wire. Coder 373 tests green; coder + web typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 15:14:21 +00:00
indifferentketchup	da36344d0b	style(web): outline the slash-picker chevron buttons Give the expand chevrons the BooCoder outline-button look (border-border bg-background, hover:bg-muted, filled when expanded) instead of the borderless ghost style. Applies to both BooChat's flat menu and BooCoder's grouped menu. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 17:00:49 +00:00
indifferentketchup	875cae0843	fix(coder): parse YAML block-scalar descriptions in slash command discovery Most plugin/han SKILL.md and command files write `description:` as a folded block scalar (`>` / `\|`) with the text on the following indented lines. The old single-line frontmatter reader captured the literal `>`, so the slash menu showed garbage/blank descriptions for nearly all of them. frontmatterField now collapses folded blocks (join with spaces) and preserves literal blocks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 17:00:49 +00:00
indifferentketchup	4caa5f91ff	docs: CLAUDE.md notes for Orchestrator + gitignored .env.host Document the in-app Orchestrator engine and its load-bearing read-only invariant in apps/coder/CLAUDE.md, and note that apps/coder/.env.host is now gitignored (recreated from .env.example with CLAUDE_SDK_BACKEND=1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 16:48:50 +00:00
indifferentketchup	1d416d0cf9	docs: refresh README + CURRENT.md for v2.7.17 (Orchestrator) Bring README current (was v2.2.1): add the Orchestrator, the Files/Git diff panel, persistent agent sessions + claude Agent-SDK, fix the provider list (5 — cursor/copilot retired), drop the broken AGENTS.md link, update latest release + planned. Refresh CURRENT.md to v2.7.17 on main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 16:43:53 +00:00
indifferentketchup	bfda61e27e	chore: stop tracking apps/coder/.env.host Untrack the host env file (git rm --cached, kept on disk for the boocoder service) and widen .gitignore to .env.* (re-including .env.example) so env files no longer get committed. The file's prior contents (dev DB password + internal Tailscale URLs; no API keys) remain in history — left as-is given the single-user Tailscale-only threat model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 16:32:03 +00:00
indifferentketchup	a734615480	docs: archive shipped openspec changes, refresh roadmap + DEFERRED-WORK Move openspec/changes/{contracts-ssot,orchestrator} → archived/ (both shipped, v2.7.13 and v2.7.17). Mark the roadmap's "Write/edit robustness" and "Claude provider SDK" milestones as shipped (fuzzy-match.ts + checkpoints.ts; the claude-sdk backend is live via CLAUDE_SDK_BACKEND in .env.host) and add a v2.7.12–v2.7.17 shipped summary. Flag DEFERRED-WORK.md as superseded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 16:30:01 +00:00