Compare commits
10 Commits
v2.7.17-or
...
v2.7.21-ed
| Author | SHA1 | Date | |
|---|---|---|---|
| cce685b1a7 | |||
| dbf1662982 | |||
| d6d246c15b | |||
| e04d0fdaa8 | |||
| da36344d0b | |||
| 875cae0843 | |||
| 4caa5f91ff | |||
| 1d416d0cf9 | |||
| bfda61e27e | |||
| a734615480 |
3
.gitignore
vendored
3
.gitignore
vendored
@@ -1,6 +1,8 @@
|
||||
node_modules
|
||||
dist
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
|
||||
# Claude / Cursor (local agent & IDE config — CLAUDE.md and AGENTS.md stay tracked)
|
||||
.claude/
|
||||
@@ -18,3 +20,4 @@ data/*
|
||||
!data/mcp.example.json
|
||||
!data/coder-providers.example.json
|
||||
codecontext/fork.tar.gz
|
||||
/Arena
|
||||
|
||||
@@ -2,6 +2,14 @@
|
||||
|
||||
All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch.
|
||||
|
||||
## v2.7.20-arena-pane — 2026-06-06
|
||||
|
||||
Adds the **Arena** pane for running the same prompt against 2–6 AI competitors simultaneously and picking the best result. A Battle is one Arena run: pick a battle type (Coding — backend+model with git worktrees producing diffs; or Q&A — BooChat persona+model producing text), write or generate a prompt, add contestants, and hit Start. Contestants are scheduled in two concurrent lanes — the local lane (llama-swap models, serial) and the cloud lane (Claude Code, OpenCode-on-cloud, parallel). The lane scheduler captures wall-clock duration for every contestant and tokens/sec for local models. When all contestants finish, a two-stage analysis (digest then judge) auto-runs on the DEFAULT_MODEL, writing `analysis.md` naming a winner; the user can override the winner per-row or trigger cross-examination. Results land in `/<project-root>/Arena/<dated-battle>/` with per-contestant `result.md`, diff patches for coding, and `manifest.json`. Replaces the old API-only `POST /api/arena` with dedicated `battles`/`contestants`/`cross_examinations` tables and full UI. Also adds a `DiffView` component with line-by-line colored unified diff and a per-row dropdown for winner override. Built on `v2.7.18-permission-modes`; pairs conceptually with the earlier `v2.7.17-orchestrator` multi-agent work (both share the pane kind pattern and `onTaskTerminal` hook).
|
||||
|
||||
## v2.7.18-permission-modes — 2026-06-05
|
||||
|
||||
Adds a unified **permission picker** to the BooCoder composer — Plan / Ask Permission / Bypass — replacing the old raw per-agent mode dropdown that exposed each agent's full native vocabulary with inconsistent labels. The three options map generically onto every provider's existing mode metadata: the `plan`-id mode → Plan, the default mode → Ask, the `isUnattended` mode → Bypass (claude `bypassPermissions`, qwen `yolo`, opencode `full-access`); goose has no modes so it shows no picker, exactly as before. `modeId` stays the single wire field — the active unified mode is derived from it, so no contracts change was needed. Native BooCode gains its own mode set (registered in the manifest and exposed by the snapshot): **Ask** stages edits to the pending-changes queue as today, **Bypass** auto-applies the queue to disk after the turn (both the interactive messages path and the task-based dispatcher path), and **Plan** falls back to Ask — the shared `apps/server` inference engine is deliberately left untouched. A supporting fix preserves the `isUnattended` flag on live-probed ACP modes (`acp-derive.ts`) so opencode's bypass mode is still detectable from the wire. Coder 373 tests green, coder + web typecheck clean. Built on `v2.7.17-orchestrator`.
|
||||
|
||||
## v2.7.17-orchestrator — 2026-06-03
|
||||
|
||||
Brings the deterministic multi-agent "conductor" into the app as the **Orchestrator**: launch any read-only Han flow (research, code-review, investigate, architectural-analysis, security-review, …) from BooChat or BooCoder and watch each specialist agent stream live in a Paseo-style run pane, ending with an evidence-disciplined, adversarially-validated report — all on free local Qwen, persisted and resumable. Built and audited end-to-end via `paseo-epic` in an isolated worktree, on top of the prior `/opt/boocode/conductor` standalone CLI: the conductor's 22 flow definitions, Spine factory, and Han evidence/YAGNI contracts were re-homed into `apps/coder/src/conductor`, and a new DB-backed flow-runner (`flow_runs`/`flow_steps`) dispatches each step as a real BooCoder task through the existing dispatcher — reusing its streaming→WS-frame pipeline and worktree-as-read-snapshot, with an `onTaskTerminal` hook that advances the wave and a startup resume that re-dispatches in-flight steps after a coder restart. Read-only is enforced hard: every step is dispatched `qwen --approval-mode plan`, an adversarial-security review caught and closed a bypass where a qwen-unavailable task silently fell through to write-capable native inference (now fails closed), and the ACP path's mode-set was made fail-closed too. The UI adds a fourth `orchestrator` pane kind (collapsed agent roster, expand-one live stream, report on top), a Workflow button + slash flows on the shared `ChatInput` for full BooChat/BooCoder parity, a "New Orchestrator" entry in the + and split menus, a category-grouped launcher dialog, runs history, and export (copy / save-to-file / send-to-chat) — fed by two new `flow_run_*` WS frames on a coder user channel. Qwen-only by design (Claude Code remains the Claude path); the existing model-competition Arena stays a separate feature. The flow launcher and the `/` slash menu both carry chevron-expandable per-item explanations (an always-on one-liner expands to a 1–2 sentence what-it-does / when-to-use blurb, condensed from each Han skill's own description), with a "read-only" pill pinned in the launcher and the fast/concise toggle wired through to the workers. Spec/plan in `openspec/changes/orchestrator`; coder 373 tests green (42 new scheduler/resume/read-only decision tests), contracts/coder/server builds + web tsc clean. Built on `v2.7.16-container-git-safedir`; pairs conceptually with the earlier `v2.7.12-audit-cleanup` multi-agent orchestration.
|
||||
|
||||
@@ -74,11 +74,11 @@ Schema CHECK migration order when renaming allowed values: (1) `ALTER TABLE ...
|
||||
|
||||
Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only add-existing scope), `BOOTSTRAP_ROOT` (/opt/projects, writable bootstrap mkdir target — host must `mkdir -p` it before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale; the public host is behind Authelia, unusable from server context), `BOOCODE_TOOLS` (`core`|`standard`|`all`, default `all`; a ceiling, never expands an agent's whitelist), `MCP_CONFIG_PATH` (default `/data/mcp.json`, opencode `mcpServers` shape; missing = no MCP), `CONTEXT7_API_KEY` (the Context7 MCP key, referenced from `data/mcp.json` as `"{env:CONTEXT7_API_KEY}"`). `data/mcp.json` is **gitignored** but no longer holds secrets — string values support opencode-style `{env:VAR}` substitution (`mcp-config.ts:substituteEnvVars`, applied before Zod validation; unset var → `''` + warn), so real keys live in `.env`; template `data/mcp.example.json`. A config-only edit there needs only `docker compose restart boocode` (data/ is bind-mounted); changing a referenced secret edits `.env`. MCP loads at server startup with per-server graceful degradation; the coder does NOT load MCP (BooChat only).
|
||||
|
||||
BooCoder at port 9502: `curl http://100.114.205.53:9502/api/health`. Runs as `boocoder.service` on the host (not Docker). Deploy: `pnpm -C packages/contracts build && pnpm -C apps/server build && pnpm -C apps/coder build && sudo systemctl restart boocoder`. Health reports tool count: `{"ok":true,"db":true,"tools":33}`.
|
||||
BooCoder at port 9502: `curl http://100.114.205.53:9502/api/health`. Runs as `boocoder.service` on the host (not Docker). Its env file `apps/coder/.env.host` is gitignored (`.env.*`, with `!.env.example`) — a fresh host recreates it from `.env.example` (incl. `CLAUDE_SDK_BACKEND=1` for the Claude Agent-SDK backend). Deploy: `pnpm -C packages/contracts build && pnpm -C apps/server build && pnpm -C apps/coder build && sudo systemctl restart boocoder`. Health reports tool count: `{"ok":true,"db":true,"tools":33}`.
|
||||
|
||||
- `FAST_MODEL` (optional) — cheaper model for titles, summaries, labeling (auto_name.ts, tool-summaries.ts). Falls back to session model or DEFAULT_MODEL. Set to a small llama-swap model (e.g. `nemotron-nano-4b`) to avoid loading the 35B for 20-token calls.
|
||||
- Qwen Code dispatch: `OPENAI_BASE_URL=http://100.101.41.16:8401/v1 OPENAI_API_KEY=dummy qwen -p "<task>" --output-format stream-json`. Install: `npm install -g @qwen-code/qwen-code@latest`. Node ≥22 on host (container stays Node 20; BooCoder dispatches via direct spawn on host). No `--yolo` flag — `-p` runs autonomously without prompts. ACP bridge is an HTTP daemon (not stdio); use PTY dispatch.
|
||||
- Arena: `POST /api/arena {project_id, input, contestants: [{agent?, model?}]}` dispatches the same task to N models/agents in parallel; each contestant gets its own task + worktree. `GET /api/arena/:id` for results; `POST /api/arena/:id/select/:task_id` picks a winner.
|
||||
- Arena: `POST /api/battles {project_id, battle_type, prompt, contestants}` starts a battle; `GET /api/battles/:id` returns battle + contestants + cross-examinations; `POST /api/battles/:id/stop` cancels; `POST /api/battles/:id/analyze` triggers/re-triggers two-stage digest→judge analysis; `GET /api/battles/:id/analysis` reads `analysis.md`; `POST /api/battles/:id/cross-examine {identity, model}` runs a cross-examination. All `/api/battles*` routes are served by `apps/coder` at port 9502 (proxied through `apps/server` as `/api/coder/battles*`).
|
||||
|
||||
## Workflow
|
||||
|
||||
|
||||
67
CONTEXT.md
Normal file
67
CONTEXT.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Context: BooCode
|
||||
|
||||
Glossary of the domain language. Terms only — no implementation detail.
|
||||
|
||||
## Workspace
|
||||
|
||||
- **Pane** — one tile in the multi-pane workspace. Each pane has a *kind*:
|
||||
Chat (BooChat), Coder (BooCoder), Terminal (BooTerm), Orchestrator, Arena,
|
||||
plus artifact/settings kinds.
|
||||
|
||||
- **Backend** — an AI engine a task is dispatched to: *native* (BooChat
|
||||
inference on a local llama-swap model) or an *external* CLI agent (Claude Code,
|
||||
OpenCode, Qwen, Goose). Code sometimes calls this the "agent" (`tasks.agent`).
|
||||
|
||||
- **BooChat Agent** (a.k.a. *persona*) — a preset from the `data/AGENTS.md`
|
||||
registry (e.g. "Code Reviewer", "Debugger"): a system prompt + tool whitelist +
|
||||
sampling knobs that runs **on the native backend** with a chosen model.
|
||||
Distinct from a Backend — this is the overloaded sense of "agent" the UI's
|
||||
Agent picker selects.
|
||||
|
||||
## Arena
|
||||
|
||||
A way to run the **same prompt** against several AI competitors at once and pick
|
||||
the best result.
|
||||
|
||||
- **Battle** — one Arena run. Dated. Produces a results folder at
|
||||
`/<project-root>/Arena/<dated-battle>/`. (The earlier API-only feature called
|
||||
this an "arena"; a Battle is one such run.)
|
||||
|
||||
- **Battle Type** — what is being compared:
|
||||
- *Coding* — Contestants change code; a result is the **diff** they produced
|
||||
(plus their explanation). Each Contestant works in its own worktree.
|
||||
- *Q&A* — Contestants answer a prompt; a result is the **text answer**. No
|
||||
code changes.
|
||||
|
||||
- **Contestant** — one competitor in a Battle, given the Battle's prompt. What
|
||||
defines a Contestant depends on Battle Type:
|
||||
- *Coding* — a **Backend + Model** (e.g. Claude Code + opus, native BooCode +
|
||||
35b). Each works in its own isolated git **worktree** (a branched on-disk
|
||||
copy of the project). Contestants do not see each other's work.
|
||||
- *Q&A* — a **BooChat Agent (persona) + Model** (e.g. Debugger + 35b), running
|
||||
on the native backend only. No worktree (no code changes).
|
||||
The same model can appear under two Contestants, so a Contestant's identity is
|
||||
the (backend-or-persona, model) pair, not the model alone.
|
||||
|
||||
- **Benchmark** — per-Contestant performance captured during a Battle. Wall-clock
|
||||
**duration** is recorded for every Contestant; **throughput** (tokens/sec) is
|
||||
recorded only for local (llama-swap) models, which are the ones the speed
|
||||
comparison is meaningful for.
|
||||
|
||||
- **Arena results folder** (`/<project-root>/Arena/<dated-battle>/`) — where a
|
||||
Battle's *results* are written (not the working copies — those stay in each
|
||||
Contestant's worktree). Holds the per-Contestant result and the final
|
||||
analysis.
|
||||
|
||||
- **Lane** — how a Battle's Contestants are scheduled. The *local lane* holds
|
||||
every llama-swap-backed Contestant and runs them strictly one at a time (the
|
||||
local server can only load one model at a time, which also keeps their speed
|
||||
Benchmark fair). The *cloud lane* holds cloud-backed Contestants (Claude Code,
|
||||
OpenCode-on-cloud) and runs them all in parallel. The two lanes run
|
||||
concurrently with each other.
|
||||
|
||||
- **Analysis** — an end-of-Battle judgement of the Contestants' results,
|
||||
produced by the default BooChat model, naming a **Winner**.
|
||||
|
||||
- **Cross-examination** — an after-the-Battle step where a chosen model (from any
|
||||
agent) is pointed at the Battle's results to interrogate / compare them.
|
||||
@@ -1,9 +1,9 @@
|
||||
# Current focus
|
||||
|
||||
Last updated: 2026-06-02
|
||||
Last updated: 2026-06-05
|
||||
|
||||
- **Last shipped:** `v2.7.8-ember-coder-tabs-model-chips` (2026-06-01)
|
||||
- **Branch:** `codebase-audit-cleanup` (audit + cleanup epic, off main HEAD)
|
||||
- **In progress:** Phase 3 — stale comments + docs refresh
|
||||
- **Last shipped:** `v2.7.18-permission-modes` (2026-06-05) — unified Plan/Ask/Bypass permission picker in the BooCoder composer (incl. native-BooCode auto-apply on Bypass).
|
||||
- **Branch:** `main`
|
||||
- **In progress:** nothing committed — dogfooding the Orchestrator to surface the next real backlog. Claude Agent-SDK backend enabled (`CLAUDE_SDK_BACKEND`). Optional/exploratory: verify-gate ensembler over pending changes.
|
||||
|
||||
See `CHANGELOG.md` for the full shipped history. That file is always authoritative; this file is a quick orientation pointer only.
|
||||
|
||||
15
README.md
15
README.md
@@ -1,10 +1,10 @@
|
||||
# boocode
|
||||
|
||||
Self-hosted single-user developer chat app. 3-app monorepo: BooChat (read-only chat), BooCoder (write tools + agent dispatch), BooTerm (PTY terminals).
|
||||
Self-hosted single-user developer chat app. 3-app monorepo: BooChat (read-only chat), BooCoder (write tools + agent dispatch), BooTerm (PTY terminals) — plus the in-app **Orchestrator**, a deterministic multi-agent conductor that runs read-only Han analysis/review flows on local Qwen.
|
||||
|
||||
**Latest release:** `v2.2.1-pane-scoped-chats` (2026-05-26) · [`CHANGELOG.md`](CHANGELOG.md) · **Current focus:** [`CURRENT.md`](CURRENT.md)
|
||||
**Latest release:** `v2.7.17-orchestrator` (2026-06-03) · [`CHANGELOG.md`](CHANGELOG.md) · **Current focus:** [`CURRENT.md`](CURRENT.md)
|
||||
|
||||
**Agent navigation:** [`AGENTS.md`](AGENTS.md) · **Architecture:** [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) · **Engineering reference:** [`CLAUDE.md`](CLAUDE.md)
|
||||
**Architecture:** [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) · **Engineering reference:** [`CLAUDE.md`](CLAUDE.md) · **Roadmap:** [`boocode_roadmap.md`](boocode_roadmap.md)
|
||||
|
||||
## Stack
|
||||
|
||||
@@ -75,15 +75,16 @@ curl http://100.114.205.53:9502/api/health
|
||||
|
||||
## What's shipped
|
||||
|
||||
See [`boocode_roadmap.md`](boocode_roadmap.md) for full version history. Highlights as of **v2.2.1**:
|
||||
See [`boocode_roadmap.md`](boocode_roadmap.md) and [`CHANGELOG.md`](CHANGELOG.md) for full version history. Highlights as of **v2.7.17**:
|
||||
|
||||
- **BooChat**: streaming chat, file-read tools, compaction, reasoning support, HTML/Markdown artifact panes, cross-repo read grants, MCP client (multi-server + stdio), tool-cost tracking, skills system, builtin agent registry, multi-pane workspace (chat / terminal / coder)
|
||||
- **BooChat**: streaming chat, file-read tools, compaction, reasoning support, HTML/Markdown artifact panes, cross-repo read grants, MCP client (multi-server + stdio), tool-cost tracking, skills system, builtin agent registry, multi-pane workspace (chat / terminal / coder / orchestrator)
|
||||
- **BooTerm**: in-browser terminal panes via tmux + xterm.js, per-session tmux sessions, SSH-out support
|
||||
- **BooCoder (v2.2)**: write tools (`edit_file`, `create_file`, `delete_file`, `apply_pending`, `rewind`), pending-changes queue with diff UI, Paseo-style provider snapshot (7 providers: boocode, cursor, claude, opencode, goose, qwen, copilot), `AgentComposerBar` (provider / mode / model / thinking), ACP dispatch with inline permission prompts + tool/reasoning streaming, PTY fallback, Arena, MCP server (6 tools, stdio), CLI client, human inbox, Boomerang orchestration, path-guard fuzz suite, **pane-scoped chats** (v2.2.1 — each coder/terminal pane owns its chat)
|
||||
- **BooCoder**: write tools (`edit_file` with fuzzy matching, `create_file`, `delete_file`, `apply_pending`, `rewind`, git-ref checkpoints), pending-changes queue + a **Files/Git diff panel** (stage / commit / discard), provider snapshot (5 providers: boocode, claude, opencode, goose, qwen — cursor/copilot retired), `AgentComposerBar`, warm ACP + **persistent agent sessions** (opencode HTTP server; claude via the Agent SDK with native session resume) + PTY fallback, config-backed provider lifecycle, Arena (same task → N models), MCP server, CLI client, human inbox, Boomerang orchestration, pane-scoped chats
|
||||
- **Orchestrator** (v2.7.17): launch any of 22 read-only Han flows (research, code-review, investigate, architectural-analysis, …) from BooChat or BooCoder via the Workflow button, a slash command, or **+ menu → New Orchestrator**; each step runs as a bounded agent on local Qwen (hard read-only via `qwen --approval-mode plan`), streaming live in a Paseo-style run pane with an evidence-disciplined, adversarially-validated report. Persisted + resumable. `@boocode/contracts` single-sources the cross-app wire contracts (v2.7.13).
|
||||
|
||||
## Planned
|
||||
|
||||
- **v2.3 provider lifecycle** — config-backed provider registry (`/data/coder-providers.json`), enable/disable toggles, two-tier probe (openspec drafted). See [`CURRENT.md`](CURRENT.md).
|
||||
Most prior roadmap milestones have shipped (see [`boocode_roadmap.md`](boocode_roadmap.md)). What remains is optional/exploratory — e.g. a verify-gate ensembler over pending changes (majority-vote diff ranking). No committed milestones currently in flight.
|
||||
|
||||
## License
|
||||
|
||||
|
||||
@@ -1,17 +0,0 @@
|
||||
NODE_ENV=production
|
||||
PORT=9502
|
||||
HOST=100.114.205.53
|
||||
DATABASE_URL=postgres://boocode:devpass@127.0.0.1:5500/boochat
|
||||
LLAMA_SWAP_URL=http://100.101.41.16:8401
|
||||
PROJECT_ROOT_WHITELIST=/opt
|
||||
BOOTSTRAP_ROOT=/opt/projects
|
||||
DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4
|
||||
LOG_LEVEL=info
|
||||
SEARXNG_URL=http://100.114.205.53:8888
|
||||
GITEA_BASE_URL=https://git.indifferentketchup.com
|
||||
GITEA_USER=indifferentketchup
|
||||
GITEA_SSH_HOST=100.114.205.53:2222
|
||||
MCP_CONFIG_PATH=/data/mcp.json
|
||||
SKILLS_ROOT=/opt/boocode/data/skills
|
||||
CODER_PROVIDERS_PATH=/opt/boocode/data/coder-providers.json
|
||||
CLAUDE_SDK_BACKEND=1
|
||||
@@ -32,3 +32,8 @@
|
||||
- **Claude SDK backend tool RESULTS arrive as `type:'user'` SDK messages** (tool_result content blocks): `mapSdkMessage` (`claude-sdk-map.ts`) MUST map the `user` case → a terminal `tool_update` (completed/failed + output), else the tool_call persists `status:'running'` and the UI spinner never stops. The dispatcher's `tool_update` path then publishes + persists it.
|
||||
- **ACP command discovery is async**: `acp-probe.ts` must poll after `newSession` for `available_commands_update` (commands arrive in a later notification; reading synchronously captures 0). PTY providers (claude) discover from disk via `claude-command-discovery.ts` (`~/.claude/commands` + `enabledPlugins`, bare names, deduped). `AgentCommand.kind` tags `'command'` vs `'skill'`; `CoderPane`'s `slashGroups` splits them into icon'd groups. `SlashCommandPicker`'s `groups?` prop is opt-in.
|
||||
- **A new per-message coder field silently drops unless you update every mapper**: the HTTP read SELECT + `mapCoderMessageRow` (`apps/coder/src/routes/messages.ts`), **the WS `snapshot` SELECT (`apps/coder/src/routes/ws.ts`)** — it has its OWN column list and the client's `snapshot` handler `setMessages`-overwrites the HTTP load, so a field present in the HTTP route but absent here shows live yet vanishes on refresh — `CoderPane.tsx` (`RawCoderMessage`/`CoderMessage`/`mapCoderTimelineRow` + the live `message_complete` WS reducer), `CoderMessageWire` (`CoderMessageList.tsx`), and `api/types.ts`. The client `mapCoderTimelineRow` whitelists fields — easiest to forget. This bit `model` twice: the client chain (`v2.7.9`) and then the WS snapshot SELECT (`v2.7.11`) — the chip showed live but vanished on coder refresh until both were fixed.
|
||||
|
||||
## Orchestrator (v2.7.17)
|
||||
|
||||
- **In-app multi-agent conductor**: `services/flow-runner.ts` runs a flow by inserting each step as a `tasks` row (the existing dispatcher runs it) and advancing on a new `onTaskTerminal` dispatcher-deps hook; persisted in `flow_runs`/`flow_steps` (resumed at startup via `initResume`). The 22 conductor flow defs + Spine factory are re-homed under `src/conductor/`. Pure scheduler/resume helpers in `flow-runner-decisions.ts`. Full design: `openspec/changes/archived/orchestrator/`.
|
||||
- **Read-only is load-bearing — don't add a dispatch path that bypasses it.** Every step dispatches `agent='qwen', mode_id='plan'`; `dispatcher.ts` force-routes qwen+plan to the PTY `--approval-mode plan` gate and HARD-FAILS the task (never falls to write-capable native inference) when qwen is unavailable (`shouldFailOnMissingAgent`). `BOOCODE_TOOLS` gates BooChat's NATIVE inference tools only — it does NOT govern an external CLI agent (qwen/opencode bring their own write tools); read-only for a dispatched agent is the agent-layer mode (PTY `--approval-mode plan`; ACP `setSessionMode` is fail-OPEN by default, fail-CLOSED for `plan` via `READ_ONLY_MODE_IDS` in `acp-dispatch.ts`).
|
||||
|
||||
@@ -13,7 +13,7 @@ import type { WsFrame } from '@boocode/contracts/ws-frames';
|
||||
// v2.0.0 Phase 2C: write tools + adapter for BooChat ToolDef compatibility.
|
||||
import { WRITE_TOOLS } from './services/tools/index.js';
|
||||
import { adaptWriteTool } from './services/tools/adapter.js';
|
||||
import { setInferenceContext, clearInferenceContext } from './services/tools/inference_context.js';
|
||||
import { runWithInferenceContext } from './services/tools/inference_context.js';
|
||||
// Routes
|
||||
import { registerMessageRoutes } from './routes/messages.js';
|
||||
import { registerSkillRoutes } from './routes/skills.js';
|
||||
@@ -23,8 +23,8 @@ import { registerAgentSessionRoutes } from './routes/agent-sessions.js';
|
||||
import { registerTaskRoutes } from './routes/tasks.js';
|
||||
import { registerInboxRoutes } from './routes/inbox.js';
|
||||
import { registerStatsRoutes } from './routes/stats.js';
|
||||
import { registerArenaRoutes } from './routes/arena.js';
|
||||
import { registerRunsRoutes } from './routes/runs.js';
|
||||
import { registerArenaRoutes } from './routes/arena.js';
|
||||
import { registerProviderRoutes } from './routes/providers.js';
|
||||
import { registerWorktreeSafetyRoutes } from './routes/worktree-safety.js';
|
||||
import { registerLifecycleRoutes } from './routes/lifecycle.js';
|
||||
@@ -34,10 +34,13 @@ import { createDispatcher } from './services/dispatcher.js';
|
||||
// Orchestrator (Phase 2): DB-backed flow-runner; advances on the dispatcher's
|
||||
// onTaskTerminal hook.
|
||||
import { createFlowRunner } from './services/flow-runner.js';
|
||||
// Arena: DB-backed battle-runner; also advances on the onTaskTerminal hook.
|
||||
import { createBattleRunner, type DispatchContestantFn } from './services/arena-runner.js';
|
||||
import { createAnalyzer } from './services/arena-analyzer.js';
|
||||
import { agentPool } from './services/agent-pool.js';
|
||||
import { createOrphanWorktreeReaper } from './services/orphan-worktree-reaper.js';
|
||||
import { probeAgents } from './services/agent-probe.js';
|
||||
import { getProviderSnapshot, persistProbedModels } from './services/provider-snapshot.js';
|
||||
import { getProviderSnapshot, persistProbedModels, fetchLlamaSwapModels } from './services/provider-snapshot.js';
|
||||
import { setPermissionHooks } from './services/permission-waiter.js';
|
||||
import { publishAgentStatus } from './services/agent-status-publish.js';
|
||||
import { homedir } from 'node:os';
|
||||
@@ -171,22 +174,27 @@ async function main() {
|
||||
}
|
||||
);
|
||||
|
||||
// Wrap the inference runner to set/clear the write-tool context around each run.
|
||||
// The inference runner calls enqueue() which fires asynchronously — we hook
|
||||
// into the enqueue to set context before the run starts.
|
||||
// Wrap the inference runner to bind the write-tool context around each run.
|
||||
// enqueue() starts its async loop synchronously, so wrapping the call in
|
||||
// runWithInferenceContext propagates the per-run context (sql, sessionId, the
|
||||
// Plan/Ask/Bypass gate) through every awaited tool execution — and concurrent
|
||||
// runs (a user message racing a dispatcher-polled native task) each get their
|
||||
// own, instead of clobbering a shared global.
|
||||
const inferenceApi = {
|
||||
enqueue: (sessionId: string, chatId: string, assistantId: string, user: string) => {
|
||||
// Set the inference context so write tools can access sql + sessionId.
|
||||
// The context persists for the duration of the inference run. Since
|
||||
// BooCoder is single-user and runs one inference at a time per session,
|
||||
// this module-level state is safe.
|
||||
setInferenceContext({ sql, sessionId, taskId: null });
|
||||
inference.enqueue(sessionId, chatId, assistantId, user);
|
||||
enqueue: (
|
||||
sessionId: string,
|
||||
chatId: string,
|
||||
assistantId: string,
|
||||
user: string,
|
||||
permissionMode?: 'plan' | 'ask' | 'bypass',
|
||||
) => {
|
||||
runWithInferenceContext({ sql, sessionId, taskId: null, permissionMode }, () => {
|
||||
inference.enqueue(sessionId, chatId, assistantId, user);
|
||||
});
|
||||
},
|
||||
cancel: async (sessionId: string, chatId: string) => {
|
||||
const result = await inference.cancel(sessionId, chatId);
|
||||
clearInferenceContext();
|
||||
return result;
|
||||
// No context to clear — AsyncLocalStorage scopes it to each run's own chain.
|
||||
return inference.cancel(sessionId, chatId);
|
||||
},
|
||||
hasActive: (chatId: string) => inference.hasActive(chatId),
|
||||
};
|
||||
@@ -220,31 +228,119 @@ async function main() {
|
||||
|
||||
// Orchestrator (Phase 2): the flow-runner reacts to the dispatcher's
|
||||
// onTaskTerminal hook to advance flow_runs. Created before the dispatcher so its
|
||||
// terminal callback can be wired in. Its launch() is driven by the runs route
|
||||
// (a later phase); resume on startup is a later phase too.
|
||||
// terminal callback can be wired in.
|
||||
const flowRunner = createFlowRunner({ sql, broker, log: app.log, config });
|
||||
|
||||
// Phase 4: dispatcher — polls tasks table and runs inference. onTaskTerminal
|
||||
// notifies the flow-runner when a step's task settles (D-2).
|
||||
// Arena SEAM (a): build the local-model set from the live llama-swap model list.
|
||||
// Both bare IDs ('qwen3.6-35b') and prefixed IDs ('llama-swap/qwen3.6-35b') are
|
||||
// included so opencode-style prefixed contestants and native-style bare contestants
|
||||
// both classify correctly as local.
|
||||
const localModelsList = await fetchLlamaSwapModels(config).catch(() => []);
|
||||
const localModels = new Set([
|
||||
...localModelsList.map((m) => m.id),
|
||||
...localModelsList.map((m) => `llama-swap/${m.id}`),
|
||||
]);
|
||||
|
||||
// Arena dispatch function — Phase 4 SEAM (b).
|
||||
// Coding: insert a tasks row with agent=identity (null for native/boocode);
|
||||
// the dispatcher creates a worktree and runs the external agent (or native).
|
||||
// Q&A: pre-create a session with agent_id stamped to the persona slug so native
|
||||
// inference loads the persona's system_prompt + tools from AGENTS.md;
|
||||
// task.session_id is pre-set so runNativeInference reuses the session.
|
||||
const dispatchContestant: DispatchContestantFn = async ({
|
||||
projectId,
|
||||
prompt,
|
||||
identity,
|
||||
model,
|
||||
battleType,
|
||||
}) => {
|
||||
if (battleType === 'qa') {
|
||||
const sessionName = `Arena Q&A [${identity}]: ${prompt.slice(0, 30)}`;
|
||||
const [session] = await sql<{ id: string }[]>`
|
||||
INSERT INTO sessions (project_id, name, model, agent_id, status)
|
||||
VALUES (${projectId}, ${sessionName}, ${model}, ${identity}, 'open')
|
||||
RETURNING id
|
||||
`;
|
||||
const [task] = await sql<{ id: string }[]>`
|
||||
INSERT INTO tasks (project_id, input, model, session_id)
|
||||
VALUES (${projectId}, ${prompt}, ${model}, ${session!.id})
|
||||
RETURNING id
|
||||
`;
|
||||
return { taskId: task!.id, sessionId: session!.id };
|
||||
}
|
||||
// Coding: boocode = native inference (no external agent); any other identity
|
||||
// is an external agent name (claude, opencode, qwen, goose) that maps to
|
||||
// available_agents and gets its own per-task worktree via runExternalAgent.
|
||||
// Session is created lazily by the dispatcher, so sessionId is unknown here.
|
||||
const agentName = identity === 'boocode' ? null : identity;
|
||||
const [task] = await sql<{ id: string }[]>`
|
||||
INSERT INTO tasks (project_id, input, agent, model)
|
||||
VALUES (${projectId}, ${prompt}, ${agentName}, ${model})
|
||||
RETURNING id
|
||||
`;
|
||||
return { taskId: task!.id, sessionId: null };
|
||||
};
|
||||
|
||||
// Arena analyzer: two-stage digest→judge (v1). Pluggable seam — a v2 Han
|
||||
// Orchestrator flow can replace this without schema changes.
|
||||
const analyzer = createAnalyzer({
|
||||
sql,
|
||||
broker,
|
||||
log: app.log,
|
||||
config,
|
||||
localModels,
|
||||
});
|
||||
|
||||
// Arena battle-runner: notified on the same onTaskTerminal hook as the flow-runner.
|
||||
const battleRunner = createBattleRunner({
|
||||
sql,
|
||||
broker,
|
||||
log: app.log,
|
||||
dispatch: dispatchContestant,
|
||||
onBattleComplete: (battleId) => {
|
||||
void analyzer.analyze(battleId);
|
||||
},
|
||||
onCrossExamStart: ({ battleId, crossExamId, identity, model }) => {
|
||||
void analyzer.crossExamine(battleId, crossExamId, { identity, model });
|
||||
},
|
||||
localModels,
|
||||
});
|
||||
|
||||
// Compose onTaskTerminal: both flow-runner and battle-runner are notified.
|
||||
// Each ignores tasks it doesn't own (flow-runner checks flow_steps.task_id;
|
||||
// battle-runner checks contestants.task_id).
|
||||
const onTaskTerminal = (taskId: string, state: string): void => {
|
||||
flowRunner.handleTaskTerminal(taskId, state);
|
||||
battleRunner.handleTaskTerminal(taskId, state);
|
||||
};
|
||||
|
||||
// Phase 4: dispatcher — polls tasks table and runs inference. The composed
|
||||
// onTaskTerminal hook notifies both the flow-runner and the battle-runner when
|
||||
// any task settles.
|
||||
const dispatcher = createDispatcher({
|
||||
sql,
|
||||
inference: inferenceApi,
|
||||
broker,
|
||||
log: app.log,
|
||||
config,
|
||||
onTaskTerminal: flowRunner.handleTaskTerminal,
|
||||
onTaskTerminal,
|
||||
});
|
||||
dispatcher.start();
|
||||
|
||||
// Phase 5: re-advance any flow_runs that were 'running' when the service last
|
||||
// stopped (D-9). Runs AFTER dispatcher.start() so re-dispatched 'pending' tasks
|
||||
// are picked up by the dispatcher's startup poll.
|
||||
// Re-advance in-flight flow_runs and battles after a coder restart. Both run
|
||||
// AFTER dispatcher.start() so re-dispatched 'pending' tasks are picked up.
|
||||
void flowRunner.initResume().catch((err) => {
|
||||
app.log.error(
|
||||
{ err: err instanceof Error ? err.message : String(err) },
|
||||
'flow-runner: initResume failed',
|
||||
);
|
||||
});
|
||||
void battleRunner.initResume().catch((err) => {
|
||||
app.log.error(
|
||||
{ err: err instanceof Error ? err.message : String(err) },
|
||||
'arena: initResume failed',
|
||||
);
|
||||
});
|
||||
|
||||
// v2.6 Phase 3: configure + start the agent-pool lifecycle sweep (idle-TTL +
|
||||
// LRU-cap eviction of warm backends, plus each backend's proactive health probe)
|
||||
@@ -281,8 +377,8 @@ async function main() {
|
||||
registerTaskRoutes(app, sql, inferenceApi, dispatcher.cancelExternalTask);
|
||||
registerInboxRoutes(app, sql);
|
||||
registerStatsRoutes(app, sql);
|
||||
registerArenaRoutes(app, sql);
|
||||
registerRunsRoutes(app, sql, flowRunner, dispatcher.cancelExternalTask);
|
||||
registerArenaRoutes(app, sql, battleRunner, dispatcher.cancelExternalTask, config);
|
||||
registerProviderRoutes(app, sql, config);
|
||||
registerWorktreeSafetyRoutes(app, sql);
|
||||
registerLifecycleRoutes(app, sql);
|
||||
|
||||
@@ -1,136 +1,412 @@
|
||||
/**
|
||||
* v2.0.5: Arena routes — competitive dispatch of the same task to multiple agents.
|
||||
* Arena routes — HTTP surface for the Battle UI.
|
||||
*
|
||||
* POST /api/arena — create an arena with 2-5 contestants
|
||||
* GET /api/arena/:id — get all tasks in an arena
|
||||
* POST /api/arena/:id/select/:task_id — mark a task as the arena winner
|
||||
* POST /api/battles — launch a battle
|
||||
* GET /api/battles?project_id= — list battles for a project
|
||||
* GET /api/battles/:id — one battle + contestants + cross-exams
|
||||
* POST /api/battles/:id/stop — cancel a running battle
|
||||
* POST /api/battles/:id/analyze — trigger analysis (Phase 5 fills the logic)
|
||||
* POST /api/battles/:id/cross-examine — start a cross-examination (Phase 5 fills the logic)
|
||||
*
|
||||
* Mirrors the shape of runs.ts (Orchestrator routes). Battle creation delegates to
|
||||
* the battle-runner; cancellation calls cancelBattle then aborts in-flight tasks
|
||||
* via the dispatcher's cancelExternalTask.
|
||||
*/
|
||||
import type { FastifyInstance } from 'fastify';
|
||||
import { z } from 'zod';
|
||||
import { readFile } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import type { Sql } from '../db.js';
|
||||
import type { Config } from '../config.js';
|
||||
import type { BattleRunner } from '../services/arena-runner.js';
|
||||
import type { ExternalCancelFn } from './tasks.js';
|
||||
import { arenaModelCall } from '../services/arena-model-call.js';
|
||||
|
||||
const ContestantSchema = z.object({
|
||||
agent: z.string().max(100).optional(),
|
||||
model: z.string().max(200).optional(),
|
||||
mode_id: z.string().max(200).optional(),
|
||||
thinking_option_id: z.string().max(200).optional(),
|
||||
// ─── Validation schemas ───────────────────────────────────────────────────────
|
||||
|
||||
const UuidParam = z.string().uuid();
|
||||
|
||||
const ContestantInput = z.object({
|
||||
identity: z.string().min(1).max(200),
|
||||
model: z.string().min(1).max(200),
|
||||
});
|
||||
|
||||
const CreateArenaBody = z.object({
|
||||
const CreateBattleBody = z.object({
|
||||
project_id: z.string().uuid(),
|
||||
input: z.string().min(1).max(64_000),
|
||||
contestants: z.array(ContestantSchema).min(2).max(5),
|
||||
battle_type: z.enum(['coding', 'qa']),
|
||||
prompt: z.string().min(1).max(64_000),
|
||||
contestants: z
|
||||
.array(ContestantInput)
|
||||
.min(2, 'at least 2 contestants required')
|
||||
.max(6, 'at most 6 contestants allowed'),
|
||||
});
|
||||
|
||||
interface TaskRow {
|
||||
id: string;
|
||||
agent: string | null;
|
||||
model: string | null;
|
||||
mode_id: string | null;
|
||||
thinking_option_id: string | null;
|
||||
state: string;
|
||||
}
|
||||
const ListBattlesQuery = z.object({
|
||||
project_id: z.string().uuid(),
|
||||
});
|
||||
|
||||
export function registerArenaRoutes(app: FastifyInstance, sql: Sql): void {
|
||||
// POST /api/arena — create a new arena
|
||||
app.post('/api/arena', async (req, reply) => {
|
||||
const parsed = CreateArenaBody.safeParse(req.body);
|
||||
const CrossExamineBody = z.object({
|
||||
identity: z.string().min(1).max(200),
|
||||
model: z.string().min(1).max(200),
|
||||
});
|
||||
|
||||
const SetWinnerBody = z.object({
|
||||
winner_contestant_id: z.string().uuid().nullable(),
|
||||
});
|
||||
|
||||
// ─── Route registration ───────────────────────────────────────────────────────
|
||||
|
||||
const GeneratePromptBody = z.object({
|
||||
description: z.string().min(1).max(2_000),
|
||||
});
|
||||
|
||||
export function registerArenaRoutes(
|
||||
app: FastifyInstance,
|
||||
sql: Sql,
|
||||
battleRunner: BattleRunner,
|
||||
cancelExternal: ExternalCancelFn,
|
||||
config: Config,
|
||||
): void {
|
||||
|
||||
// POST /api/battles/generate-prompt — draft a fuller battle prompt from a
|
||||
// short description using the default BooChat model. One-shot, non-streaming.
|
||||
// Must be registered BEFORE /api/battles/:id so the literal 'generate-prompt'
|
||||
// path is not mistaken for a UUID param.
|
||||
app.post('/api/battles/generate-prompt', async (req, reply) => {
|
||||
const parsed = GeneratePromptBody.safeParse(req.body);
|
||||
if (!parsed.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid body', details: parsed.error.flatten() };
|
||||
}
|
||||
|
||||
const { project_id, input, contestants } = parsed.data;
|
||||
const arenaId = crypto.randomUUID();
|
||||
const { description } = parsed.data;
|
||||
|
||||
const tasks: TaskRow[] = [];
|
||||
for (const contestant of contestants) {
|
||||
const [task] = await sql<TaskRow[]>`
|
||||
INSERT INTO tasks (project_id, input, agent, model, mode_id, thinking_option_id, arena_id)
|
||||
VALUES (
|
||||
${project_id},
|
||||
${input},
|
||||
${contestant.agent ?? null},
|
||||
${contestant.model ?? null},
|
||||
${contestant.mode_id ?? null},
|
||||
${contestant.thinking_option_id ?? null},
|
||||
${arenaId}
|
||||
)
|
||||
RETURNING id, agent, model, mode_id, thinking_option_id, state
|
||||
`;
|
||||
tasks.push(task!);
|
||||
try {
|
||||
const prompt = await arenaModelCall({
|
||||
config,
|
||||
model: config.DEFAULT_MODEL,
|
||||
system: [
|
||||
'You are a battle-prompt writer for an AI Arena.',
|
||||
'The user gives you a short description of a coding or Q&A challenge.',
|
||||
'Expand it into a clear, self-contained prompt (2–6 sentences) that any AI model can act on.',
|
||||
'Include specific acceptance criteria where helpful.',
|
||||
'Output ONLY the prompt — no preamble, no labels, no meta-commentary.',
|
||||
].join(' '),
|
||||
user: description,
|
||||
maxTokens: 400,
|
||||
temperature: 0.6,
|
||||
});
|
||||
return { prompt };
|
||||
} catch (err) {
|
||||
app.log.warn(
|
||||
{ err: err instanceof Error ? err.message : String(err) },
|
||||
'arena generate-prompt: model call failed',
|
||||
);
|
||||
reply.code(502);
|
||||
return { error: 'model call failed' };
|
||||
}
|
||||
});
|
||||
|
||||
// POST /api/battles — launch a battle
|
||||
app.post('/api/battles', async (req, reply) => {
|
||||
const parsed = CreateBattleBody.safeParse(req.body);
|
||||
if (!parsed.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid body', details: parsed.error.flatten() };
|
||||
}
|
||||
|
||||
const { project_id, battle_type, prompt, contestants } = parsed.data;
|
||||
|
||||
// Reject duplicate (identity, model) pairs up front — the schema UNIQUE
|
||||
// constraint would catch it too, but an early 422 is friendlier.
|
||||
const seen = new Set<string>();
|
||||
for (const c of contestants) {
|
||||
const key = `${c.identity}::${c.model}`;
|
||||
if (seen.has(key)) {
|
||||
reply.code(422);
|
||||
return {
|
||||
error: 'duplicate_contestant',
|
||||
message: `duplicate contestant: identity="${c.identity}" model="${c.model}"`,
|
||||
};
|
||||
}
|
||||
seen.add(key);
|
||||
}
|
||||
|
||||
// Verify project exists
|
||||
const [proj] = await sql<{ id: string }[]>`SELECT id FROM projects WHERE id = ${project_id}`;
|
||||
if (!proj) {
|
||||
reply.code(404);
|
||||
return { error: 'project not found' };
|
||||
}
|
||||
|
||||
const { battleId } = await battleRunner.startBattle({
|
||||
projectId: project_id,
|
||||
battleType: battle_type,
|
||||
prompt,
|
||||
contestants,
|
||||
});
|
||||
|
||||
reply.code(201);
|
||||
return {
|
||||
arena_id: arenaId,
|
||||
tasks: tasks.map((t) => ({
|
||||
id: t.id,
|
||||
agent: t.agent,
|
||||
model: t.model,
|
||||
mode_id: t.mode_id,
|
||||
thinking_option_id: t.thinking_option_id,
|
||||
state: t.state,
|
||||
})),
|
||||
};
|
||||
return { battle_id: battleId };
|
||||
});
|
||||
|
||||
// GET /api/arena/:arena_id — list all tasks in an arena
|
||||
app.get<{ Params: { arena_id: string } }>('/api/arena/:arena_id', async (req, reply) => {
|
||||
const { arena_id } = req.params;
|
||||
|
||||
// Validate UUID format
|
||||
const uuidRegex = /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
|
||||
if (!uuidRegex.test(arena_id)) {
|
||||
// GET /api/battles?project_id= — list battles, most-recent-first
|
||||
app.get('/api/battles', async (req, reply) => {
|
||||
const parsed = ListBattlesQuery.safeParse(req.query);
|
||||
if (!parsed.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid arena_id format' };
|
||||
return { error: 'invalid query', details: parsed.error.flatten() };
|
||||
}
|
||||
|
||||
const tasks = await sql`
|
||||
SELECT id, project_id, state, input, output_summary, agent, model, mode_id, thinking_option_id, execution_path, session_id, started_at, ended_at, created_at, arena_id
|
||||
FROM tasks
|
||||
WHERE arena_id = ${arena_id}
|
||||
ORDER BY created_at
|
||||
const battles = await sql`
|
||||
SELECT id, project_id, battle_type, prompt, status,
|
||||
winner_contestant_id, results_path, error,
|
||||
created_at, updated_at
|
||||
FROM battles
|
||||
WHERE project_id = ${parsed.data.project_id}
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 100
|
||||
`;
|
||||
|
||||
if (tasks.length === 0) {
|
||||
reply.code(404);
|
||||
return { error: 'arena not found' };
|
||||
}
|
||||
|
||||
return { arena_id, tasks };
|
||||
return { battles };
|
||||
});
|
||||
|
||||
// POST /api/arena/:arena_id/select/:task_id — mark the winner
|
||||
app.post<{ Params: { arena_id: string; task_id: string } }>(
|
||||
'/api/arena/:arena_id/select/:task_id',
|
||||
async (req, reply) => {
|
||||
const { arena_id, task_id } = req.params;
|
||||
|
||||
// Verify the task belongs to this arena
|
||||
const rows = await sql<{ id: string; state: string; arena_id: string | null }[]>`
|
||||
SELECT id, state, arena_id FROM tasks WHERE id = ${task_id}
|
||||
`;
|
||||
|
||||
if (rows.length === 0) {
|
||||
reply.code(404);
|
||||
return { error: 'task not found' };
|
||||
}
|
||||
|
||||
const task = rows[0]!;
|
||||
if (task.arena_id !== arena_id) {
|
||||
reply.code(409);
|
||||
return { error: 'task does not belong to this arena' };
|
||||
}
|
||||
|
||||
// Mark as selected via output_summary prefix (lightweight — no schema change)
|
||||
await sql`
|
||||
UPDATE tasks
|
||||
SET output_summary = COALESCE('[SELECTED] ' || output_summary, '[SELECTED]')
|
||||
WHERE id = ${task_id}
|
||||
`;
|
||||
|
||||
return { selected: true, task_id, arena_id };
|
||||
// GET /api/battles/:id — one battle + its contestants + cross-examinations
|
||||
app.get<{ Params: { id: string } }>('/api/battles/:id', async (req, reply) => {
|
||||
const parsedId = UuidParam.safeParse(req.params.id);
|
||||
if (!parsedId.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid id' };
|
||||
}
|
||||
);
|
||||
const id = parsedId.data;
|
||||
|
||||
const [battle] = await sql<{
|
||||
id: string;
|
||||
project_id: string;
|
||||
battle_type: string;
|
||||
prompt: string;
|
||||
status: string;
|
||||
winner_contestant_id: string | null;
|
||||
results_path: string | null;
|
||||
error: string | null;
|
||||
created_at: unknown;
|
||||
updated_at: unknown;
|
||||
}[]>`
|
||||
SELECT id, project_id, battle_type, prompt, status,
|
||||
winner_contestant_id, results_path, error,
|
||||
created_at, updated_at
|
||||
FROM battles WHERE id = ${id}
|
||||
`;
|
||||
|
||||
if (!battle) {
|
||||
reply.code(404);
|
||||
return { error: 'battle not found' };
|
||||
}
|
||||
|
||||
const contestants = await sql`
|
||||
SELECT id, battle_id, identity, model, lane, task_id, worktree_id,
|
||||
status, duration_ms, tokens_per_sec, cost_tokens, result_path, error,
|
||||
created_at, updated_at
|
||||
FROM contestants
|
||||
WHERE battle_id = ${id}
|
||||
ORDER BY created_at ASC
|
||||
`;
|
||||
|
||||
const crossExaminations = await sql`
|
||||
SELECT id, battle_id, identity, model, verdict, created_at
|
||||
FROM cross_examinations
|
||||
WHERE battle_id = ${id}
|
||||
ORDER BY created_at ASC
|
||||
`;
|
||||
|
||||
return { battle, contestants, cross_examinations: crossExaminations };
|
||||
});
|
||||
|
||||
// POST /api/battles/:id/stop — cancel a running battle
|
||||
app.post<{ Params: { id: string } }>('/api/battles/:id/stop', async (req, reply) => {
|
||||
const parsedId = UuidParam.safeParse(req.params.id);
|
||||
if (!parsedId.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid id' };
|
||||
}
|
||||
const id = parsedId.data;
|
||||
|
||||
const [row] = await sql<{ id: string; status: string }[]>`
|
||||
SELECT id, status FROM battles WHERE id = ${id}
|
||||
`;
|
||||
if (!row) {
|
||||
reply.code(404);
|
||||
return { error: 'battle not found' };
|
||||
}
|
||||
if (row.status !== 'running') {
|
||||
reply.code(409);
|
||||
return { error: `cannot stop battle in status '${row.status}'` };
|
||||
}
|
||||
|
||||
const { cancelled, taskIds } = await battleRunner.cancelBattle(id);
|
||||
if (!cancelled) {
|
||||
reply.code(409);
|
||||
return { error: 'battle is no longer running' };
|
||||
}
|
||||
|
||||
// Abort any in-flight dispatcher tasks (cloud contestants running externally).
|
||||
for (const taskId of taskIds) {
|
||||
cancelExternal(taskId);
|
||||
}
|
||||
|
||||
return { cancelled: true };
|
||||
});
|
||||
|
||||
// GET /api/battles/:id/analysis — read analysis.md from the battle's results_path
|
||||
app.get<{ Params: { id: string } }>('/api/battles/:id/analysis', async (req, reply) => {
|
||||
const parsedId = UuidParam.safeParse(req.params.id);
|
||||
if (!parsedId.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid id' };
|
||||
}
|
||||
const id = parsedId.data;
|
||||
|
||||
const [row] = await sql<{ results_path: string | null }[]>`
|
||||
SELECT results_path FROM battles WHERE id = ${id}
|
||||
`;
|
||||
if (!row) {
|
||||
reply.code(404);
|
||||
return { error: 'battle not found' };
|
||||
}
|
||||
if (!row.results_path) {
|
||||
reply.code(404);
|
||||
return { error: 'analysis not ready' };
|
||||
}
|
||||
|
||||
try {
|
||||
const text = await readFile(join(row.results_path, 'analysis.md'), 'utf8');
|
||||
return { text };
|
||||
} catch {
|
||||
reply.code(404);
|
||||
return { error: 'analysis not ready' };
|
||||
}
|
||||
});
|
||||
|
||||
// POST /api/battles/:id/analyze — trigger or re-trigger analysis
|
||||
app.post<{ Params: { id: string } }>('/api/battles/:id/analyze', async (req, reply) => {
|
||||
const parsedId = UuidParam.safeParse(req.params.id);
|
||||
if (!parsedId.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid id' };
|
||||
}
|
||||
const id = parsedId.data;
|
||||
|
||||
const [row] = await sql<{ id: string; status: string }[]>`
|
||||
SELECT id, status FROM battles WHERE id = ${id}
|
||||
`;
|
||||
if (!row) {
|
||||
reply.code(404);
|
||||
return { error: 'battle not found' };
|
||||
}
|
||||
if (row.status === 'running') {
|
||||
reply.code(409);
|
||||
return { error: 'battle is still running — wait for all contestants to finish' };
|
||||
}
|
||||
|
||||
const result = await battleRunner.triggerAnalysis(id);
|
||||
if (!result.triggered) {
|
||||
reply.code(404);
|
||||
return { error: 'battle not found' };
|
||||
}
|
||||
|
||||
reply.code(202);
|
||||
return { triggered: true };
|
||||
});
|
||||
|
||||
// PATCH /api/battles/:id/winner — manually set or clear the winner.
|
||||
// Validates the contestant belongs to the battle; publishes battle_updated so
|
||||
// the pane badge reflects the override immediately. Human is authoritative.
|
||||
app.patch<{ Params: { id: string } }>('/api/battles/:id/winner', async (req, reply) => {
|
||||
const parsedId = UuidParam.safeParse(req.params.id);
|
||||
if (!parsedId.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid id' };
|
||||
}
|
||||
|
||||
const parsed = SetWinnerBody.safeParse(req.body);
|
||||
if (!parsed.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid body', details: parsed.error.flatten() };
|
||||
}
|
||||
|
||||
const result = await battleRunner.setWinner(parsedId.data, parsed.data.winner_contestant_id);
|
||||
if (!result.ok) {
|
||||
if (result.notFound) { reply.code(404); return { error: 'battle not found' }; }
|
||||
if (result.invalidContestant) { reply.code(422); return { error: 'contestant not found in this battle' }; }
|
||||
reply.code(500); return { error: 'unknown error' };
|
||||
}
|
||||
return { ok: true };
|
||||
});
|
||||
|
||||
// GET /api/battles/:id/contestants/:cid/diff — read the diff.patch for a coding contestant.
|
||||
app.get<{ Params: { id: string; cid: string } }>('/api/battles/:id/contestants/:cid/diff', async (req, reply) => {
|
||||
const parsedId = UuidParam.safeParse(req.params.id);
|
||||
const parsedCid = UuidParam.safeParse(req.params.cid);
|
||||
if (!parsedId.success || !parsedCid.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid id' };
|
||||
}
|
||||
|
||||
const [contestant] = await sql<{ result_path: string | null }[]>`
|
||||
SELECT result_path FROM contestants
|
||||
WHERE id = ${parsedCid.data} AND battle_id = ${parsedId.data}
|
||||
`;
|
||||
if (!contestant) {
|
||||
reply.code(404);
|
||||
return { error: 'contestant not found' };
|
||||
}
|
||||
if (!contestant.result_path) {
|
||||
reply.code(404);
|
||||
return { error: 'diff not available' };
|
||||
}
|
||||
|
||||
try {
|
||||
const text = await readFile(join(contestant.result_path, 'diff.patch'), 'utf8');
|
||||
return { diff: text };
|
||||
} catch {
|
||||
reply.code(404);
|
||||
return { error: 'diff not available' };
|
||||
}
|
||||
});
|
||||
|
||||
// POST /api/battles/:id/cross-examine — start a cross-examination
|
||||
app.post<{ Params: { id: string } }>('/api/battles/:id/cross-examine', async (req, reply) => {
|
||||
const parsedId = UuidParam.safeParse(req.params.id);
|
||||
if (!parsedId.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid id' };
|
||||
}
|
||||
const id = parsedId.data;
|
||||
|
||||
const parsed = CrossExamineBody.safeParse(req.body);
|
||||
if (!parsed.success) {
|
||||
reply.code(400);
|
||||
return { error: 'invalid body', details: parsed.error.flatten() };
|
||||
}
|
||||
|
||||
const [row] = await sql<{ id: string; status: string }[]>`
|
||||
SELECT id, status FROM battles WHERE id = ${id}
|
||||
`;
|
||||
if (!row) {
|
||||
reply.code(404);
|
||||
return { error: 'battle not found' };
|
||||
}
|
||||
if (row.status === 'running') {
|
||||
reply.code(409);
|
||||
return { error: 'battle is still running — cross-examine after all contestants finish' };
|
||||
}
|
||||
|
||||
const { crossExamId } = await battleRunner.startCrossExam(id, {
|
||||
identity: parsed.data.identity,
|
||||
model: parsed.data.model,
|
||||
});
|
||||
|
||||
reply.code(202);
|
||||
return { cross_exam_id: crossExamId };
|
||||
});
|
||||
}
|
||||
|
||||
@@ -4,6 +4,7 @@ import type { Sql } from '../db.js';
|
||||
import type { Broker } from '@boocode/server/broker';
|
||||
import type { WsFrame } from '@boocode/contracts/ws-frames';
|
||||
import { resolveChatId } from './chat-resolve.js';
|
||||
import { asPermissionMode } from '../services/tools/types.js';
|
||||
|
||||
const AnswerUserInputBody = z.object({
|
||||
tool_call_id: z.string().min(1),
|
||||
@@ -43,7 +44,13 @@ const SendBody = z.object({
|
||||
});
|
||||
|
||||
interface InferenceApi {
|
||||
enqueue: (sessionId: string, chatId: string, assistantId: string, user: string) => void;
|
||||
enqueue: (
|
||||
sessionId: string,
|
||||
chatId: string,
|
||||
assistantId: string,
|
||||
user: string,
|
||||
permissionMode?: 'plan' | 'ask' | 'bypass',
|
||||
) => void;
|
||||
cancel: (sessionId: string, chatId: string) => Promise<boolean>;
|
||||
hasActive: (chatId: string) => boolean;
|
||||
}
|
||||
@@ -245,7 +252,16 @@ export function registerMessageRoutes(
|
||||
RETURNING id
|
||||
`;
|
||||
|
||||
inference.enqueue(sessionId, chatId, assistantMsg!.id, 'default');
|
||||
// Native BooCode permission gate (plan/ask/bypass) — threaded into the
|
||||
// write-tool context so create/edit/delete and apply_pending honor it.
|
||||
// Plan = read-only, Ask = stage to the queue (agent can't self-apply),
|
||||
// Bypass = apply each write immediately. Other mode ids (e.g. an external
|
||||
// fallback's native mode) leave the gate undefined = legacy behavior.
|
||||
req.log.info(
|
||||
{ provider, mode_id, permissionMode: asPermissionMode(mode_id), chatId },
|
||||
'native enqueue — permission gate',
|
||||
);
|
||||
inference.enqueue(sessionId, chatId, assistantMsg!.id, 'default', asPermissionMode(mode_id));
|
||||
|
||||
reply.code(202);
|
||||
return { user_message_id: userMsg!.id, assistant_message_id: assistantMsg!.id };
|
||||
|
||||
@@ -54,9 +54,6 @@ DO $$ BEGIN
|
||||
END IF;
|
||||
END $$;
|
||||
|
||||
-- v2.0.5: arena support — group tasks into competitive arenas.
|
||||
ALTER TABLE tasks ADD COLUMN IF NOT EXISTS arena_id UUID;
|
||||
|
||||
-- Human inbox: tasks needing attention
|
||||
CREATE OR REPLACE VIEW human_inbox AS
|
||||
SELECT * FROM tasks WHERE state IN ('blocked', 'failed');
|
||||
@@ -81,6 +78,7 @@ ALTER TABLE tasks ADD COLUMN IF NOT EXISTS thinking_option_id TEXT;
|
||||
DROP VIEW IF EXISTS human_inbox;
|
||||
ALTER TABLE tasks DROP COLUMN IF EXISTS feature_values;
|
||||
ALTER TABLE tasks DROP COLUMN IF EXISTS worktree_path;
|
||||
ALTER TABLE tasks DROP COLUMN IF EXISTS arena_id;
|
||||
CREATE OR REPLACE VIEW human_inbox AS
|
||||
SELECT * FROM tasks WHERE state IN ('blocked', 'failed');
|
||||
|
||||
@@ -157,7 +155,7 @@ CREATE UNIQUE INDEX IF NOT EXISTS worktrees_active_path_uidx ON worktrees(path)
|
||||
DROP TABLE IF EXISTS session_worktrees;
|
||||
|
||||
-- Dispatch hint: which chat (tab) a task belongs to. The coder message route and
|
||||
-- skills route set it from the frontend tab; session-less creators (arena, MCP,
|
||||
-- skills route set it from the frontend tab; session-less creators (MCP,
|
||||
-- new_task, generic /api/tasks) leave it NULL and the dispatcher creates a chat.
|
||||
ALTER TABLE tasks ADD COLUMN IF NOT EXISTS chat_id UUID REFERENCES chats(id) ON DELETE SET NULL;
|
||||
|
||||
@@ -271,7 +269,7 @@ ALTER TABLE agent_sessions ADD CONSTRAINT agent_sessions_backend_chk
|
||||
CHECK (backend IN ('opencode_server', 'acp_warm', 'claude_sdk'));
|
||||
|
||||
-- LISTEN/NOTIFY fast path: every tasks INSERT (from any call site — routes,
|
||||
-- new_task tool, arena, MCP server) fires pg_notify('tasks_new') in the same
|
||||
-- new_task tool, MCP server) fires pg_notify('tasks_new') in the same
|
||||
-- transaction, so the dispatcher reacts immediately instead of waiting for the
|
||||
-- fallback poll. Postgres holds the notification until COMMIT, so the listener
|
||||
-- always sees the committed row. A trigger covers all insert paths with no
|
||||
@@ -357,3 +355,71 @@ DO $$ BEGIN
|
||||
CHECK (status IN ('pending', 'running', 'completed', 'failed', 'skipped', 'cancelled'));
|
||||
END IF;
|
||||
END $$;
|
||||
|
||||
-- Arena: battles + contestants + cross_examinations.
|
||||
-- project_id carries no FK (matches tasks.project_id + flow_runs.project_id convention).
|
||||
-- winner_contestant_id FK is deferred (forward reference): added via guarded ALTER below.
|
||||
CREATE TABLE IF NOT EXISTS battles (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
project_id UUID NOT NULL,
|
||||
battle_type TEXT NOT NULL,
|
||||
prompt TEXT NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'pending',
|
||||
winner_contestant_id UUID,
|
||||
results_path TEXT,
|
||||
error TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
|
||||
CONSTRAINT battles_type_chk CHECK (battle_type IN ('coding', 'qa')),
|
||||
CONSTRAINT battles_status_chk CHECK (status IN ('pending', 'running', 'completed', 'failed', 'cancelled'))
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS contestants (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
battle_id UUID NOT NULL REFERENCES battles(id) ON DELETE CASCADE,
|
||||
identity TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
lane TEXT NOT NULL,
|
||||
task_id UUID REFERENCES tasks(id) ON DELETE SET NULL,
|
||||
worktree_id UUID REFERENCES worktrees(id) ON DELETE SET NULL,
|
||||
status TEXT NOT NULL DEFAULT 'queued',
|
||||
duration_ms INTEGER,
|
||||
tokens_per_sec DOUBLE PRECISION,
|
||||
cost_tokens INTEGER,
|
||||
result_path TEXT,
|
||||
error TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
|
||||
CONSTRAINT contestants_lane_chk CHECK (lane IN ('local', 'cloud')),
|
||||
CONSTRAINT contestants_status_chk CHECK (status IN ('queued', 'running', 'done', 'error')),
|
||||
UNIQUE (battle_id, identity, model)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS cross_examinations (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
battle_id UUID NOT NULL REFERENCES battles(id) ON DELETE CASCADE,
|
||||
identity TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
verdict TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
|
||||
);
|
||||
|
||||
-- Add the winner FK now that contestants exists.
|
||||
DO $$ BEGIN
|
||||
IF NOT EXISTS (SELECT 1 FROM pg_constraint WHERE conname = 'battles_winner_contestant_id_fkey') THEN
|
||||
ALTER TABLE battles ADD CONSTRAINT battles_winner_contestant_id_fkey
|
||||
FOREIGN KEY (winner_contestant_id) REFERENCES contestants(id) ON DELETE SET NULL;
|
||||
END IF;
|
||||
END $$;
|
||||
|
||||
-- battles query (GET /api/battles?project_id=).
|
||||
CREATE INDEX IF NOT EXISTS battles_project_created_idx ON battles(project_id, created_at DESC);
|
||||
|
||||
-- Lane-scheduler advance scans (contestants WHERE battle_id = ? AND status = ?).
|
||||
CREATE INDEX IF NOT EXISTS contestants_battle_status_idx ON contestants(battle_id, status);
|
||||
|
||||
-- onTaskTerminal callback: look up the contestant owning a completed task.
|
||||
CREATE INDEX IF NOT EXISTS contestants_task_id_idx ON contestants(task_id);
|
||||
|
||||
-- Cross-examination listing per battle.
|
||||
CREATE INDEX IF NOT EXISTS cross_examinations_battle_idx ON cross_examinations(battle_id);
|
||||
|
||||
254
apps/coder/src/services/__tests__/arena-analyzer-helpers.test.ts
Normal file
254
apps/coder/src/services/__tests__/arena-analyzer-helpers.test.ts
Normal file
@@ -0,0 +1,254 @@
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import {
|
||||
buildDigestPrompt,
|
||||
buildJudgePrompt,
|
||||
buildCrossExamPrompt,
|
||||
extractWinner,
|
||||
shouldNameWinner,
|
||||
type ContestantDigest,
|
||||
type ContestantDigestInput,
|
||||
} from '../arena-analyzer-helpers.js';
|
||||
|
||||
// ─── shouldNameWinner ─────────────────────────────────────────────────────────
|
||||
|
||||
describe('shouldNameWinner', () => {
|
||||
it('returns false with 0 succeeded contestants', () => {
|
||||
expect(shouldNameWinner(0)).toBe(false);
|
||||
});
|
||||
|
||||
it('returns false with exactly 1 succeeded contestant', () => {
|
||||
expect(shouldNameWinner(1)).toBe(false);
|
||||
});
|
||||
|
||||
it('returns true with exactly 2 succeeded contestants', () => {
|
||||
expect(shouldNameWinner(2)).toBe(true);
|
||||
});
|
||||
|
||||
it('returns true with more than 2 succeeded contestants', () => {
|
||||
expect(shouldNameWinner(3)).toBe(true);
|
||||
expect(shouldNameWinner(6)).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── extractWinner ────────────────────────────────────────────────────────────
|
||||
|
||||
describe('extractWinner', () => {
|
||||
it('extracts identity and model from a WINNER: line', () => {
|
||||
const output = 'Some analysis\n\nWINNER: claude/opus-4-5\n\nMore text.';
|
||||
expect(extractWinner(output)).toEqual({ identity: 'claude', model: 'opus-4-5' });
|
||||
});
|
||||
|
||||
it('is case-insensitive for the WINNER keyword', () => {
|
||||
expect(extractWinner('winner: boocode/qwen3.6-35b')).toEqual({
|
||||
identity: 'boocode',
|
||||
model: 'qwen3.6-35b',
|
||||
});
|
||||
expect(extractWinner('Winner: opencode/some-model')).toEqual({
|
||||
identity: 'opencode',
|
||||
model: 'some-model',
|
||||
});
|
||||
});
|
||||
|
||||
it('returns null when NO_WINNER is declared', () => {
|
||||
expect(extractWinner('WINNER: NO_WINNER')).toBeNull();
|
||||
expect(extractWinner('winner: no_winner')).toBeNull();
|
||||
});
|
||||
|
||||
it('returns null when no WINNER line is present', () => {
|
||||
expect(extractWinner('Just some analysis text with no verdict.')).toBeNull();
|
||||
expect(extractWinner('')).toBeNull();
|
||||
});
|
||||
|
||||
it('returns null when the WINNER line has no slash separator', () => {
|
||||
expect(extractWinner('WINNER: justidentity')).toBeNull();
|
||||
});
|
||||
|
||||
it('returns null when the WINNER line is empty after the colon', () => {
|
||||
expect(extractWinner('WINNER:')).toBeNull();
|
||||
expect(extractWinner('WINNER: ')).toBeNull();
|
||||
});
|
||||
|
||||
it('handles leading and trailing whitespace around the slash parts', () => {
|
||||
const result = extractWinner('WINNER: claude / opus-4-5 ');
|
||||
expect(result).toEqual({ identity: 'claude', model: 'opus-4-5' });
|
||||
});
|
||||
|
||||
it('picks the first WINNER line when multiple are present', () => {
|
||||
const output = 'WINNER: claude/opus-4-5\nWINNER: opencode/other-model';
|
||||
expect(extractWinner(output)).toEqual({ identity: 'claude', model: 'opus-4-5' });
|
||||
});
|
||||
|
||||
it('handles model names that contain slashes by splitting at the first slash only', () => {
|
||||
// edge case: model name with a slash — should still split at first slash
|
||||
// identity = 'native', model = 'llama-swap/qwen3.6'
|
||||
const result = extractWinner('WINNER: native/llama-swap/qwen3.6');
|
||||
expect(result).toEqual({ identity: 'native', model: 'llama-swap/qwen3.6' });
|
||||
});
|
||||
});
|
||||
|
||||
// ─── buildDigestPrompt ────────────────────────────────────────────────────────
|
||||
|
||||
describe('buildDigestPrompt', () => {
|
||||
const base: ContestantDigestInput = {
|
||||
identity: 'claude',
|
||||
model: 'opus-4-5',
|
||||
resultMd: '# Output\n\nSome result content.',
|
||||
benchmarkLine: '12000ms',
|
||||
};
|
||||
|
||||
it('returns an object with non-empty system and user strings', () => {
|
||||
const { system, user } = buildDigestPrompt(base);
|
||||
expect(system.length).toBeGreaterThan(0);
|
||||
expect(user.length).toBeGreaterThan(0);
|
||||
});
|
||||
|
||||
it('includes the contestant identity and model in the user prompt', () => {
|
||||
const { user } = buildDigestPrompt(base);
|
||||
expect(user).toContain('claude');
|
||||
expect(user).toContain('opus-4-5');
|
||||
});
|
||||
|
||||
it('includes the benchmark line in the user prompt', () => {
|
||||
const { user } = buildDigestPrompt(base);
|
||||
expect(user).toContain('12000ms');
|
||||
});
|
||||
|
||||
it('includes the result.md content in the user prompt', () => {
|
||||
const { user } = buildDigestPrompt(base);
|
||||
expect(user).toContain('Some result content.');
|
||||
});
|
||||
|
||||
it('includes the diff.patch when provided', () => {
|
||||
const input: ContestantDigestInput = { ...base, diffPatch: '--- a/foo.ts\n+++ b/foo.ts\n+added' };
|
||||
const { user } = buildDigestPrompt(input);
|
||||
expect(user).toContain('added');
|
||||
expect(user).toContain('```diff');
|
||||
});
|
||||
|
||||
it('omits the diff section when diffPatch is undefined', () => {
|
||||
const { user } = buildDigestPrompt(base);
|
||||
expect(user).not.toContain('```diff');
|
||||
});
|
||||
|
||||
it('truncates resultMd longer than 8000 characters', () => {
|
||||
const longResult = 'x'.repeat(10_000);
|
||||
const { user } = buildDigestPrompt({ ...base, resultMd: longResult });
|
||||
// The truncated content must not exceed 8000 chars in the sliced section.
|
||||
// We just check the total user string doesn't balloon unreasonably.
|
||||
expect(user.length).toBeLessThan(15_000);
|
||||
});
|
||||
|
||||
it('truncates diffPatch longer than 5000 characters', () => {
|
||||
const longDiff = '+' + 'x'.repeat(10_000);
|
||||
const { user } = buildDigestPrompt({ ...base, diffPatch: longDiff });
|
||||
expect(user.length).toBeLessThan(16_000);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── buildJudgePrompt ─────────────────────────────────────────────────────────
|
||||
|
||||
describe('buildJudgePrompt', () => {
|
||||
const digests: ContestantDigest[] = [
|
||||
{ identity: 'claude', model: 'opus-4-5', digest: 'Good result.', benchmarkLine: '5000ms' },
|
||||
{ identity: 'opencode', model: 'qwen3.6', digest: 'Decent result.', benchmarkLine: '8000ms' },
|
||||
];
|
||||
|
||||
it('includes the original prompt in the user section', () => {
|
||||
const { user } = buildJudgePrompt('Write a sorting algorithm', digests);
|
||||
expect(user).toContain('Write a sorting algorithm');
|
||||
});
|
||||
|
||||
it('includes each contestant heading in the user section', () => {
|
||||
const { user } = buildJudgePrompt('prompt', digests);
|
||||
expect(user).toContain('claude');
|
||||
expect(user).toContain('opus-4-5');
|
||||
expect(user).toContain('opencode');
|
||||
expect(user).toContain('qwen3.6');
|
||||
});
|
||||
|
||||
it('includes each contestant digest text', () => {
|
||||
const { user } = buildJudgePrompt('prompt', digests);
|
||||
expect(user).toContain('Good result.');
|
||||
expect(user).toContain('Decent result.');
|
||||
});
|
||||
|
||||
it('instructs the model to name a WINNER when 2+ digests are provided', () => {
|
||||
const { system } = buildJudgePrompt('prompt', digests);
|
||||
expect(system).toContain('WINNER:');
|
||||
});
|
||||
|
||||
it('instructs the model NOT to name a winner when fewer than 2 digests are provided', () => {
|
||||
const oneDigest = digests.slice(0, 1);
|
||||
const { system } = buildJudgePrompt('prompt', oneDigest);
|
||||
expect(system).toContain('NO_WINNER');
|
||||
expect(system).not.toContain('WINNER: <identity>');
|
||||
});
|
||||
|
||||
it('instructs NO_WINNER when digests list is empty', () => {
|
||||
const { system } = buildJudgePrompt('prompt', []);
|
||||
expect(system).toContain('NO_WINNER');
|
||||
});
|
||||
|
||||
it('truncates originalPrompt longer than 2000 characters', () => {
|
||||
const longPrompt = 'p'.repeat(5_000);
|
||||
const { user } = buildJudgePrompt(longPrompt, digests);
|
||||
// Should not contain more than 2000 chars of the prompt.
|
||||
const promptSection = user.split('# Contestant Digests')[0] ?? '';
|
||||
expect(promptSection.length).toBeLessThan(3_000);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── buildCrossExamPrompt ─────────────────────────────────────────────────────
|
||||
|
||||
describe('buildCrossExamPrompt', () => {
|
||||
const digests: ContestantDigest[] = [
|
||||
{ identity: 'claude', model: 'opus-4-5', digest: 'Strong result.', benchmarkLine: '5000ms' },
|
||||
{ identity: 'boocode', model: 'qwen3.6-35b', digest: 'Decent result.', benchmarkLine: '12000ms' },
|
||||
];
|
||||
|
||||
const baseOpts = {
|
||||
originalPrompt: 'Write a sorting algorithm.',
|
||||
digests,
|
||||
analysisContent: '# Arena Analysis\n\nClaude did better.\n\nWINNER: claude/opus-4-5',
|
||||
proposedWinner: 'claude/opus-4-5',
|
||||
examinerIdentity: 'goose',
|
||||
examinerModel: 'gpt-4o',
|
||||
};
|
||||
|
||||
it('includes the examiner identity and model in the system prompt', () => {
|
||||
const { system } = buildCrossExamPrompt(baseOpts);
|
||||
expect(system).toContain('goose');
|
||||
expect(system).toContain('gpt-4o');
|
||||
});
|
||||
|
||||
it('includes the original prompt in the user section', () => {
|
||||
const { user } = buildCrossExamPrompt(baseOpts);
|
||||
expect(user).toContain('Write a sorting algorithm.');
|
||||
});
|
||||
|
||||
it('includes each contestant digest', () => {
|
||||
const { user } = buildCrossExamPrompt(baseOpts);
|
||||
expect(user).toContain('Strong result.');
|
||||
expect(user).toContain('Decent result.');
|
||||
});
|
||||
|
||||
it('includes the proposed analysis content', () => {
|
||||
const { user } = buildCrossExamPrompt(baseOpts);
|
||||
expect(user).toContain('Claude did better.');
|
||||
});
|
||||
|
||||
it('includes the proposed winner when set', () => {
|
||||
const { user } = buildCrossExamPrompt(baseOpts);
|
||||
expect(user).toContain('claude/opus-4-5');
|
||||
});
|
||||
|
||||
it('notes that no winner was proposed when proposedWinner is null', () => {
|
||||
const { user } = buildCrossExamPrompt({ ...baseOpts, proposedWinner: null });
|
||||
expect(user).toContain('No winner was proposed');
|
||||
});
|
||||
|
||||
it('instructs the examiner to provide a VERDICT line', () => {
|
||||
const { system } = buildCrossExamPrompt(baseOpts);
|
||||
expect(system).toContain('VERDICT:');
|
||||
});
|
||||
});
|
||||
332
apps/coder/src/services/__tests__/arena-decisions.test.ts
Normal file
332
apps/coder/src/services/__tests__/arena-decisions.test.ts
Normal file
@@ -0,0 +1,332 @@
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import {
|
||||
classifyLane,
|
||||
nextLocalContestant,
|
||||
isBattleComplete,
|
||||
computeBenchmark,
|
||||
sanitizeSlug,
|
||||
buildBattleSlug,
|
||||
buildContestantDir,
|
||||
reconcileContestantResume,
|
||||
reconcileContestants,
|
||||
type ContestantSlot,
|
||||
} from '../arena-decisions.js';
|
||||
|
||||
// Local models = what the llama-swap server actually serves.
|
||||
const LOCAL_MODELS: ReadonlySet<string> = new Set([
|
||||
'qwen3.6-35b-a3b-mxfp4',
|
||||
'qwen2.5-coder-7b',
|
||||
]);
|
||||
|
||||
// ─── classifyLane ────────────────────────────────────────────────────────────
|
||||
|
||||
describe('classifyLane', () => {
|
||||
it('classifies qa battles as local regardless of identity or model', () => {
|
||||
expect(classifyLane('qa', 'boocode', 'qwen3.6-35b-a3b-mxfp4', LOCAL_MODELS)).toBe('local');
|
||||
expect(classifyLane('qa', 'claude', 'claude-opus-4-5', LOCAL_MODELS)).toBe('local');
|
||||
expect(classifyLane('qa', 'Debugger', 'cloud-model', new Set())).toBe('local');
|
||||
expect(classifyLane('qa', 'opencode', 'any-model', LOCAL_MODELS)).toBe('local');
|
||||
});
|
||||
|
||||
it('classifies coding contestants as local when model is in localModels', () => {
|
||||
expect(classifyLane('coding', 'boocode', 'qwen3.6-35b-a3b-mxfp4', LOCAL_MODELS)).toBe('local');
|
||||
expect(classifyLane('coding', 'opencode', 'qwen3.6-35b-a3b-mxfp4', LOCAL_MODELS)).toBe('local');
|
||||
expect(classifyLane('coding', 'qwen', 'qwen2.5-coder-7b', LOCAL_MODELS)).toBe('local');
|
||||
});
|
||||
|
||||
it('classifies coding contestants as cloud when model is not in localModels', () => {
|
||||
expect(classifyLane('coding', 'claude', 'claude-opus-4-5', LOCAL_MODELS)).toBe('cloud');
|
||||
expect(classifyLane('coding', 'opencode', 'claude-opus-4-5', LOCAL_MODELS)).toBe('cloud');
|
||||
expect(classifyLane('coding', 'goose', 'gpt-4o', LOCAL_MODELS)).toBe('cloud');
|
||||
expect(classifyLane('coding', 'qwen', 'unknown-remote-model', LOCAL_MODELS)).toBe('cloud');
|
||||
});
|
||||
|
||||
it('uses the injected localModels set, not a hardcoded list', () => {
|
||||
const custom = new Set(['my-local-model']);
|
||||
expect(classifyLane('coding', 'any-agent', 'my-local-model', custom)).toBe('local');
|
||||
expect(classifyLane('coding', 'boocode', 'other-model', custom)).toBe('cloud');
|
||||
});
|
||||
|
||||
it('defaults to cloud for an empty localModels set', () => {
|
||||
expect(classifyLane('coding', 'boocode', 'qwen3.6-35b-a3b-mxfp4', new Set())).toBe('cloud');
|
||||
expect(classifyLane('coding', 'native', 'any-local-model', new Set())).toBe('cloud');
|
||||
});
|
||||
});
|
||||
|
||||
// ─── nextLocalContestant ─────────────────────────────────────────────────────
|
||||
|
||||
describe('nextLocalContestant', () => {
|
||||
it('returns null for an empty list', () => {
|
||||
expect(nextLocalContestant([])).toBeNull();
|
||||
});
|
||||
|
||||
it('returns null when no local contestants are queued', () => {
|
||||
const slots: ContestantSlot[] = [
|
||||
{ id: 'c1', lane: 'local', status: 'running' },
|
||||
{ id: 'c2', lane: 'cloud', status: 'queued' },
|
||||
];
|
||||
expect(nextLocalContestant(slots)).toBeNull();
|
||||
});
|
||||
|
||||
it('returns the first queued local contestant in order', () => {
|
||||
const slots: ContestantSlot[] = [
|
||||
{ id: 'c1', lane: 'local', status: 'done' },
|
||||
{ id: 'c2', lane: 'local', status: 'queued' },
|
||||
{ id: 'c3', lane: 'local', status: 'queued' },
|
||||
];
|
||||
expect(nextLocalContestant(slots)).toBe('c2');
|
||||
});
|
||||
|
||||
it('skips done/error local contestants and cloud contestants', () => {
|
||||
const slots: ContestantSlot[] = [
|
||||
{ id: 'c1', lane: 'cloud', status: 'queued' },
|
||||
{ id: 'c2', lane: 'local', status: 'error' },
|
||||
{ id: 'c3', lane: 'local', status: 'queued' },
|
||||
];
|
||||
expect(nextLocalContestant(slots)).toBe('c3');
|
||||
});
|
||||
|
||||
it('returns null when all local contestants are done or error', () => {
|
||||
const slots: ContestantSlot[] = [
|
||||
{ id: 'c1', lane: 'local', status: 'done' },
|
||||
{ id: 'c2', lane: 'local', status: 'error' },
|
||||
];
|
||||
expect(nextLocalContestant(slots)).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
// ─── isBattleComplete ────────────────────────────────────────────────────────
|
||||
|
||||
describe('isBattleComplete', () => {
|
||||
it('returns false for an empty list', () => {
|
||||
expect(isBattleComplete([])).toBe(false);
|
||||
});
|
||||
|
||||
it('returns true when all contestants are done', () => {
|
||||
expect(isBattleComplete([{ status: 'done' }, { status: 'done' }])).toBe(true);
|
||||
});
|
||||
|
||||
it('returns true when all contestants are error', () => {
|
||||
expect(isBattleComplete([{ status: 'error' }, { status: 'error' }])).toBe(true);
|
||||
});
|
||||
|
||||
it('returns true for a mixed done/error result', () => {
|
||||
expect(isBattleComplete([{ status: 'done' }, { status: 'error' }, { status: 'done' }])).toBe(true);
|
||||
});
|
||||
|
||||
it('returns false while any contestant is still running', () => {
|
||||
expect(isBattleComplete([{ status: 'done' }, { status: 'running' }])).toBe(false);
|
||||
});
|
||||
|
||||
it('returns false while any contestant is still queued', () => {
|
||||
expect(isBattleComplete([{ status: 'done' }, { status: 'queued' }])).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── computeBenchmark ────────────────────────────────────────────────────────
|
||||
|
||||
describe('computeBenchmark', () => {
|
||||
const t0 = new Date('2026-06-06T10:00:00.000Z');
|
||||
const t1 = new Date('2026-06-06T10:00:05.000Z'); // +5 000ms
|
||||
|
||||
it('computes duration in ms for both lanes', () => {
|
||||
const local = computeBenchmark(t0, t1, 100, 'local');
|
||||
expect(local.durationMs).toBe(5000);
|
||||
const cloud = computeBenchmark(t0, t1, null, 'cloud');
|
||||
expect(cloud.durationMs).toBe(5000);
|
||||
});
|
||||
|
||||
it('computes tokens/sec for local lane when costTokens is known', () => {
|
||||
const bench = computeBenchmark(t0, t1, 500, 'local');
|
||||
expect(bench.tokensPerSec).toBeCloseTo(100, 5); // 500 / 5 = 100 tok/s
|
||||
});
|
||||
|
||||
it('omits tokens/sec for cloud lane regardless of costTokens', () => {
|
||||
const bench = computeBenchmark(t0, t1, 500, 'cloud');
|
||||
expect(bench.tokensPerSec).toBeNull();
|
||||
});
|
||||
|
||||
it('omits tokens/sec for local lane when costTokens is null', () => {
|
||||
const bench = computeBenchmark(t0, t1, null, 'local');
|
||||
expect(bench.tokensPerSec).toBeNull();
|
||||
});
|
||||
|
||||
it('returns durationMs = 0 and null tokensPerSec when timestamps are equal', () => {
|
||||
const bench = computeBenchmark(t0, t0, 100, 'local');
|
||||
expect(bench.durationMs).toBe(0);
|
||||
expect(bench.tokensPerSec).toBeNull();
|
||||
});
|
||||
|
||||
it('clamps negative duration to 0 (clock skew)', () => {
|
||||
const bench = computeBenchmark(t1, t0, 50, 'local');
|
||||
expect(bench.durationMs).toBe(0);
|
||||
expect(bench.tokensPerSec).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
// ─── sanitizeSlug ────────────────────────────────────────────────────────────
|
||||
|
||||
describe('sanitizeSlug', () => {
|
||||
it('lowercases and preserves alphanumeric + hyphens', () => {
|
||||
expect(sanitizeSlug('claude')).toBe('claude');
|
||||
expect(sanitizeSlug('claude-opus-4-5')).toBe('claude-opus-4-5');
|
||||
});
|
||||
|
||||
it('replaces spaces and special characters with hyphens', () => {
|
||||
expect(sanitizeSlug('Code Reviewer')).toBe('code-reviewer');
|
||||
expect(sanitizeSlug('native/boocode')).toBe('native-boocode');
|
||||
expect(sanitizeSlug('qwen2.5-coder-35b')).toBe('qwen2-5-coder-35b');
|
||||
});
|
||||
|
||||
it('collapses consecutive non-alphanumeric runs to a single hyphen', () => {
|
||||
expect(sanitizeSlug('foo bar---baz')).toBe('foo-bar-baz');
|
||||
});
|
||||
|
||||
it('strips leading and trailing hyphens', () => {
|
||||
expect(sanitizeSlug('---foo---')).toBe('foo');
|
||||
});
|
||||
|
||||
it('truncates to 64 characters', () => {
|
||||
const long = 'a'.repeat(100);
|
||||
expect(sanitizeSlug(long).length).toBe(64);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── buildBattleSlug ─────────────────────────────────────────────────────────
|
||||
|
||||
describe('buildBattleSlug', () => {
|
||||
it('builds a deterministic dated slug from id, type, and createdAt', () => {
|
||||
const id = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890';
|
||||
const createdAt = new Date('2026-06-06T12:00:00.000Z');
|
||||
const slug = buildBattleSlug(id, 'coding', createdAt);
|
||||
expect(slug).toBe('2026-06-06-coding-a1b2c3d4');
|
||||
});
|
||||
|
||||
it('includes the battle type in the slug', () => {
|
||||
const id = 'aaaaaaaa-0000-0000-0000-000000000000';
|
||||
const createdAt = new Date('2026-01-01T00:00:00.000Z');
|
||||
expect(buildBattleSlug(id, 'qa', createdAt)).toContain('-qa-');
|
||||
expect(buildBattleSlug(id, 'coding', createdAt)).toContain('-coding-');
|
||||
});
|
||||
|
||||
it('uses the first 8 hex chars of the uuid (dashes stripped)', () => {
|
||||
const id = 'deadbeef-0000-0000-0000-000000000000';
|
||||
const slug = buildBattleSlug(id, 'coding', new Date('2026-06-06T00:00:00Z'));
|
||||
expect(slug.endsWith('-deadbeef')).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── buildContestantDir ──────────────────────────────────────────────────────
|
||||
|
||||
describe('buildContestantDir', () => {
|
||||
it('joins sanitized identity and model with a hyphen', () => {
|
||||
expect(buildContestantDir('claude', 'claude-opus-4-5')).toBe('claude-claude-opus-4-5');
|
||||
});
|
||||
|
||||
it('sanitizes both parts independently', () => {
|
||||
expect(buildContestantDir('Code Reviewer', 'qwen2.5-35b')).toBe('code-reviewer-qwen2-5-35b');
|
||||
});
|
||||
});
|
||||
|
||||
// ─── reconcileContestantResume ───────────────────────────────────────────────
|
||||
|
||||
describe('reconcileContestantResume', () => {
|
||||
it('keeps non-running contestants regardless of task state', () => {
|
||||
for (const status of ['queued', 'done', 'error']) {
|
||||
expect(reconcileContestantResume(status, 'tid', 'completed')).toBe('keep');
|
||||
expect(reconcileContestantResume(status, null, null)).toBe('keep');
|
||||
}
|
||||
});
|
||||
|
||||
it('re-dispatches a running contestant with no task_id', () => {
|
||||
expect(reconcileContestantResume('running', null, null)).toBe('re-dispatch');
|
||||
});
|
||||
|
||||
it('re-dispatches a running contestant whose task row is absent', () => {
|
||||
expect(reconcileContestantResume('running', 'tid', null)).toBe('re-dispatch');
|
||||
});
|
||||
|
||||
it('marks done when the task completed before the terminal callback ran', () => {
|
||||
expect(reconcileContestantResume('running', 'tid', 'completed')).toBe('mark-done');
|
||||
});
|
||||
|
||||
it('marks error when the task failed', () => {
|
||||
expect(reconcileContestantResume('running', 'tid', 'failed')).toBe('mark-error');
|
||||
});
|
||||
|
||||
it('marks cancelled when the task was cancelled', () => {
|
||||
expect(reconcileContestantResume('running', 'tid', 'cancelled')).toBe('mark-cancelled');
|
||||
});
|
||||
|
||||
it('keeps a running contestant whose task is pending (dispatcher handles it)', () => {
|
||||
expect(reconcileContestantResume('running', 'tid', 'pending')).toBe('keep');
|
||||
});
|
||||
|
||||
it('re-dispatches when the task is stuck running (process died)', () => {
|
||||
expect(reconcileContestantResume('running', 'tid', 'running')).toBe('re-dispatch');
|
||||
});
|
||||
|
||||
it('re-dispatches when the task is blocked (permission dialog gone on restart)', () => {
|
||||
expect(reconcileContestantResume('running', 'tid', 'blocked')).toBe('re-dispatch');
|
||||
});
|
||||
});
|
||||
|
||||
// ─── reconcileContestants ────────────────────────────────────────────────────
|
||||
|
||||
describe('reconcileContestants', () => {
|
||||
it('returns one decision per contestant', () => {
|
||||
const contestants = [
|
||||
{ contestantId: 'c1', taskId: null, status: 'done' },
|
||||
{ contestantId: 'c2', taskId: 't1', status: 'running' },
|
||||
{ contestantId: 'c3', taskId: 't2', status: 'running' },
|
||||
];
|
||||
const taskStates = new Map([['t1', 'completed'], ['t2', 'running']]);
|
||||
const decisions = reconcileContestants(contestants, taskStates);
|
||||
expect(decisions).toHaveLength(3);
|
||||
expect(decisions[0]).toEqual({ contestantId: 'c1', action: 'keep' });
|
||||
expect(decisions[1]).toEqual({ contestantId: 'c2', action: 'mark-done' });
|
||||
expect(decisions[2]).toEqual({ contestantId: 'c3', action: 're-dispatch' });
|
||||
});
|
||||
|
||||
it('re-dispatches a running contestant whose taskId is absent from taskStates', () => {
|
||||
const contestants = [{ contestantId: 'c1', taskId: 'orphan', status: 'running' }];
|
||||
const decisions = reconcileContestants(contestants, new Map());
|
||||
expect(decisions[0]?.action).toBe('re-dispatch');
|
||||
});
|
||||
|
||||
it('re-dispatches a running contestant with null taskId', () => {
|
||||
const contestants = [{ contestantId: 'c1', taskId: null, status: 'running' }];
|
||||
const decisions = reconcileContestants(contestants, new Map());
|
||||
expect(decisions[0]?.action).toBe('re-dispatch');
|
||||
});
|
||||
|
||||
it('returns empty array for no contestants', () => {
|
||||
expect(reconcileContestants([], new Map())).toEqual([]);
|
||||
});
|
||||
|
||||
it('keeps a running contestant whose task is pending', () => {
|
||||
const contestants = [{ contestantId: 'c1', taskId: 't1', status: 'running' }];
|
||||
const taskStates = new Map([['t1', 'pending']]);
|
||||
const decisions = reconcileContestants(contestants, taskStates);
|
||||
expect(decisions[0]?.action).toBe('keep');
|
||||
});
|
||||
|
||||
it('handles a mixed battle: done/queued kept, stale running re-dispatched', () => {
|
||||
const contestants = [
|
||||
{ contestantId: 'c1', taskId: 't1', status: 'done' },
|
||||
{ contestantId: 'c2', taskId: null, status: 'queued' },
|
||||
{ contestantId: 'c3', taskId: 't2', status: 'running' },
|
||||
{ contestantId: 'c4', taskId: 't3', status: 'running' },
|
||||
];
|
||||
const taskStates = new Map([
|
||||
['t1', 'completed'],
|
||||
['t2', 'running'], // stuck — process dead
|
||||
['t3', 'pending'], // dispatcher will handle
|
||||
]);
|
||||
const decisions = reconcileContestants(contestants, taskStates);
|
||||
expect(decisions.find((d) => d.contestantId === 'c1')?.action).toBe('keep');
|
||||
expect(decisions.find((d) => d.contestantId === 'c2')?.action).toBe('keep');
|
||||
expect(decisions.find((d) => d.contestantId === 'c3')?.action).toBe('re-dispatch');
|
||||
expect(decisions.find((d) => d.contestantId === 'c4')?.action).toBe('keep');
|
||||
});
|
||||
});
|
||||
@@ -161,6 +161,52 @@ describe('locateMatch — strategy 4: Levenshtein', () => {
|
||||
});
|
||||
});
|
||||
|
||||
describe('locateMatch — strategy 4: fail-closed on ambiguity (corruption guard)', () => {
|
||||
it('refuses (ambiguous) when two equally-similar anchored blocks both clear the bar', () => {
|
||||
// The repetitive-file case that duplicated blocks: two blocks share the same
|
||||
// first+last anchor lines and their middle lines are EQUALLY similar to the
|
||||
// (drifted) needle. Tier 4 must refuse rather than splice over one of them.
|
||||
const content = [
|
||||
'const x = {',
|
||||
' total = aa;',
|
||||
'};',
|
||||
'const x = {',
|
||||
' total = bb;',
|
||||
'};',
|
||||
].join('\n');
|
||||
const needle = ['const x = {', ' total = ab;', '};'].join('\n');
|
||||
const result = locateMatch(content, needle);
|
||||
expect(result.kind).toBe('ambiguous');
|
||||
});
|
||||
|
||||
it('refuses a below-threshold near-miss that the old 0.66 floor would have spliced', () => {
|
||||
// ~0.7 similar: under the raised 0.85 floor this is now not_found, so the
|
||||
// caller surfaces a correctable error instead of corrupting the file.
|
||||
const content = 'const grandTotalAmount = a + b;\n';
|
||||
const needle = 'const totalValue = a + b;';
|
||||
const result = locateMatch(content, needle);
|
||||
expect(result).toEqual({ kind: 'not_found' });
|
||||
});
|
||||
|
||||
it('still matches a single genuine high-similarity drift uniquely', () => {
|
||||
const content = 'const total = sum + tax;\n';
|
||||
const needle = 'const totals = sum + tax;'; // one-char typo, ~0.96
|
||||
const result = locateMatch(content, needle);
|
||||
expect(result.kind).toBe('fuzzy');
|
||||
const { start, end } = span(result);
|
||||
expect(content.slice(start, end)).toBe('const total = sum + tax;');
|
||||
});
|
||||
|
||||
it('requires an exact first+last line anchor for multi-line needles', () => {
|
||||
// First line drifted too far to anchor → no window is scored → not_found,
|
||||
// even though the middle lines are identical.
|
||||
const content = ['function compute() {', ' return a + b;', ' return done;', '}'].join('\n');
|
||||
const needle = ['totally different opener', ' return a + b;', '}'].join('\n');
|
||||
const result = locateMatch(content, needle);
|
||||
expect(result).toEqual({ kind: 'not_found' });
|
||||
});
|
||||
});
|
||||
|
||||
describe('locateMatch — edge cases', () => {
|
||||
it('returns not_found for an empty needle', () => {
|
||||
expect(locateMatch('anything', '')).toEqual({ kind: 'not_found' });
|
||||
|
||||
@@ -83,6 +83,53 @@ describe.runIf(!!process.env.DATABASE_URL)('pending_changes integration', () =>
|
||||
expect(existsSync(resolve(testDir, 'deleteme.txt'))).toBe(false);
|
||||
});
|
||||
|
||||
it('re-emitted identical edits dedupe at queue and never duplicate on apply', async () => {
|
||||
// Regression: the 2-3x block-stamping corruption. An anchored insert queued
|
||||
// three times (a local model re-emitting the same tool call) must collapse to
|
||||
// ONE pending row and apply exactly once.
|
||||
await queueCreate(sql, testSessionId, null, 'dup.js', '<script>\nrender();\n', projectRoot)
|
||||
.then((c) => applyOne(sql, c.id, projectRoot));
|
||||
|
||||
const oldStr = '<script>';
|
||||
const newStr = '<script>\nconst recordFormats = ["gif"];';
|
||||
const a = await queueEdit(sql, testSessionId, null, 'dup.js', oldStr, newStr, projectRoot);
|
||||
const b = await queueEdit(sql, testSessionId, null, 'dup.js', oldStr, newStr, projectRoot);
|
||||
const c = await queueEdit(sql, testSessionId, null, 'dup.js', oldStr, newStr, projectRoot);
|
||||
// All three calls return the SAME pending row (deduped).
|
||||
expect(b.id).toBe(a.id);
|
||||
expect(c.id).toBe(a.id);
|
||||
|
||||
await applyOne(sql, a.id, projectRoot);
|
||||
let content = await readFile(resolve(testDir, 'dup.js'), 'utf8');
|
||||
expect((content.match(/const recordFormats/g) || []).length).toBe(1);
|
||||
|
||||
// Even a fresh, separately-queued identical edit re-applied is a no-op, not a stamp.
|
||||
const again = await queueEdit(sql, testSessionId, null, 'dup.js', oldStr, newStr, projectRoot);
|
||||
const res = await applyOne(sql, again.id, projectRoot);
|
||||
expect(res.success).toBe(true);
|
||||
content = await readFile(resolve(testDir, 'dup.js'), 'utf8');
|
||||
expect((content.match(/const recordFormats/g) || []).length).toBe(1);
|
||||
});
|
||||
|
||||
it('preserves CRLF line endings on edit', async () => {
|
||||
await queueCreate(sql, testSessionId, null, 'crlf.txt', 'line one\r\nline two\r\nline three\r\n', projectRoot)
|
||||
.then((c) => applyOne(sql, c.id, projectRoot));
|
||||
const edit = await queueEdit(sql, testSessionId, null, 'crlf.txt', 'line two', 'line TWO', projectRoot);
|
||||
const res = await applyOne(sql, edit.id, projectRoot);
|
||||
expect(res.success).toBe(true);
|
||||
const content = await readFile(resolve(testDir, 'crlf.txt'), 'utf8');
|
||||
expect(content).toBe('line one\r\nline TWO\r\nline three\r\n');
|
||||
});
|
||||
|
||||
it('refuses an edit that matches multiple locations instead of corrupting', async () => {
|
||||
await queueCreate(sql, testSessionId, null, 'ambig.js', 'x=1;\ny=2;\nx=1;\n', projectRoot)
|
||||
.then((ch) => applyOne(sql, ch.id, projectRoot));
|
||||
const edit = await queueEdit(sql, testSessionId, null, 'ambig.js', 'x=1;', 'x=9;', projectRoot);
|
||||
const res = await applyOne(sql, edit.id, projectRoot);
|
||||
expect(res.success).toBe(false);
|
||||
expect(res.error).toMatch(/matches 2 locations/);
|
||||
});
|
||||
|
||||
it('rewindOne → verify reverted', async () => {
|
||||
// Setup: create and apply a file
|
||||
const createChange = await queueCreate(sql, testSessionId, null, 'rewindable.txt', 'initial', projectRoot);
|
||||
|
||||
69
apps/coder/src/services/__tests__/plan-edit.test.ts
Normal file
69
apps/coder/src/services/__tests__/plan-edit.test.ts
Normal file
@@ -0,0 +1,69 @@
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { planEdit } from '../pending_changes.js';
|
||||
|
||||
// planEdit is the pure core of applyOne's edit splice. These tests pin the
|
||||
// idempotency guards that stop the "block stamped 2-3x" corruption: applying the
|
||||
// same queued edit more than once must be a no-op, never a duplicate.
|
||||
|
||||
describe('planEdit — normal application', () => {
|
||||
it('applies a unique exact edit', () => {
|
||||
const content = 'a\nfoo\nb\n';
|
||||
const plan = planEdit(content, 'foo', 'bar');
|
||||
expect(plan).toEqual({ kind: 'apply', updated: 'a\nbar\nb\n' });
|
||||
});
|
||||
|
||||
it('reports ambiguous when old_string occurs more than once', () => {
|
||||
const content = 'foo\nx\nfoo\n';
|
||||
const plan = planEdit(content, 'foo', 'bar');
|
||||
expect(plan).toEqual({ kind: 'ambiguous', count: 2 });
|
||||
});
|
||||
|
||||
it('reports not_found when old_string is absent and new is not present', () => {
|
||||
const content = 'alpha\nbeta\n';
|
||||
const plan = planEdit(content, 'gamma that is clearly nowhere', 'delta');
|
||||
expect(plan).toEqual({ kind: 'not_found' });
|
||||
});
|
||||
});
|
||||
|
||||
describe('planEdit — idempotency (the corruption guard)', () => {
|
||||
it('treats a re-applied anchored insert as already-applied (no duplicate)', () => {
|
||||
// The exact mechanism that tripled `const recordFormats` in settings.html:
|
||||
// an anchored insert (old=anchor, new=anchor+block) where the anchor still
|
||||
// matches uniquely after the first apply.
|
||||
const oldStr = '<script>';
|
||||
const newStr = '<script>\nconst recordFormats = ["gif","mp4"];';
|
||||
const before = '<script>\nfunction render() {}\n</script>\n';
|
||||
|
||||
const first = planEdit(before, oldStr, newStr);
|
||||
expect(first.kind).toBe('apply');
|
||||
const after = first.kind === 'apply' ? first.updated : '';
|
||||
expect((after.match(/const recordFormats/g) || []).length).toBe(1);
|
||||
|
||||
// Re-applying the identical edit to the already-edited content is a no-op.
|
||||
const second = planEdit(after, oldStr, newStr);
|
||||
expect(second).toEqual({ kind: 'noop', reason: 'already-applied' });
|
||||
});
|
||||
|
||||
it('treats an edit whose old_string is gone but new_string is present as already-applied', () => {
|
||||
const content = 'const total = sum + tax;\n';
|
||||
const plan = planEdit(content, 'const subtotal = sum;', 'const total = sum + tax;');
|
||||
expect(plan).toEqual({ kind: 'noop', reason: 'already-applied' });
|
||||
});
|
||||
|
||||
it('treats a no-change splice as a noop', () => {
|
||||
const content = 'a\nfoo\nb\n';
|
||||
const plan = planEdit(content, 'foo', 'foo');
|
||||
expect(plan).toEqual({ kind: 'noop', reason: 'identical' });
|
||||
});
|
||||
|
||||
it('does not duplicate across three repeated applications', () => {
|
||||
const oldStr = 'function f() {';
|
||||
const newStr = 'function f() {\n const x = 1;';
|
||||
let content = 'function f() {\n return x;\n}\n';
|
||||
for (let i = 0; i < 3; i++) {
|
||||
const plan = planEdit(content, oldStr, newStr);
|
||||
if (plan.kind === 'apply') content = plan.updated;
|
||||
}
|
||||
expect((content.match(/const x = 1;/g) || []).length).toBe(1);
|
||||
});
|
||||
});
|
||||
@@ -68,11 +68,18 @@ export function deriveModesFromACP(
|
||||
): { modes: ProviderMode[]; currentModeId: string | null } {
|
||||
if (modeState?.availableModes?.length) {
|
||||
return {
|
||||
modes: modeState.availableModes.map((mode) => ({
|
||||
id: mode.id,
|
||||
label: mode.name,
|
||||
description: mode.description ?? undefined,
|
||||
})),
|
||||
// ACP omits the unattended flag; inherit it from the manifest fallback by
|
||||
// id so the unified permission picker can still detect each agent's bypass
|
||||
// mode (e.g. opencode `full-access`) from live-probed modes.
|
||||
modes: modeState.availableModes.map((mode) => {
|
||||
const fb = fallbackModes.find((f) => f.id === mode.id);
|
||||
return {
|
||||
id: mode.id,
|
||||
label: mode.name,
|
||||
description: mode.description ?? undefined,
|
||||
...(fb?.isUnattended ? { isUnattended: true } : {}),
|
||||
};
|
||||
}),
|
||||
currentModeId: modeState.currentModeId ?? null,
|
||||
};
|
||||
}
|
||||
|
||||
191
apps/coder/src/services/arena-analyzer-helpers.ts
Normal file
191
apps/coder/src/services/arena-analyzer-helpers.ts
Normal file
@@ -0,0 +1,191 @@
|
||||
/**
|
||||
* Pure, side-effect-free helpers for the Arena analyzer.
|
||||
* No DB, no IO, no network — safe to unit-test directly.
|
||||
*
|
||||
* Covers: digest-prompt assembly, judge-prompt assembly, winner extraction
|
||||
* from the judge output, the <2-survivors no-winner rule, and the
|
||||
* cross-examination prompt.
|
||||
*/
|
||||
|
||||
// ─── Shared types ─────────────────────────────────────────────────────────────
|
||||
|
||||
export interface ContestantDigestInput {
|
||||
identity: string;
|
||||
model: string;
|
||||
resultMd: string;
|
||||
diffPatch?: string;
|
||||
benchmarkLine: string;
|
||||
}
|
||||
|
||||
export interface ContestantDigest {
|
||||
identity: string;
|
||||
model: string;
|
||||
digest: string;
|
||||
benchmarkLine: string;
|
||||
}
|
||||
|
||||
// ─── Digest stage ─────────────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Build the system + user prompts for the per-contestant digest call.
|
||||
* The digest is a short structured summary; it keeps each call's context small
|
||||
* so the downstream judge only sees digests (not raw diffs).
|
||||
*/
|
||||
export function buildDigestPrompt(input: ContestantDigestInput): { system: string; user: string } {
|
||||
const system =
|
||||
'You are an expert technical analyst evaluating the output of an AI coding or Q&A battle. ' +
|
||||
'Produce a concise structured digest (under 300 words, Markdown bullet points) covering: ' +
|
||||
'(1) correctness and quality, (2) completeness, (3) notable strengths, (4) notable weaknesses or issues. ' +
|
||||
'Do not reference the battle or other contestants — focus only on this submission.';
|
||||
|
||||
const parts: string[] = [
|
||||
`# Contestant: ${input.identity} / ${input.model}`,
|
||||
`\nBenchmark: ${input.benchmarkLine}`,
|
||||
'\n## Result\n',
|
||||
input.resultMd.slice(0, 8_000),
|
||||
];
|
||||
|
||||
if (input.diffPatch) {
|
||||
parts.push('\n## Code Changes (diff)\n```diff');
|
||||
parts.push(input.diffPatch.slice(0, 5_000));
|
||||
parts.push('```');
|
||||
}
|
||||
|
||||
return { system, user: parts.join('\n') };
|
||||
}
|
||||
|
||||
// ─── Judge stage ──────────────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Build the system + user prompts for the comparative judge call.
|
||||
* Receives contestant digests (NOT raw diffs) to keep context bounded.
|
||||
*
|
||||
* The judge output must contain a line starting with WINNER: or NO_WINNER.
|
||||
* The caller extracts it with extractWinner().
|
||||
*/
|
||||
export function buildJudgePrompt(
|
||||
originalPrompt: string,
|
||||
digests: ContestantDigest[],
|
||||
): { system: string; user: string } {
|
||||
const canName = shouldNameWinner(digests.length);
|
||||
|
||||
const winnerInstruction = canName
|
||||
? 'After your comparative analysis, name the best submission on its own line in this exact format:\n' +
|
||||
'WINNER: <identity>/<model>\n' +
|
||||
'where <identity> and <model> exactly match the heading above. No other text on that line.'
|
||||
: 'Fewer than 2 contestants succeeded. Do NOT name a winner. Write the following on its own line:\nNO_WINNER';
|
||||
|
||||
const system =
|
||||
'You are an expert judge for an AI battle. You have received digest summaries of each ' +
|
||||
"contestant's work on the same task. Write a comparative analysis, then follow these instructions:\n" +
|
||||
winnerInstruction;
|
||||
|
||||
const parts: string[] = [
|
||||
'# Original Task Prompt\n',
|
||||
originalPrompt.slice(0, 2_000),
|
||||
'\n# Contestant Digests\n',
|
||||
];
|
||||
|
||||
for (const d of digests) {
|
||||
parts.push(`\n## ${d.identity} / ${d.model}`);
|
||||
parts.push(`Benchmark: ${d.benchmarkLine}`);
|
||||
parts.push(d.digest);
|
||||
}
|
||||
|
||||
parts.push(
|
||||
'\n# Instructions\nCompare the contestants and follow the winner-naming instructions above.',
|
||||
);
|
||||
|
||||
return { system, user: parts.join('\n') };
|
||||
}
|
||||
|
||||
// ─── No-winner rule ───────────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Returns true when enough contestants succeeded to name a winner.
|
||||
* Rule: at least 2 must have produced a result. With 0 or 1 success the
|
||||
* analysis must NOT name a winner (no meaningful comparison possible).
|
||||
*/
|
||||
export function shouldNameWinner(succeededCount: number): boolean {
|
||||
return succeededCount >= 2;
|
||||
}
|
||||
|
||||
// ─── Winner extraction ────────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Parse the judge's text output and extract the declared winner.
|
||||
* Looks for a line matching: WINNER: <identity>/<model>
|
||||
* Returns null when no valid winner line is found, or when the line contains
|
||||
* NO_WINNER.
|
||||
*
|
||||
* The parse is lenient on surrounding whitespace and case for the keyword.
|
||||
*/
|
||||
export function extractWinner(judgeOutput: string): { identity: string; model: string } | null {
|
||||
for (const line of judgeOutput.split('\n')) {
|
||||
const trimmed = line.trim();
|
||||
if (!trimmed.toUpperCase().startsWith('WINNER:')) continue;
|
||||
|
||||
const rest = trimmed.slice('WINNER:'.length).trim();
|
||||
if (rest.toUpperCase() === 'NO_WINNER' || rest === '') return null;
|
||||
|
||||
const slashIdx = rest.indexOf('/');
|
||||
if (slashIdx === -1) return null;
|
||||
|
||||
const identity = rest.slice(0, slashIdx).trim();
|
||||
const model = rest.slice(slashIdx + 1).trim();
|
||||
if (identity && model) return { identity, model };
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// ─── Cross-examination stage ──────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Build the system + user prompts for a cross-examination call.
|
||||
* The cross-examiner sees the original prompt, contestant digests, and the
|
||||
* proposed analysis, and is asked to challenge the result.
|
||||
*/
|
||||
export function buildCrossExamPrompt(opts: {
|
||||
originalPrompt: string;
|
||||
digests: ContestantDigest[];
|
||||
analysisContent: string;
|
||||
proposedWinner: string | null;
|
||||
examinerIdentity: string;
|
||||
examinerModel: string;
|
||||
}): { system: string; user: string } {
|
||||
const system =
|
||||
`You are ${opts.examinerIdentity} (model: ${opts.examinerModel}), acting as an independent ` +
|
||||
'cross-examiner in an AI battle. Your role is to critically challenge the proposed analysis ' +
|
||||
'and winner, then give your own verdict. Be rigorous but fair. ' +
|
||||
'End your response with your verdict on its own line:\n' +
|
||||
'VERDICT: <identity>/<model> — if you agree or disagree with the proposed winner but can name one\n' +
|
||||
'VERDICT: NO_WINNER — if no clear winner exists';
|
||||
|
||||
const parts: string[] = [
|
||||
'# Original Task Prompt\n',
|
||||
opts.originalPrompt.slice(0, 2_000),
|
||||
'\n# Contestant Digests\n',
|
||||
];
|
||||
|
||||
for (const d of opts.digests) {
|
||||
parts.push(`\n## ${d.identity} / ${d.model}`);
|
||||
parts.push(`Benchmark: ${d.benchmarkLine}`);
|
||||
parts.push(d.digest);
|
||||
}
|
||||
|
||||
parts.push('\n# Proposed Analysis\n');
|
||||
parts.push(opts.analysisContent.slice(0, 5_000));
|
||||
|
||||
if (opts.proposedWinner) {
|
||||
parts.push(`\n*(Proposed winner: ${opts.proposedWinner})*`);
|
||||
} else {
|
||||
parts.push('\n*(No winner was proposed — fewer than 2 contestants succeeded.)*');
|
||||
}
|
||||
|
||||
parts.push(
|
||||
'\n# Your Cross-Examination\n' +
|
||||
'Challenge the analysis above, then give your independent verdict (VERDICT: … on its own line).',
|
||||
);
|
||||
|
||||
return { system, user: parts.join('\n') };
|
||||
}
|
||||
496
apps/coder/src/services/arena-analyzer.ts
Normal file
496
apps/coder/src/services/arena-analyzer.ts
Normal file
@@ -0,0 +1,496 @@
|
||||
/**
|
||||
* Arena Analyzer — pluggable seam for battle analysis and cross-examination.
|
||||
*
|
||||
* The Analyzer interface is the plug point: a v2 Han Orchestrator flow can
|
||||
* replace the v1 two-stage digest→judge implementation without a schema change.
|
||||
*
|
||||
* v1 implementation uses DEFAULT_MODEL via direct llama-swap calls (arenaModelCall):
|
||||
* Digest stage — one call per succeeded contestant, concurrent; produces a
|
||||
* bounded summary of each result (result.md + diff.patch for
|
||||
* coding, result.md for Q&A).
|
||||
* Judge stage — one call with all digests + the original prompt; writes
|
||||
* analysis.md, names a winner (unless < 2 succeeded), and
|
||||
* updates battles.winner_contestant_id.
|
||||
*
|
||||
* Cross-examination:
|
||||
* Local model — direct arenaModelCall to llama-swap with the chosen model.
|
||||
* Cloud model — inserts a tasks row (triggers the dispatcher via pg_notify);
|
||||
* polls for completion; reads output_summary as the verdict.
|
||||
* In both cases the verdict is written to cross_examinations.verdict, appended
|
||||
* to <resultsPath>/cross-exam.md, and a battle_updated frame is published.
|
||||
*
|
||||
* Never throws — all errors are caught, logged, and swallowed so the caller
|
||||
* (arena-runner's onBattleComplete / onCrossExamStart) is never wedged.
|
||||
*/
|
||||
|
||||
import { readFile, writeFile, mkdir } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import type { Sql } from '../db.js';
|
||||
import type { Broker } from '@boocode/server/broker';
|
||||
import type { WsFrame } from '@boocode/contracts/ws-frames';
|
||||
import type { FastifyBaseLogger } from 'fastify';
|
||||
import type { Config } from '../config.js';
|
||||
import type { BattleType } from '@boocode/contracts/arena';
|
||||
import { arenaModelCall } from './arena-model-call.js';
|
||||
import {
|
||||
buildDigestPrompt,
|
||||
buildJudgePrompt,
|
||||
buildCrossExamPrompt,
|
||||
extractWinner,
|
||||
shouldNameWinner,
|
||||
type ContestantDigest,
|
||||
} from './arena-analyzer-helpers.js';
|
||||
|
||||
// ─── Public interface ─────────────────────────────────────────────────────────
|
||||
|
||||
/** Pluggable analysis seam — swap to a Han Orchestrator flow in v2. */
|
||||
export interface Analyzer {
|
||||
/** Run the two-stage digest→judge analysis for a completed battle. */
|
||||
analyze(battleId: string): Promise<void>;
|
||||
/**
|
||||
* Run a cross-examination for an already-inserted cross_examinations row.
|
||||
* The result is written back to that row and a battle_updated frame is published.
|
||||
*/
|
||||
crossExamine(
|
||||
battleId: string,
|
||||
crossExamId: string,
|
||||
opts: { identity: string; model: string },
|
||||
): Promise<void>;
|
||||
}
|
||||
|
||||
// ─── Internal DB row types ────────────────────────────────────────────────────
|
||||
|
||||
interface BattleRow {
|
||||
id: string;
|
||||
project_id: string;
|
||||
battle_type: BattleType;
|
||||
prompt: string;
|
||||
status: string;
|
||||
results_path: string | null;
|
||||
winner_contestant_id: string | null;
|
||||
}
|
||||
|
||||
interface ContestantRow {
|
||||
id: string;
|
||||
identity: string;
|
||||
model: string;
|
||||
lane: string;
|
||||
status: string;
|
||||
result_path: string | null;
|
||||
duration_ms: number | null;
|
||||
tokens_per_sec: number | null;
|
||||
}
|
||||
|
||||
// ─── Factory ──────────────────────────────────────────────────────────────────
|
||||
|
||||
interface AnalyzerDeps {
|
||||
sql: Sql;
|
||||
broker: Broker;
|
||||
log: FastifyBaseLogger;
|
||||
config: Pick<Config, 'LLAMA_SWAP_URL' | 'DEFAULT_MODEL'>;
|
||||
/** Model IDs served by local llama-swap — cross-exam routing uses this. */
|
||||
localModels: ReadonlySet<string>;
|
||||
}
|
||||
|
||||
export function createAnalyzer(deps: AnalyzerDeps): Analyzer {
|
||||
const { sql, broker, log, config, localModels } = deps;
|
||||
|
||||
// ─── analyze ──────────────────────────────────────────────────────────────
|
||||
|
||||
async function analyze(battleId: string): Promise<void> {
|
||||
try {
|
||||
await runAnalysis(battleId);
|
||||
} catch (err) {
|
||||
log.error(
|
||||
{ err: errMsg(err), battleId },
|
||||
'arena-analyzer: analysis failed',
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
async function runAnalysis(battleId: string): Promise<void> {
|
||||
const battle = await loadBattle(battleId);
|
||||
if (!battle) {
|
||||
log.warn({ battleId }, 'arena-analyzer: battle not found');
|
||||
return;
|
||||
}
|
||||
|
||||
const contestants = await loadContestants(battleId);
|
||||
const succeeded = contestants.filter((c) => c.status === 'done' && c.result_path);
|
||||
|
||||
log.info(
|
||||
{ battleId, total: contestants.length, succeeded: succeeded.length },
|
||||
'arena-analyzer: starting analysis',
|
||||
);
|
||||
|
||||
// Digest stage — concurrent, one call per succeeded contestant.
|
||||
const digests = (
|
||||
await Promise.all(succeeded.map((c) => digestContestant(battle, c)))
|
||||
).filter((d): d is ContestantDigest => d !== null);
|
||||
|
||||
// Failed contestants are noted in the analysis even if they produced no digest.
|
||||
const failedNotes = contestants
|
||||
.filter((c) => c.status === 'error')
|
||||
.map((c) => `- **${c.identity} / ${c.model}**: failed (no result)\n`);
|
||||
|
||||
// Judge stage — single call with all digests.
|
||||
const { analysisText, winner } = await judgeContestants(battle, digests, failedNotes);
|
||||
|
||||
// Write analysis.md to the battle results folder.
|
||||
const resultsPath = battle.results_path;
|
||||
if (resultsPath) {
|
||||
await mkdir(resultsPath, { recursive: true });
|
||||
await writeFile(join(resultsPath, 'analysis.md'), analysisText, 'utf8');
|
||||
}
|
||||
|
||||
// Resolve the winner to a contestant id and update the battle row.
|
||||
let winnerId: string | null = null;
|
||||
if (winner && shouldNameWinner(succeeded.length)) {
|
||||
const winnerContestant = contestants.find(
|
||||
(c) => c.identity === winner.identity && c.model === winner.model,
|
||||
);
|
||||
if (winnerContestant) {
|
||||
winnerId = winnerContestant.id;
|
||||
await sql`
|
||||
UPDATE battles
|
||||
SET winner_contestant_id = ${winnerId}, updated_at = clock_timestamp()
|
||||
WHERE id = ${battleId}
|
||||
`;
|
||||
log.info({ battleId, winnerId, identity: winner.identity, model: winner.model }, 'arena-analyzer: winner set');
|
||||
} else {
|
||||
log.warn({ battleId, winner }, 'arena-analyzer: judge named a winner not found in contestants');
|
||||
}
|
||||
}
|
||||
|
||||
publishUser({
|
||||
type: 'battle_updated',
|
||||
battle_id: battleId,
|
||||
winner_contestant_id: winnerId,
|
||||
analysis_ready: true,
|
||||
});
|
||||
|
||||
log.info({ battleId }, 'arena-analyzer: analysis complete');
|
||||
}
|
||||
|
||||
// ─── crossExamine ─────────────────────────────────────────────────────────
|
||||
|
||||
async function crossExamine(
|
||||
battleId: string,
|
||||
crossExamId: string,
|
||||
opts: { identity: string; model: string },
|
||||
): Promise<void> {
|
||||
try {
|
||||
await runCrossExam(battleId, crossExamId, opts);
|
||||
} catch (err) {
|
||||
log.error(
|
||||
{ err: errMsg(err), battleId, crossExamId },
|
||||
'arena-analyzer: cross-exam failed',
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
async function runCrossExam(
|
||||
battleId: string,
|
||||
crossExamId: string,
|
||||
opts: { identity: string; model: string },
|
||||
): Promise<void> {
|
||||
const battle = await loadBattle(battleId);
|
||||
if (!battle) {
|
||||
log.warn({ battleId }, 'arena-analyzer: battle not found for cross-exam');
|
||||
return;
|
||||
}
|
||||
|
||||
const contestants = await loadContestants(battleId);
|
||||
|
||||
// Re-read the digests (if contestants have results) for context.
|
||||
const succeeded = contestants.filter((c) => c.status === 'done' && c.result_path);
|
||||
const digests = (
|
||||
await Promise.all(succeeded.map((c) => digestContestant(battle, c)))
|
||||
).filter((d): d is ContestantDigest => d !== null);
|
||||
|
||||
// Read analysis.md for the proposed analysis content.
|
||||
let analysisContent = '';
|
||||
if (battle.results_path) {
|
||||
analysisContent = await readFile(
|
||||
join(battle.results_path, 'analysis.md'), 'utf8',
|
||||
).catch(() => '');
|
||||
}
|
||||
|
||||
// Resolve proposed winner label.
|
||||
let proposedWinner: string | null = null;
|
||||
if (battle.winner_contestant_id) {
|
||||
const w = contestants.find((c) => c.id === battle.winner_contestant_id);
|
||||
if (w) proposedWinner = `${w.identity}/${w.model}`;
|
||||
}
|
||||
|
||||
const { system, user } = buildCrossExamPrompt({
|
||||
originalPrompt: battle.prompt,
|
||||
digests,
|
||||
analysisContent,
|
||||
proposedWinner,
|
||||
examinerIdentity: opts.identity,
|
||||
examinerModel: opts.model,
|
||||
});
|
||||
|
||||
log.info({ battleId, crossExamId, identity: opts.identity, model: opts.model }, 'arena-analyzer: running cross-exam');
|
||||
|
||||
const verdict = await executeModelCall({
|
||||
battleId,
|
||||
projectId: battle.project_id,
|
||||
identity: opts.identity,
|
||||
model: opts.model,
|
||||
system,
|
||||
user,
|
||||
});
|
||||
|
||||
// Persist verdict and append to cross-exam.md.
|
||||
await sql`
|
||||
UPDATE cross_examinations
|
||||
SET verdict = ${verdict}
|
||||
WHERE id = ${crossExamId}
|
||||
`;
|
||||
|
||||
if (battle.results_path) {
|
||||
const crossExamPath = join(battle.results_path, 'cross-exam.md');
|
||||
const section =
|
||||
`\n---\n\n# Cross-Examination by ${opts.identity} / ${opts.model}\n\n` +
|
||||
`${verdict}\n`;
|
||||
await writeFile(crossExamPath, section, { flag: 'a', encoding: 'utf8' });
|
||||
}
|
||||
|
||||
publishUser({
|
||||
type: 'battle_updated',
|
||||
battle_id: battleId,
|
||||
cross_exam_id: crossExamId,
|
||||
});
|
||||
|
||||
log.info({ battleId, crossExamId }, 'arena-analyzer: cross-exam complete');
|
||||
}
|
||||
|
||||
// ─── Model call routing ───────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Route a one-shot model call to llama-swap (local) or the task dispatcher
|
||||
* (cloud). Cloud dispatch inserts a tasks row and polls for completion.
|
||||
*/
|
||||
async function executeModelCall(opts: {
|
||||
battleId: string;
|
||||
projectId: string;
|
||||
identity: string;
|
||||
model: string;
|
||||
system: string;
|
||||
user: string;
|
||||
}): Promise<string> {
|
||||
const isLocal = localModels.has(opts.model) || localModels.has(`llama-swap/${opts.model}`);
|
||||
|
||||
if (isLocal) {
|
||||
return arenaModelCall({
|
||||
config,
|
||||
model: opts.model,
|
||||
system: opts.system,
|
||||
user: opts.user,
|
||||
maxTokens: 2_000,
|
||||
temperature: 0.3,
|
||||
});
|
||||
}
|
||||
|
||||
// Cloud path: dispatch through the task system and poll for completion.
|
||||
return executeCloudModelCall(opts);
|
||||
}
|
||||
|
||||
async function executeCloudModelCall(opts: {
|
||||
projectId: string;
|
||||
identity: string;
|
||||
model: string;
|
||||
system: string;
|
||||
user: string;
|
||||
}): Promise<string> {
|
||||
// The cross-exam prompt is the full input to the external agent. We embed
|
||||
// the system prompt as a preamble in the user message (external agents don't
|
||||
// take a separate system arg through the tasks dispatcher).
|
||||
const input = `${opts.system}\n\n${opts.user}`;
|
||||
|
||||
// For well-known external agents, stamp the agent name so the dispatcher
|
||||
// routes via PTY/ACP. For unknown identities fall back to native inference
|
||||
// (agent = null → DEFAULT_MODEL text generation).
|
||||
const knownAgents = new Set(['claude', 'opencode', 'qwen', 'goose']);
|
||||
const agentName = knownAgents.has(opts.identity) ? opts.identity : null;
|
||||
|
||||
const [task] = await sql<{ id: string }[]>`
|
||||
INSERT INTO tasks (project_id, input, agent, model)
|
||||
VALUES (${opts.projectId}, ${input}, ${agentName}, ${opts.model})
|
||||
RETURNING id
|
||||
`;
|
||||
const taskId = task!.id;
|
||||
|
||||
log.info({ taskId, identity: opts.identity, model: opts.model }, 'arena-analyzer: cloud cross-exam task dispatched');
|
||||
|
||||
// Poll until terminal (up to 5 minutes).
|
||||
const timeoutMs = 5 * 60 * 1_000;
|
||||
const pollMs = 2_000;
|
||||
const deadline = Date.now() + timeoutMs;
|
||||
|
||||
while (Date.now() < deadline) {
|
||||
await sleep(pollMs);
|
||||
const [row] = await sql<{ state: string; output_summary: string | null }[]>`
|
||||
SELECT state, output_summary FROM tasks WHERE id = ${taskId}
|
||||
`;
|
||||
if (!row) break;
|
||||
if (row.state === 'completed') return row.output_summary ?? '';
|
||||
if (row.state === 'failed' || row.state === 'cancelled') {
|
||||
throw new Error(`cross-exam task ${row.state}: ${row.output_summary ?? ''}`);
|
||||
}
|
||||
}
|
||||
|
||||
throw new Error(`cloud cross-exam task timed out after ${timeoutMs / 1000}s`);
|
||||
}
|
||||
|
||||
// ─── Digest helper ────────────────────────────────────────────────────────
|
||||
|
||||
async function digestContestant(
|
||||
battle: BattleRow,
|
||||
c: ContestantRow,
|
||||
): Promise<ContestantDigest | null> {
|
||||
if (!c.result_path) return null;
|
||||
|
||||
const resultMd = await readFile(join(c.result_path, 'result.md'), 'utf8').catch(() => '');
|
||||
|
||||
let diffPatch: string | undefined;
|
||||
if (battle.battle_type === 'coding') {
|
||||
diffPatch = await readFile(join(c.result_path, 'diff.patch'), 'utf8').catch(
|
||||
() => undefined,
|
||||
);
|
||||
}
|
||||
|
||||
const benchmarkLine = formatBenchmarkLine(c);
|
||||
const { system, user } = buildDigestPrompt({
|
||||
identity: c.identity,
|
||||
model: c.model,
|
||||
resultMd,
|
||||
diffPatch,
|
||||
benchmarkLine,
|
||||
});
|
||||
|
||||
let digest: string;
|
||||
try {
|
||||
digest = await arenaModelCall({
|
||||
config,
|
||||
model: config.DEFAULT_MODEL,
|
||||
system,
|
||||
user,
|
||||
maxTokens: 500,
|
||||
temperature: 0.3,
|
||||
});
|
||||
} catch (err) {
|
||||
log.warn(
|
||||
{ err: errMsg(err), identity: c.identity, model: c.model },
|
||||
'arena-analyzer: digest call failed — skipping contestant',
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
return { identity: c.identity, model: c.model, digest, benchmarkLine };
|
||||
}
|
||||
|
||||
// ─── Judge helper ─────────────────────────────────────────────────────────
|
||||
|
||||
async function judgeContestants(
|
||||
battle: BattleRow,
|
||||
digests: ContestantDigest[],
|
||||
failedNotes: string[],
|
||||
): Promise<{ analysisText: string; winner: { identity: string; model: string } | null }> {
|
||||
const { system, user } = buildJudgePrompt(battle.prompt, digests);
|
||||
|
||||
let judgeOutput = '';
|
||||
try {
|
||||
judgeOutput = await arenaModelCall({
|
||||
config,
|
||||
model: config.DEFAULT_MODEL,
|
||||
system,
|
||||
user,
|
||||
maxTokens: 2_000,
|
||||
temperature: 0.3,
|
||||
});
|
||||
} catch (err) {
|
||||
log.error({ err: errMsg(err), battleId: battle.id }, 'arena-analyzer: judge call failed');
|
||||
judgeOutput = '*(Judge call failed — no comparison produced.)*';
|
||||
}
|
||||
|
||||
const winner = shouldNameWinner(digests.length) ? extractWinner(judgeOutput) : null;
|
||||
|
||||
const sections: string[] = [
|
||||
`# Arena Analysis`,
|
||||
`\n**Battle type:** ${battle.battle_type}`,
|
||||
];
|
||||
|
||||
if (failedNotes.length > 0) {
|
||||
sections.push('\n## Failed Contestants\n');
|
||||
sections.push(...failedNotes);
|
||||
}
|
||||
|
||||
if (digests.length > 0) {
|
||||
sections.push('\n## Contestant Digests\n');
|
||||
for (const d of digests) {
|
||||
sections.push(`### ${d.identity} / ${d.model}`);
|
||||
sections.push(`*Benchmark: ${d.benchmarkLine}*\n`);
|
||||
sections.push(d.digest);
|
||||
}
|
||||
}
|
||||
|
||||
sections.push("\n## Judge's Verdict\n");
|
||||
sections.push(judgeOutput);
|
||||
|
||||
if (winner) {
|
||||
sections.push(`\n## Winner\n**${winner.identity} / ${winner.model}**`);
|
||||
} else {
|
||||
const reason =
|
||||
digests.length < 2
|
||||
? 'fewer than 2 contestants produced results'
|
||||
: 'no clear winner identified';
|
||||
sections.push(`\n## Winner\n*No winner named (${reason}).*`);
|
||||
}
|
||||
|
||||
return { analysisText: sections.join('\n'), winner };
|
||||
}
|
||||
|
||||
// ─── DB helpers ───────────────────────────────────────────────────────────
|
||||
|
||||
async function loadBattle(battleId: string): Promise<BattleRow | null> {
|
||||
const [b] = await sql<BattleRow[]>`
|
||||
SELECT id, project_id, battle_type, prompt, status, results_path, winner_contestant_id
|
||||
FROM battles WHERE id = ${battleId}
|
||||
`;
|
||||
return b ?? null;
|
||||
}
|
||||
|
||||
async function loadContestants(battleId: string): Promise<ContestantRow[]> {
|
||||
return sql<ContestantRow[]>`
|
||||
SELECT id, identity, model, lane, status, result_path, duration_ms, tokens_per_sec
|
||||
FROM contestants WHERE battle_id = ${battleId}
|
||||
ORDER BY created_at ASC
|
||||
`;
|
||||
}
|
||||
|
||||
// ─── Misc helpers ─────────────────────────────────────────────────────────
|
||||
|
||||
function formatBenchmarkLine(c: ContestantRow): string {
|
||||
const parts: string[] = [];
|
||||
if (c.duration_ms !== null) parts.push(`${c.duration_ms}ms`);
|
||||
if (c.tokens_per_sec !== null) parts.push(`${c.tokens_per_sec.toFixed(1)} tok/s`);
|
||||
return parts.length > 0 ? parts.join(', ') : 'no benchmark';
|
||||
}
|
||||
|
||||
function publishUser(frame: Record<string, unknown>): void {
|
||||
broker.publishUserFrame('default', frame as unknown as WsFrame);
|
||||
}
|
||||
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
return { analyze, crossExamine };
|
||||
}
|
||||
|
||||
function errMsg(e: unknown): string {
|
||||
return e instanceof Error ? e.message : String(e);
|
||||
}
|
||||
186
apps/coder/src/services/arena-decisions.ts
Normal file
186
apps/coder/src/services/arena-decisions.ts
Normal file
@@ -0,0 +1,186 @@
|
||||
/**
|
||||
* Pure scheduling and classification decisions for the Arena battle-runner.
|
||||
* No database, no IO. Mirrors the pattern of flow-runner-decisions.ts.
|
||||
*
|
||||
* Vocabulary:
|
||||
* local lane — llama-swap-backed contestants, run strictly one at a time
|
||||
* cloud lane — cloud-backed contestants, run all in parallel
|
||||
*
|
||||
* A contestant's status lifecycle:
|
||||
* queued → running → done | error
|
||||
*/
|
||||
import type { BattleType, ContestantLane } from '@boocode/contracts/arena';
|
||||
|
||||
// ─── Lane classification ──────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Classify a contestant into a lane.
|
||||
*
|
||||
* Q&A contestants always run on the native (llama-swap) backend → local.
|
||||
* Coding contestants: their MODEL is checked against the localModels set
|
||||
* (all model IDs served by the local llama-swap server). This means an
|
||||
* opencode or qwen contestant pointed at a local model counts as local,
|
||||
* which correctly captures GPU-contention and fair benchmarking (ADR 0001).
|
||||
*
|
||||
* @param battleType 'coding' | 'qa'
|
||||
* @param identity backend name (coding) or persona name (qa) — not used for lane logic
|
||||
* @param model the contestant's model id
|
||||
* @param localModels set of model IDs served by the local llama-swap server
|
||||
*/
|
||||
export function classifyLane(
|
||||
battleType: BattleType,
|
||||
_identity: string,
|
||||
model: string,
|
||||
localModels: ReadonlySet<string>,
|
||||
): ContestantLane {
|
||||
if (battleType === 'qa') return 'local';
|
||||
return localModels.has(model) ? 'local' : 'cloud';
|
||||
}
|
||||
|
||||
// ─── Local-lane queue ─────────────────────────────────────────────────────────
|
||||
|
||||
export interface ContestantSlot {
|
||||
id: string;
|
||||
lane: ContestantLane;
|
||||
status: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* The next queued local contestant to dispatch — the first 'queued' contestant
|
||||
* in the local lane, in creation order (caller must supply rows in created_at ASC).
|
||||
* Returns null when the local queue is empty or all local slots are non-queued.
|
||||
*/
|
||||
export function nextLocalContestant(contestants: readonly ContestantSlot[]): string | null {
|
||||
for (const c of contestants) {
|
||||
if (c.lane === 'local' && c.status === 'queued') return c.id;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// ─── Battle completion ────────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* True when every contestant has reached a terminal state (done | error).
|
||||
* Returns false for an empty list — a battle with no contestants never completes.
|
||||
*/
|
||||
export function isBattleComplete(contestants: readonly { status: string }[]): boolean {
|
||||
if (contestants.length === 0) return false;
|
||||
return contestants.every((c) => c.status === 'done' || c.status === 'error');
|
||||
}
|
||||
|
||||
// ─── Benchmark ────────────────────────────────────────────────────────────────
|
||||
|
||||
export interface Benchmark {
|
||||
durationMs: number;
|
||||
tokensPerSec: number | null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Compute the benchmark for a contestant.
|
||||
* Wall-clock duration is captured for every contestant; tokens/sec is only
|
||||
* meaningful for local (llama-swap) contestants where the model has sole
|
||||
* access to the GPU and the measurement is fair.
|
||||
*/
|
||||
export function computeBenchmark(
|
||||
startedAt: Date,
|
||||
endedAt: Date,
|
||||
costTokens: number | null,
|
||||
lane: ContestantLane,
|
||||
): Benchmark {
|
||||
const durationMs = Math.max(0, endedAt.getTime() - startedAt.getTime());
|
||||
const tokensPerSec =
|
||||
lane === 'local' && costTokens !== null && durationMs > 0
|
||||
? (costTokens / durationMs) * 1000
|
||||
: null;
|
||||
return { durationMs, tokensPerSec };
|
||||
}
|
||||
|
||||
// ─── Slug / path helpers ──────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Sanitize a string for use as a directory name component.
|
||||
* Lowercases, replaces non-alphanumeric runs with '-', trims leading/trailing
|
||||
* dashes, and caps at 64 characters.
|
||||
*/
|
||||
export function sanitizeSlug(s: string): string {
|
||||
return s
|
||||
.toLowerCase()
|
||||
.replace(/[^a-z0-9]+/g, '-')
|
||||
.replace(/^-+|-+$/g, '')
|
||||
.slice(0, 64);
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the dated battle slug used as the Arena results folder name.
|
||||
* Format: YYYY-MM-DD-<battleType>-<first-8-hex-of-uuid>
|
||||
* Deterministic: callers can rebuild it from (id, type, created_at) on resume.
|
||||
*/
|
||||
export function buildBattleSlug(battleId: string, battleType: BattleType, createdAt: Date): string {
|
||||
const date = createdAt.toISOString().slice(0, 10);
|
||||
const shortId = battleId.replace(/-/g, '').slice(0, 8);
|
||||
return `${date}-${battleType}-${shortId}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the per-contestant results directory name within a battle folder.
|
||||
* Format: <sanitized-identity>-<sanitized-model>
|
||||
*/
|
||||
export function buildContestantDir(identity: string, model: string): string {
|
||||
return `${sanitizeSlug(identity)}-${sanitizeSlug(model)}`;
|
||||
}
|
||||
|
||||
// ─── Resume reconciliation ────────────────────────────────────────────────────
|
||||
|
||||
export type ContestantResumeAction =
|
||||
| 'keep'
|
||||
| 're-dispatch'
|
||||
| 'mark-done'
|
||||
| 'mark-error'
|
||||
| 'mark-cancelled';
|
||||
|
||||
export interface ContestantResumeDecision {
|
||||
contestantId: string;
|
||||
action: ContestantResumeAction;
|
||||
}
|
||||
|
||||
/**
|
||||
* Decide what to do with ONE contestant during startup resume.
|
||||
* Mirrors reconcileResumeStep from flow-runner-decisions.ts.
|
||||
*
|
||||
* @param status contestants.status
|
||||
* @param taskId contestants.task_id (null when not yet dispatched)
|
||||
* @param taskState tasks.state for taskId, or null if the task row is absent
|
||||
*/
|
||||
export function reconcileContestantResume(
|
||||
status: string,
|
||||
taskId: string | null,
|
||||
taskState: string | null,
|
||||
): ContestantResumeAction {
|
||||
if (status !== 'running') return 'keep';
|
||||
if (!taskId || taskState === null) return 're-dispatch';
|
||||
switch (taskState) {
|
||||
case 'completed': return 'mark-done';
|
||||
case 'failed': return 'mark-error';
|
||||
case 'cancelled': return 'mark-cancelled';
|
||||
case 'pending': return 'keep'; // dispatcher startup poll will run it normally
|
||||
default: return 're-dispatch'; // 'running'/'blocked' — process is dead
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Reconcile every contestant of an in-flight battle for startup resume.
|
||||
* Returns one decision per contestant. Pure — no IO.
|
||||
*/
|
||||
export function reconcileContestants(
|
||||
contestants: ReadonlyArray<{ contestantId: string; taskId: string | null; status: string }>,
|
||||
taskStates: ReadonlyMap<string, string>,
|
||||
): ContestantResumeDecision[] {
|
||||
return contestants.map((c) => ({
|
||||
contestantId: c.contestantId,
|
||||
action: reconcileContestantResume(
|
||||
c.status,
|
||||
c.taskId,
|
||||
c.taskId ? (taskStates.get(c.taskId) ?? null) : null,
|
||||
),
|
||||
}));
|
||||
}
|
||||
70
apps/coder/src/services/arena-model-call.ts
Normal file
70
apps/coder/src/services/arena-model-call.ts
Normal file
@@ -0,0 +1,70 @@
|
||||
/**
|
||||
* One-shot model completion for the Arena analyzer.
|
||||
*
|
||||
* Calls the local llama-swap server directly for a single non-streaming
|
||||
* completion. Used for the digest and judge stages (always DEFAULT_MODEL)
|
||||
* and for local-model cross-examinations (any local model).
|
||||
*
|
||||
* Mirrors apps/server/src/services/task-model.ts but targets the coder's
|
||||
* config shape and uses a longer timeout appropriate for analysis calls.
|
||||
*/
|
||||
|
||||
import type { Config } from '../config.js';
|
||||
|
||||
const TIMEOUT_MS = 120_000;
|
||||
|
||||
export async function arenaModelCall(opts: {
|
||||
config: Pick<Config, 'LLAMA_SWAP_URL'>;
|
||||
model: string;
|
||||
system: string;
|
||||
user: string;
|
||||
maxTokens?: number;
|
||||
temperature?: number;
|
||||
}): Promise<string> {
|
||||
const { config, model, system, user } = opts;
|
||||
const maxTokens = opts.maxTokens ?? 2_000;
|
||||
const temperature = opts.temperature ?? 0.3;
|
||||
|
||||
const res = await fetch(`${config.LLAMA_SWAP_URL}/v1/chat/completions`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
model,
|
||||
messages: [
|
||||
{ role: 'system', content: system },
|
||||
{ role: 'user', content: user },
|
||||
],
|
||||
max_tokens: maxTokens,
|
||||
temperature,
|
||||
stream: false,
|
||||
chat_template_kwargs: { enable_thinking: false },
|
||||
}),
|
||||
signal: AbortSignal.timeout(TIMEOUT_MS),
|
||||
});
|
||||
|
||||
if (!res.ok) {
|
||||
const text = await res.text().catch(() => '');
|
||||
throw new Error(`llama-swap responded ${res.status}: ${text.slice(0, 200)}`);
|
||||
}
|
||||
|
||||
const data = (await res.json()) as {
|
||||
choices?: Array<{
|
||||
message?: { content?: string; reasoning_content?: string };
|
||||
}>;
|
||||
};
|
||||
|
||||
const choice = data.choices?.[0]?.message;
|
||||
if (!choice) return '';
|
||||
|
||||
const content = (choice.content ?? '').trim();
|
||||
if (content.length > 0) return content;
|
||||
|
||||
// For thinking-mode models the answer sometimes only lands in reasoning_content.
|
||||
const reasoning = (choice.reasoning_content ?? '').trim();
|
||||
if (reasoning.length > 0) {
|
||||
const lines = reasoning.split('\n').filter((l) => l.trim().length > 0);
|
||||
return lines[lines.length - 1] ?? '';
|
||||
}
|
||||
|
||||
return '';
|
||||
}
|
||||
895
apps/coder/src/services/arena-runner.ts
Normal file
895
apps/coder/src/services/arena-runner.ts
Normal file
@@ -0,0 +1,895 @@
|
||||
/**
|
||||
* Arena battle-runner — DB-backed execution engine for Arena battles.
|
||||
*
|
||||
* Mirrors flow-runner.ts but implements the Arena's two-lane scheduler instead
|
||||
* of the Orchestrator's wave scheduler. Persists to battles/contestants tables
|
||||
* (not flow_runs/flow_steps). Each contestant is dispatched as a real tasks row
|
||||
* via an injected DispatchContestantFn (Phase 4 wires this to the dispatcher).
|
||||
* Advances on the dispatcher's onTaskTerminal hook.
|
||||
*
|
||||
* Scheduling:
|
||||
* - Cloud lane: all contestants start immediately, in parallel.
|
||||
* - Local lane: contestants run strictly one at a time (serial queue). Only
|
||||
* the first local contestant runs at start; the next is dispatched when the
|
||||
* current one terminates. Both lanes run concurrently with each other.
|
||||
*
|
||||
* Results:
|
||||
* Written to <projectRoot>/Arena/<battleSlug>/<identity>-<model>/
|
||||
* Coding: result.md + diff.patch (from the contestant's worktree).
|
||||
* Q&A: result.md with the text answer.
|
||||
*
|
||||
* Analyzer seam:
|
||||
* onBattleComplete is called when all contestants are terminal. Phase 5 wires
|
||||
* this to the two-stage digest→judge analyzer. A failed contestant does NOT
|
||||
* abort the battle — others continue and the analyzer judges survivors.
|
||||
*/
|
||||
import type { Sql } from '../db.js';
|
||||
import type { Broker } from '@boocode/server/broker';
|
||||
import type { WsFrame } from '@boocode/contracts/ws-frames';
|
||||
import type { FastifyBaseLogger } from 'fastify';
|
||||
import type { BattleType, ContestantLane } from '@boocode/contracts/arena';
|
||||
import { mkdir, writeFile } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import { diffWorktree } from './worktrees.js';
|
||||
import {
|
||||
buildBattleSlug,
|
||||
buildContestantDir,
|
||||
classifyLane,
|
||||
computeBenchmark,
|
||||
isBattleComplete,
|
||||
nextLocalContestant,
|
||||
reconcileContestants,
|
||||
type ContestantResumeAction,
|
||||
type ContestantSlot,
|
||||
} from './arena-decisions.js';
|
||||
|
||||
// ─── Public types ─────────────────────────────────────────────────────────────
|
||||
|
||||
export interface ContestantSpec {
|
||||
/** Backend name (coding) or persona name (qa). */
|
||||
identity: string;
|
||||
model: string;
|
||||
}
|
||||
|
||||
export interface BattleStartOpts {
|
||||
projectId: string;
|
||||
battleType: BattleType;
|
||||
prompt: string;
|
||||
/** 2–6 contestants. Duplicate (identity, model) pairs are rejected by the schema UNIQUE constraint. */
|
||||
contestants: ContestantSpec[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Injected dispatch function — Phase 4 wires this to the real task inserter.
|
||||
* Must INSERT a tasks row and return its id. The arena-runner sets the
|
||||
* contestant's task_id and status after this call.
|
||||
* `sessionId` is returned when already known (Q&A pre-creates the session);
|
||||
* null for coding contestants whose session is created lazily by the dispatcher.
|
||||
*/
|
||||
export type DispatchContestantFn = (opts: {
|
||||
projectId: string;
|
||||
contestantId: string;
|
||||
prompt: string;
|
||||
identity: string;
|
||||
model: string;
|
||||
battleType: BattleType;
|
||||
}) => Promise<{ taskId: string; sessionId: string | null }>;
|
||||
|
||||
/**
|
||||
* Called once when every contestant in a battle has reached a terminal state.
|
||||
* Phase 5 wires this to the two-stage digest→judge analyzer.
|
||||
* Must never throw — the caller swallows errors.
|
||||
*/
|
||||
export type OnBattleComplete = (battleId: string) => void;
|
||||
|
||||
/**
|
||||
* Called after a cross_examinations row has been inserted, with its id.
|
||||
* Phase 5 wires this to the analyzer's cross-examination runner.
|
||||
* Must never throw — the caller swallows errors.
|
||||
*/
|
||||
export type OnCrossExamStart = (opts: {
|
||||
battleId: string;
|
||||
crossExamId: string;
|
||||
identity: string;
|
||||
model: string;
|
||||
}) => void;
|
||||
|
||||
export interface BattleRunner {
|
||||
/** Start a battle: persist it + its contestants, classify lanes, dispatch initial wave. */
|
||||
startBattle(opts: BattleStartOpts): Promise<{ battleId: string }>;
|
||||
/**
|
||||
* Wire to createDispatcher({ onTaskTerminal }). Fires when ANY task settles;
|
||||
* the runner ignores tasks it doesn't own. Never throws.
|
||||
*/
|
||||
handleTaskTerminal(taskId: string, state: string): void;
|
||||
/**
|
||||
* Re-advance any battles still marked 'running' after a coder restart.
|
||||
* Mirrors flow-runner's initResume (D-9). Never throws.
|
||||
*/
|
||||
initResume(): Promise<void>;
|
||||
/**
|
||||
* Cancel a running battle. Marks it and all non-terminal contestants cancelled,
|
||||
* publishes frames, and returns the task_ids of in-flight contestants so the
|
||||
* route can abort them via the dispatcher's cancelExternalTask.
|
||||
*/
|
||||
cancelBattle(battleId: string): Promise<{ cancelled: boolean; taskIds: string[] }>;
|
||||
/**
|
||||
* Trigger analysis for a completed (or manually re-analyzed) battle.
|
||||
* Phase 5 wires this to the two-stage digest→judge analyzer. For now, calls
|
||||
* the injected onBattleComplete seam directly.
|
||||
*/
|
||||
triggerAnalysis(battleId: string): Promise<{ triggered: boolean }>;
|
||||
/**
|
||||
* Start a cross-examination on a battle. Inserts a cross_examinations row and
|
||||
* invokes the analyzer seam. Phase 5 fills the actual verdict logic.
|
||||
*/
|
||||
startCrossExam(
|
||||
battleId: string,
|
||||
opts: { identity: string; model: string },
|
||||
): Promise<{ crossExamId: string }>;
|
||||
/**
|
||||
* Manually set (or clear) the winner. Validates the contestant belongs to the
|
||||
* battle, updates battles.winner_contestant_id, and publishes a battle_updated
|
||||
* frame so the pane reflects the override immediately.
|
||||
*/
|
||||
setWinner(battleId: string, winnerId: string | null): Promise<{
|
||||
ok: boolean;
|
||||
notFound?: boolean;
|
||||
invalidContestant?: boolean;
|
||||
}>;
|
||||
}
|
||||
|
||||
// ─── Internal row shapes ──────────────────────────────────────────────────────
|
||||
|
||||
interface ContestantRow {
|
||||
id: string;
|
||||
battle_id: string;
|
||||
identity: string;
|
||||
model: string;
|
||||
lane: ContestantLane;
|
||||
task_id: string | null;
|
||||
worktree_id: string | null;
|
||||
status: string;
|
||||
}
|
||||
|
||||
interface BattleRow {
|
||||
id: string;
|
||||
project_id: string;
|
||||
battle_type: BattleType;
|
||||
prompt: string;
|
||||
status: string;
|
||||
results_path: string | null;
|
||||
created_at: Date;
|
||||
}
|
||||
|
||||
// ─── Deps / factory ───────────────────────────────────────────────────────────
|
||||
|
||||
interface Deps {
|
||||
sql: Sql;
|
||||
broker: Broker;
|
||||
log: FastifyBaseLogger;
|
||||
dispatch: DispatchContestantFn;
|
||||
onBattleComplete: OnBattleComplete;
|
||||
/**
|
||||
* Called after a cross_examinations row is inserted. Phase 5 wires this to
|
||||
* the analyzer's cross-examination runner. Optional: absent → no cross-exam
|
||||
* logic runs (stub behaviour for tests).
|
||||
*/
|
||||
onCrossExamStart?: OnCrossExamStart;
|
||||
/**
|
||||
* Model IDs served by the local llama-swap server. Used for lane classification:
|
||||
* a contestant whose model is in this set runs in the local lane (serial, GPU-fair).
|
||||
* Q&A contestants are always local regardless of this set.
|
||||
* Defaults to an empty set → all coding contestants go to the cloud lane.
|
||||
*/
|
||||
localModels?: ReadonlySet<string>;
|
||||
}
|
||||
|
||||
const DEFAULT_LOCAL_MODELS: ReadonlySet<string> = new Set();
|
||||
|
||||
export function createBattleRunner(deps: Deps): BattleRunner {
|
||||
const { sql, broker, log, dispatch, onBattleComplete, onCrossExamStart } = deps;
|
||||
const localModels = deps.localModels ?? DEFAULT_LOCAL_MODELS;
|
||||
|
||||
// Serialize local-lane advance per battle so two near-simultaneous terminal
|
||||
// callbacks don't double-dispatch the next local contestant.
|
||||
const advanceChain = new Map<string, Promise<void>>();
|
||||
|
||||
// Delta bridge: per-contestant broker unsubscribe functions.
|
||||
// 'terminated' sentinel prevents a late-arriving setupDeltaBridge from
|
||||
// registering a subscription that would never be cleaned up.
|
||||
const deltaUnsubs = new Map<string, (() => void) | 'terminated'>();
|
||||
|
||||
function publishUser(frame: Record<string, unknown>): void {
|
||||
broker.publishUserFrame('default', frame as unknown as WsFrame);
|
||||
}
|
||||
|
||||
/**
|
||||
* Subscribe to the contestant's inference session and forward delta frames
|
||||
* to the user channel as contestant_updated{delta}. Polls for session_id
|
||||
* when not immediately known (coding contestants whose session is created
|
||||
* lazily by the dispatcher). Unsubscribes on termination or max retries.
|
||||
*/
|
||||
async function setupDeltaBridge(
|
||||
battleId: string,
|
||||
contestantId: string,
|
||||
taskId: string,
|
||||
knownSessionId: string | null,
|
||||
): Promise<void> {
|
||||
let sessionId = knownSessionId;
|
||||
if (!sessionId) {
|
||||
// Coding contestant: session_id is written by the dispatcher just before
|
||||
// inference starts. Poll until it appears or the contestant terminates.
|
||||
for (let i = 0; i < 50; i++) {
|
||||
if (deltaUnsubs.get(contestantId) === 'terminated') return;
|
||||
const [row] = await sql<{ session_id: string | null }[]>`
|
||||
SELECT session_id FROM tasks WHERE id = ${taskId}
|
||||
`.catch(() => []);
|
||||
if (row?.session_id) { sessionId = row.session_id; break; }
|
||||
await new Promise((r) => setTimeout(r, 200));
|
||||
}
|
||||
}
|
||||
if (!sessionId) return;
|
||||
if (deltaUnsubs.get(contestantId) === 'terminated') return;
|
||||
|
||||
const unsub = broker.subscribe(sessionId, (frame) => {
|
||||
if (frame.type === 'delta') {
|
||||
const deltaContent = (frame as unknown as { content?: unknown }).content;
|
||||
if (typeof deltaContent === 'string') {
|
||||
publishUser({
|
||||
type: 'contestant_updated',
|
||||
battle_id: battleId,
|
||||
contestant_id: contestantId,
|
||||
delta: deltaContent,
|
||||
});
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
const existing = deltaUnsubs.get(contestantId);
|
||||
if (existing === 'terminated') {
|
||||
unsub();
|
||||
} else {
|
||||
deltaUnsubs.set(contestantId, unsub);
|
||||
}
|
||||
}
|
||||
|
||||
function teardownDeltaBridge(contestantId: string): void {
|
||||
const entry = deltaUnsubs.get(contestantId);
|
||||
if (typeof entry === 'function') {
|
||||
entry();
|
||||
deltaUnsubs.delete(contestantId);
|
||||
} else {
|
||||
deltaUnsubs.set(contestantId, 'terminated');
|
||||
}
|
||||
}
|
||||
|
||||
// ─── startBattle ────────────────────────────────────────────────────────────
|
||||
|
||||
async function startBattle(opts: BattleStartOpts): Promise<{ battleId: string }> {
|
||||
if (opts.contestants.length < 2 || opts.contestants.length > 6) {
|
||||
throw new Error(`battle requires 2–6 contestants; got ${opts.contestants.length}`);
|
||||
}
|
||||
|
||||
const [proj] = await sql<{ path: string }[]>`SELECT path FROM projects WHERE id = ${opts.projectId}`;
|
||||
if (!proj) throw new Error(`project not found: ${opts.projectId}`);
|
||||
|
||||
// Insert the battle row as 'running'; update results_path once we have the id.
|
||||
const [battle] = await sql<{ id: string; created_at: Date }[]>`
|
||||
INSERT INTO battles (project_id, battle_type, prompt, status)
|
||||
VALUES (${opts.projectId}, ${opts.battleType}, ${opts.prompt}, 'running')
|
||||
RETURNING id, created_at
|
||||
`;
|
||||
const battleId = battle!.id;
|
||||
const battleSlug = buildBattleSlug(battleId, opts.battleType, battle!.created_at);
|
||||
const resultsPath = join(proj.path, 'Arena', battleSlug);
|
||||
|
||||
await sql`
|
||||
UPDATE battles SET results_path = ${resultsPath}, updated_at = clock_timestamp()
|
||||
WHERE id = ${battleId}
|
||||
`;
|
||||
|
||||
// Insert all contestant rows with lane classification.
|
||||
const contestantRows: Array<{ id: string; identity: string; model: string; lane: ContestantLane }> = [];
|
||||
for (const spec of opts.contestants) {
|
||||
const lane = classifyLane(opts.battleType, spec.identity, spec.model, localModels);
|
||||
const [row] = await sql<{ id: string }[]>`
|
||||
INSERT INTO contestants (battle_id, identity, model, lane, status)
|
||||
VALUES (${battleId}, ${spec.identity}, ${spec.model}, ${lane}, 'queued')
|
||||
RETURNING id
|
||||
`;
|
||||
contestantRows.push({ id: row!.id, identity: spec.identity, model: spec.model, lane });
|
||||
}
|
||||
|
||||
// Write initial manifest so the results folder is always populated.
|
||||
await writeManifest(
|
||||
battleId, resultsPath, opts.battleType, opts.prompt, battle!.created_at,
|
||||
contestantRows.map((c) => ({ identity: c.identity, model: c.model, lane: c.lane })),
|
||||
null,
|
||||
).catch((err) => {
|
||||
log.warn({ err: errMsg(err), battleId }, 'arena-runner: initial manifest write failed');
|
||||
});
|
||||
|
||||
publishUser({
|
||||
type: 'battle_started',
|
||||
battle_id: battleId,
|
||||
battle_type: opts.battleType,
|
||||
prompt: opts.prompt,
|
||||
contestants: contestantRows.map((c) => ({
|
||||
id: c.id,
|
||||
identity: c.identity,
|
||||
model: c.model,
|
||||
lane: c.lane,
|
||||
})),
|
||||
});
|
||||
|
||||
// Dispatch: cloud lane starts all contestants in parallel; local lane starts
|
||||
// only the first queued contestant (serial queue).
|
||||
let localStarted = false;
|
||||
for (const c of contestantRows) {
|
||||
if (c.lane === 'cloud') {
|
||||
await dispatchContestant(battleId, opts.projectId, opts.battleType, opts.prompt, c);
|
||||
} else if (!localStarted) {
|
||||
await dispatchContestant(battleId, opts.projectId, opts.battleType, opts.prompt, c);
|
||||
localStarted = true;
|
||||
// remaining local contestants stay 'queued' until this one finishes
|
||||
}
|
||||
}
|
||||
|
||||
return { battleId };
|
||||
}
|
||||
|
||||
async function dispatchContestant(
|
||||
battleId: string,
|
||||
projectId: string,
|
||||
battleType: BattleType,
|
||||
prompt: string,
|
||||
c: { id: string; identity: string; model: string; lane: ContestantLane },
|
||||
): Promise<void> {
|
||||
const { taskId, sessionId } = await dispatch({
|
||||
projectId,
|
||||
contestantId: c.id,
|
||||
prompt,
|
||||
identity: c.identity,
|
||||
model: c.model,
|
||||
battleType,
|
||||
});
|
||||
await sql`
|
||||
UPDATE contestants
|
||||
SET task_id = ${taskId}, status = 'running', updated_at = clock_timestamp()
|
||||
WHERE id = ${c.id}
|
||||
`;
|
||||
publishContestantFrame(battleId, c.id, { status: 'running' });
|
||||
// Start the delta bridge in the background; unsubscribe when the contestant
|
||||
// terminates (teardownDeltaBridge called in handleTaskTerminal).
|
||||
void setupDeltaBridge(battleId, c.id, taskId, sessionId ?? null);
|
||||
}
|
||||
|
||||
// ─── local-lane advance (serialized per battle) ───────────────────────────
|
||||
|
||||
function advanceLocalLane(battleId: string): Promise<void> {
|
||||
const prev = advanceChain.get(battleId) ?? Promise.resolve();
|
||||
const next = prev
|
||||
.catch(() => {})
|
||||
.then(() =>
|
||||
advanceLocalLaneInner(battleId).catch((err) => {
|
||||
log.error({ err: errMsg(err), battleId }, 'arena-runner: advanceLocalLane failed');
|
||||
}),
|
||||
);
|
||||
advanceChain.set(battleId, next);
|
||||
void next.finally(() => {
|
||||
if (advanceChain.get(battleId) === next) advanceChain.delete(battleId);
|
||||
});
|
||||
return next;
|
||||
}
|
||||
|
||||
async function advanceLocalLaneInner(battleId: string): Promise<void> {
|
||||
const battle = await loadBattle(battleId);
|
||||
if (!battle || battle.status !== 'running') return;
|
||||
|
||||
const contestants = await loadContestants(battleId);
|
||||
const slots: ContestantSlot[] = contestants.map((c) => ({
|
||||
id: c.id,
|
||||
lane: c.lane,
|
||||
status: c.status,
|
||||
}));
|
||||
|
||||
// Nothing to do if the local lane is still busy.
|
||||
const localRunning = slots.some((c) => c.lane === 'local' && c.status === 'running');
|
||||
if (localRunning) return;
|
||||
|
||||
const nextId = nextLocalContestant(slots);
|
||||
if (!nextId) return; // local queue is exhausted
|
||||
|
||||
const next = contestants.find((c) => c.id === nextId)!;
|
||||
await dispatchContestant(battleId, battle.project_id, battle.battle_type, battle.prompt, {
|
||||
id: next.id,
|
||||
identity: next.identity,
|
||||
model: next.model,
|
||||
lane: next.lane,
|
||||
});
|
||||
}
|
||||
|
||||
// ─── handleTaskTerminal ───────────────────────────────────────────────────
|
||||
|
||||
function handleTaskTerminal(taskId: string, state: string): void {
|
||||
void (async () => {
|
||||
// Look up which contestant owns this task (contestants_task_id_idx).
|
||||
const [row] = await sql<ContestantRow[]>`
|
||||
SELECT id, battle_id, identity, model, lane, task_id, worktree_id, status
|
||||
FROM contestants WHERE task_id = ${taskId}
|
||||
`;
|
||||
if (!row) return; // not an arena task — ignore
|
||||
if (row.status !== 'running') return; // already settled (idempotent)
|
||||
|
||||
const battle = await loadBattle(row.battle_id);
|
||||
|
||||
// Pull the task row for benchmark + output.
|
||||
const [task] = await sql<{
|
||||
chat_id: string | null;
|
||||
started_at: Date | null;
|
||||
ended_at: Date | null;
|
||||
cost_tokens: number | null;
|
||||
}[]>`SELECT chat_id, started_at, ended_at, cost_tokens FROM tasks WHERE id = ${taskId}`;
|
||||
|
||||
const endedAt = task?.ended_at ?? new Date();
|
||||
|
||||
if (state === 'completed') {
|
||||
const startedAt = task?.started_at ?? endedAt;
|
||||
const bench = computeBenchmark(startedAt, endedAt, task?.cost_tokens ?? null, row.lane);
|
||||
|
||||
const output = task?.chat_id ? await readChatOutput(task.chat_id) : '';
|
||||
|
||||
const resultPath = battle
|
||||
? await writeContestantResults(battle, row, output, bench).catch((err) => {
|
||||
log.warn({ err: errMsg(err), contestantId: row.id }, 'arena-runner: result write failed');
|
||||
return null;
|
||||
})
|
||||
: null;
|
||||
|
||||
await sql`
|
||||
UPDATE contestants
|
||||
SET status = 'done',
|
||||
duration_ms = ${Math.round(bench.durationMs)},
|
||||
tokens_per_sec = ${bench.tokensPerSec},
|
||||
cost_tokens = ${task?.cost_tokens ?? null},
|
||||
result_path = ${resultPath},
|
||||
updated_at = clock_timestamp()
|
||||
WHERE id = ${row.id} AND status = 'running'
|
||||
`;
|
||||
teardownDeltaBridge(row.id);
|
||||
|
||||
// Check if this was the last contestant.
|
||||
const allContestants = await loadContestants(row.battle_id);
|
||||
const battleDone = isBattleComplete(allContestants);
|
||||
|
||||
publishContestantFrame(row.battle_id, row.id, {
|
||||
status: 'done',
|
||||
duration_ms: Math.round(bench.durationMs),
|
||||
...(bench.tokensPerSec !== null ? { tokens_per_sec: bench.tokensPerSec } : {}),
|
||||
...(battleDone ? { battle_status: 'completed' } : {}),
|
||||
});
|
||||
|
||||
if (battleDone) {
|
||||
await completeBattle(row.battle_id);
|
||||
} else if (row.lane === 'local') {
|
||||
void advanceLocalLane(row.battle_id);
|
||||
}
|
||||
} else {
|
||||
// failed or cancelled — the contest continues; this contestant is error.
|
||||
const errorMsg = state === 'cancelled' ? 'cancelled' : `task ${state}`;
|
||||
await sql`
|
||||
UPDATE contestants
|
||||
SET status = 'error', error = ${errorMsg}, updated_at = clock_timestamp()
|
||||
WHERE id = ${row.id} AND status = 'running'
|
||||
`;
|
||||
teardownDeltaBridge(row.id);
|
||||
|
||||
const allContestants = await loadContestants(row.battle_id);
|
||||
const battleDone = isBattleComplete(allContestants);
|
||||
|
||||
publishContestantFrame(row.battle_id, row.id, {
|
||||
status: 'error',
|
||||
error: errorMsg,
|
||||
...(battleDone ? { battle_status: 'completed' } : {}),
|
||||
});
|
||||
|
||||
if (battleDone) {
|
||||
await completeBattle(row.battle_id);
|
||||
} else if (row.lane === 'local') {
|
||||
void advanceLocalLane(row.battle_id);
|
||||
}
|
||||
}
|
||||
})().catch((err) => {
|
||||
log.error({ err: errMsg(err), taskId }, 'arena-runner: handleTaskTerminal failed');
|
||||
});
|
||||
}
|
||||
|
||||
// ─── battle finalization ──────────────────────────────────────────────────
|
||||
|
||||
async function completeBattle(battleId: string): Promise<void> {
|
||||
const updated = await sql`
|
||||
UPDATE battles SET status = 'completed', updated_at = clock_timestamp()
|
||||
WHERE id = ${battleId} AND status = 'running'
|
||||
`;
|
||||
if (updated.count === 0) return; // already terminal (race guard)
|
||||
log.info({ battleId }, 'arena-runner: battle completed');
|
||||
|
||||
// Update manifest with finished_at timestamp.
|
||||
const completedBattle = await loadBattle(battleId);
|
||||
if (completedBattle?.results_path) {
|
||||
const contestants = await loadContestants(battleId);
|
||||
await writeManifest(
|
||||
battleId,
|
||||
completedBattle.results_path,
|
||||
completedBattle.battle_type,
|
||||
completedBattle.prompt,
|
||||
completedBattle.created_at,
|
||||
contestants.map((c) => ({ identity: c.identity, model: c.model, lane: c.lane })),
|
||||
new Date(),
|
||||
).catch((err) => {
|
||||
log.warn({ err: errMsg(err), battleId }, 'arena-runner: manifest update failed');
|
||||
});
|
||||
}
|
||||
|
||||
onBattleComplete(battleId);
|
||||
}
|
||||
|
||||
// ─── manifest writer ─────────────────────────────────────────────────────
|
||||
|
||||
async function writeManifest(
|
||||
battleId: string,
|
||||
resultsPath: string,
|
||||
battleType: BattleType,
|
||||
prompt: string,
|
||||
createdAt: Date,
|
||||
contestants: Array<{ identity: string; model: string; lane: ContestantLane }>,
|
||||
finishedAt: Date | null,
|
||||
): Promise<void> {
|
||||
await mkdir(resultsPath, { recursive: true });
|
||||
const manifest = {
|
||||
id: battleId,
|
||||
battle_type: battleType,
|
||||
prompt,
|
||||
contestants,
|
||||
created_at: createdAt.toISOString(),
|
||||
finished_at: finishedAt?.toISOString() ?? null,
|
||||
};
|
||||
await writeFile(join(resultsPath, 'manifest.json'), JSON.stringify(manifest, null, 2), 'utf8');
|
||||
}
|
||||
|
||||
// ─── results writer ───────────────────────────────────────────────────────
|
||||
|
||||
async function writeContestantResults(
|
||||
battle: BattleRow,
|
||||
contestant: { identity: string; model: string; lane: ContestantLane; worktree_id: string | null },
|
||||
output: string,
|
||||
bench: { durationMs: number; tokensPerSec: number | null },
|
||||
): Promise<string> {
|
||||
const resultsPath = await getOrBuildResultsPath(battle);
|
||||
if (!resultsPath) throw new Error('cannot resolve results path for battle ' + battle.id);
|
||||
|
||||
const contestantDir = buildContestantDir(contestant.identity, contestant.model);
|
||||
const dir = join(resultsPath, contestantDir);
|
||||
await mkdir(dir, { recursive: true });
|
||||
|
||||
const benchLines = [
|
||||
`duration: ${bench.durationMs}ms`,
|
||||
bench.tokensPerSec != null ? `tokens/sec: ${bench.tokensPerSec.toFixed(1)}` : null,
|
||||
]
|
||||
.filter(Boolean)
|
||||
.join('\n');
|
||||
|
||||
const resultMd =
|
||||
`# ${contestant.identity} / ${contestant.model}\n\n` +
|
||||
`## Benchmark\n\n${benchLines}\n\n` +
|
||||
`## Output\n\n${output}\n`;
|
||||
await writeFile(join(dir, 'result.md'), resultMd, 'utf8');
|
||||
|
||||
if (battle.battle_type === 'coding' && contestant.worktree_id) {
|
||||
const [wt] = await sql<{ path: string; base_commit: string | null }[]>`
|
||||
SELECT path, base_commit FROM worktrees WHERE id = ${contestant.worktree_id}
|
||||
`;
|
||||
if (wt) {
|
||||
const [proj] = await sql<{ path: string }[]>`
|
||||
SELECT path FROM projects WHERE id = ${battle.project_id}
|
||||
`;
|
||||
if (proj) {
|
||||
const diff = await diffWorktree(wt.path, proj.path, {
|
||||
baseRef: wt.base_commit ?? undefined,
|
||||
}).catch(() => '');
|
||||
await writeFile(join(dir, 'diff.patch'), diff, 'utf8');
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return dir;
|
||||
}
|
||||
|
||||
/** Resolve or rebuild results_path for a battle (handles crash-before-UPDATE). */
|
||||
async function getOrBuildResultsPath(battle: BattleRow): Promise<string | null> {
|
||||
if (battle.results_path) return battle.results_path;
|
||||
const [proj] = await sql<{ path: string }[]>`SELECT path FROM projects WHERE id = ${battle.project_id}`;
|
||||
if (!proj) return null;
|
||||
const slug = buildBattleSlug(battle.id, battle.battle_type, battle.created_at);
|
||||
const resultsPath = join(proj.path, 'Arena', slug);
|
||||
await sql`
|
||||
UPDATE battles SET results_path = ${resultsPath}, updated_at = clock_timestamp()
|
||||
WHERE id = ${battle.id}
|
||||
`;
|
||||
return resultsPath;
|
||||
}
|
||||
|
||||
// ─── helpers ──────────────────────────────────────────────────────────────
|
||||
|
||||
async function readChatOutput(chatId: string): Promise<string> {
|
||||
const [m] = await sql<{ content: string | null }[]>`
|
||||
SELECT content FROM messages
|
||||
WHERE chat_id = ${chatId} AND role = 'assistant'
|
||||
ORDER BY created_at DESC LIMIT 1
|
||||
`;
|
||||
return m?.content ?? '';
|
||||
}
|
||||
|
||||
async function loadBattle(battleId: string): Promise<BattleRow | null> {
|
||||
const [b] = await sql<BattleRow[]>`
|
||||
SELECT id, project_id, battle_type, prompt, status, results_path, created_at
|
||||
FROM battles WHERE id = ${battleId}
|
||||
`;
|
||||
return b ?? null;
|
||||
}
|
||||
|
||||
async function loadContestants(battleId: string): Promise<ContestantRow[]> {
|
||||
return sql<ContestantRow[]>`
|
||||
SELECT id, battle_id, identity, model, lane, task_id, worktree_id, status
|
||||
FROM contestants WHERE battle_id = ${battleId}
|
||||
ORDER BY created_at ASC
|
||||
`;
|
||||
}
|
||||
|
||||
function publishContestantFrame(
|
||||
battleId: string,
|
||||
contestantId: string,
|
||||
extra: Record<string, unknown>,
|
||||
): void {
|
||||
publishUser({
|
||||
type: 'contestant_updated',
|
||||
battle_id: battleId,
|
||||
contestant_id: contestantId,
|
||||
...extra,
|
||||
});
|
||||
}
|
||||
|
||||
// ─── initResume ───────────────────────────────────────────────────────────
|
||||
|
||||
async function initResume(): Promise<void> {
|
||||
const battles = await sql<BattleRow[]>`
|
||||
SELECT id, project_id, battle_type, prompt, status, results_path, created_at
|
||||
FROM battles WHERE status = 'running'
|
||||
`;
|
||||
if (battles.length === 0) return;
|
||||
log.info({ count: battles.length }, 'arena-runner: resuming in-flight battles on startup');
|
||||
for (const battle of battles) {
|
||||
await resumeBattle(battle).catch((err) => {
|
||||
log.error({ err: errMsg(err), battleId: battle.id }, 'arena-runner: initResume failed for battle');
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
async function resumeBattle(battle: BattleRow): Promise<void> {
|
||||
const contestants = await loadContestants(battle.id);
|
||||
|
||||
const taskIds = contestants.map((c) => c.task_id).filter((id): id is string => id !== null);
|
||||
const taskStates = new Map<string, string>();
|
||||
if (taskIds.length > 0) {
|
||||
const tasks = await sql<{ id: string; state: string }[]>`
|
||||
SELECT id, state FROM tasks WHERE id = ANY(${taskIds})
|
||||
`;
|
||||
for (const t of tasks) taskStates.set(t.id, t.state);
|
||||
}
|
||||
|
||||
const decisions = reconcileContestants(
|
||||
contestants.map((c) => ({ contestantId: c.id, taskId: c.task_id, status: c.status })),
|
||||
taskStates,
|
||||
);
|
||||
|
||||
for (const decision of decisions) {
|
||||
if (decision.action === 'keep') continue;
|
||||
const contestant = contestants.find((c) => c.id === decision.contestantId)!;
|
||||
await applyResumeDecision(battle, contestant, decision.action);
|
||||
}
|
||||
|
||||
// Re-check completion after applying decisions.
|
||||
const updated = await loadContestants(battle.id);
|
||||
if (isBattleComplete(updated)) {
|
||||
await completeBattle(battle.id);
|
||||
} else {
|
||||
// Advance local lane in case a slot opened up.
|
||||
void advanceLocalLane(battle.id);
|
||||
}
|
||||
|
||||
log.info({ battleId: battle.id }, 'arena-runner: battle resumed');
|
||||
}
|
||||
|
||||
async function applyResumeDecision(
|
||||
battle: BattleRow,
|
||||
contestant: ContestantRow,
|
||||
action: ContestantResumeAction,
|
||||
): Promise<void> {
|
||||
switch (action) {
|
||||
case 'keep': break;
|
||||
|
||||
case 'mark-done': {
|
||||
const taskRow = contestant.task_id
|
||||
? (await sql<{ started_at: Date | null; ended_at: Date | null; cost_tokens: number | null; chat_id: string | null }[]>`
|
||||
SELECT started_at, ended_at, cost_tokens, chat_id FROM tasks WHERE id = ${contestant.task_id}`)[0]
|
||||
: null;
|
||||
const endedAt = taskRow?.ended_at ?? new Date();
|
||||
const startedAt = taskRow?.started_at ?? endedAt;
|
||||
const bench = computeBenchmark(startedAt, endedAt, taskRow?.cost_tokens ?? null, contestant.lane);
|
||||
const output = taskRow?.chat_id ? await readChatOutput(taskRow.chat_id) : '';
|
||||
const resultPath = battle
|
||||
? await writeContestantResults(battle, contestant, output, bench).catch((err) => {
|
||||
log.warn({ err: errMsg(err), contestantId: contestant.id }, 'arena-runner: resume result write failed');
|
||||
return null;
|
||||
})
|
||||
: null;
|
||||
await sql`
|
||||
UPDATE contestants
|
||||
SET status = 'done',
|
||||
duration_ms = ${Math.round(bench.durationMs)},
|
||||
tokens_per_sec = ${bench.tokensPerSec},
|
||||
result_path = ${resultPath},
|
||||
updated_at = clock_timestamp()
|
||||
WHERE id = ${contestant.id}
|
||||
`;
|
||||
break;
|
||||
}
|
||||
|
||||
case 'mark-error':
|
||||
await sql`
|
||||
UPDATE contestants
|
||||
SET status = 'error', error = 'task failed before callback',
|
||||
updated_at = clock_timestamp()
|
||||
WHERE id = ${contestant.id}
|
||||
`;
|
||||
break;
|
||||
|
||||
case 'mark-cancelled':
|
||||
await sql`
|
||||
UPDATE contestants
|
||||
SET status = 'error', error = 'cancelled before callback',
|
||||
updated_at = clock_timestamp()
|
||||
WHERE id = ${contestant.id}
|
||||
`;
|
||||
break;
|
||||
|
||||
case 're-dispatch': {
|
||||
const { taskId } = await dispatch({
|
||||
projectId: battle.project_id,
|
||||
contestantId: contestant.id,
|
||||
prompt: battle.prompt,
|
||||
identity: contestant.identity,
|
||||
model: contestant.model,
|
||||
battleType: battle.battle_type,
|
||||
});
|
||||
await sql`
|
||||
UPDATE contestants
|
||||
SET task_id = ${taskId}, updated_at = clock_timestamp()
|
||||
WHERE id = ${contestant.id}
|
||||
`;
|
||||
log.info(
|
||||
{ battleId: battle.id, contestantId: contestant.id, taskId },
|
||||
'arena-runner: contestant re-dispatched on resume',
|
||||
);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ─── cancelBattle ─────────────────────────────────────────────────────────
|
||||
|
||||
async function cancelBattle(battleId: string): Promise<{ cancelled: boolean; taskIds: string[] }> {
|
||||
const updated = await sql`
|
||||
UPDATE battles SET status = 'cancelled', updated_at = clock_timestamp()
|
||||
WHERE id = ${battleId} AND status = 'running'
|
||||
`;
|
||||
if (updated.count === 0) return { cancelled: false, taskIds: [] };
|
||||
|
||||
// Mark all non-terminal contestants cancelled and collect in-flight task_ids.
|
||||
const contestants = await sql<{ id: string; task_id: string | null; status: string }[]>`
|
||||
SELECT id, task_id, status FROM contestants
|
||||
WHERE battle_id = ${battleId} AND status NOT IN ('done', 'error')
|
||||
`;
|
||||
|
||||
if (contestants.length > 0) {
|
||||
await sql`
|
||||
UPDATE contestants
|
||||
SET status = 'error', error = 'battle cancelled', updated_at = clock_timestamp()
|
||||
WHERE battle_id = ${battleId} AND status NOT IN ('done', 'error')
|
||||
`;
|
||||
for (const c of contestants) {
|
||||
publishContestantFrame(battleId, c.id, {
|
||||
status: 'error',
|
||||
error: 'battle cancelled',
|
||||
battle_status: 'cancelled',
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
const taskIds = contestants
|
||||
.filter(
|
||||
(c): c is typeof c & { task_id: string } =>
|
||||
c.task_id !== null && c.status === 'running',
|
||||
)
|
||||
.map((c) => c.task_id);
|
||||
|
||||
log.info({ battleId }, 'arena-runner: battle cancelled by request');
|
||||
return { cancelled: true, taskIds };
|
||||
}
|
||||
|
||||
// ─── triggerAnalysis (Phase 5 seam) ──────────────────────────────────────
|
||||
|
||||
async function triggerAnalysis(battleId: string): Promise<{ triggered: boolean }> {
|
||||
const battle = await loadBattle(battleId);
|
||||
if (!battle) return { triggered: false };
|
||||
log.info({ battleId }, 'arena-runner: triggerAnalysis requested');
|
||||
// Calls the injected onBattleComplete seam — Phase 5 replaces this with the
|
||||
// real two-stage digest→judge analyzer (see ADR 0002 + plan Phase 5).
|
||||
onBattleComplete(battleId);
|
||||
return { triggered: true };
|
||||
}
|
||||
|
||||
// ─── startCrossExam (Phase 5 seam) ───────────────────────────────────────
|
||||
|
||||
async function startCrossExam(
|
||||
battleId: string,
|
||||
opts: { identity: string; model: string },
|
||||
): Promise<{ crossExamId: string }> {
|
||||
const [row] = await sql<{ id: string }[]>`
|
||||
INSERT INTO cross_examinations (battle_id, identity, model)
|
||||
VALUES (${battleId}, ${opts.identity}, ${opts.model})
|
||||
RETURNING id
|
||||
`;
|
||||
const crossExamId = row!.id;
|
||||
log.info({ battleId, crossExamId, ...opts }, 'arena-runner: cross-exam inserted, triggering analyzer');
|
||||
if (onCrossExamStart) {
|
||||
try {
|
||||
onCrossExamStart({ battleId, crossExamId, identity: opts.identity, model: opts.model });
|
||||
} catch (err) {
|
||||
log.error({ err: err instanceof Error ? err.message : String(err), battleId, crossExamId }, 'arena-runner: onCrossExamStart threw');
|
||||
}
|
||||
}
|
||||
return { crossExamId };
|
||||
}
|
||||
|
||||
// ─── setWinner (user override) ────────────────────────────────────────────
|
||||
|
||||
async function setWinner(
|
||||
battleId: string,
|
||||
winnerId: string | null,
|
||||
): Promise<{ ok: boolean; notFound?: boolean; invalidContestant?: boolean }> {
|
||||
const [row] = await sql<{ id: string }[]>`SELECT id FROM battles WHERE id = ${battleId}`;
|
||||
if (!row) return { ok: false, notFound: true };
|
||||
|
||||
if (winnerId !== null) {
|
||||
const [c] = await sql<{ id: string }[]>`
|
||||
SELECT id FROM contestants WHERE id = ${winnerId} AND battle_id = ${battleId}
|
||||
`;
|
||||
if (!c) return { ok: false, invalidContestant: true };
|
||||
}
|
||||
|
||||
await sql`
|
||||
UPDATE battles SET winner_contestant_id = ${winnerId}, updated_at = clock_timestamp()
|
||||
WHERE id = ${battleId}
|
||||
`;
|
||||
publishUser({ type: 'battle_updated', battle_id: battleId, winner_contestant_id: winnerId });
|
||||
return { ok: true };
|
||||
}
|
||||
|
||||
return { startBattle, handleTaskTerminal, initResume, cancelBattle, triggerAnalysis, startCrossExam, setWinner };
|
||||
}
|
||||
|
||||
function errMsg(e: unknown): string {
|
||||
return e instanceof Error ? e.message : String(e);
|
||||
}
|
||||
@@ -12,12 +12,48 @@ import { homedir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import type { AgentCommand } from './provider-types.js';
|
||||
|
||||
/** Minimal frontmatter reader — single-line `key: value` between `---` fences. */
|
||||
/**
|
||||
* Frontmatter reader between `---` fences. Handles single-line `key: value`
|
||||
* AND YAML block scalars (`key: >` folded / `key: |` literal) whose value
|
||||
* spans the following more-indented lines — the shape most plugin SKILL.md
|
||||
* descriptions use (`description: >`).
|
||||
*/
|
||||
function frontmatterField(content: string, field: string): string | undefined {
|
||||
const block = content.match(/^---\r?\n([\s\S]*?)\r?\n---/);
|
||||
if (!block?.[1]) return undefined;
|
||||
const m = block[1].match(new RegExp(`^${field}:\\s*(.+)$`, 'm'));
|
||||
return m?.[1]?.trim().replace(/^["']|["']$/g, '') || undefined;
|
||||
const lines = block[1].split(/\r?\n/);
|
||||
const keyRe = new RegExp(`^(\\s*)${field}:\\s*(.*)$`);
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
const m = lines[i]?.match(keyRe);
|
||||
if (!m) continue;
|
||||
const keyIndent = (m[1] ?? '').length;
|
||||
const inline = (m[2] ?? '').trim();
|
||||
// Block scalar: `>` (folded) or `|` (literal), optional chomping `+`/`-`.
|
||||
if (/^[>|][+-]?$/.test(inline)) {
|
||||
const folded = inline[0] === '>';
|
||||
const body: string[] = [];
|
||||
for (let j = i + 1; j < lines.length; j++) {
|
||||
const line = lines[j] ?? '';
|
||||
if (line.trim() === '') {
|
||||
body.push('');
|
||||
continue;
|
||||
}
|
||||
const indent = line.length - line.trimStart().length;
|
||||
if (indent <= keyIndent) break; // dedent ends the block
|
||||
body.push(line.slice(keyIndent + 1));
|
||||
}
|
||||
const joined = folded
|
||||
? body
|
||||
.map((l) => l.trim())
|
||||
.join(' ')
|
||||
.replace(/\s+/g, ' ')
|
||||
.trim()
|
||||
: body.join('\n').replace(/\n+$/, '');
|
||||
return joined || undefined;
|
||||
}
|
||||
return inline.replace(/^["']|["']$/g, '').trim() || undefined;
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function readCommandDir(dir: string): AgentCommand[] {
|
||||
|
||||
@@ -4,6 +4,7 @@ import type { Broker } from '@boocode/server/broker';
|
||||
import type { WsFrame } from '@boocode/contracts/ws-frames';
|
||||
import type { Config } from '../config.js';
|
||||
import { createWorktree, diffWorktree, cleanupWorktree, ensureSessionWorktree } from './worktrees.js';
|
||||
import { asPermissionMode } from './tools/types.js';
|
||||
import { createCheckpoint } from './checkpoints.js';
|
||||
import { makeDcpStreamStripper } from './dcp-strip.js';
|
||||
import { dispatchViaAcp } from './acp-dispatch.js';
|
||||
@@ -31,7 +32,13 @@ import {
|
||||
import { shouldFailOnMissingAgent } from './flow-runner-decisions.js';
|
||||
|
||||
interface InferenceRunner {
|
||||
enqueue: (sessionId: string, chatId: string, assistantId: string, user: string) => void;
|
||||
enqueue: (
|
||||
sessionId: string,
|
||||
chatId: string,
|
||||
assistantId: string,
|
||||
user: string,
|
||||
permissionMode?: 'plan' | 'ask' | 'bypass',
|
||||
) => void;
|
||||
cancel: (sessionId: string, chatId: string) => Promise<boolean>;
|
||||
hasActive: (chatId: string) => boolean;
|
||||
}
|
||||
@@ -305,10 +312,13 @@ export function createDispatcher(deps: Deps): {
|
||||
|
||||
// ─── Path A: Native Inference ───────────────────────────────────────────────
|
||||
|
||||
async function runNativeInference(task: { id: string; project_id: string; input: string; agent: string | null; model: string | null; session_id: string | null }): Promise<void> {
|
||||
async function runNativeInference(task: { id: string; project_id: string; input: string; agent: string | null; model: string | null; mode_id: string | null; session_id: string | null }): Promise<void> {
|
||||
const taskId = task.id;
|
||||
log.info({ taskId }, 'dispatcher: starting task (path A — native)');
|
||||
|
||||
// Declared before try so the catch block can write it back on the task row.
|
||||
let chatId: string | null = null;
|
||||
|
||||
try {
|
||||
// Mark running
|
||||
await sql`
|
||||
@@ -317,26 +327,29 @@ export function createDispatcher(deps: Deps): {
|
||||
WHERE id = ${taskId}
|
||||
`;
|
||||
|
||||
// Create session + chat for this task
|
||||
// Session setup: reuse a pre-created session (e.g. Q&A arena contestants
|
||||
// whose persona is stamped on the session via agent_id) or create a fresh one.
|
||||
const model = task.model ?? config.DEFAULT_MODEL;
|
||||
const sessionName = 'Task: ' + task.input.slice(0, 40);
|
||||
|
||||
const [session] = await sql<{ id: string }[]>`
|
||||
INSERT INTO sessions (project_id, name, model, status)
|
||||
VALUES (${task.project_id}, ${sessionName}, ${model}, 'open')
|
||||
RETURNING id
|
||||
`;
|
||||
const sessionId = session!.id;
|
||||
let sessionId: string;
|
||||
if (task.session_id) {
|
||||
sessionId = task.session_id;
|
||||
} else {
|
||||
const sessionName = 'Task: ' + task.input.slice(0, 40);
|
||||
const [session] = await sql<{ id: string }[]>`
|
||||
INSERT INTO sessions (project_id, name, model, status)
|
||||
VALUES (${task.project_id}, ${sessionName}, ${model}, 'open')
|
||||
RETURNING id
|
||||
`;
|
||||
sessionId = session!.id;
|
||||
await sql`UPDATE tasks SET session_id = ${sessionId} WHERE id = ${taskId}`;
|
||||
}
|
||||
|
||||
const [chat] = await sql<{ id: string }[]>`
|
||||
INSERT INTO chats (session_id, name, status)
|
||||
VALUES (${sessionId}, 'Task execution', 'open')
|
||||
RETURNING id
|
||||
`;
|
||||
const chatId = chat!.id;
|
||||
|
||||
// Link task to session
|
||||
await sql`UPDATE tasks SET session_id = ${sessionId} WHERE id = ${taskId}`;
|
||||
chatId = chat!.id;
|
||||
|
||||
// Create user message + streaming assistant
|
||||
await sql<{ id: string }[]>`
|
||||
@@ -351,8 +364,9 @@ export function createDispatcher(deps: Deps): {
|
||||
`;
|
||||
const assistantId = assistantMsg!.id;
|
||||
|
||||
// Enqueue inference
|
||||
inference.enqueue(sessionId, chatId, assistantId, 'default');
|
||||
// Enqueue inference — pass the native permission gate (plan/ask/bypass)
|
||||
// through to the write-tool context. Non-unified mode ids → undefined.
|
||||
inference.enqueue(sessionId, chatId, assistantId, 'default', asPermissionMode(task.mode_id));
|
||||
|
||||
// Wait for inference to complete (poll message status)
|
||||
const finalStatus = await waitForCompletion(assistantId);
|
||||
@@ -381,7 +395,7 @@ export function createDispatcher(deps: Deps): {
|
||||
const summary = (msg?.content ?? '').slice(0, 500);
|
||||
await sql`
|
||||
UPDATE tasks
|
||||
SET state = 'completed', ended_at = clock_timestamp(), output_summary = ${summary}, cost_tokens = ${costTokens}
|
||||
SET state = 'completed', ended_at = clock_timestamp(), output_summary = ${summary}, cost_tokens = ${costTokens}, chat_id = ${chatId}
|
||||
WHERE id = ${taskId}
|
||||
`;
|
||||
log.info({ taskId, costTokens }, 'dispatcher: task completed (native)');
|
||||
@@ -392,7 +406,7 @@ export function createDispatcher(deps: Deps): {
|
||||
const summary = (msg?.content ?? 'Inference failed').slice(0, 500);
|
||||
await sql`
|
||||
UPDATE tasks
|
||||
SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${summary}, cost_tokens = ${costTokens}
|
||||
SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${summary}, cost_tokens = ${costTokens}, chat_id = ${chatId}
|
||||
WHERE id = ${taskId}
|
||||
`;
|
||||
log.warn({ taskId, finalStatus }, 'dispatcher: task failed (native)');
|
||||
@@ -402,7 +416,7 @@ export function createDispatcher(deps: Deps): {
|
||||
log.error({ taskId, err: errMsg }, 'dispatcher: task error (native)');
|
||||
await sql`
|
||||
UPDATE tasks
|
||||
SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)}
|
||||
SET state = 'failed', ended_at = clock_timestamp(), output_summary = ${errMsg.slice(0, 500)}, chat_id = ${chatId}
|
||||
WHERE id = ${taskId}
|
||||
`.catch(() => {});
|
||||
}
|
||||
|
||||
@@ -21,7 +21,16 @@
|
||||
// punctuation to ASCII on both sides; the match is
|
||||
// mapped back to original offsets.
|
||||
// 4. levenshtein — best line-window by normalized edit-distance
|
||||
// similarity; accepted only at >= SIMILARITY_THRESHOLD.
|
||||
// similarity; accepted only at >= SIMILARITY_THRESHOLD,
|
||||
// anchored on an exact first+last line for multi-line
|
||||
// needles, and REFUSED (ambiguous) when a second window
|
||||
// scores within AMBIGUITY_EPSILON of the best. Like the
|
||||
// exact/whitespace tiers, this tier fails CLOSED — it
|
||||
// never splices over a merely-plausible guess, because a
|
||||
// wrong-window splice corrupts the file (it leaves the
|
||||
// real target intact and duplicates it). This mirrors
|
||||
// opencode/cline/qwen, whose fuzzy tiers all keep the
|
||||
// unique-match requirement rather than picking a winner.
|
||||
//
|
||||
// Pure and dependency-free (Levenshtein is the standard iterative two-row DP),
|
||||
// reimplemented from the general technique — no vendored source.
|
||||
@@ -31,8 +40,31 @@ export type MatchResult =
|
||||
| { kind: 'ambiguous'; count: number }
|
||||
| { kind: 'not_found' };
|
||||
|
||||
/** Levenshtein similarity floor for the final fuzzy fallback (strategy 4). */
|
||||
export const SIMILARITY_THRESHOLD = 0.66;
|
||||
/**
|
||||
* Levenshtein similarity floor for the final fuzzy fallback (strategy 4).
|
||||
* 0.66 was far too low — at two-thirds similarity a structurally-wrong window
|
||||
* (e.g. one of three near-identical form blocks) clears the bar and gets spliced
|
||||
* over, leaving the real target intact and duplicated. Competent agents anchor
|
||||
* far tighter (opencode's BlockAnchor needs an exact anchor; cline needs exact
|
||||
* first+last lines). 0.85 keeps genuine quantized-model drift (a typo, an indent
|
||||
* shift) while refusing a different block.
|
||||
*/
|
||||
export const SIMILARITY_THRESHOLD = 0.85;
|
||||
|
||||
/**
|
||||
* If a second candidate window scores within this of the best, the match is
|
||||
* ambiguous and tier 4 refuses rather than guessing — the same fail-closed
|
||||
* stance the exact and whitespace tiers take on multiple hits. Repetitive files
|
||||
* (the duplicate-block corruption case) produce near-tied windows; this is what
|
||||
* turns that into a clean "add more context" error instead of a wrong splice.
|
||||
*/
|
||||
export const AMBIGUITY_EPSILON = 0.05;
|
||||
|
||||
/** Multi-line needles at or above this length must anchor on an exact (after
|
||||
* trim + unicode-fold) first AND last line before similarity is even scored —
|
||||
* the cline/opencode block-anchor rule. Below it, threshold + uniqueness alone
|
||||
* guard the match. */
|
||||
const ANCHOR_MIN_LINES = 3;
|
||||
|
||||
export function locateMatch(content: string, needle: string): MatchResult {
|
||||
// Empty needle has no meaningful match.
|
||||
@@ -252,20 +284,39 @@ function locateByLevenshtein(content: string, needle: string): MatchResult | nul
|
||||
|
||||
const needleJoined = needleLines.map((l) => l.trim()).join('\n');
|
||||
|
||||
let best = -1;
|
||||
let bestSpan: { start: number; end: number } | null = null;
|
||||
// Block-anchor gate for multi-line needles: the first and last lines must match
|
||||
// exactly (after trim + unicode-fold) or the window is not even scored. This
|
||||
// stops a high interior-similarity from dragging a structurally-wrong window
|
||||
// over the threshold — the failure that duplicates blocks in repetitive files.
|
||||
const anchored = n >= ANCHOR_MIN_LINES;
|
||||
const needleFirst = canonicalize(needleLines[0]!.trim());
|
||||
const needleLast = canonicalize(needleLines[n - 1]!.trim());
|
||||
|
||||
const scored: Array<{ score: number; start: number; end: number }> = [];
|
||||
for (let i = 0; i + n <= contentLines.length; i++) {
|
||||
const window = contentLines.slice(i, i + n);
|
||||
const windowJoined = window.map((l) => l.text.trim()).join('\n');
|
||||
const score = similarity(windowJoined, needleJoined);
|
||||
if (score > best) {
|
||||
best = score;
|
||||
bestSpan = { start: window[0]!.start, end: window[n - 1]!.end };
|
||||
if (anchored) {
|
||||
const winFirst = canonicalize(window[0]!.text.trim());
|
||||
const winLast = canonicalize(window[n - 1]!.text.trim());
|
||||
if (winFirst !== needleFirst || winLast !== needleLast) continue;
|
||||
}
|
||||
const windowJoined = window.map((l) => l.text.trim()).join('\n');
|
||||
scored.push({
|
||||
score: similarity(windowJoined, needleJoined),
|
||||
start: window[0]!.start,
|
||||
end: window[n - 1]!.end,
|
||||
});
|
||||
}
|
||||
|
||||
if (bestSpan && best >= SIMILARITY_THRESHOLD) {
|
||||
return { kind: 'fuzzy', start: bestSpan.start, end: bestSpan.end };
|
||||
}
|
||||
return null;
|
||||
if (scored.length === 0) return null;
|
||||
scored.sort((a, b) => b.score - a.score);
|
||||
const best = scored[0]!;
|
||||
if (best.score < SIMILARITY_THRESHOLD) return null;
|
||||
|
||||
// Uniqueness guard: refuse when a second window is within epsilon of the best.
|
||||
// Fail closed (ambiguous) rather than silently splicing one of several lookalikes.
|
||||
const tied = scored.filter((s) => s.score >= best.score - AMBIGUITY_EPSILON);
|
||||
if (tied.length > 1) return { kind: 'ambiguous', count: tied.length };
|
||||
|
||||
return { kind: 'fuzzy', start: best.start, end: best.end };
|
||||
}
|
||||
|
||||
@@ -1,9 +1,120 @@
|
||||
import { readFile, writeFile, unlink, mkdir } from 'node:fs/promises';
|
||||
import { dirname } from 'node:path';
|
||||
import { readFile, writeFile, unlink, mkdir, rename, realpath } from 'node:fs/promises';
|
||||
import { dirname, join, basename } from 'node:path';
|
||||
import { randomBytes } from 'node:crypto';
|
||||
import type { Sql } from '../db.js';
|
||||
import { resolveWritePath } from './write_guard.js';
|
||||
import { locateMatch } from './fuzzy-match.js';
|
||||
|
||||
/**
|
||||
* Write a file atomically: stage to a sibling temp file, then rename over the
|
||||
* target. rename(2) on the same filesystem is atomic, so a crash mid-write can
|
||||
* never leave a half-written (truncated/corrupt) source file — readers see
|
||||
* either the old content or the complete new content. The temp lives in the same
|
||||
* directory to guarantee a same-filesystem rename.
|
||||
*
|
||||
* Symlinks: a plain writeFile FOLLOWS a symlink and writes through to its target;
|
||||
* a bare rename would REPLACE the link with a regular file. We realpath an
|
||||
* existing target first so the rename lands on the real file and the link
|
||||
* survives — preserving the prior follow-through behavior. A missing target
|
||||
* (create, or a broken link) just writes the literal path.
|
||||
*/
|
||||
async function writeFileAtomic(filePath: string, content: string): Promise<void> {
|
||||
let target = filePath;
|
||||
try {
|
||||
target = await realpath(filePath);
|
||||
} catch {
|
||||
// ENOENT (new file) or broken link — write the literal path.
|
||||
}
|
||||
const tmp = join(dirname(target), `.${basename(target)}.tmp.${process.pid}.${randomBytes(6).toString('hex')}`);
|
||||
await writeFile(tmp, content, 'utf8');
|
||||
try {
|
||||
await rename(tmp, target);
|
||||
} catch (err) {
|
||||
await unlink(tmp).catch(() => {});
|
||||
throw err;
|
||||
}
|
||||
}
|
||||
|
||||
/** Detect a file's dominant line ending so an edit can preserve it. */
|
||||
function detectEol(text: string): '\r\n' | '\n' {
|
||||
return text.includes('\r\n') ? '\r\n' : '\n';
|
||||
}
|
||||
|
||||
/**
|
||||
* Serialize the read-modify-write of a single file so two concurrent applies
|
||||
* (e.g. two chat tabs sharing one worktree, or a Bypass write racing an
|
||||
* apply_pending) can't lose an update. In-process keying is sufficient —
|
||||
* BooCoder is a single Fastify process. One Map entry per distinct path.
|
||||
*/
|
||||
const fileLocks = new Map<string, Promise<void>>();
|
||||
async function withFileLock<T>(filePath: string, fn: () => Promise<T>): Promise<T> {
|
||||
const prev = fileLocks.get(filePath) ?? Promise.resolve();
|
||||
let release!: () => void;
|
||||
const current = new Promise<void>((r) => { release = r; });
|
||||
fileLocks.set(filePath, prev.then(() => current));
|
||||
await prev.catch(() => {});
|
||||
try {
|
||||
return await fn();
|
||||
} finally {
|
||||
release();
|
||||
}
|
||||
}
|
||||
|
||||
// --- Edit-apply planning (pure, unit-tested) ---------------------------------
|
||||
|
||||
/**
|
||||
* Decision for applying one queued edit to a file's current content. Pulled out
|
||||
* of `applyOne` so the splice — the part that actually corrupted files — is pure
|
||||
* and testable without a DB or filesystem. Mirrors how opencode/cline/qwen keep
|
||||
* their matchers fail-closed and idempotent.
|
||||
*/
|
||||
export type EditPlan =
|
||||
| { kind: 'apply'; updated: string }
|
||||
| { kind: 'noop'; reason: 'identical' | 'already-applied' }
|
||||
| { kind: 'ambiguous'; count: number }
|
||||
| { kind: 'not_found' };
|
||||
|
||||
/**
|
||||
* Decide how (or whether) to apply an `old → new` edit to `content`.
|
||||
*
|
||||
* Idempotency is the whole point here: a queued edit can legitimately be
|
||||
* re-applied (a local model re-emits the same tool call; a turn is retried; the
|
||||
* same change sits in the queue twice). A naive splice stamps the new text again
|
||||
* each time — the 2–3× block duplication. Two guards make re-application a no-op:
|
||||
*
|
||||
* - already-applied (anchored insert): when `new` is `old` + an appended block
|
||||
* (`old="anchor"`, `new="anchor\n<block>"`), `old` still matches uniquely after
|
||||
* the first apply, so a second apply would duplicate `<block>`. If the full
|
||||
* `new` text is already present at the match site, the edit is already applied.
|
||||
* - already-applied (old gone): if `old` can't be located but `new` is already
|
||||
* in the file, the change landed on a prior pass — treat as a no-op, not an error.
|
||||
* - identical: the splice would not change the file.
|
||||
*
|
||||
* Anything ambiguous or genuinely absent fails CLOSED so the caller surfaces a
|
||||
* correctable error instead of writing a guess.
|
||||
*/
|
||||
export function planEdit(content: string, oldStr: string, newStr: string): EditPlan {
|
||||
const match = locateMatch(content, oldStr);
|
||||
|
||||
if (match.kind === 'ambiguous') return { kind: 'ambiguous', count: match.count };
|
||||
|
||||
if (match.kind === 'not_found') {
|
||||
if (newStr.length > 0 && content.includes(newStr)) {
|
||||
return { kind: 'noop', reason: 'already-applied' };
|
||||
}
|
||||
return { kind: 'not_found' };
|
||||
}
|
||||
|
||||
const updated = content.slice(0, match.start) + newStr + content.slice(match.end);
|
||||
// No-change splice first (covers old === new), then the anchored re-stamp guard:
|
||||
// the full replacement already sits at the match site (re-emitted anchored insert).
|
||||
if (updated === content) return { kind: 'noop', reason: 'identical' };
|
||||
if (content.slice(match.start, match.start + newStr.length) === newStr) {
|
||||
return { kind: 'noop', reason: 'already-applied' };
|
||||
}
|
||||
return { kind: 'apply', updated };
|
||||
}
|
||||
|
||||
// --- Types -------------------------------------------------------------------
|
||||
|
||||
export interface PendingChange {
|
||||
@@ -47,6 +158,13 @@ export async function queueEdit(
|
||||
const resolved = resolveWritePath(projectRoot, filePath);
|
||||
const diff = JSON.stringify({ old: oldString, new: newString });
|
||||
|
||||
// Idempotent queue: collapse an identical edit that is still pending. Local
|
||||
// quantized models re-emit the same edit_file call within a turn, and a retried
|
||||
// turn re-queues — each duplicate row would apply and stamp another copy. One
|
||||
// pending row per (session, file, operation, diff) is enough.
|
||||
const existing = await findPendingDuplicate(sql, sessionId, resolved, 'edit', diff);
|
||||
if (existing) return existing;
|
||||
|
||||
const [row] = await sql<PendingChange[]>`
|
||||
INSERT INTO pending_changes (session_id, task_id, file_path, operation, diff, agent)
|
||||
VALUES (${sessionId}, ${taskId}, ${resolved}, 'edit', ${diff}, ${agent})
|
||||
@@ -55,6 +173,28 @@ export async function queueEdit(
|
||||
return row!;
|
||||
}
|
||||
|
||||
/** Return an identical still-pending change for this (session, file, op, diff),
|
||||
* or undefined. Used to keep the queue idempotent against re-emitted edits. */
|
||||
async function findPendingDuplicate(
|
||||
sql: Sql,
|
||||
sessionId: string,
|
||||
resolvedPath: string,
|
||||
operation: 'create' | 'edit' | 'delete',
|
||||
diff: string,
|
||||
): Promise<PendingChange | undefined> {
|
||||
const [row] = await sql<PendingChange[]>`
|
||||
SELECT * FROM pending_changes
|
||||
WHERE session_id = ${sessionId}
|
||||
AND file_path = ${resolvedPath}
|
||||
AND operation = ${operation}
|
||||
AND diff = ${diff}
|
||||
AND status = 'pending'
|
||||
ORDER BY created_at ASC
|
||||
LIMIT 1
|
||||
`;
|
||||
return row;
|
||||
}
|
||||
|
||||
export async function queueCreate(
|
||||
sql: Sql,
|
||||
sessionId: string,
|
||||
@@ -68,6 +208,9 @@ export async function queueCreate(
|
||||
): Promise<PendingChange> {
|
||||
const resolved = resolveWritePath(projectRoot, filePath);
|
||||
|
||||
const existing = await findPendingDuplicate(sql, sessionId, resolved, 'create', content);
|
||||
if (existing) return existing;
|
||||
|
||||
const [row] = await sql<PendingChange[]>`
|
||||
INSERT INTO pending_changes (session_id, task_id, file_path, operation, diff, agent)
|
||||
VALUES (${sessionId}, ${taskId}, ${resolved}, 'create', ${content}, ${agent})
|
||||
@@ -87,6 +230,9 @@ export async function queueDelete(
|
||||
): Promise<PendingChange> {
|
||||
const resolved = resolveWritePath(projectRoot, filePath);
|
||||
|
||||
const existing = await findPendingDuplicate(sql, sessionId, resolved, 'delete', '');
|
||||
if (existing) return existing;
|
||||
|
||||
const [row] = await sql<PendingChange[]>`
|
||||
INSERT INTO pending_changes (session_id, task_id, file_path, operation, diff, agent)
|
||||
VALUES (${sessionId}, ${taskId}, ${resolved}, 'delete', '', ${agent})
|
||||
@@ -110,48 +256,60 @@ export async function applyOne(
|
||||
}
|
||||
|
||||
try {
|
||||
// Re-validate path in case projectRoot has shifted
|
||||
resolveWritePath(projectRoot, change.file_path);
|
||||
return await withFileLock(change.file_path, async () => {
|
||||
// Re-validate path in case projectRoot has shifted
|
||||
resolveWritePath(projectRoot, change.file_path);
|
||||
|
||||
switch (change.operation) {
|
||||
case 'create': {
|
||||
await mkdir(dirname(change.file_path), { recursive: true });
|
||||
await writeFile(change.file_path, change.diff, 'utf8');
|
||||
break;
|
||||
}
|
||||
case 'edit': {
|
||||
const { old: oldStr, new: newStr } = JSON.parse(change.diff) as { old: string; new: string };
|
||||
const content = await readFile(change.file_path, 'utf8');
|
||||
const match = locateMatch(content, oldStr);
|
||||
if (match.kind === 'ambiguous') {
|
||||
throw new Error(
|
||||
`old_string matches ${match.count} locations — add surrounding context to disambiguate`,
|
||||
);
|
||||
switch (change.operation) {
|
||||
case 'create': {
|
||||
await mkdir(dirname(change.file_path), { recursive: true });
|
||||
await writeFileAtomic(change.file_path, change.diff);
|
||||
break;
|
||||
}
|
||||
if (match.kind === 'not_found') {
|
||||
throw new Error(
|
||||
'old_string not found in file (even fuzzily) — file may have changed since the edit was queued',
|
||||
);
|
||||
case 'edit': {
|
||||
const { old: oldStr, new: newStr } = JSON.parse(change.diff) as { old: string; new: string };
|
||||
const raw = await readFile(change.file_path, 'utf8');
|
||||
// Normalize to LF for matching, then write back in the file's native EOL
|
||||
// so an LF-emitting model doesn't leave a CRLF file with mixed endings.
|
||||
const eol = detectEol(raw);
|
||||
const toLf = (t: string) => t.replaceAll('\r\n', '\n');
|
||||
const plan = planEdit(toLf(raw), toLf(oldStr), toLf(newStr));
|
||||
if (plan.kind === 'ambiguous') {
|
||||
throw new Error(
|
||||
`old_string matches ${plan.count} locations — add surrounding context to disambiguate`,
|
||||
);
|
||||
}
|
||||
if (plan.kind === 'not_found') {
|
||||
throw new Error(
|
||||
'old_string not found in file (even fuzzily) — file may have changed since the edit was queued',
|
||||
);
|
||||
}
|
||||
if (plan.kind === 'apply') {
|
||||
const out = eol === '\r\n' ? plan.updated.replaceAll('\n', '\r\n') : plan.updated;
|
||||
await writeFileAtomic(change.file_path, out);
|
||||
} else {
|
||||
// noop: the edit is already applied (re-emitted / retried) or a no-change.
|
||||
// Mark it applied without rewriting so it can't stamp a duplicate.
|
||||
console.log(`[pending] edit ${change.file_path} is a no-op (${plan.reason}) — not rewriting`);
|
||||
}
|
||||
break;
|
||||
}
|
||||
const updated = content.slice(0, match.start) + newStr + content.slice(match.end);
|
||||
await writeFile(change.file_path, updated, 'utf8');
|
||||
break;
|
||||
}
|
||||
case 'delete': {
|
||||
// Stash current content in diff for potential rewind
|
||||
try {
|
||||
const existing = await readFile(change.file_path, 'utf8');
|
||||
await sql`UPDATE pending_changes SET diff = ${existing} WHERE id = ${changeId}`;
|
||||
} catch {
|
||||
// File may already be gone — proceed with status update
|
||||
case 'delete': {
|
||||
// Stash current content in diff for potential rewind
|
||||
try {
|
||||
const existing = await readFile(change.file_path, 'utf8');
|
||||
await sql`UPDATE pending_changes SET diff = ${existing} WHERE id = ${changeId}`;
|
||||
} catch {
|
||||
// File may already be gone — proceed with status update
|
||||
}
|
||||
await unlink(change.file_path);
|
||||
break;
|
||||
}
|
||||
await unlink(change.file_path);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
await sql`UPDATE pending_changes SET status = 'applied' WHERE id = ${changeId}`;
|
||||
return { id: change.id, file_path: change.file_path, operation: change.operation, success: true };
|
||||
await sql`UPDATE pending_changes SET status = 'applied' WHERE id = ${changeId}`;
|
||||
return { id: change.id, file_path: change.file_path, operation: change.operation, success: true };
|
||||
});
|
||||
} catch (err) {
|
||||
const message = err instanceof Error ? err.message : String(err);
|
||||
return { id: change.id, file_path: change.file_path, operation: change.operation, success: false, error: message };
|
||||
@@ -220,13 +378,13 @@ export async function rewindOne(
|
||||
);
|
||||
}
|
||||
const reverted = content.slice(0, match.start) + oldStr + content.slice(match.end);
|
||||
await writeFile(change.file_path, reverted, 'utf8');
|
||||
await writeFileAtomic(change.file_path, reverted);
|
||||
break;
|
||||
}
|
||||
case 'delete': {
|
||||
// Reverse a delete: recreate the file (diff holds the original content stashed at apply time)
|
||||
await mkdir(dirname(change.file_path), { recursive: true });
|
||||
await writeFile(change.file_path, change.diff, 'utf8');
|
||||
await writeFileAtomic(change.file_path, change.diff);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -32,6 +32,18 @@ const QWEN_PTY_MODES: ProviderMode[] = [
|
||||
{ id: 'yolo', label: 'YOLO', description: 'Auto-approve all tools', isUnattended: true },
|
||||
];
|
||||
|
||||
// Native BooCode (llama-swap) has no agent-native mode vocabulary, so we define
|
||||
// one that matches the unified permission ladder. `bypass` is the only mode that
|
||||
// changes behavior (auto-apply staged edits after the turn — dispatcher.ts);
|
||||
// `plan` falls back to `ask` semantics for native (writes still stage to the
|
||||
// pending-changes queue). External agents map the same three unified modes onto
|
||||
// THEIR native ids via the `plan`-id / default / `isUnattended` shape.
|
||||
const BOOCODE_MODES: ProviderMode[] = [
|
||||
{ id: 'plan', label: 'Plan', description: 'Read-only analysis (native BooCode falls back to Ask)' },
|
||||
{ id: 'ask', label: 'Ask Permission', description: 'Stage edits to the pending-changes queue for review' },
|
||||
{ id: 'bypass', label: 'Bypass', description: 'Auto-apply edits to disk after the turn', isUnattended: true },
|
||||
];
|
||||
|
||||
const CLAUDE_THINKING = [
|
||||
{ id: 'low', label: 'Low' },
|
||||
{ id: 'medium', label: 'Medium' },
|
||||
@@ -41,6 +53,10 @@ const CLAUDE_THINKING = [
|
||||
];
|
||||
|
||||
export const PROVIDER_MANIFEST: Record<string, ProviderManifestEntry> = {
|
||||
boocode: {
|
||||
defaultModeId: 'ask',
|
||||
modes: BOOCODE_MODES,
|
||||
},
|
||||
claude: {
|
||||
defaultModeId: 'default',
|
||||
modes: CLAUDE_MODES,
|
||||
|
||||
@@ -122,12 +122,14 @@ async function buildProviderEntry(
|
||||
};
|
||||
}
|
||||
|
||||
// 2. Native boocode → always ready (llama-swap models).
|
||||
// 2. Native boocode → always ready (llama-swap models). Exposes the unified
|
||||
// permission modes (plan/ask/bypass) so the composer's permission picker works
|
||||
// for native BooCode too; `bypass` auto-applies staged edits (dispatcher.ts).
|
||||
if (isNative) {
|
||||
return {
|
||||
name, label: resolved.label, transport, status: 'ready',
|
||||
enabled: true, installed: true, models: withConfigModels(llamaModels), modes: [],
|
||||
defaultModeId: null, commands: manifestCommands,
|
||||
enabled: true, installed: true, models: withConfigModels(llamaModels),
|
||||
modes: fallbackModes, defaultModeId, commands: manifestCommands,
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,38 @@
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { runWithInferenceContext, getInferenceContext } from '../inference_context.js';
|
||||
import type { Sql } from '../../../db.js';
|
||||
|
||||
const fakeSql = {} as unknown as Sql;
|
||||
|
||||
describe('inference context (AsyncLocalStorage isolation)', () => {
|
||||
it('throws when read outside a run', () => {
|
||||
expect(() => getInferenceContext()).toThrow(/outside inference context/);
|
||||
});
|
||||
|
||||
it('keeps each run its own context across overlapping awaits', async () => {
|
||||
// The race the global `let current` had: run B starts (and would overwrite a
|
||||
// shared global) while run A is awaiting. After A resumes it must still read
|
||||
// its OWN sessionId, not B's.
|
||||
const run = (id: string, delay: number) =>
|
||||
runWithInferenceContext({ sql: fakeSql, sessionId: id, taskId: null }, async () => {
|
||||
await new Promise((r) => setTimeout(r, delay));
|
||||
return getInferenceContext().sessionId;
|
||||
});
|
||||
|
||||
const [a, b] = await Promise.all([run('A', 20), run('B', 5)]);
|
||||
expect(a).toBe('A');
|
||||
expect(b).toBe('B');
|
||||
});
|
||||
|
||||
it('carries permissionMode and taskId per run', async () => {
|
||||
const result = await runWithInferenceContext(
|
||||
{ sql: fakeSql, sessionId: 's1', taskId: 't1', permissionMode: 'bypass' },
|
||||
async () => {
|
||||
await Promise.resolve();
|
||||
const ctx = getInferenceContext();
|
||||
return { taskId: ctx.taskId, mode: ctx.permissionMode };
|
||||
},
|
||||
);
|
||||
expect(result).toEqual({ taskId: 't1', mode: 'bypass' });
|
||||
});
|
||||
});
|
||||
@@ -26,6 +26,15 @@ export const applyPendingTool: ToolDef<ApplyPendingInputT> = {
|
||||
},
|
||||
},
|
||||
async execute(_input: ApplyPendingInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
|
||||
// Under Ask (and Plan) the human approves via the Pending Changes panel — the
|
||||
// agent must not auto-apply. Bypass and legacy (undefined) may apply.
|
||||
if (context.permissionMode === 'ask' || context.permissionMode === 'plan') {
|
||||
return {
|
||||
status: 'denied',
|
||||
message:
|
||||
'Permission mode is Ask — staged changes must be approved by the user in the Pending Changes panel, not applied by the agent.',
|
||||
};
|
||||
}
|
||||
const results = await applyAll(context.sql, context.sessionId, projectRoot);
|
||||
const succeeded = results.filter((r) => r.success).length;
|
||||
const failed = results.filter((r) => !r.success).length;
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import { z } from 'zod';
|
||||
import type { ToolDef, ToolContext } from './types.js';
|
||||
import { queueCreate } from '../pending_changes.js';
|
||||
import { denyReadOnly, finalizeWrite } from './write-gate.js';
|
||||
|
||||
const CreateFileInput = z.object({
|
||||
file_path: z.string().min(1),
|
||||
@@ -32,6 +33,7 @@ export const createFileTool: ToolDef<CreateFileInputT> = {
|
||||
},
|
||||
},
|
||||
async execute(input: CreateFileInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
|
||||
if (context.permissionMode === 'plan') return denyReadOnly('create_file');
|
||||
const change = await queueCreate(
|
||||
context.sql,
|
||||
context.sessionId,
|
||||
@@ -40,12 +42,11 @@ export const createFileTool: ToolDef<CreateFileInputT> = {
|
||||
input.content,
|
||||
projectRoot,
|
||||
);
|
||||
return {
|
||||
status: 'queued',
|
||||
change_id: change.id,
|
||||
file_path: change.file_path,
|
||||
operation: 'create',
|
||||
message: `File creation queued: ${change.file_path}. Use apply_pending to write changes to disk.`,
|
||||
};
|
||||
return finalizeWrite(
|
||||
context,
|
||||
projectRoot,
|
||||
change,
|
||||
`File creation queued: ${change.file_path}. Use apply_pending to write changes to disk.`,
|
||||
);
|
||||
},
|
||||
};
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import { z } from 'zod';
|
||||
import type { ToolDef, ToolContext } from './types.js';
|
||||
import { queueDelete } from '../pending_changes.js';
|
||||
import { denyReadOnly, finalizeWrite } from './write-gate.js';
|
||||
|
||||
const DeleteFileInput = z.object({
|
||||
file_path: z.string().min(1),
|
||||
@@ -30,6 +31,7 @@ export const deleteFileTool: ToolDef<DeleteFileInputT> = {
|
||||
},
|
||||
},
|
||||
async execute(input: DeleteFileInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
|
||||
if (context.permissionMode === 'plan') return denyReadOnly('delete_file');
|
||||
const change = await queueDelete(
|
||||
context.sql,
|
||||
context.sessionId,
|
||||
@@ -37,12 +39,11 @@ export const deleteFileTool: ToolDef<DeleteFileInputT> = {
|
||||
input.file_path,
|
||||
projectRoot,
|
||||
);
|
||||
return {
|
||||
status: 'queued',
|
||||
change_id: change.id,
|
||||
file_path: change.file_path,
|
||||
operation: 'delete',
|
||||
message: `File deletion queued: ${change.file_path}. Use apply_pending to write changes to disk.`,
|
||||
};
|
||||
return finalizeWrite(
|
||||
context,
|
||||
projectRoot,
|
||||
change,
|
||||
`File deletion queued: ${change.file_path}. Use apply_pending to write changes to disk.`,
|
||||
);
|
||||
},
|
||||
};
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import { z } from 'zod';
|
||||
import type { ToolDef, ToolContext } from './types.js';
|
||||
import { queueEdit } from '../pending_changes.js';
|
||||
import { denyReadOnly, finalizeWrite } from './write-gate.js';
|
||||
|
||||
const EditFileInput = z.object({
|
||||
file_path: z.string().min(1),
|
||||
@@ -34,6 +35,7 @@ export const editFileTool: ToolDef<EditFileInputT> = {
|
||||
},
|
||||
},
|
||||
async execute(input: EditFileInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
|
||||
if (context.permissionMode === 'plan') return denyReadOnly('edit_file');
|
||||
const change = await queueEdit(
|
||||
context.sql,
|
||||
context.sessionId,
|
||||
@@ -43,12 +45,11 @@ export const editFileTool: ToolDef<EditFileInputT> = {
|
||||
input.new_string,
|
||||
projectRoot,
|
||||
);
|
||||
return {
|
||||
status: 'queued',
|
||||
change_id: change.id,
|
||||
file_path: change.file_path,
|
||||
operation: 'edit',
|
||||
message: `Edit queued for ${change.file_path}. Use apply_pending to write changes to disk.`,
|
||||
};
|
||||
return finalizeWrite(
|
||||
context,
|
||||
projectRoot,
|
||||
change,
|
||||
`Edit queued for ${change.file_path}. Use apply_pending to write changes to disk.`,
|
||||
);
|
||||
},
|
||||
};
|
||||
|
||||
@@ -1,36 +1,49 @@
|
||||
import { AsyncLocalStorage } from 'node:async_hooks';
|
||||
import type { Sql } from '../../db.js';
|
||||
import type { PermissionMode } from './types.js';
|
||||
|
||||
/**
|
||||
* Module-level inference context for write tools.
|
||||
* Per-run inference context for write tools.
|
||||
*
|
||||
* Set via `setInferenceContext()` before each inference run starts.
|
||||
* Write tools read it via `getInferenceContext()` during execute.
|
||||
* Same pattern as BooChat's `loadConfig()` singleton — tools need
|
||||
* ambient state that can't be threaded through the tool-phase execute
|
||||
* signature (which is `execute(input, projectRoot, extraRoots?)`).
|
||||
* Write tools need ambient state (sql, sessionId, the permission gate) that the
|
||||
* BooChat tool-phase `execute(input, projectRoot, extraRoots?)` signature can't
|
||||
* carry. This used to be a single module-level `let current` — but the inference
|
||||
* runner's `enqueue()` is fire-and-forget, so two overlapping runs (a user
|
||||
* message racing a dispatcher-polled native task; two chat tabs streaming) would
|
||||
* clobber each other's context, and `cancel()` cleared it for ALL in-flight runs.
|
||||
*
|
||||
* AsyncLocalStorage gives each run its own context: `enqueue()` starts its async
|
||||
* loop synchronously inside `runWithInferenceContext`, so the store propagates
|
||||
* through every awaited tool execution in that run — and only that run.
|
||||
*/
|
||||
|
||||
export interface InferenceContext {
|
||||
sql: Sql;
|
||||
sessionId: string;
|
||||
taskId: string | null;
|
||||
/** Native-BooCode permission gate, set per run from the request/task mode. */
|
||||
permissionMode?: PermissionMode;
|
||||
}
|
||||
|
||||
let current: InferenceContext | null = null;
|
||||
const storage = new AsyncLocalStorage<InferenceContext>();
|
||||
|
||||
export function setInferenceContext(ctx: InferenceContext): void {
|
||||
current = ctx;
|
||||
}
|
||||
|
||||
export function clearInferenceContext(): void {
|
||||
current = null;
|
||||
/**
|
||||
* Bind `ctx` for the duration of the (possibly detached) async chain `fn` starts.
|
||||
* The inference runner kicks off its loop synchronously within this call, so all
|
||||
* downstream `await`s — including write-tool `execute` via the adapter — read the
|
||||
* same store. Concurrent runs each get their own; nothing is shared or cleared
|
||||
* out from under an in-flight run.
|
||||
*/
|
||||
export function runWithInferenceContext<T>(ctx: InferenceContext, fn: () => T): T {
|
||||
return storage.run(ctx, fn);
|
||||
}
|
||||
|
||||
export function getInferenceContext(): InferenceContext {
|
||||
if (!current) {
|
||||
const ctx = storage.getStore();
|
||||
if (!ctx) {
|
||||
throw new Error(
|
||||
'Write tool called outside inference context — setInferenceContext() was not called before this run',
|
||||
'Write tool called outside inference context — runWithInferenceContext() did not wrap this run',
|
||||
);
|
||||
}
|
||||
return current;
|
||||
return ctx;
|
||||
}
|
||||
|
||||
@@ -1,6 +1,22 @@
|
||||
import type { z } from 'zod';
|
||||
import type { Sql } from '../../db.js';
|
||||
|
||||
/**
|
||||
* Unified permission ladder for native BooCode inference. Gates the write tools:
|
||||
* plan — read-only: create/edit/delete are denied (no staging).
|
||||
* ask — stage to the pending-changes queue; `apply_pending` is denied so the
|
||||
* agent cannot self-apply (the human approves via the Diff panel).
|
||||
* bypass — apply each write immediately (no queue, no approval).
|
||||
* Undefined preserves the historical behavior (stage + `apply_pending` allowed).
|
||||
*/
|
||||
export type PermissionMode = 'plan' | 'ask' | 'bypass';
|
||||
|
||||
/** Narrow a raw task/request mode id to a unified PermissionMode, else undefined
|
||||
* (e.g. an external agent's native mode id, or null). */
|
||||
export function asPermissionMode(id: string | null | undefined): PermissionMode | undefined {
|
||||
return id === 'plan' || id === 'ask' || id === 'bypass' ? id : undefined;
|
||||
}
|
||||
|
||||
export interface ToolJsonSchema {
|
||||
type: 'function';
|
||||
function: {
|
||||
@@ -21,6 +37,8 @@ export interface ToolContext {
|
||||
sql: Sql;
|
||||
sessionId: string;
|
||||
taskId: string | null;
|
||||
/** Native-BooCode permission gate for write tools (undefined = legacy behavior). */
|
||||
permissionMode?: PermissionMode;
|
||||
}
|
||||
|
||||
export interface ToolDef<TInput> {
|
||||
|
||||
53
apps/coder/src/services/tools/write-gate.ts
Normal file
53
apps/coder/src/services/tools/write-gate.ts
Normal file
@@ -0,0 +1,53 @@
|
||||
/**
|
||||
* Permission-gate helpers for native BooCode write tools. The gate comes from
|
||||
* the per-run inference context (`ToolContext.permissionMode`):
|
||||
* plan — deny the write (read-only); nothing is staged.
|
||||
* bypass — apply the staged change immediately (no queue, no approval).
|
||||
* ask / undefined — leave it in the pending-changes queue for review.
|
||||
*/
|
||||
import type { ToolContext } from './types.js';
|
||||
import { applyOne } from '../pending_changes.js';
|
||||
|
||||
/** Result returned when a write is denied under Plan (read-only) mode. */
|
||||
export function denyReadOnly(operation: string): unknown {
|
||||
return {
|
||||
status: 'denied',
|
||||
operation,
|
||||
message: `Read-only (Plan) permission mode — ${operation} is not permitted. Switch to Ask or Bypass to make changes.`,
|
||||
};
|
||||
}
|
||||
|
||||
/** Finalize a just-staged change per the permission gate: apply now under Bypass,
|
||||
* otherwise return it as queued for the human to approve. */
|
||||
export async function finalizeWrite(
|
||||
context: ToolContext,
|
||||
projectRoot: string,
|
||||
change: { id: string; file_path: string; operation: string },
|
||||
queuedHint: string,
|
||||
): Promise<unknown> {
|
||||
if (context.permissionMode === 'bypass') {
|
||||
const res = await applyOne(context.sql, change.id, projectRoot);
|
||||
console.log(
|
||||
`[write-gate] bypass apply ${change.operation} ${change.file_path} -> ${res.success ? 'applied' : 'FAILED: ' + (res.error ?? '?')}`,
|
||||
);
|
||||
return {
|
||||
status: res.success ? 'applied' : 'failed',
|
||||
change_id: change.id,
|
||||
file_path: change.file_path,
|
||||
operation: change.operation,
|
||||
message: res.success
|
||||
? `${change.operation} applied to ${change.file_path}.`
|
||||
: `Apply failed for ${change.file_path}: ${res.error ?? 'unknown error'}. Left in the pending queue.`,
|
||||
};
|
||||
}
|
||||
console.log(
|
||||
`[write-gate] ${context.permissionMode ?? 'legacy'} queued ${change.operation} ${change.file_path}`,
|
||||
);
|
||||
return {
|
||||
status: 'queued',
|
||||
change_id: change.id,
|
||||
file_path: change.file_path,
|
||||
operation: change.operation,
|
||||
message: queuedHint,
|
||||
};
|
||||
}
|
||||
@@ -372,13 +372,12 @@ ALTER TABLE messages ADD COLUMN IF NOT EXISTS tail_start_id UUID REFERENCES mess
|
||||
ALTER TABLE chats ADD COLUMN IF NOT EXISTS needs_compaction BOOLEAN NOT NULL DEFAULT FALSE;
|
||||
CREATE INDEX IF NOT EXISTS idx_messages_chat_compacted ON messages (chat_id, compacted_at);
|
||||
|
||||
-- tasks table (provider dispatch, arena)
|
||||
-- tasks table (provider dispatch)
|
||||
CREATE TABLE IF NOT EXISTS tasks (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||
session_id UUID REFERENCES sessions(id) ON DELETE CASCADE,
|
||||
parent_task_id UUID REFERENCES tasks(id),
|
||||
arena_id UUID,
|
||||
state TEXT NOT NULL DEFAULT 'pending'
|
||||
CHECK (state IN ('pending','running','completed','failed','blocked','cancelled')),
|
||||
input TEXT NOT NULL,
|
||||
@@ -405,3 +404,6 @@ DO $$ BEGIN
|
||||
FOREIGN KEY (session_id) REFERENCES sessions(id) ON DELETE CASCADE;
|
||||
END IF;
|
||||
END $$;
|
||||
|
||||
-- Remove the v2.0.5 arena_id column (replaced by the new Arena feature).
|
||||
ALTER TABLE tasks DROP COLUMN IF EXISTS arena_id;
|
||||
|
||||
@@ -44,7 +44,11 @@ export interface InferenceFrame {
|
||||
| 'chat_renamed'
|
||||
| 'error'
|
||||
| 'flow_run_started'
|
||||
| 'flow_run_step_updated';
|
||||
| 'flow_run_step_updated'
|
||||
// arena frames
|
||||
| 'battle_started'
|
||||
| 'contestant_updated'
|
||||
| 'battle_updated';
|
||||
message_id?: string;
|
||||
message_ids?: string[];
|
||||
chat_id?: string;
|
||||
@@ -84,6 +88,19 @@ export interface InferenceFrame {
|
||||
status?: string;
|
||||
run_status?: 'running' | 'completed' | 'failed' | 'cancelled';
|
||||
report?: string;
|
||||
// arena frames
|
||||
battle_id?: string;
|
||||
battle_type?: 'coding' | 'qa';
|
||||
prompt?: string;
|
||||
contestants?: Array<{ id: string; identity: string; model: string; lane: 'local' | 'cloud' }>;
|
||||
contestant_id?: string;
|
||||
battle_status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
|
||||
duration_ms?: number;
|
||||
tokens_per_sec?: number;
|
||||
winner_contestant_id?: string | null;
|
||||
analysis_ready?: boolean;
|
||||
cross_exam_id?: string;
|
||||
delta?: string;
|
||||
}
|
||||
|
||||
export type FramePublisher = (sessionId: string, frame: InferenceFrame) => void;
|
||||
|
||||
@@ -16,6 +16,7 @@ import { RightRailDrawerProvider, useRightRailDrawer } from '@/hooks/useRightRai
|
||||
import { useViewport } from '@/hooks/useViewport';
|
||||
import { ThemeFx } from '@/components/fx/ThemeFx';
|
||||
import { FlowLauncherDialog } from '@/components/FlowLauncherDialog';
|
||||
import { ArenaLauncherDialog } from '@/components/ArenaLauncherDialog';
|
||||
|
||||
function SessionRightRail() {
|
||||
const { id } = useParams<{ id: string }>();
|
||||
@@ -102,6 +103,7 @@ function AppShell() {
|
||||
</Routes>
|
||||
<Toaster position="bottom-right" />
|
||||
<FlowLauncherDialog />
|
||||
<ArenaLauncherDialog />
|
||||
</div>
|
||||
</>
|
||||
);
|
||||
|
||||
@@ -27,6 +27,9 @@ import type {
|
||||
WorkspaceState,
|
||||
FlowRunRow,
|
||||
FlowStepRow,
|
||||
BattleShape,
|
||||
ContestantShape,
|
||||
CrossExaminationShape,
|
||||
} from './types';
|
||||
|
||||
// v2.6 Phase 1-UX §9b: chat-scoped agent-session rows. Returned by
|
||||
@@ -518,6 +521,63 @@ export const api = {
|
||||
request<AgentsResponse>(`/api/projects/${projectId}/agents`),
|
||||
},
|
||||
|
||||
// Arena battle API — proxied to boocoder at /api/coder/battles/*.
|
||||
battles: {
|
||||
create: (body: {
|
||||
project_id: string;
|
||||
battle_type: 'coding' | 'qa';
|
||||
prompt: string;
|
||||
contestants: Array<{ identity: string; model: string }>;
|
||||
}) =>
|
||||
request<{ battle_id: string }>('/api/coder/battles', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify(body),
|
||||
}),
|
||||
list: (projectId: string) =>
|
||||
request<{ battles: BattleShape[] }>(
|
||||
`/api/coder/battles?project_id=${encodeURIComponent(projectId)}`,
|
||||
),
|
||||
get: (battleId: string) =>
|
||||
request<{
|
||||
battle: BattleShape;
|
||||
contestants: ContestantShape[];
|
||||
cross_examinations: CrossExaminationShape[];
|
||||
}>(`/api/coder/battles/${encodeURIComponent(battleId)}`),
|
||||
stop: (battleId: string) =>
|
||||
request<{ cancelled: boolean }>(
|
||||
`/api/coder/battles/${encodeURIComponent(battleId)}/stop`,
|
||||
{ method: 'POST' },
|
||||
),
|
||||
analyze: (battleId: string) =>
|
||||
request<{ triggered: boolean }>(
|
||||
`/api/coder/battles/${encodeURIComponent(battleId)}/analyze`,
|
||||
{ method: 'POST' },
|
||||
),
|
||||
crossExamine: (battleId: string, body: { identity: string; model: string }) =>
|
||||
request<{ cross_exam_id: string }>(
|
||||
`/api/coder/battles/${encodeURIComponent(battleId)}/cross-examine`,
|
||||
{ method: 'POST', body: JSON.stringify(body) },
|
||||
),
|
||||
getAnalysis: (battleId: string) =>
|
||||
request<{ text: string }>(
|
||||
`/api/coder/battles/${encodeURIComponent(battleId)}/analysis`,
|
||||
),
|
||||
generatePrompt: (description: string) =>
|
||||
request<{ prompt: string }>('/api/coder/battles/generate-prompt', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ description }),
|
||||
}),
|
||||
setWinner: (battleId: string, body: { winner_contestant_id: string | null }) =>
|
||||
request<{ ok: boolean }>(
|
||||
`/api/coder/battles/${encodeURIComponent(battleId)}/winner`,
|
||||
{ method: 'PATCH', body: JSON.stringify(body) },
|
||||
),
|
||||
getDiff: (battleId: string, contestantId: string) =>
|
||||
request<{ diff: string }>(
|
||||
`/api/coder/battles/${encodeURIComponent(battleId)}/contestants/${encodeURIComponent(contestantId)}/diff`,
|
||||
),
|
||||
},
|
||||
|
||||
skills: {
|
||||
list: () => request<{ skills: Skill[] }>('/api/skills'),
|
||||
},
|
||||
|
||||
@@ -391,7 +391,8 @@ export type WorkspacePaneKind =
|
||||
| 'settings'
|
||||
| 'markdown_artifact'
|
||||
| 'html_artifact'
|
||||
| 'orchestrator';
|
||||
| 'orchestrator'
|
||||
| 'arena';
|
||||
|
||||
// Mixed tabs: a pane can hold tabs of different kinds (a BooChat tab next to a
|
||||
// BooCode tab next to a Terminal tab). Each tab carries its own kind; the active
|
||||
@@ -424,6 +425,10 @@ export interface OrchestratorState {
|
||||
band: 'small' | 'medium' | 'large';
|
||||
}
|
||||
|
||||
// Arena pane state — single-sourced in @boocode/contracts; edit the package, not here.
|
||||
import type { ArenaState, BattleShape, ContestantShape, CrossExaminationShape, BattleType, BattleStatus, ContestantStatus, ContestantLane } from '@boocode/contracts/arena';
|
||||
export type { ArenaState, BattleShape, ContestantShape, CrossExaminationShape, BattleType, BattleStatus, ContestantStatus, ContestantLane };
|
||||
|
||||
// Orchestrator run API types (returned by GET /api/coder/runs/:id).
|
||||
export interface FlowRunRow {
|
||||
id: string;
|
||||
@@ -475,6 +480,8 @@ export interface WorkspacePane {
|
||||
html_artifact_state?: HtmlArtifactState;
|
||||
// orchestrator pane: populated only when kind === 'orchestrator'.
|
||||
orchestrator_state?: OrchestratorState;
|
||||
// arena pane: populated only when kind === 'arena'.
|
||||
arena_state?: ArenaState;
|
||||
}
|
||||
|
||||
// Reopen LIFO stack entry. Shape unchanged from the prior module-level stack;
|
||||
@@ -592,4 +599,31 @@ export type WsFrame =
|
||||
status: 'pending' | 'running' | 'completed' | 'failed' | 'skipped' | 'cancelled';
|
||||
run_status?: 'running' | 'completed' | 'failed' | 'cancelled';
|
||||
report?: string;
|
||||
}
|
||||
// arena frames: battle lifecycle + per-contestant streaming
|
||||
| {
|
||||
type: 'battle_started';
|
||||
battle_id: string;
|
||||
battle_type: 'coding' | 'qa';
|
||||
prompt: string;
|
||||
contestants: Array<{ id: string; identity: string; model: string; lane: 'local' | 'cloud' }>;
|
||||
}
|
||||
| {
|
||||
type: 'contestant_updated';
|
||||
battle_id: string;
|
||||
contestant_id: string;
|
||||
status?: 'queued' | 'running' | 'done' | 'error';
|
||||
duration_ms?: number;
|
||||
tokens_per_sec?: number;
|
||||
battle_status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
|
||||
delta?: string;
|
||||
error?: string;
|
||||
}
|
||||
| {
|
||||
type: 'battle_updated';
|
||||
battle_id: string;
|
||||
status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
|
||||
winner_contestant_id?: string | null;
|
||||
analysis_ready?: boolean;
|
||||
cross_exam_id?: string;
|
||||
};
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
import { useEffect, useMemo, useRef, useState } from 'react';
|
||||
import { Check, ChevronDown, RefreshCw, Loader2, Shield, Brain, Bot } from 'lucide-react';
|
||||
import { Check, ChevronDown, RefreshCw, Loader2, Shield, ShieldAlert, Eye, Brain, Bot } from 'lucide-react';
|
||||
import { api } from '@/api/client';
|
||||
import type { AgentSessionConfig, ProviderSnapshotEntry, AgentCommand } from '@/api/types';
|
||||
import { useProviderSnapshot, refreshProviderSnapshot } from '@/hooks/useProviderSnapshot';
|
||||
@@ -14,8 +14,22 @@ import {
|
||||
import { BottomSheet } from '@/components/BottomSheet';
|
||||
import { useViewport } from '@/hooks/useViewport';
|
||||
import { formatModelLabel } from '@/lib/model-label';
|
||||
import {
|
||||
availablePermissionModes,
|
||||
permissionForModeId,
|
||||
nativeModeForPermission,
|
||||
type PermissionMode,
|
||||
} from '@/lib/permission-mode';
|
||||
import { cn } from '@/lib/utils';
|
||||
|
||||
// Permission picker icon — varies with the active mode so the (icon-only) control
|
||||
// is glanceable: Eye = Plan (read-only), Shield = Ask, ShieldAlert = Bypass.
|
||||
function permissionIcon(mode: PermissionMode): React.ReactNode {
|
||||
if (mode === 'plan') return <Eye className="size-3 shrink-0" />;
|
||||
if (mode === 'bypass') return <ShieldAlert className="size-3 shrink-0 text-amber-500" />;
|
||||
return <Shield className="size-3 shrink-0" />;
|
||||
}
|
||||
|
||||
const PREFS_KEY = 'boocode.coder.agent-prefs';
|
||||
|
||||
|
||||
@@ -350,7 +364,11 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
|
||||
}
|
||||
|
||||
const providerOptions = entries.map((e) => ({ id: e.name, label: e.label }));
|
||||
const modeOptions = (currentEntry?.modes ?? []).map((m) => ({ id: m.id, label: m.label }));
|
||||
// Unified permission ladder (Plan / Ask / Bypass) mapped onto this provider's
|
||||
// native modes. `value.modeId` stays the wire field; the active unified mode is
|
||||
// derived from it.
|
||||
const permissionModes = availablePermissionModes(currentEntry?.modes ?? []);
|
||||
const currentPermission = permissionForModeId(value.modeId, currentEntry?.modes ?? []);
|
||||
const modelOptions = (currentEntry?.models ?? []).map((m) => ({ id: m.id, label: formatModelLabel(m.label) }));
|
||||
const thinkingOpts = thinkingOptions.map((t) => ({ id: t.id, label: t.label }));
|
||||
|
||||
@@ -380,15 +398,25 @@ export function AgentComposerBar({ projectPath, value, onChange, onProviderComma
|
||||
</>
|
||||
}
|
||||
/>
|
||||
{/* Mode (shield) only when the provider actually exposes modes. Native
|
||||
BooCoder has none, so it's hidden rather than shown disabled. */}
|
||||
{modeOptions.length > 0 && (
|
||||
{/* Permission ladder (Plan / Ask / Bypass) — shown when the provider exposes
|
||||
modes. Picks the unified mode; we resolve it to the provider's native
|
||||
modeId. Icon varies with the active mode (Bypass is amber). */}
|
||||
{permissionModes.length > 0 && (
|
||||
<CompactPicker
|
||||
label="Mode"
|
||||
value={value.modeId ?? ''}
|
||||
options={modeOptions}
|
||||
onPick={(modeId) => persist({ ...value, modeId })}
|
||||
icon={<Shield className="size-3 shrink-0" />}
|
||||
label="Permission"
|
||||
value={currentPermission}
|
||||
options={permissionModes}
|
||||
onPick={(perm) =>
|
||||
persist({
|
||||
...value,
|
||||
modeId: nativeModeForPermission(
|
||||
perm as PermissionMode,
|
||||
currentEntry?.modes ?? [],
|
||||
currentEntry?.defaultModeId ?? null,
|
||||
),
|
||||
})
|
||||
}
|
||||
icon={permissionIcon(currentPermission)}
|
||||
iconOnly
|
||||
/>
|
||||
)}
|
||||
|
||||
410
apps/web/src/components/ArenaLauncherDialog.tsx
Normal file
410
apps/web/src/components/ArenaLauncherDialog.tsx
Normal file
@@ -0,0 +1,410 @@
|
||||
// ArenaLauncherDialog — mirrors FlowLauncherDialog.
|
||||
// Opens via sessionEvents 'open_arena_launcher'.
|
||||
// Flow: pick Battle Type → write/generate prompt → add 2–6 contestants → Start.
|
||||
|
||||
import { useCallback, useEffect, useRef, useState } from 'react';
|
||||
import { Loader2, Minus, Plus, Swords, TriangleAlert, X } from 'lucide-react';
|
||||
import { toast } from 'sonner';
|
||||
import {
|
||||
Dialog,
|
||||
DialogContent,
|
||||
DialogFooter,
|
||||
DialogHeader,
|
||||
DialogTitle,
|
||||
} from '@/components/ui/dialog';
|
||||
import { Button } from '@/components/ui/button';
|
||||
import { Label } from '@/components/ui/label';
|
||||
import { api } from '@/api/client';
|
||||
import type { Agent, ProviderSnapshotEntry } from '@/api/types';
|
||||
import { sessionEvents } from '@/hooks/sessionEvents';
|
||||
import { useProviderSnapshot } from '@/hooks/useProviderSnapshot';
|
||||
import { cn } from '@/lib/utils';
|
||||
|
||||
// ─── types ────────────────────────────────────────────────────────────────────
|
||||
|
||||
type BattleType = 'coding' | 'qa';
|
||||
|
||||
interface Contestant {
|
||||
key: string; // local unique key for React
|
||||
identity: string;
|
||||
model: string;
|
||||
}
|
||||
|
||||
// ─── helpers ─────────────────────────────────────────────────────────────────
|
||||
|
||||
function newContestant(): Contestant {
|
||||
return { key: crypto.randomUUID(), identity: '', model: '' };
|
||||
}
|
||||
|
||||
function isDuplicate(contestants: Contestant[], c: Contestant): boolean {
|
||||
const dups = contestants.filter(
|
||||
(x) => x.key !== c.key && x.identity === c.identity && x.model === c.model && x.identity !== '',
|
||||
);
|
||||
return dups.length > 0;
|
||||
}
|
||||
|
||||
function hasDuplicatePair(contestants: Contestant[]): boolean {
|
||||
return contestants.some((c) => isDuplicate(contestants, c));
|
||||
}
|
||||
|
||||
function localCount(battleType: BattleType, contestants: Contestant[], snapshot: ProviderSnapshotEntry[] | null): number {
|
||||
if (battleType === 'qa') return contestants.filter((c) => c.identity !== '').length;
|
||||
const boocode = snapshot?.find((e) => e.name === 'boocode');
|
||||
const localModelIds = new Set(boocode?.models.map((m) => m.id) ?? []);
|
||||
return contestants.filter((c) => {
|
||||
// Match bare IDs (boocode/native) and llama-swap/-prefixed IDs used by
|
||||
// opencode and other external agents pointing at the local llama-swap server.
|
||||
return localModelIds.has(c.model) || localModelIds.has(c.model.replace(/^llama-swap\//, ''));
|
||||
}).length;
|
||||
}
|
||||
|
||||
// ─── ContestantRow ────────────────────────────────────────────────────────────
|
||||
|
||||
function ContestantRow({
|
||||
contestant,
|
||||
battleType,
|
||||
snapshot,
|
||||
agents,
|
||||
allContestants,
|
||||
onUpdate,
|
||||
onRemove,
|
||||
removable,
|
||||
}: {
|
||||
contestant: Contestant;
|
||||
battleType: BattleType;
|
||||
snapshot: ProviderSnapshotEntry[] | null;
|
||||
agents: Agent[];
|
||||
allContestants: Contestant[];
|
||||
onUpdate: (patch: Partial<Contestant>) => void;
|
||||
onRemove: () => void;
|
||||
removable: boolean;
|
||||
}) {
|
||||
const dup = isDuplicate(allContestants, contestant);
|
||||
|
||||
// Identity options for Coding: installed provider names.
|
||||
// Identity options for Q&A: agents by id.
|
||||
const identityOptions =
|
||||
battleType === 'coding'
|
||||
? (snapshot ?? [])
|
||||
.filter((e) => e.installed && e.enabled)
|
||||
.map((e) => ({ value: e.name, label: e.label }))
|
||||
: agents.map((a) => ({ value: a.id, label: a.name }));
|
||||
|
||||
// Model options: for Coding use the selected provider's models; for Q&A use boocode models.
|
||||
const modelOptions: { value: string; label: string }[] = (() => {
|
||||
if (battleType === 'coding') {
|
||||
const provider = (snapshot ?? []).find((e) => e.name === contestant.identity);
|
||||
return (provider?.models ?? []).map((m) => ({ value: m.id, label: m.label }));
|
||||
}
|
||||
// Q&A: native backend only — use boocode models
|
||||
const boocode = (snapshot ?? []).find((e) => e.name === 'boocode');
|
||||
return (boocode?.models ?? []).map((m) => ({ value: m.id, label: m.label }));
|
||||
})();
|
||||
|
||||
function handleIdentityChange(value: string) {
|
||||
// Reset model when identity changes so stale model doesn't persist.
|
||||
onUpdate({ identity: value, model: '' });
|
||||
}
|
||||
|
||||
function handleModelChange(value: string) {
|
||||
onUpdate({ model: value });
|
||||
}
|
||||
|
||||
return (
|
||||
<div className={cn('flex items-center gap-2', dup && 'opacity-60')}>
|
||||
<select
|
||||
value={contestant.identity}
|
||||
onChange={(e) => handleIdentityChange(e.target.value)}
|
||||
className="flex-1 min-w-0 text-xs border border-border rounded bg-background px-2 py-1.5 text-foreground focus:outline-none focus:ring-1 focus:ring-ring"
|
||||
aria-label={battleType === 'coding' ? 'Backend' : 'Persona'}
|
||||
>
|
||||
<option value="">{battleType === 'coding' ? 'Backend…' : 'Persona…'}</option>
|
||||
{identityOptions.map((o) => (
|
||||
<option key={o.value} value={o.value}>{o.label}</option>
|
||||
))}
|
||||
</select>
|
||||
<select
|
||||
value={contestant.model}
|
||||
onChange={(e) => handleModelChange(e.target.value)}
|
||||
disabled={!contestant.identity}
|
||||
className="flex-1 min-w-0 text-xs border border-border rounded bg-background px-2 py-1.5 text-foreground focus:outline-none focus:ring-1 focus:ring-ring disabled:opacity-50"
|
||||
aria-label="Model"
|
||||
>
|
||||
<option value="">Model…</option>
|
||||
{modelOptions.map((o) => (
|
||||
<option key={o.value} value={o.value}>{o.label}</option>
|
||||
))}
|
||||
</select>
|
||||
{dup && (
|
||||
<span title="Duplicate contestant" className="shrink-0 text-destructive">
|
||||
<TriangleAlert size={12} />
|
||||
</span>
|
||||
)}
|
||||
{removable && (
|
||||
<button
|
||||
type="button"
|
||||
onClick={onRemove}
|
||||
className="shrink-0 inline-flex items-center justify-center p-1 rounded text-muted-foreground hover:bg-muted hover:text-foreground"
|
||||
aria-label="Remove contestant"
|
||||
>
|
||||
<Minus size={12} />
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// ─── ArenaLauncherDialog ──────────────────────────────────────────────────────
|
||||
|
||||
export function ArenaLauncherDialog() {
|
||||
const [open, setOpen] = useState(false);
|
||||
const [projectId, setProjectId] = useState('');
|
||||
const [placement, setPlacement] = useState<'new' | 'split'>('new');
|
||||
const [battleType, setBattleType] = useState<BattleType>('coding');
|
||||
const [prompt, setPrompt] = useState('');
|
||||
const [contestants, setContestants] = useState<Contestant[]>(() => [
|
||||
newContestant(),
|
||||
newContestant(),
|
||||
]);
|
||||
const [generating, setGenerating] = useState(false);
|
||||
const [starting, setStarting] = useState(false);
|
||||
const [agents, setAgents] = useState<Agent[]>([]);
|
||||
const promptRef = useRef<HTMLTextAreaElement>(null);
|
||||
|
||||
const snapshot = useProviderSnapshot();
|
||||
|
||||
useEffect(() => {
|
||||
return sessionEvents.subscribe((ev) => {
|
||||
if (ev.type !== 'open_arena_launcher') return;
|
||||
setProjectId(ev.project_id);
|
||||
setPlacement(ev.placement ?? 'new');
|
||||
setBattleType('coding');
|
||||
setPrompt('');
|
||||
setContestants([newContestant(), newContestant()]);
|
||||
setGenerating(false);
|
||||
setStarting(false);
|
||||
setOpen(true);
|
||||
});
|
||||
}, []);
|
||||
|
||||
// Load agents list when dialog opens (for Q&A mode).
|
||||
useEffect(() => {
|
||||
if (!open || !projectId) return;
|
||||
api.agents.list(projectId)
|
||||
.then((r) => setAgents(r.agents))
|
||||
.catch(() => {});
|
||||
}, [open, projectId]);
|
||||
|
||||
const handleGeneratePrompt = useCallback(async () => {
|
||||
const description = prompt.trim();
|
||||
if (!description || generating) return;
|
||||
setGenerating(true);
|
||||
try {
|
||||
const { prompt: generated } = await api.battles.generatePrompt(description);
|
||||
setPrompt(generated);
|
||||
promptRef.current?.focus();
|
||||
} catch (err) {
|
||||
toast.error(err instanceof Error ? err.message : 'Generate failed');
|
||||
} finally {
|
||||
setGenerating(false);
|
||||
}
|
||||
}, [prompt, generating]);
|
||||
|
||||
function updateContestant(key: string, patch: Partial<Contestant>) {
|
||||
setContestants((prev) => prev.map((c) => (c.key === key ? { ...c, ...patch } : c)));
|
||||
}
|
||||
|
||||
function removeContestant(key: string) {
|
||||
setContestants((prev) => prev.filter((c) => c.key !== key));
|
||||
}
|
||||
|
||||
function addContestant() {
|
||||
if (contestants.length >= 6) return;
|
||||
setContestants((prev) => [...prev, newContestant()]);
|
||||
}
|
||||
|
||||
const canStart =
|
||||
!starting &&
|
||||
prompt.trim().length > 0 &&
|
||||
contestants.length >= 2 &&
|
||||
contestants.every((c) => c.identity !== '' && c.model !== '') &&
|
||||
!hasDuplicatePair(contestants);
|
||||
|
||||
const localLaneCount = localCount(battleType, contestants, snapshot);
|
||||
const showLocalWarning = localLaneCount >= 3;
|
||||
|
||||
async function handleStart() {
|
||||
if (!canStart) return;
|
||||
setStarting(true);
|
||||
try {
|
||||
const { battle_id } = await api.battles.create({
|
||||
project_id: projectId,
|
||||
battle_type: battleType,
|
||||
prompt: prompt.trim(),
|
||||
contestants: contestants.map((c) => ({ identity: c.identity, model: c.model })),
|
||||
});
|
||||
sessionEvents.emit({
|
||||
type: 'open_arena_pane',
|
||||
state: { battle_id, battle_type: battleType, prompt: prompt.trim() },
|
||||
placement,
|
||||
});
|
||||
setOpen(false);
|
||||
} catch (err) {
|
||||
toast.error(err instanceof Error ? err.message : 'Failed to start battle');
|
||||
} finally {
|
||||
setStarting(false);
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<Dialog open={open} onOpenChange={setOpen}>
|
||||
<DialogContent
|
||||
className="flex flex-col gap-0 p-0 max-h-[85vh] sm:max-w-lg overflow-hidden"
|
||||
showCloseButton={false}
|
||||
>
|
||||
<DialogHeader className="gap-1.5 px-4 pt-4 pb-3 border-b shrink-0">
|
||||
<div className="flex items-center gap-2">
|
||||
<Swords size={14} className="text-muted-foreground shrink-0" />
|
||||
<DialogTitle className="text-sm font-medium">New Arena Battle</DialogTitle>
|
||||
</div>
|
||||
<p className="text-xs text-muted-foreground">
|
||||
Run the same prompt against multiple AI competitors and pick the best result.
|
||||
</p>
|
||||
</DialogHeader>
|
||||
|
||||
<div className="flex flex-col gap-4 overflow-y-auto overscroll-contain px-4 py-3">
|
||||
{/* Battle type */}
|
||||
<div className="flex flex-col gap-1.5">
|
||||
<Label className="text-xs text-muted-foreground">Battle type</Label>
|
||||
<div className="flex gap-1.5">
|
||||
{(['coding', 'qa'] as const).map((t) => (
|
||||
<button
|
||||
key={t}
|
||||
type="button"
|
||||
onClick={() => { setBattleType(t); setContestants([newContestant(), newContestant()]); }}
|
||||
aria-pressed={battleType === t}
|
||||
className={cn(
|
||||
'flex-1 rounded-lg border py-1.5 text-xs transition-colors capitalize',
|
||||
battleType === t
|
||||
? 'border-primary bg-primary/10 text-primary font-medium'
|
||||
: 'border-border text-muted-foreground hover:bg-muted hover:text-foreground',
|
||||
)}
|
||||
>
|
||||
{t === 'coding' ? 'Coding' : 'Q&A'}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
<p className="text-xs text-muted-foreground">
|
||||
{battleType === 'coding'
|
||||
? 'Each contestant works in its own isolated worktree. Results include a diff.'
|
||||
: 'Contestants answer the prompt as text. No code changes.'}
|
||||
</p>
|
||||
</div>
|
||||
|
||||
{/* Prompt */}
|
||||
<div className="flex flex-col gap-1.5">
|
||||
<div className="flex items-center justify-between">
|
||||
<Label htmlFor="arena-prompt" className="text-xs text-muted-foreground">
|
||||
Prompt
|
||||
</Label>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => void handleGeneratePrompt()}
|
||||
disabled={generating || prompt.trim().length === 0}
|
||||
className="text-xs text-primary hover:text-primary/80 disabled:opacity-40 disabled:cursor-default flex items-center gap-1"
|
||||
title="Expand your description into a fuller battle prompt"
|
||||
>
|
||||
{generating && <Loader2 size={10} className="animate-spin" />}
|
||||
Generate prompt
|
||||
</button>
|
||||
</div>
|
||||
<textarea
|
||||
id="arena-prompt"
|
||||
ref={promptRef}
|
||||
value={prompt}
|
||||
onChange={(e) => setPrompt(e.target.value)}
|
||||
placeholder={
|
||||
battleType === 'coding'
|
||||
? 'Describe a coding task, or enter a short description and click Generate prompt…'
|
||||
: 'Ask a question or describe a topic, or enter a short description and click Generate prompt…'
|
||||
}
|
||||
rows={4}
|
||||
className="w-full text-sm border border-border rounded bg-background px-3 py-2 text-foreground placeholder:text-muted-foreground focus:outline-none focus:ring-1 focus:ring-ring resize-none"
|
||||
/>
|
||||
</div>
|
||||
|
||||
{/* Contestants */}
|
||||
<div className="flex flex-col gap-2">
|
||||
<div className="flex items-center justify-between">
|
||||
<Label className="text-xs text-muted-foreground">
|
||||
Contestants ({contestants.length}/6)
|
||||
</Label>
|
||||
<span className="text-xs text-muted-foreground">
|
||||
{battleType === 'coding' ? 'Backend + Model' : 'Persona + Model'}
|
||||
</span>
|
||||
</div>
|
||||
|
||||
<div className="flex flex-col gap-1.5">
|
||||
{contestants.map((c) => (
|
||||
<ContestantRow
|
||||
key={c.key}
|
||||
contestant={c}
|
||||
battleType={battleType}
|
||||
snapshot={snapshot}
|
||||
agents={agents}
|
||||
allContestants={contestants}
|
||||
onUpdate={(patch) => updateContestant(c.key, patch)}
|
||||
onRemove={() => removeContestant(c.key)}
|
||||
removable={contestants.length > 2}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{contestants.length < 6 && (
|
||||
<button
|
||||
type="button"
|
||||
onClick={addContestant}
|
||||
className="flex items-center gap-1.5 text-xs text-muted-foreground hover:text-foreground py-1"
|
||||
>
|
||||
<Plus size={12} /> Add contestant
|
||||
</button>
|
||||
)}
|
||||
|
||||
{hasDuplicatePair(contestants) && (
|
||||
<div className="flex items-center gap-1.5 text-xs text-destructive">
|
||||
<TriangleAlert size={12} />
|
||||
Duplicate contestants (same identity + model) are not allowed.
|
||||
</div>
|
||||
)}
|
||||
|
||||
{showLocalWarning && (
|
||||
<div className="flex items-center gap-1.5 text-xs text-amber-600 dark:text-amber-400">
|
||||
<TriangleAlert size={12} />
|
||||
{localLaneCount} local contestants will run serially (one GPU load at a time). This battle will take a while.
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<DialogFooter className="px-4 py-3 border-t shrink-0 flex items-center justify-between">
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => setOpen(false)}
|
||||
className="flex items-center gap-1.5 text-xs text-muted-foreground hover:text-foreground"
|
||||
>
|
||||
<X size={12} /> Cancel
|
||||
</button>
|
||||
<Button
|
||||
type="button"
|
||||
size="sm"
|
||||
onClick={() => void handleStart()}
|
||||
disabled={!canStart}
|
||||
>
|
||||
{starting ? <Loader2 className="animate-spin" /> : <Swords size={14} />}
|
||||
Start battle
|
||||
</Button>
|
||||
</DialogFooter>
|
||||
</DialogContent>
|
||||
</Dialog>
|
||||
);
|
||||
}
|
||||
@@ -37,6 +37,7 @@ interface Props {
|
||||
onNewTab: (kind: WorkspaceTabKind) => void;
|
||||
onSplitPane: (kind: 'chat' | 'terminal' | 'coder') => void;
|
||||
onNewOrchestrator?: () => void;
|
||||
onNewArena?: () => void;
|
||||
onReopenPane?: () => void;
|
||||
onShowHistory: () => void;
|
||||
onRename: (chatId: string, name: string) => Promise<void>;
|
||||
@@ -69,6 +70,7 @@ export function ChatTabBar({
|
||||
onNewTab,
|
||||
onSplitPane,
|
||||
onNewOrchestrator,
|
||||
onNewArena,
|
||||
onReopenPane,
|
||||
onShowHistory,
|
||||
onRename,
|
||||
@@ -230,6 +232,7 @@ export function ChatTabBar({
|
||||
onNewTab={onNewTab}
|
||||
onSplitPane={onSplitPane}
|
||||
onNewOrchestrator={onNewOrchestrator}
|
||||
onNewArena={onNewArena}
|
||||
onReopenPane={onReopenPane}
|
||||
onShowHistory={onShowHistory}
|
||||
onRemovePane={onRemovePane}
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { Code, Columns2, History, MessageSquare, Plus, RotateCcw, Terminal, Workflow, X } from 'lucide-react';
|
||||
import { Code, Columns2, History, MessageSquare, Plus, RotateCcw, Swords, Terminal, Workflow, X } from 'lucide-react';
|
||||
import {
|
||||
DropdownMenu,
|
||||
DropdownMenuContent,
|
||||
@@ -19,6 +19,8 @@ interface Props {
|
||||
// When provided, shows a "New Orchestrator" item that opens the flow launcher.
|
||||
// Orchestrators are always split (run-bound; can't live as a tab in another pane).
|
||||
onNewOrchestrator?: () => void;
|
||||
// When provided, shows a "New Arena" item that opens the arena launcher.
|
||||
onNewArena?: () => void;
|
||||
onReopenPane?: () => void;
|
||||
onShowHistory: () => void;
|
||||
onRemovePane?: () => void;
|
||||
@@ -35,6 +37,7 @@ export function PaneHeaderActions({
|
||||
onNewTab,
|
||||
onSplitPane,
|
||||
onNewOrchestrator,
|
||||
onNewArena,
|
||||
onReopenPane,
|
||||
onShowHistory,
|
||||
onRemovePane,
|
||||
@@ -71,6 +74,11 @@ export function PaneHeaderActions({
|
||||
<Workflow size={14} /> New Orchestrator
|
||||
</DropdownMenuItem>
|
||||
)}
|
||||
{onNewArena && (
|
||||
<DropdownMenuItem onSelect={onNewArena}>
|
||||
<Swords size={14} /> New Arena
|
||||
</DropdownMenuItem>
|
||||
)}
|
||||
</DropdownMenuContent>
|
||||
</DropdownMenu>
|
||||
|
||||
@@ -101,6 +109,11 @@ export function PaneHeaderActions({
|
||||
<Workflow size={14} /> New Orchestrator
|
||||
</DropdownMenuItem>
|
||||
)}
|
||||
{onNewArena && (
|
||||
<DropdownMenuItem onSelect={onNewArena}>
|
||||
<Swords size={14} /> New Arena
|
||||
</DropdownMenuItem>
|
||||
)}
|
||||
</DropdownMenuContent>
|
||||
</DropdownMenu>
|
||||
|
||||
|
||||
@@ -225,7 +225,7 @@ export function SlashCommandPicker({
|
||||
setHighlightIndex(i);
|
||||
setExpandedIndex((prev) => (prev === i ? null : i));
|
||||
}}
|
||||
className="-mr-1 -mt-0.5 flex shrink-0 items-center justify-center rounded p-1 text-muted-foreground/60 transition-colors hover:bg-foreground/10 hover:text-foreground max-md:min-h-[36px] max-md:min-w-[36px]"
|
||||
className="-mr-1 -mt-0.5 flex shrink-0 items-center justify-center rounded-md border border-border bg-background p-1 text-muted-foreground transition-colors hover:bg-muted hover:text-foreground aria-expanded:bg-muted aria-expanded:text-foreground max-md:min-h-[36px] max-md:min-w-[36px]"
|
||||
>
|
||||
<ChevronRight
|
||||
className={cn(
|
||||
|
||||
@@ -13,6 +13,7 @@ import { CoderPane } from '@/components/panes/CoderPane';
|
||||
import { MarkdownArtifactPane } from '@/components/MarkdownArtifactPane';
|
||||
import { HtmlArtifactPane } from '@/components/HtmlArtifactPane';
|
||||
import { OrchestratorPane } from '@/components/panes/OrchestratorPane';
|
||||
import { ArenaPane } from '@/components/panes/ArenaPane';
|
||||
import { ChatTabBar, type TabDescriptor } from '@/components/ChatTabBar';
|
||||
import { SessionLandingPage } from '@/components/SessionLandingPage';
|
||||
import { cn } from '@/lib/utils';
|
||||
@@ -134,6 +135,14 @@ export function Workspace({
|
||||
});
|
||||
}
|
||||
|
||||
function handleNewArena() {
|
||||
sessionEvents.emit({
|
||||
type: 'open_arena_launcher',
|
||||
project_id: projectId,
|
||||
placement: 'split',
|
||||
});
|
||||
}
|
||||
|
||||
// v1.10 booterm + mixed tabs: per-terminal-TAB label, keyed by the terminal
|
||||
// tab id (which keys its tmux session). Numbered across the workspace.
|
||||
const terminalLabels = useMemo(() => {
|
||||
@@ -180,6 +189,7 @@ export function Workspace({
|
||||
const isTerminal = pane.kind === 'terminal';
|
||||
const isCoder = pane.kind === 'coder';
|
||||
const isOrchestrator = pane.kind === 'orchestrator';
|
||||
const isArena = pane.kind === 'arena';
|
||||
const isArtifact = pane.kind === 'markdown_artifact' || pane.kind === 'html_artifact';
|
||||
// v1.9: when maximized, hide every pane except the settings one.
|
||||
// display:none keeps the React tree mounted so streams / drafts
|
||||
@@ -192,8 +202,8 @@ export function Workspace({
|
||||
}
|
||||
return null;
|
||||
}
|
||||
// Terminal + coder + orchestrator panes own their tab strip (no chats, no ChatTabBar).
|
||||
const isChromeless = isSettings || isTerminal || isCoder || isArtifact || isOrchestrator;
|
||||
// Terminal + coder + orchestrator + arena panes own their tab strip (no chats, no ChatTabBar).
|
||||
const isChromeless = isSettings || isTerminal || isCoder || isArtifact || isOrchestrator || isArena;
|
||||
return (
|
||||
<div
|
||||
key={pane.id}
|
||||
@@ -218,7 +228,7 @@ export function Workspace({
|
||||
(chat / coder / terminal / empty-landing). The "+" adds a tab
|
||||
of any kind; Split adds a pane. Settings/artifact panes own
|
||||
their own headers. Hidden on mobile (mobile uses pane panes). */}
|
||||
{!isMobile && !isSettings && !isArtifact && !isOrchestrator && (
|
||||
{!isMobile && !isSettings && !isArtifact && !isOrchestrator && !isArena && (
|
||||
<ChatTabBar
|
||||
pane={pane}
|
||||
tabs={paneTabs(pane)}
|
||||
@@ -231,6 +241,7 @@ export function Workspace({
|
||||
onNewTab={(kind) => void createTab(idx, kind)}
|
||||
onSplitPane={(kind) => onAddPane(kind)}
|
||||
onNewOrchestrator={handleNewOrchestrator}
|
||||
onNewArena={handleNewArena}
|
||||
onReopenPane={hasClosedPanes ? reopenPane : undefined}
|
||||
onShowHistory={() => openSessionHistory(idx)}
|
||||
onRename={renameChat}
|
||||
@@ -277,6 +288,12 @@ export function Workspace({
|
||||
state={pane.orchestrator_state}
|
||||
onClose={() => removePane(idx)}
|
||||
/>
|
||||
) : pane.kind === 'arena' && pane.arena_state ? (
|
||||
<ArenaPane
|
||||
state={pane.arena_state}
|
||||
projectId={projectId}
|
||||
onClose={() => removePane(idx)}
|
||||
/>
|
||||
) : pane.kind === 'markdown_artifact' && pane.markdown_artifact_state ? (
|
||||
<MarkdownArtifactPane
|
||||
chatId={pane.markdown_artifact_state.chat_id}
|
||||
|
||||
664
apps/web/src/components/panes/ArenaPane.tsx
Normal file
664
apps/web/src/components/panes/ArenaPane.tsx
Normal file
@@ -0,0 +1,664 @@
|
||||
// ArenaPane — live view for an Arena battle.
|
||||
// Mirrors OrchestratorPane: header with status/winner, contestant roster
|
||||
// (collapsed rows, expand-one), analysis panel, cross-examination control.
|
||||
//
|
||||
// Subscribes to the coder user channel (via useCoderUserEvents → sessionEvents)
|
||||
// for battle_started / contestant_updated / battle_updated frames.
|
||||
|
||||
import { useCallback, useEffect, useRef, useState } from 'react';
|
||||
import { ChevronDown, ChevronRight, Loader2, MoreHorizontal, RotateCcw, Swords, Trophy, X } from 'lucide-react';
|
||||
import { toast } from 'sonner';
|
||||
import { api } from '@/api/client';
|
||||
import type { ArenaState, BattleShape, ContestantShape, CrossExaminationShape, ProviderSnapshotEntry } from '@/api/types';
|
||||
import { sessionEvents } from '@/hooks/sessionEvents';
|
||||
import { useProviderSnapshot } from '@/hooks/useProviderSnapshot';
|
||||
import {
|
||||
DropdownMenu,
|
||||
DropdownMenuContent,
|
||||
DropdownMenuItem,
|
||||
DropdownMenuTrigger,
|
||||
} from '@/components/ui/dropdown-menu';
|
||||
import { cn } from '@/lib/utils';
|
||||
|
||||
// ─── Status dot (mirrors FlowStepStatusDot) ───────────────────────────────────
|
||||
|
||||
function ContestantStatusDot({ status }: { status: ContestantShape['status'] }) {
|
||||
if (status === 'running') {
|
||||
return (
|
||||
<span
|
||||
aria-label="running"
|
||||
className="inline-block w-3 h-3 rounded-full border-2 border-emerald-500 border-t-transparent animate-spin shrink-0"
|
||||
/>
|
||||
);
|
||||
}
|
||||
const cls =
|
||||
status === 'done'
|
||||
? 'bg-emerald-500'
|
||||
: status === 'error'
|
||||
? 'bg-destructive'
|
||||
: 'bg-muted-foreground/40'; // queued
|
||||
return <span aria-label={status} className={cn('inline-block w-1.5 h-1.5 rounded-full shrink-0', cls)} />;
|
||||
}
|
||||
|
||||
// ─── Lane badge ───────────────────────────────────────────────────────────────
|
||||
|
||||
function LaneBadge({ lane }: { lane: ContestantShape['lane'] }) {
|
||||
return (
|
||||
<span
|
||||
className={cn(
|
||||
'text-[10px] px-1 py-0.5 rounded shrink-0',
|
||||
lane === 'local'
|
||||
? 'bg-sky-500/10 text-sky-600 dark:text-sky-400'
|
||||
: 'bg-violet-500/10 text-violet-600 dark:text-violet-400',
|
||||
)}
|
||||
>
|
||||
{lane}
|
||||
</span>
|
||||
);
|
||||
}
|
||||
|
||||
// ─── Duration formatter ───────────────────────────────────────────────────────
|
||||
|
||||
function formatDuration(ms: number | null): string {
|
||||
if (ms == null) return '';
|
||||
const s = Math.round(ms / 1000);
|
||||
if (s < 60) return `${s}s`;
|
||||
return `${Math.floor(s / 60)}m${String(s % 60).padStart(2, '0')}s`;
|
||||
}
|
||||
|
||||
// ─── Live ticker for running contestants ─────────────────────────────────────
|
||||
|
||||
function LiveDuration({ startedAt }: { startedAt: number }) {
|
||||
const [elapsed, setElapsed] = useState(() => Date.now() - startedAt);
|
||||
useEffect(() => {
|
||||
const id = setInterval(() => setElapsed(Date.now() - startedAt), 1000);
|
||||
return () => clearInterval(id);
|
||||
}, [startedAt]);
|
||||
return <span>{formatDuration(elapsed)}</span>;
|
||||
}
|
||||
|
||||
// ─── DiffView ─────────────────────────────────────────────────────────────────
|
||||
|
||||
function DiffView({ diff }: { diff: string }) {
|
||||
const lines = diff.split('\n');
|
||||
return (
|
||||
<div className="border-t border-border/50">
|
||||
<div className="px-3 pt-2 pb-1 text-[10px] font-medium uppercase tracking-wide text-muted-foreground">
|
||||
Diff
|
||||
</div>
|
||||
<pre className="px-3 pb-3 text-xs font-mono whitespace-pre leading-relaxed overflow-x-auto">
|
||||
{lines.map((line, i) => {
|
||||
const cls =
|
||||
line.startsWith('+') && !line.startsWith('+++')
|
||||
? 'text-emerald-600 dark:text-emerald-400'
|
||||
: line.startsWith('-') && !line.startsWith('---')
|
||||
? 'text-destructive'
|
||||
: line.startsWith('@@')
|
||||
? 'text-violet-500 dark:text-violet-400'
|
||||
: 'text-muted-foreground';
|
||||
return (
|
||||
<span key={i} className={cn('block', cls)}>
|
||||
{line || ' '}
|
||||
</span>
|
||||
);
|
||||
})}
|
||||
</pre>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// ─── ContestantRow ────────────────────────────────────────────────────────────
|
||||
|
||||
interface ContestantRowState {
|
||||
data: ContestantShape;
|
||||
output: string;
|
||||
startedAt: number | null;
|
||||
}
|
||||
|
||||
function ContestantRow({
|
||||
row,
|
||||
isExpanded,
|
||||
onToggle,
|
||||
isWinner,
|
||||
battleId,
|
||||
battleType,
|
||||
}: {
|
||||
row: ContestantRowState;
|
||||
isExpanded: boolean;
|
||||
onToggle: () => void;
|
||||
isWinner: boolean;
|
||||
battleId: string;
|
||||
battleType: 'coding' | 'qa';
|
||||
}) {
|
||||
const { data, output, startedAt } = row;
|
||||
const label = `${data.identity} / ${data.model}`;
|
||||
|
||||
// Lazy-fetch diff for coding contestants once they are done and expanded.
|
||||
const [diff, setDiff] = useState<string | null>(null);
|
||||
useEffect(() => {
|
||||
if (!isExpanded || battleType !== 'coding' || data.status !== 'done') return;
|
||||
if (diff !== null) return;
|
||||
api.battles.getDiff(battleId, data.id)
|
||||
.then(({ diff: d }) => setDiff(d))
|
||||
.catch(() => setDiff(''));
|
||||
}, [isExpanded, battleType, data.status, data.id, battleId, diff]);
|
||||
|
||||
async function handleSetWinner(contestantId: string | null) {
|
||||
try {
|
||||
await api.battles.setWinner(battleId, { winner_contestant_id: contestantId });
|
||||
} catch {
|
||||
// WS frame updates the badge; a failed call just leaves it unchanged
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div>
|
||||
<button
|
||||
type="button"
|
||||
onClick={onToggle}
|
||||
className="w-full flex items-center gap-2 px-3 py-2.5 text-left hover:bg-muted/30 transition-colors"
|
||||
>
|
||||
<ContestantStatusDot status={data.status} />
|
||||
<span className="text-sm flex-1 truncate min-w-0">{label}</span>
|
||||
{isWinner && (
|
||||
<Trophy size={11} className="shrink-0 text-emerald-500" aria-label="winner" />
|
||||
)}
|
||||
<LaneBadge lane={data.lane} />
|
||||
{data.status === 'running' && startedAt != null ? (
|
||||
<span className="text-xs text-muted-foreground shrink-0 tabular-nums">
|
||||
<LiveDuration startedAt={startedAt} />
|
||||
</span>
|
||||
) : data.duration_ms != null ? (
|
||||
<span className="text-xs text-muted-foreground shrink-0 tabular-nums">
|
||||
{formatDuration(data.duration_ms)}
|
||||
</span>
|
||||
) : null}
|
||||
{data.tokens_per_sec != null && (
|
||||
<span className="text-xs text-muted-foreground shrink-0 hidden sm:block tabular-nums">
|
||||
{data.tokens_per_sec.toFixed(1)} tok/s
|
||||
</span>
|
||||
)}
|
||||
{data.status === 'error' && (
|
||||
<span className="text-xs text-destructive shrink-0 hidden sm:block truncate max-w-[100px]" title={data.error ?? ''}>
|
||||
{data.error ?? 'error'}
|
||||
</span>
|
||||
)}
|
||||
{isExpanded ? (
|
||||
<ChevronDown size={12} className="shrink-0 text-muted-foreground" />
|
||||
) : (
|
||||
<ChevronRight size={12} className="shrink-0 text-muted-foreground" />
|
||||
)}
|
||||
{/* Row menu: winner override. Stop propagation so the row toggle isn't triggered. */}
|
||||
<span onClick={(e) => e.stopPropagation()}>
|
||||
<DropdownMenu>
|
||||
<DropdownMenuTrigger asChild>
|
||||
<button
|
||||
type="button"
|
||||
className="shrink-0 p-0.5 rounded text-muted-foreground hover:text-foreground hover:bg-muted"
|
||||
aria-label="Contestant options"
|
||||
>
|
||||
<MoreHorizontal size={12} />
|
||||
</button>
|
||||
</DropdownMenuTrigger>
|
||||
<DropdownMenuContent align="end">
|
||||
{!isWinner && (
|
||||
<DropdownMenuItem onSelect={() => void handleSetWinner(data.id)}>
|
||||
<Trophy size={12} /> Set as winner
|
||||
</DropdownMenuItem>
|
||||
)}
|
||||
{isWinner && (
|
||||
<DropdownMenuItem onSelect={() => void handleSetWinner(null)}>
|
||||
Clear winner
|
||||
</DropdownMenuItem>
|
||||
)}
|
||||
</DropdownMenuContent>
|
||||
</DropdownMenu>
|
||||
</span>
|
||||
</button>
|
||||
|
||||
{isExpanded && (
|
||||
<div className="border-t border-border/50 bg-muted/10 max-h-[55vh] overflow-y-auto">
|
||||
{output.length === 0 ? (
|
||||
<div className="flex items-center justify-center py-6 text-sm text-muted-foreground">
|
||||
{data.status === 'queued'
|
||||
? 'Waiting to start…'
|
||||
: data.status === 'error'
|
||||
? data.error ?? 'Error'
|
||||
: 'Connecting…'}
|
||||
</div>
|
||||
) : (
|
||||
<pre className="p-3 text-xs font-mono whitespace-pre-wrap leading-relaxed break-all text-foreground">
|
||||
{output}
|
||||
</pre>
|
||||
)}
|
||||
{battleType === 'coding' && data.status === 'done' && diff && (
|
||||
<DiffView diff={diff} />
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// ─── CrossExaminationPanel ────────────────────────────────────────────────────
|
||||
|
||||
function CrossExaminationPanel({
|
||||
battleId,
|
||||
crossExams,
|
||||
snapshot,
|
||||
}: {
|
||||
battleId: string;
|
||||
crossExams: CrossExaminationShape[];
|
||||
snapshot: ProviderSnapshotEntry[] | null;
|
||||
}) {
|
||||
const [identity, setIdentity] = useState('');
|
||||
const [model, setModel] = useState('');
|
||||
const [running, setRunning] = useState(false);
|
||||
|
||||
const identityOptions = (snapshot ?? [])
|
||||
.filter((e) => e.installed && e.enabled)
|
||||
.map((e) => ({ value: e.name, label: e.label }));
|
||||
|
||||
const modelOptions = (() => {
|
||||
const provider = (snapshot ?? []).find((e) => e.name === identity);
|
||||
return (provider?.models ?? []).map((m) => ({ value: m.id, label: m.label }));
|
||||
})();
|
||||
|
||||
async function handleRun() {
|
||||
if (!identity || !model || running) return;
|
||||
setRunning(true);
|
||||
try {
|
||||
await api.battles.crossExamine(battleId, { identity, model });
|
||||
// The verdict arrives via battle_updated frame; ArenaPane will refetch.
|
||||
} catch (err) {
|
||||
toast.error(err instanceof Error ? err.message : 'Cross-examination failed');
|
||||
} finally {
|
||||
setRunning(false);
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="border-t border-border p-4 flex flex-col gap-3">
|
||||
<div className="text-xs font-medium text-muted-foreground uppercase tracking-wide">
|
||||
Cross-examination
|
||||
</div>
|
||||
<p className="text-xs text-muted-foreground">
|
||||
Challenge the results with any model. The verdict is advisory and never changes the recorded winner.
|
||||
</p>
|
||||
<div className="flex gap-2 items-center flex-wrap">
|
||||
<select
|
||||
value={identity}
|
||||
onChange={(e) => { setIdentity(e.target.value); setModel(''); }}
|
||||
className="flex-1 min-w-[120px] text-xs border border-border rounded bg-background px-2 py-1.5 text-foreground focus:outline-none focus:ring-1 focus:ring-ring"
|
||||
aria-label="Backend"
|
||||
>
|
||||
<option value="">Backend…</option>
|
||||
{identityOptions.map((o) => (
|
||||
<option key={o.value} value={o.value}>{o.label}</option>
|
||||
))}
|
||||
</select>
|
||||
<select
|
||||
value={model}
|
||||
onChange={(e) => setModel(e.target.value)}
|
||||
disabled={!identity}
|
||||
className="flex-1 min-w-[120px] text-xs border border-border rounded bg-background px-2 py-1.5 text-foreground focus:outline-none focus:ring-1 focus:ring-ring disabled:opacity-50"
|
||||
aria-label="Model"
|
||||
>
|
||||
<option value="">Model…</option>
|
||||
{modelOptions.map((o) => (
|
||||
<option key={o.value} value={o.value}>{o.label}</option>
|
||||
))}
|
||||
</select>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => void handleRun()}
|
||||
disabled={!identity || !model || running}
|
||||
className="inline-flex items-center gap-1 text-xs px-2 py-1.5 rounded border border-border text-foreground hover:bg-muted disabled:opacity-50"
|
||||
>
|
||||
{running && <Loader2 size={10} className="animate-spin" />}
|
||||
Run
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{crossExams.length > 0 && (
|
||||
<div className="flex flex-col gap-3 mt-1">
|
||||
{crossExams.map((xe) => (
|
||||
<div key={xe.id} className="rounded border border-border/50 bg-muted/20 p-3">
|
||||
<div className="text-xs font-medium text-muted-foreground mb-1.5">
|
||||
{xe.identity} / {xe.model}
|
||||
</div>
|
||||
{xe.verdict ? (
|
||||
<div className="text-sm whitespace-pre-wrap leading-relaxed">{xe.verdict}</div>
|
||||
) : (
|
||||
<div className="text-xs text-muted-foreground flex items-center gap-1.5">
|
||||
<Loader2 size={10} className="animate-spin" /> Running…
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// ─── ArenaPane ────────────────────────────────────────────────────────────────
|
||||
|
||||
interface Props {
|
||||
state: ArenaState;
|
||||
projectId: string; // available for future use (e.g. file browser affordance)
|
||||
onClose: () => void;
|
||||
}
|
||||
|
||||
export function ArenaPane({ state, onClose }: Props) {
|
||||
const [battle, setBattle] = useState<BattleShape | null>(null);
|
||||
const [contestantRows, setContestantRows] = useState<ContestantRowState[]>([]);
|
||||
const [crossExams, setCrossExams] = useState<CrossExaminationShape[]>([]);
|
||||
const [analysis, setAnalysis] = useState<string | null>(null);
|
||||
const [expandedId, setExpandedId] = useState<string | null>(null);
|
||||
const [stopping, setStopping] = useState(false);
|
||||
const [reanalyzing, setReanalyzing] = useState(false);
|
||||
const startTimesRef = useRef<Map<string, number>>(new Map());
|
||||
|
||||
const snapshot = useProviderSnapshot();
|
||||
|
||||
// Fetch current battle state on mount / battle_id change.
|
||||
useEffect(() => {
|
||||
setBattle(null);
|
||||
setContestantRows([]);
|
||||
setCrossExams([]);
|
||||
setAnalysis(null);
|
||||
setExpandedId(null);
|
||||
|
||||
api.battles.get(state.battle_id)
|
||||
.then(({ battle: b, contestants, cross_examinations }) => {
|
||||
setBattle(b);
|
||||
setContestantRows(
|
||||
contestants.map((c) => ({
|
||||
data: c,
|
||||
output: '',
|
||||
startedAt: c.status === 'running' ? Date.now() : null,
|
||||
})),
|
||||
);
|
||||
setCrossExams(cross_examinations);
|
||||
// Fetch analysis text if battle is already completed.
|
||||
if (b.status === 'completed') {
|
||||
api.battles.getAnalysis(state.battle_id)
|
||||
.then(({ text }) => setAnalysis(text))
|
||||
.catch(() => {});
|
||||
}
|
||||
// Auto-expand first running contestant.
|
||||
const firstRunning = contestants.find((c) => c.status === 'running');
|
||||
if (firstRunning) setExpandedId(firstRunning.id);
|
||||
})
|
||||
.catch(() => {});
|
||||
}, [state.battle_id]);
|
||||
|
||||
// Subscribe to live battle/contestant frames.
|
||||
useEffect(() => {
|
||||
return sessionEvents.subscribe((ev) => {
|
||||
if (ev.type === 'battle_started' && ev.battle_id === state.battle_id) {
|
||||
setContestantRows((prev) => {
|
||||
if (prev.length > 0) return prev;
|
||||
return ev.contestants.map((c) => ({
|
||||
data: {
|
||||
id: c.id,
|
||||
battle_id: ev.battle_id,
|
||||
identity: c.identity,
|
||||
model: c.model,
|
||||
lane: c.lane,
|
||||
task_id: null,
|
||||
worktree_id: null,
|
||||
status: 'queued' as const,
|
||||
duration_ms: null,
|
||||
tokens_per_sec: null,
|
||||
cost_tokens: null,
|
||||
result_path: null,
|
||||
error: null,
|
||||
created_at: new Date().toISOString(),
|
||||
updated_at: new Date().toISOString(),
|
||||
},
|
||||
output: '',
|
||||
startedAt: null,
|
||||
}));
|
||||
});
|
||||
} else if (ev.type === 'contestant_updated' && ev.battle_id === state.battle_id) {
|
||||
setContestantRows((prev) =>
|
||||
prev.map((row) => {
|
||||
if (row.data.id !== ev.contestant_id) return row;
|
||||
const updatedData: ContestantShape = {
|
||||
...row.data,
|
||||
...(ev.status != null ? { status: ev.status } : {}),
|
||||
...(ev.duration_ms != null ? { duration_ms: ev.duration_ms } : {}),
|
||||
...(ev.tokens_per_sec != null ? { tokens_per_sec: ev.tokens_per_sec } : {}),
|
||||
...(ev.error != null ? { error: ev.error } : {}),
|
||||
};
|
||||
const newStartedAt =
|
||||
ev.status === 'running' && row.startedAt == null
|
||||
? Date.now()
|
||||
: ev.status === 'done' || ev.status === 'error'
|
||||
? null
|
||||
: row.startedAt;
|
||||
if (ev.status === 'running') {
|
||||
startTimesRef.current.set(ev.contestant_id, newStartedAt ?? Date.now());
|
||||
setExpandedId(ev.contestant_id);
|
||||
}
|
||||
return {
|
||||
data: updatedData,
|
||||
output: ev.delta ? row.output + ev.delta : row.output,
|
||||
startedAt: newStartedAt,
|
||||
};
|
||||
}),
|
||||
);
|
||||
if (ev.battle_status) {
|
||||
setBattle((prev) => prev ? { ...prev, status: ev.battle_status! } : prev);
|
||||
}
|
||||
} else if (ev.type === 'battle_updated' && ev.battle_id === state.battle_id) {
|
||||
setBattle((prev) => {
|
||||
if (!prev) return prev;
|
||||
return {
|
||||
...prev,
|
||||
...(ev.status != null ? { status: ev.status } : {}),
|
||||
...(ev.winner_contestant_id !== undefined ? { winner_contestant_id: ev.winner_contestant_id } : {}),
|
||||
};
|
||||
});
|
||||
if (ev.analysis_ready) {
|
||||
api.battles.getAnalysis(state.battle_id)
|
||||
.then(({ text }) => setAnalysis(text))
|
||||
.catch(() => setAnalysis('Analysis ready — failed to load text.'));
|
||||
}
|
||||
if (ev.cross_exam_id) {
|
||||
// Refetch cross-exams to get the latest verdict.
|
||||
api.battles.get(state.battle_id)
|
||||
.then(({ cross_examinations }) => setCrossExams(cross_examinations))
|
||||
.catch(() => {});
|
||||
}
|
||||
}
|
||||
});
|
||||
}, [state.battle_id]);
|
||||
|
||||
const toggleExpand = useCallback((id: string) => {
|
||||
setExpandedId((prev) => (prev === id ? null : id));
|
||||
}, []);
|
||||
|
||||
async function handleStop() {
|
||||
if (stopping) return;
|
||||
setStopping(true);
|
||||
try {
|
||||
await api.battles.stop(state.battle_id);
|
||||
} catch {
|
||||
// non-fatal
|
||||
} finally {
|
||||
setStopping(false);
|
||||
}
|
||||
}
|
||||
|
||||
async function handleReanalyze() {
|
||||
if (reanalyzing) return;
|
||||
setReanalyzing(true);
|
||||
try {
|
||||
await api.battles.analyze(state.battle_id);
|
||||
toast.success('Re-analysis triggered');
|
||||
} catch (err) {
|
||||
toast.error(err instanceof Error ? err.message : 'Re-analysis failed');
|
||||
} finally {
|
||||
setReanalyzing(false);
|
||||
}
|
||||
}
|
||||
|
||||
function handleOpenResults() {
|
||||
if (!battle?.results_path) return;
|
||||
sessionEvents.emit({ type: 'open_file_in_browser', path: battle.results_path });
|
||||
}
|
||||
|
||||
function handleCopyAnalysis() {
|
||||
if (!analysis) return;
|
||||
navigator.clipboard.writeText(analysis).catch(() => toast.error('Clipboard write failed'));
|
||||
}
|
||||
|
||||
const battleStatus = battle?.status ?? 'running';
|
||||
const isRunning = battleStatus === 'running' || battleStatus === 'pending';
|
||||
const isCompleted = battleStatus === 'completed';
|
||||
const winnerId = battle?.winner_contestant_id;
|
||||
const winnerRow = winnerId ? contestantRows.find((r) => r.data.id === winnerId) : null;
|
||||
const winnerLabel = winnerRow ? `${winnerRow.data.identity} / ${winnerRow.data.model}` : null;
|
||||
|
||||
return (
|
||||
<div className="flex flex-col h-full min-h-0 overflow-hidden">
|
||||
{/* Header */}
|
||||
<div className="flex items-center gap-2 border-b border-border bg-muted/20 px-3 py-2 shrink-0">
|
||||
<Swords size={13} className="text-muted-foreground shrink-0" />
|
||||
<span className="text-sm font-medium truncate min-w-0 flex-1" title={state.prompt}>
|
||||
{state.prompt.length > 60 ? state.prompt.slice(0, 60) + '…' : state.prompt}
|
||||
</span>
|
||||
<span className="text-xs text-muted-foreground shrink-0 capitalize">{state.battle_type}</span>
|
||||
|
||||
{winnerLabel && (
|
||||
<span
|
||||
className="text-xs px-1.5 py-0.5 rounded bg-emerald-500/10 text-emerald-600 dark:text-emerald-400 shrink-0 hidden sm:block truncate max-w-[130px]"
|
||||
title={`Winner: ${winnerLabel}`}
|
||||
>
|
||||
✓ {winnerLabel}
|
||||
</span>
|
||||
)}
|
||||
|
||||
<div className="ml-auto flex items-center gap-1 shrink-0">
|
||||
{isRunning ? (
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => void handleStop()}
|
||||
disabled={stopping}
|
||||
className="inline-flex items-center gap-1 text-xs px-1.5 py-0.5 rounded border border-border text-muted-foreground hover:text-foreground hover:bg-muted disabled:opacity-50"
|
||||
title="Stop battle"
|
||||
>
|
||||
Stop
|
||||
</button>
|
||||
) : (
|
||||
<span
|
||||
className={cn(
|
||||
'text-xs px-1.5 py-0.5 rounded',
|
||||
isCompleted
|
||||
? 'text-emerald-600 bg-emerald-500/10'
|
||||
: battleStatus === 'failed' || battleStatus === 'cancelled'
|
||||
? 'text-destructive bg-destructive/10'
|
||||
: 'text-muted-foreground bg-muted/40',
|
||||
)}
|
||||
>
|
||||
{battleStatus}
|
||||
</span>
|
||||
)}
|
||||
|
||||
{isCompleted && (
|
||||
<DropdownMenu>
|
||||
<DropdownMenuTrigger asChild>
|
||||
<button
|
||||
type="button"
|
||||
className="inline-flex items-center justify-center p-1 rounded text-muted-foreground hover:bg-muted hover:text-foreground"
|
||||
aria-label="Battle options"
|
||||
>
|
||||
<MoreHorizontal size={14} />
|
||||
</button>
|
||||
</DropdownMenuTrigger>
|
||||
<DropdownMenuContent align="end">
|
||||
<DropdownMenuItem onSelect={() => void handleReanalyze()} disabled={reanalyzing}>
|
||||
<RotateCcw size={14} /> Re-analyze
|
||||
</DropdownMenuItem>
|
||||
{battle?.results_path && (
|
||||
<DropdownMenuItem onSelect={handleOpenResults}>
|
||||
Open results folder
|
||||
</DropdownMenuItem>
|
||||
)}
|
||||
{analysis && (
|
||||
<DropdownMenuItem onSelect={handleCopyAnalysis}>
|
||||
Copy analysis
|
||||
</DropdownMenuItem>
|
||||
)}
|
||||
</DropdownMenuContent>
|
||||
</DropdownMenu>
|
||||
)}
|
||||
|
||||
<button
|
||||
type="button"
|
||||
onClick={onClose}
|
||||
className="inline-flex items-center justify-center p-1 rounded text-muted-foreground hover:bg-muted hover:text-foreground"
|
||||
aria-label="Close pane"
|
||||
title="Close pane"
|
||||
>
|
||||
<X size={12} />
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Body */}
|
||||
<div className="flex-1 min-h-0 overflow-y-auto">
|
||||
{/* Analysis panel */}
|
||||
{analysis && (
|
||||
<div className="border-b border-border p-4">
|
||||
<div className="text-xs font-medium text-muted-foreground uppercase tracking-wide mb-2 pb-1 border-b border-border/50">
|
||||
Analysis
|
||||
</div>
|
||||
<div className="text-sm text-foreground whitespace-pre-wrap leading-relaxed">
|
||||
{analysis}
|
||||
</div>
|
||||
{winnerLabel && (
|
||||
<div className="mt-2 text-sm font-medium text-emerald-600 dark:text-emerald-400">
|
||||
Winner: {winnerLabel}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Empty state */}
|
||||
{contestantRows.length === 0 && !analysis && (
|
||||
<div className="flex items-center justify-center h-24 text-sm text-muted-foreground">
|
||||
Starting battle…
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Contestant roster */}
|
||||
<div className="divide-y divide-border">
|
||||
{contestantRows.map((row) => (
|
||||
<ContestantRow
|
||||
key={row.data.id}
|
||||
row={row}
|
||||
isExpanded={expandedId === row.data.id}
|
||||
onToggle={() => toggleExpand(row.data.id)}
|
||||
isWinner={winnerId === row.data.id}
|
||||
battleId={state.battle_id}
|
||||
battleType={battle?.battle_type ?? state.battle_type}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Cross-examination panel — available after battle finishes */}
|
||||
{!isRunning && (
|
||||
<CrossExaminationPanel
|
||||
battleId={state.battle_id}
|
||||
crossExams={crossExams}
|
||||
snapshot={snapshot}
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -3,7 +3,11 @@
|
||||
// also refresh the sidebar's session list).
|
||||
|
||||
import type {
|
||||
ArenaState,
|
||||
BattleShape,
|
||||
Chat,
|
||||
ContestantShape,
|
||||
CrossExaminationShape,
|
||||
ErrorReason,
|
||||
HtmlArtifactState,
|
||||
MarkdownArtifactState,
|
||||
@@ -231,6 +235,53 @@ export interface FlowRunStepUpdatedEvent {
|
||||
report?: string;
|
||||
}
|
||||
|
||||
// Arena: emitted by "New Arena" menu items to request the launcher dialog.
|
||||
export interface OpenArenaLauncherEvent {
|
||||
type: 'open_arena_launcher';
|
||||
project_id: string;
|
||||
placement?: 'new' | 'split';
|
||||
}
|
||||
|
||||
// Arena: emitted after a battle is created to open/focus the arena pane.
|
||||
export interface OpenArenaPaneEvent {
|
||||
type: 'open_arena_pane';
|
||||
state: ArenaState;
|
||||
placement?: 'new' | 'split';
|
||||
}
|
||||
|
||||
// Arena: battle lifecycle frames forwarded from the coder user channel.
|
||||
export interface BattleStartedEvent {
|
||||
type: 'battle_started';
|
||||
battle_id: string;
|
||||
battle_type: 'coding' | 'qa';
|
||||
prompt: string;
|
||||
contestants: Array<{ id: string; identity: string; model: string; lane: 'local' | 'cloud' }>;
|
||||
}
|
||||
|
||||
export interface ContestantUpdatedEvent {
|
||||
type: 'contestant_updated';
|
||||
battle_id: string;
|
||||
contestant_id: string;
|
||||
status?: 'queued' | 'running' | 'done' | 'error';
|
||||
duration_ms?: number;
|
||||
tokens_per_sec?: number;
|
||||
battle_status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
|
||||
delta?: string;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
export interface BattleUpdatedEvent {
|
||||
type: 'battle_updated';
|
||||
battle_id: string;
|
||||
status?: 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
|
||||
winner_contestant_id?: string | null;
|
||||
analysis_ready?: boolean;
|
||||
cross_exam_id?: string;
|
||||
}
|
||||
|
||||
// Re-export arena API shapes for consumers that need the full battle data.
|
||||
export type { BattleShape, ContestantShape, CrossExaminationShape };
|
||||
|
||||
export type SessionEvent =
|
||||
| SessionRenamedEvent
|
||||
| ProjectCreatedEvent
|
||||
@@ -262,7 +313,12 @@ export type SessionEvent =
|
||||
| OpenOrchestratorPaneEvent
|
||||
| FlowRunStartedEvent
|
||||
| FlowRunStepUpdatedEvent
|
||||
| OpenFlowLauncherEvent;
|
||||
| OpenFlowLauncherEvent
|
||||
| OpenArenaLauncherEvent
|
||||
| OpenArenaPaneEvent
|
||||
| BattleStartedEvent
|
||||
| ContestantUpdatedEvent
|
||||
| BattleUpdatedEvent;
|
||||
type Listener = (event: SessionEvent) => void;
|
||||
|
||||
const listeners = new Set<Listener>();
|
||||
|
||||
@@ -8,7 +8,13 @@
|
||||
import { useEffect } from 'react';
|
||||
import { WsFrameSchema } from '@boocode/contracts/ws-frames';
|
||||
import { sessionEvents } from './sessionEvents';
|
||||
import type { FlowRunStartedEvent, FlowRunStepUpdatedEvent } from './sessionEvents';
|
||||
import type {
|
||||
BattleStartedEvent,
|
||||
BattleUpdatedEvent,
|
||||
ContestantUpdatedEvent,
|
||||
FlowRunStartedEvent,
|
||||
FlowRunStepUpdatedEvent,
|
||||
} from './sessionEvents';
|
||||
|
||||
const RECONNECT_INITIAL_MS = 1000;
|
||||
const RECONNECT_MAX_MS = 30_000;
|
||||
@@ -49,6 +55,12 @@ export function useCoderUserEvents(): void {
|
||||
sessionEvents.emit(frame as unknown as FlowRunStartedEvent);
|
||||
} else if (frame.type === 'flow_run_step_updated') {
|
||||
sessionEvents.emit(frame as unknown as FlowRunStepUpdatedEvent);
|
||||
} else if (frame.type === 'battle_started') {
|
||||
sessionEvents.emit(frame as unknown as BattleStartedEvent);
|
||||
} else if (frame.type === 'contestant_updated') {
|
||||
sessionEvents.emit(frame as unknown as ContestantUpdatedEvent);
|
||||
} else if (frame.type === 'battle_updated') {
|
||||
sessionEvents.emit(frame as unknown as BattleUpdatedEvent);
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
@@ -204,6 +204,13 @@ function applyFrame(state: State, frame: WsFrame): State {
|
||||
// No-op here to keep TS exhaustiveness satisfied.
|
||||
return state;
|
||||
}
|
||||
case 'battle_started':
|
||||
case 'contestant_updated':
|
||||
case 'battle_updated': {
|
||||
// Arena frames consumed by ArenaPane's own subscription.
|
||||
// No-op here to keep TS exhaustiveness satisfied.
|
||||
return state;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -195,6 +195,13 @@ function applyEvent(prev: SidebarResponse, event: import('./sessionEvents').Sess
|
||||
case 'flow_run_step_updated':
|
||||
// Consumed by useWorkspacePanes / OrchestratorPane / FlowLauncherDialog; sidebar has no stake.
|
||||
return prev;
|
||||
case 'open_arena_launcher':
|
||||
case 'open_arena_pane':
|
||||
case 'battle_started':
|
||||
case 'contestant_updated':
|
||||
case 'battle_updated':
|
||||
// Consumed by useWorkspacePanes / ArenaPane / ArenaLauncherDialog; sidebar has no stake.
|
||||
return prev;
|
||||
case 'project_archived': {
|
||||
const next = prev.projects.filter((p) => p.id !== event.project_id);
|
||||
if (next.length === prev.projects.length) return prev;
|
||||
|
||||
@@ -3,6 +3,7 @@ import type { DragEvent } from 'react';
|
||||
import { toast } from 'sonner';
|
||||
import { api } from '@/api/client';
|
||||
import type {
|
||||
ArenaState,
|
||||
ClosedPaneEntry,
|
||||
HtmlArtifactState,
|
||||
MarkdownArtifactState,
|
||||
@@ -187,6 +188,16 @@ function orchestratorPane(state: OrchestratorState): WorkspacePane {
|
||||
};
|
||||
}
|
||||
|
||||
function arenaPane(state: ArenaState): WorkspacePane {
|
||||
return {
|
||||
id: generateId(),
|
||||
kind: 'arena',
|
||||
chatIds: [],
|
||||
activeChatIdx: -1,
|
||||
arena_state: state,
|
||||
};
|
||||
}
|
||||
|
||||
// v1.9: settings panes are ephemeral. Filter them out before persisting so a
|
||||
// page reload always returns to a clean workspace; the user re-opens via the
|
||||
// sidebar Settings button when needed.
|
||||
@@ -290,6 +301,8 @@ export interface UseWorkspacePanesResult {
|
||||
createTab: (paneIdx: number, kind: WorkspaceTabKind) => Promise<void>;
|
||||
/** Open an orchestrator run pane (or focus an existing one for the same run_id). */
|
||||
addOrchestratorPane: (state: OrchestratorState) => string | null;
|
||||
/** Open an arena battle pane (or focus an existing one for the same battle_id). */
|
||||
addArenaPane: (state: ArenaState) => string | null;
|
||||
/** Back-compat alias for createTab(paneIdx, 'coder'). */
|
||||
createCoderTab: (paneIdx: number) => Promise<void>;
|
||||
// Open-on-first-click, close-on-second-click. Singleton — settings panes
|
||||
@@ -877,6 +890,38 @@ export function useWorkspacePanes(sessionId: string): UseWorkspacePanesResult {
|
||||
});
|
||||
}, [addOrchestratorPane]);
|
||||
|
||||
const addArenaPane = useCallback((state: ArenaState): string | null => {
|
||||
let openedId: string | null = null;
|
||||
setPanes((prev) => {
|
||||
const existingIdx = prev.findIndex(
|
||||
(p) => p.kind === 'arena' && p.arena_state?.battle_id === state.battle_id,
|
||||
);
|
||||
if (existingIdx >= 0) {
|
||||
setActivePaneIdx(existingIdx);
|
||||
openedId = prev[existingIdx]!.id;
|
||||
return prev;
|
||||
}
|
||||
if (nonSettingsCount(prev) >= MAX_PANES) {
|
||||
toast.error(`Maximum ${MAX_PANES} panes`);
|
||||
return prev;
|
||||
}
|
||||
const newPane = arenaPane(state);
|
||||
openedId = newPane.id;
|
||||
const next = [...prev, newPane];
|
||||
setActivePaneIdx(next.length - 1);
|
||||
return next;
|
||||
});
|
||||
return openedId;
|
||||
}, []);
|
||||
|
||||
// Arena pane: open via sessionEvents (fired by the launcher).
|
||||
useEffect(() => {
|
||||
return sessionEvents.subscribe((ev) => {
|
||||
if (ev.type !== 'open_arena_pane') return;
|
||||
addArenaPane(ev.state);
|
||||
});
|
||||
}, [addArenaPane]);
|
||||
|
||||
// Returns the new settings pane id when one is OPENED (so mobile callers can
|
||||
// push ?pane= atomically — see addPaneAndSwitch), or null when it was closed.
|
||||
// Id generated outside the updater so a strict-mode double-invoke agrees.
|
||||
@@ -1121,6 +1166,7 @@ export function useWorkspacePanes(sessionId: string): UseWorkspacePanesResult {
|
||||
addSplitPane,
|
||||
createTab,
|
||||
addOrchestratorPane,
|
||||
addArenaPane,
|
||||
createCoderTab,
|
||||
toggleSettingsPane,
|
||||
removePane,
|
||||
|
||||
55
apps/web/src/lib/permission-mode.ts
Normal file
55
apps/web/src/lib/permission-mode.ts
Normal file
@@ -0,0 +1,55 @@
|
||||
// Unified permission ladder shown in the composer's permission picker. Maps a
|
||||
// curated three-option control (Plan / Ask Permission / Bypass) onto each
|
||||
// provider's native mode vocabulary, derived purely from the snapshot's mode
|
||||
// metadata (`plan` id, the default mode, and the `isUnattended` bypass mode).
|
||||
// `modeId` stays the single wire field sent to the dispatcher — there is no
|
||||
// separate persisted permission field; the active unified mode is derived from
|
||||
// the current `modeId`.
|
||||
import type { ProviderMode } from '@/api/types';
|
||||
|
||||
export type PermissionMode = 'plan' | 'ask' | 'bypass';
|
||||
|
||||
export const PERMISSION_LABELS: Record<PermissionMode, string> = {
|
||||
plan: 'Plan',
|
||||
ask: 'Ask Permission',
|
||||
bypass: 'Bypass',
|
||||
};
|
||||
|
||||
/** The native modeId for a unified permission, or null when the provider has no
|
||||
* modes (e.g. goose). `plan` → the `plan`-id mode; `bypass` → the `isUnattended`
|
||||
* mode; `ask` → the non-unattended default. Falls back to defaultModeId. */
|
||||
export function nativeModeForPermission(
|
||||
mode: PermissionMode,
|
||||
modes: ProviderMode[],
|
||||
defaultModeId: string | null,
|
||||
): string | null {
|
||||
if (modes.length === 0) return null;
|
||||
if (mode === 'plan') return modes.find((m) => m.id === 'plan')?.id ?? defaultModeId;
|
||||
if (mode === 'bypass') return modes.find((m) => m.isUnattended)?.id ?? defaultModeId;
|
||||
return (
|
||||
modes.find((m) => m.id === defaultModeId && !m.isUnattended)?.id ??
|
||||
modes.find((m) => !m.isUnattended && m.id !== 'plan')?.id ??
|
||||
defaultModeId
|
||||
);
|
||||
}
|
||||
|
||||
/** Which unified permission a native modeId corresponds to (for picker state). */
|
||||
export function permissionForModeId(modeId: string | null, modes: ProviderMode[]): PermissionMode {
|
||||
if (!modeId) return 'ask';
|
||||
if (modeId === 'plan') return 'plan';
|
||||
if (modes.find((m) => m.id === modeId)?.isUnattended) return 'bypass';
|
||||
return 'ask';
|
||||
}
|
||||
|
||||
/** The unified permission options a provider supports, in fixed Plan→Ask→Bypass
|
||||
* order. Empty when the provider exposes no modes (no picker shown). */
|
||||
export function availablePermissionModes(
|
||||
modes: ProviderMode[],
|
||||
): Array<{ id: PermissionMode; label: string }> {
|
||||
if (modes.length === 0) return [];
|
||||
const out: Array<{ id: PermissionMode; label: string }> = [];
|
||||
if (modes.some((m) => m.id === 'plan')) out.push({ id: 'plan', label: PERMISSION_LABELS.plan });
|
||||
out.push({ id: 'ask', label: PERMISSION_LABELS.ask });
|
||||
if (modes.some((m) => m.isUnattended)) out.push({ id: 'bypass', label: PERMISSION_LABELS.bypass });
|
||||
return out;
|
||||
}
|
||||
@@ -1,9 +1,11 @@
|
||||
# BooCode roadmap (v1.x–v2.x)
|
||||
|
||||
Last updated: 2026-05-31
|
||||
Last updated: 2026-06-03
|
||||
|
||||
> **Companion doc:** `boocode_code_review.md` holds the full external-repo inventory, lift rationale, and license analysis. This document is the canonical source for shipping state, version ordering, and what's planned vs. shipped.
|
||||
|
||||
> **Shipped since this doc's body was written (v2.7.12–v2.7.17, 2026-06-02→03; see `CHANGELOG.md` for detail):** `v2.7.12-audit-cleanup` (repo-wide dead-code/dedup pass, ~−4,600 LOC), `v2.7.13-contracts-ssot` (the `@boocode/contracts` shared wire-contract package — the "unified types" deferred item), `v2.7.14-backlog-hardening` (5 v2-review items incl. external task-cancel, stall-timeout, retire `:9502` SPA), `v2.7.15-git-diff-panel` + `v2.7.16-container-git-safedir` (Files/Git tab), and `v2.7.17-orchestrator` (the in-app multi-agent Orchestrator on local Qwen). The "Write/edit robustness" and "Claude provider SDK" milestones below — previously marked "planned" — are also now shipped (see those sections).
|
||||
|
||||
## Overview
|
||||
|
||||
BooCode is a **3-app monorepo** at `/opt/boocode/` (locked 2026-05-22):
|
||||
@@ -452,9 +454,9 @@ The original plan (kept for record): expose `boocoder acp` (JSON-RPC over stdio)
|
||||
|
||||
-----
|
||||
|
||||
## Write/edit robustness (planned)
|
||||
## Write/edit robustness — SHIPPED
|
||||
|
||||
**Status: planned, not started.** From the v2 review (`boocode_code_review_v2.md` §5b; `cline/cline`, Apache-2.0 — code-liftable). Two lifts that harden BooCoder's write surface where it's weakest for local quantized models:
|
||||
**Status: SHIPPED (by v2.7.x).** Both lifts are live: the fuzzy patch applier (`apps/coder/src/services/fuzzy-match.ts`, consumed by `pending_changes.ts` — `edit_file` is no longer exact-match) and the `git`-ref checkpoint snapshot (`apps/coder/src/services/checkpoints.ts` → `createCheckpoint`, private `refs/boocode/checkpoints/<id>` ref). The original "planned" note below is retained for provenance. From the v2 review (`boocode_code_review_v2.md` §5b; `cline/cline`, Apache-2.0 — code-liftable). Two lifts that harden BooCoder's write surface where it's weakest for local quantized models:
|
||||
|
||||
1. **Fuzzy patch applier for `edit_file`.** BooCoder's `edit_file` is exact-match today (`apps/coder/src/services/pending_changes.ts` — `if (!content.includes(oldStr)) throw`; no whitespace/unicode tolerance, no multi-occurrence guard). Lift cline's tiered match ladder (exact → `trimEnd` → `trim` → Levenshtein ≥0.66) + unicode canonicalization (dashes, curly quotes, nbsp) + multi-occurrence guard; unmatched → warning, not throw. `apply-patch-parser.ts:347-431`.
|
||||
2. **`git stash create` + private-ref checkpoint.** A per-turn workspace snapshot that captures **all** state — including edits made by dispatched external agents (opencode/claude/qwen/goose), build artifacts, test side-effects — which BooCoder's current `rewind` cannot (it only reverse-applies BooCoder's own queued `pending_changes`). Snapshot stored under a private `refs/…/checkpoints/…` ref, restorable with conversation-trim in sync. `checkpoint-hooks.ts:177-253`.
|
||||
@@ -463,9 +465,9 @@ The original plan (kept for record): expose `boocoder acp` (JSON-RPC over stdio)
|
||||
|
||||
-----
|
||||
|
||||
## Claude provider — SDK transport + native session resume (planned)
|
||||
## Claude provider — SDK transport + native session resume — SHIPPED (enabled 2026-06-03)
|
||||
|
||||
**Status: planned, not started.** From the v2 review (`boocode_code_review_v2.md` §5h–§5i) + a direct read of the published SDK `.d.ts` (`@anthropic-ai/claude-agent-sdk@0.3.158`, reviewed 2026-05-31). Today BooCoder dispatches `claude` one-shot via PTY (`claude --output-format stream-json`) with no continuity. Plan:
|
||||
**Status: BUILT and ENABLED.** The Agent-SDK backend (`apps/coder/src/services/backends/claude-sdk.ts`) and the `PostgresSessionStore` (`claude-session-store.ts`, keyed `(chat_id, agent)`) are implemented; it was shipped behind the `CLAUDE_SDK_BACKEND` env flag (off by default in code) and is **enabled in `apps/coder/.env.host` (`CLAUDE_SDK_BACKEND=1`, confirmed live in the running host service)** — chat-tab `claude` tasks route through the warm SDK backend with native session resume instead of one-shot PTY. The original "planned" note below is retained for provenance. From the v2 review (`boocode_code_review_v2.md` §5h–§5i) + a direct read of the published SDK `.d.ts` (`@anthropic-ai/claude-agent-sdk@0.3.158`, reviewed 2026-05-31). Today BooCoder dispatches `claude` one-shot via PTY (`claude --output-format stream-json`) with no continuity. Plan:
|
||||
|
||||
1. **Adopt the Agent SDK** (`@anthropic-ai/claude-agent-sdk`) over the PTY path. `query({ prompt, options })` yields structured `SDKMessage`s — `SDKSystemMessage` (`subtype:'init'`, carries the session id + tool/skill/mcp lists), `SDKPartialAssistantMessage` (`type:'stream_event'` deltas), `SDKResultMessage` (turn end) — no stdout scraping. `happy` (`slopus/happy`) is the working existence-proof.
|
||||
2. **Native session resume via a pluggable `SessionStore`.** Implement `PostgresSessionStore implements SessionStore` (5 methods: `append`/`load`/`listSessions`/`delete`/`listSubkeys`) over BooCode's Postgres, keyed by `(chat_id, agent)`; drive turns with `query({ options: { sessionStore, resume } })` and the SDK materializes the stored session for the CLI subprocess. **This supersedes happy's SessionStart-hook + jsonl-watcher** — that was a workaround predating the feature (happy pins SDK `^0.2.96`; the `SessionStore` API is `0.3.x`). `importSessionToStore()` migrates an existing local session; `InMemorySessionStore` is the reference shape.
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
# Deferred work — post stale cleanup (2026-05-26)
|
||||
|
||||
> **⚠️ SUPERSEDED (2026-06-03): most items in this doc have since shipped.** Task cancel → abort ACP/PTY child (v2.7.14), unified `packages/types` (v2.7.13 `@boocode/contracts`), retire `apps/coder/web/` fallback SPA (v2.7.14), `console.debug`→pino in the xml-parser (v2.7.14), and the large-file splits (v2.7.12) are all done; the ACP cold-probe skip shipped earlier (v2.3). Treat this doc as historical — see `CHANGELOG.md` (v2.7.12–v2.7.17) for what actually shipped. Kept for the design rationale in the detail sections below.
|
||||
|
||||
This document describes work intentionally **not** shipped in the 2026-05-26 stale/simplify batch. Each item needs a product or architecture decision before implementation. See also [`STALE-DEPRECATED.md`](./STALE-DEPRECATED.md) for what was resolved in that batch.
|
||||
|
||||
Last updated: 2026-05-29
|
||||
|
||||
19
docs/adr/0001-arena-two-lane-scheduling.md
Normal file
19
docs/adr/0001-arena-two-lane-scheduling.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Arena schedules contestants in a local lane (serial) and a cloud lane (parallel)
|
||||
|
||||
A Battle runs the same prompt against 2–6 Contestants. The local llama-swap
|
||||
server can only hold one model in memory at a time, so llama-swap-backed
|
||||
Contestants are placed in a **local lane** and run strictly one at a time, while
|
||||
cloud-backed Contestants (Claude Code, OpenCode-on-cloud) run all in parallel in
|
||||
a **cloud lane**; the two lanes run concurrently. We chose this over running
|
||||
everything serially (too slow for cloud) or everything in parallel (impossible
|
||||
for local, and it would corrupt the speed Benchmark) because the single-model
|
||||
constraint is physical and the serial local lane also gives each local model an
|
||||
uncontended, fair tokens/sec measurement.
|
||||
|
||||
## Consequences
|
||||
|
||||
- A Battle's wall-clock is roughly `max(slowest cloud contestant, sum of local
|
||||
contestants)`. Deep local lanes (especially all-local Q&A battles) are slow by
|
||||
design; the launcher warns when the local lane is deep.
|
||||
- The speed Benchmark (tokens/sec) is only meaningful for local-lane Contestants,
|
||||
which is acceptable since external CLI agents don't report token usage anyway.
|
||||
22
docs/adr/0002-arena-dedicated-tables-not-flow-runner.md
Normal file
22
docs/adr/0002-arena-dedicated-tables-not-flow-runner.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Arena gets dedicated battles/contestants tables and replaces the old API-only arena
|
||||
|
||||
The Arena feature reuses the dispatcher, the `onTaskTerminal` advance hook, the
|
||||
streaming→WS-frame pipeline, and the pane pattern from the Orchestrator, but
|
||||
persists to its **own `battles` + `contestants` tables** rather than the
|
||||
Orchestrator's `flow_runs`/`flow_steps`. A Battle is not shaped like a flow — it
|
||||
has two scheduling lanes, per-contestant benchmarks, on-disk results folders, a
|
||||
two-stage analysis, and cross-examinations — so modelling it as flow steps would
|
||||
fight the schema. Each Contestant links to a real `tasks` row via `task_id`,
|
||||
inheriting all worktree/streaming/dispatch machinery. This also **replaces the
|
||||
earlier v2.0.5 API-only arena** (`POST /api/arena`, `tasks.arena_id`,
|
||||
select-winner): that feature had no UI and no users, and the new Arena is a
|
||||
strict superset, so the old routes and the `tasks.arena_id` column are removed
|
||||
rather than left as a second, competing "arena" concept.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Analysis and cross-examination run through a small pluggable **Analyzer** seam
|
||||
(v1 = default-model two-stage judge). A v2 that drives a Han Orchestrator flow
|
||||
as the analyzer slots in behind that seam without a schema change.
|
||||
- The `arena` pane kind, `ArenaState`, and `battle_*` WS frames are added
|
||||
alongside (not folded into) the Orchestrator's, mirroring its patterns.
|
||||
@@ -28,6 +28,10 @@
|
||||
"./worktree-risk": {
|
||||
"types": "./dist/worktree-risk.d.ts",
|
||||
"default": "./dist/worktree-risk.js"
|
||||
},
|
||||
"./arena": {
|
||||
"types": "./dist/arena.d.ts",
|
||||
"default": "./dist/arena.js"
|
||||
}
|
||||
},
|
||||
"scripts": {
|
||||
|
||||
55
packages/contracts/src/arena.ts
Normal file
55
packages/contracts/src/arena.ts
Normal file
@@ -0,0 +1,55 @@
|
||||
/** Arena types — single source of truth for cross-app Arena wire contracts. */
|
||||
|
||||
export type BattleType = 'coding' | 'qa';
|
||||
export type BattleStatus = 'pending' | 'running' | 'completed' | 'failed' | 'cancelled';
|
||||
export type ContestantStatus = 'queued' | 'running' | 'done' | 'error';
|
||||
export type ContestantLane = 'local' | 'cloud';
|
||||
|
||||
// Pane state — carried on the WorkspacePane row, mirrors OrchestratorState.
|
||||
export interface ArenaState {
|
||||
battle_id: string;
|
||||
battle_type: BattleType;
|
||||
prompt: string;
|
||||
}
|
||||
|
||||
export interface BattleShape {
|
||||
id: string;
|
||||
project_id: string;
|
||||
battle_type: BattleType;
|
||||
prompt: string;
|
||||
status: BattleStatus;
|
||||
winner_contestant_id: string | null;
|
||||
results_path: string | null;
|
||||
error: string | null;
|
||||
created_at: string;
|
||||
updated_at: string;
|
||||
}
|
||||
|
||||
export interface ContestantShape {
|
||||
id: string;
|
||||
battle_id: string;
|
||||
/** Backend name (coding) or persona name (qa). Unique per (battle, model) pair. */
|
||||
identity: string;
|
||||
model: string;
|
||||
lane: ContestantLane;
|
||||
task_id: string | null;
|
||||
worktree_id: string | null;
|
||||
status: ContestantStatus;
|
||||
duration_ms: number | null;
|
||||
tokens_per_sec: number | null;
|
||||
cost_tokens: number | null;
|
||||
result_path: string | null;
|
||||
error: string | null;
|
||||
created_at: string;
|
||||
updated_at: string;
|
||||
}
|
||||
|
||||
export interface CrossExaminationShape {
|
||||
id: string;
|
||||
battle_id: string;
|
||||
/** Backend + model performing the cross-examination. */
|
||||
identity: string;
|
||||
model: string;
|
||||
verdict: string | null;
|
||||
created_at: string;
|
||||
}
|
||||
@@ -358,6 +358,53 @@ export const FlowRunStepUpdatedFrame = z.object({
|
||||
report: z.string().optional(),
|
||||
});
|
||||
|
||||
// ---- arena frames ----------------------------------------------------------
|
||||
|
||||
const ContestantManifestEntry = z.object({
|
||||
id: Uuid,
|
||||
identity: z.string().min(1),
|
||||
model: z.string().min(1),
|
||||
lane: z.enum(['local', 'cloud']),
|
||||
});
|
||||
|
||||
// Published once when a battle starts. Carries the contestant roster so the
|
||||
// ArenaPane can build its grid immediately.
|
||||
export const BattleStartedFrame = z.object({
|
||||
type: z.literal('battle_started'),
|
||||
battle_id: Uuid,
|
||||
battle_type: z.enum(['coding', 'qa']),
|
||||
prompt: z.string(),
|
||||
contestants: z.array(ContestantManifestEntry),
|
||||
});
|
||||
|
||||
// Published on every contestant status change or streaming update.
|
||||
// `delta` carries the latest chunk of streaming output while status='running'.
|
||||
// `battle_status` is present only on the final transition that closes the battle.
|
||||
export const ContestantUpdatedFrame = z.object({
|
||||
type: z.literal('contestant_updated'),
|
||||
battle_id: Uuid,
|
||||
contestant_id: Uuid,
|
||||
status: z.enum(['queued', 'running', 'done', 'error']).optional(),
|
||||
duration_ms: z.number().int().nonnegative().optional(),
|
||||
tokens_per_sec: z.number().nonnegative().optional(),
|
||||
battle_status: z.enum(['pending', 'running', 'completed', 'failed', 'cancelled']).optional(),
|
||||
delta: z.string().optional(),
|
||||
error: z.string().optional(),
|
||||
});
|
||||
|
||||
// Published when battle-level state changes that don't ride on a contestant
|
||||
// update: analysis finished, winner set, cross-exam verdict ready. The pane
|
||||
// uses this to update its analysis panel and winner badge without a refetch.
|
||||
// Fields are all optional — publishers include only what changed.
|
||||
export const BattleUpdatedFrame = z.object({
|
||||
type: z.literal('battle_updated'),
|
||||
battle_id: Uuid,
|
||||
status: z.enum(['pending', 'running', 'completed', 'failed', 'cancelled']).optional(),
|
||||
winner_contestant_id: Uuid.nullable().optional(),
|
||||
analysis_ready: z.boolean().optional(),
|
||||
cross_exam_id: Uuid.optional(),
|
||||
});
|
||||
|
||||
// ---- discriminated union ---------------------------------------------------
|
||||
|
||||
export const WsFrameSchema = z.discriminatedUnion('type', [
|
||||
@@ -381,6 +428,10 @@ export const WsFrameSchema = z.discriminatedUnion('type', [
|
||||
// orchestrator
|
||||
FlowRunStartedFrame,
|
||||
FlowRunStepUpdatedFrame,
|
||||
// arena
|
||||
BattleStartedFrame,
|
||||
ContestantUpdatedFrame,
|
||||
BattleUpdatedFrame,
|
||||
// per-user
|
||||
ChatStatusFrame,
|
||||
SessionUpdatedFrame,
|
||||
@@ -425,6 +476,9 @@ export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
|
||||
'agent_status_updated',
|
||||
'flow_run_started',
|
||||
'flow_run_step_updated',
|
||||
'battle_started',
|
||||
'contestant_updated',
|
||||
'battle_updated',
|
||||
'chat_status',
|
||||
'session_updated',
|
||||
'session_renamed',
|
||||
|
||||
Reference in New Issue
Block a user