New plans table (id, project_id, title, description, status, flow_run_id,
progress_pct, items_total, items_completed, metadata, timestamps) with
CHECK constraints and indexes.
Plan store (plan-store.ts): createPlan, getPlan, listPlans, listActivePlans,
updatePlan, updatePlanFromRun, findPlanWithRunningRun, planStatusFromRun.
Flow-runner integration: onRunTerminal callback fires on every terminal
transition (complete/fail/cancel) and updates linked plans automatically.
5 API endpoints: GET /api/plans, GET /api/plans/active, GET /api/plans/:id,
POST /api/plans, PATCH /api/plans/:id.
484 tests pass, build clean.
Implements audit-harness-inspired session lifecycle: audit session
creation/end/recover/report-daily with JSONL buffer and graded context
recovery (L0-L4). Guideline service for behavioral compliance rules
(condition/action model with criticality). Correction service for
persistent user correction tracking across agent sessions.
8 supporting skills: audit-start/end/report-daily/recover + command
variants for slash-command integration.
Adds Inference tab to SettingsPane with controls for temperature, top-p,
top-k, min-p, and other inference parameters. Server-side route and
provider config wiring to pass overrides through the inference pipeline.
New /analytics route: token usage dashboard with aggregate summary,
per-session breakdown, context window stats, and per-category token
distribution. Data served from existing agent_sessions + tool_cost_stats.
New /results route: browsable archive of orchestrator flow runs and
arena battles. Two-tab layout (Analysis Runs / Arena Battles) using
existing API endpoints (no new backend).
Sidebar gains Results (ScrollText icon) and Token Analytics (BarChart3
icon) nav buttons above Settings.
- Approval gate steps pause and await human resolution
- appendStepEvent wired into markStep, failRun, dispatchAgentStep
- Trigger rule unit tests (6 variants)
- New parallel-research flow with one_success trigger
- TriggerRule type (all_success/one_success/all_done) for parallel deps
- Variable substitution ($stepId.output.field) in agent step prompts
- Approval gate step kind (pauses flow via permission frames)
- flow_step_events table for append-only event-sourced step log
- evaluateTriggerRule pure function in flow-runner-decisions
- AgentCapabilitiesSchema with supportsStreaming/Reasoning/Background flags
- supportsStreaming and supportsReasoningStream fields in ProviderSnapshotEntry
- new_task tool: background mode flag for non-blocking subtask dispatch
- lsp/ module: types, config, JSON-RPC client, server-manager, operations
- lsp_diagnostics: TypeScript/JavaScript diagnostics for a file
- lsp_goto_definition: find symbol definition at position
- lsp_find_references: find all references to a symbol
- Registered as READ_TOOLS in tool index
Root cause: two proven corruption mechanisms — (M1) non-idempotent apply
stamped the same block N times when a quantized model re-emitted the same
edit_file call or a turn was retried; (M2) Levenshtein tier 4 was fail-open
with no uniqueness guard, silently splicing into the wrong location.
Fixes applied at every layer of the pipeline:
Matcher (fuzzy-match.ts): raise SIMILARITY_THRESHOLD 0.66 → 0.85; add
AMBIGUITY_EPSILON uniqueness guard — two windows within 0.05 of the top
score → ambiguous, not a guess; add block-anchor gate (≥3-line needles
require first+last line exact match before a window is scored).
Edit planner (pending_changes.ts): extract planEdit() as a pure function;
idempotency guards detect already-applied states (anchored insert re-stamp,
old-gone-but-new-present); findPendingDuplicate() collapses identical
pending rows at queue time so M1 never reaches applyOne.
Atomic writes (pending_changes.ts): temp-file + rename on the same
filesystem so a crash can't leave a half-written source file; realpath()
first so symlinks survive the rename.
Per-file mutex (pending_changes.ts): withFileLock() serializes concurrent
read-modify-write on the same path via a chained-Promise Map.
EOL preservation (pending_changes.ts): normalize CRLF → LF for matching,
restore native line ending on write so Windows-style files stay clean.
Context isolation (inference_context.ts): replace module-level singleton
with AsyncLocalStorage so concurrent inference runs (arena parallel
dispatch, dispatcher poll racing a user message) each get their own
scoped context with no clobbering.
Tests: plan-edit.test.ts (pure planEdit unit tests), extended fuzzy-match
and pending_changes_integration suites, ALS isolation test that proves
overlapping runs get correct session IDs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Arena is a new pane kind for competitive AI evaluation. A Battle runs
the same prompt against 2-6 Contestants across two concurrent lanes:
local lane (llama-swap models, serial) and cloud lane (parallel).
Added to all three registries: @boocode/contracts WsFrameSchema,
server InferenceFrame, and web WsFrame.
Backend (apps/coder):
- arena-runner: battle scheduler, lane classifier, benchmark, results
writer, resume, user winner override
- arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL
- arena-decisions: status transitions and resume logic (unit-tested)
- arena-analyzer-helpers: pure helper functions (unit-tested)
- arena-model-call: model call utility for analysis
- arena routes: create/get/list/stop/analyze/cross-examine/winner/diff
- schema: battles, contestants, cross_examinations tables (idempotent)
- remove old /api/arena* routes and tasks.arena_id column
Frontend (apps/web):
- ArenaLauncherDialog: battle type, prompt, contestant selection
- ArenaPane: live roster, streaming output, analysis, cross-exam
- DiffView: unified diff with line-by-line color for coding contests
- Winner override per-row dropdown (Trophy icon)
- battle_updated WS handler for live winner/analysis updates
- arena pane kind in Workspace, ChatTabBar, useSidebar
Cross-app:
- ArenaState and ArenaContestantShape/WsFrame types (contracts)
- battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame
- manifest.json written per battle results folder
- /Arena added to .gitignore
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the raw per-agent mode dropdown in the BooCoder composer with a
curated three-option permission ladder mapped generically onto each
provider's native modes: `plan` id -> Plan, default -> Ask, isUnattended
-> Bypass (claude bypassPermissions, qwen yolo, opencode full-access).
modeId stays the single wire field; the active unified mode is derived
from it (no contracts change).
Native BooCode gains its own mode set: Ask stages to the pending-changes
queue (today's behavior), Bypass auto-applies the queue to disk after the
turn (interactive messages path + task dispatcher path), Plan falls back
to Ask. The shared apps/server inference engine is left untouched.
Also preserve isUnattended on live-probed ACP modes so opencode's bypass
mode stays detectable from the wire.
Coder 373 tests green; coder + web typecheck clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Most plugin/han SKILL.md and command files write `description:` as a folded
block scalar (`>` / `|`) with the text on the following indented lines. The
old single-line frontmatter reader captured the literal `>`, so the slash
menu showed garbage/blank descriptions for nearly all of them. frontmatterField
now collapses folded blocks (join with spaces) and preserves literal blocks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document the in-app Orchestrator engine and its load-bearing read-only
invariant in apps/coder/CLAUDE.md, and note that apps/coder/.env.host is
now gitignored (recreated from .env.example with CLAUDE_SDK_BACKEND=1).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Untrack the host env file (git rm --cached, kept on disk for the boocoder
service) and widen .gitignore to .env.* (re-including .env.example) so env
files no longer get committed. The file's prior contents (dev DB password +
internal Tailscale URLs; no API keys) remain in history — left as-is given the
single-user Tailscale-only threat model.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings the deterministic Han-flow conductor into BooCode: launch any read-only
flow from BooChat or BooCoder, watch each agent stream live in a Paseo-style
run pane, get an evidence-disciplined report — on local Qwen, persisted and
resumable. Read-only enforced hard via qwen --approval-mode plan (orchestrator
tasks fail closed if qwen is unavailable; never fall to write-capable native).
Backend (apps/coder): re-homed conductor defs, flow_runs/flow_steps schema,
flow-runner + dispatcher onTaskTerminal hook, restart-resume, runs routes
(launch/list/get/cancel), user-channel WS. Contracts: two flow_run_* frames.
Web: orchestrator pane kind + OrchestratorPane, Workflow button + slash flows
(BooChat/BooCoder parity), FlowLauncherDialog, "New Orchestrator" in the + and
split menus, runs history + export. Plan: openspec/changes/orchestrator.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Checkpoint of in-flight work so the orchestrator branch can rebase onto a
clean main: ContextBar → ContextMeter, model-label helper, model/agent picker
+ provider-snapshot/registry changes, inference payload + message-columns.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five independent items from the post-review backlog. F1: Stop on an external
agent task now aborts the running child via a per-task AbortController registry
reachable from the cancel route, and finalizes the assistant message as
cancelled (fixing two latent bugs — catch blocks left the message streaming,
and warm success-paths wrote complete on an aborted turn); warm pools/worktrees
are preserved and the native path is unchanged. F2/F3: prune the tool-call
parser to its two load-bearing exports (unexport eight zero-caller symbols, add
a gate test for the <invoke>-as-text fallback) and route placeholder-rejection
logging through pino. F6: a 90s per-chunk stall-timeout wraps native inference's
fullStream via AbortSignal.any so a hung stream finalizes the message instead of
hanging — no retry (a pure classifyStreamError helper is added). F7: a read-only
view_session_history MCP tool (newest-N, chronological). F9: retire the unused
apps/coder/web :9502 fallback SPA, keeping every API/WS/health/MCP route.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The audit-cleanup migration dropped tasks.feature_values/worktree_path, but
human_inbox is `SELECT * FROM tasks` and pins every column, so the DROP COLUMN
failed (2BP01) on any existing DB and crash-looped boocoder on boot. Drop the
view, drop the columns, then recreate it — idempotent on fresh and existing DBs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
worktree-risk.ts now returns the package's WorktreeRiskReport (local RiskReport interface removed); frame-emitter.ts imports WsFrame from @boocode/contracts/ws-frames (the deleted @boocode/server/ws-frames subpath).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move all hand-synced cross-app wire contracts into one built workspace
package, @boocode/contracts, consumed by server/web/coder/coder-web via
workspace:* + a per-subpath exports map. The ws-frames and provider-config
Zod schemas are schema-first (z.infer); MessageMetadata, ErrorReason,
AgentSessionConfig, the provider snapshot types, and WorktreeRiskReport are
each single-sourced. Deletes the byte-identical copies and their parity
tests, fixes a live AgentSessionConfig drift (coder dead copy removed,
unified to the web required/nullable shape), removes the dead pending_change
WS arms in the fallback SPA, and inverts the build order (contracts builds
first) across root build, Dockerfile, and the coder deploy docs. Reverses
the shared-package decision declined in v2.5.12.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CoderPane hydrates from the HTTP listMessages fetch (SELECT has model) AND the WS snapshot frame, and the snapshot handler setMessages-overwrites the HTTP load. The snapshot query in apps/coder/src/routes/ws.ts had its own column list that omitted model, so on coder refresh the chip's model was lost (it showed live via the message_complete frame). One-column fix: add model to that SELECT. CLAUDE.md mapper-chain note updated to list the WS snapshot SELECT.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In-flight workspace UX work.
- Extract a shared PaneHeaderActions cluster (+/Split/Reopen/History/Close)
used by ChatTabBar + the Workspace coder/terminal pane headers, replacing the
divergent per-header copies; SessionLandingPage history + useWorkspacePanes
tweaks.
- Fix coder-side correctness bug: resolveChatId read sessions.workspace_panes as
a bare WorkspacePane[] but v2.6.5 widened it to a WorkspaceState envelope, so
it mis-read panes and clobbered tabNumbers/nextTabNumber/closedPaneStack on
every pane-chat write. New normalizeWorkspaceState handles either shape and
preserves the envelope (+ regression test).
- CLAUDE.md doc-sync (coder vitest suite, deploy-by-surface, dual-remote push,
in-flight-web-WIP staging, release-branch naming).
Web tsc + coder build + coder tests green. Builds on v2.7.6.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scoped half of boocode_code_review_v2 §1 #10 — publish the agent status
BooCoder already observes (the config-injection notify-hook is the documented
follow-on, clean-room from superset ELv2).
- agent_status_updated WS frame (working|blocked|idle|error), server+web parity.
- Published from the dispatcher's turn boundaries (warm-acp/opencode/sdk/pty:
working at start, idle/error at end) + the permission flow (blocked/working).
Best-effort, never breaks a turn.
- Clean-room normalizeAgentEvent helper (superset's vendor-event -> Start/blocked
/Stop collapse, event names as facts) + 25 tests — reused by the follow-on.
- AgentComposerBar status dot (distinct from the WS-liveness dot), tracked per
(chat,agent) by a useAgentStatus map in CoderPane.
Built by 2 parallel agents vs a pinned frame contract. Server 545 + coder 294
tests passing (25 new); web tsc + builds clean; ws-frames parity green. Clears
the actionable review backlog (#1/#3/#4/#6-#12). Builds on v2.7.5.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lands the lean-SDK direction (boocode_code_review_v2 §1 #9) behind a flag.
Adds @anthropic-ai/claude-agent-sdk@0.3.159 (Commercial Terms, runtime dep).
- PostgresSessionStore: clean-room impl of the SDK's real SessionStore type
over a new claude_session_entries table. Typechecks against the SDK type;
8 DB-integration tests.
- ClaudeSdkBackend (implements AgentBackend): one warm query() per (chat,claude)
in streaming-input mode via a pushable async-iterable pump, sessionStore +
resume continuity, pure mapSdkMessage->AgentEvent, session_id from init,
usage/cost onto agent_sessions (backend CHECK gains 'claude_sdk').
- Routing env-gated by CLAUDE_SDK_BACKEND (default off) -> PTY path UNCHANGED.
- Built against real SDK 0.3.159 types (install paid off: partial=stream_event
needing includePartialMessages, MessageParam, result error arm).
- Fix latent test-infra deadlock: serialize DB suites (fileParallelism:false).
Coder 269 passing default / 290 with DB; tsc clean vs SDK types; builds clean.
LIVE pump + resume + actual claude turn need a host smoke (CLAUDE_SDK_BACKEND=1
+ claude binary + auth). zod peer-dep wants ^4 (workspace 3.25). Builds on v2.7.4.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three small wins from boocode_code_review_v2 §1 #11/#7/#8.
#11 sampling knobs: top_n_sigma + dry_* family as first-class Agent fields,
threaded into the request body via providerOptions.openaiCompatible. Fixes a
latent bug — top_k (rejected by the AI-SDK provider) and min_p (never passed to
streamText) were dead on the wire; both now route through the same channel.
--reasoning-budget documented in data/AGENTS.md.
#7 live PTY stream-json: new stream-json-parser.ts line-buffers qwen/claude
NDJSON and emits text/reasoning/tool frames live + persists, with a fallback to
the old opaque slice. claude gets --output-format stream-json --verbose.
#8 token UI: agent_sessions input/output_tokens/cost now flow through the route
+ type and render beside the AgentComposerBar session chip.
Built by 3 parallel agents. Server 523 + coder 245 tests passing; builds + web
tsc clean. Builds on v2.7.2. openspec sampling-streamjson-tokens.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flagged by the automated push security review on v2.7.1.
- GET /checkpoints?chat_id= : the chat_id branch filtered by chat_id alone
(any session's chat_id read its checkpoints). Now joins chats and gates on
chats.session_id.
- restoreCheckpoint scope guard was fail-open: `cp.session_id && cp.session_id
!== sessionId` fell through on a null denormalized session_id, allowing a
cross-session restore (worktree reset + transcript trim). Now resolves the
owning session via the checkpoint's chat and denies on missing/mismatch.
- Adds a DB-integration regression for the null-session_id cross-session case.
Both scope authoritatively through chats.session_id (checkpoints.session_id is
a nullable hint). Coder suite 234 passing; 7/7 checkpoint tests (incl. the
regression) against live postgres+git; typecheck clean. Hotfix on v2.7.1.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#3 Fuzzy patch applier: new pure fuzzy-match.ts (locateMatch, exact→trim→
unicode-canon→Levenshtein≥0.66, refuse-on-ambiguous) wired into pending_changes
applyOne/rewindOne so local-model whitespace/unicode drift in old_string no
longer loses the edit.
#4 Worktree checkpoint + conversation-trim: checkpoints table + checkpoints.ts
(shadow-commit of tracked+untracked into refs/boocode/checkpoints, hooked into
the 3 external-agent dispatcher paths) + POST restore route (reset --hard +
clean -fd -> transcript trim -> backend-session reset) + "Restore to here" UI.
Built by 3 parallel agents; DB-integration testing caught a created_at
self-deletion bug. Coder suite 234 passing; server+coder build + web tsc clean.
Builds on v2.7.0-mit. openspec write-edit-robustness.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WarmAcpBackend (AgentBackend) holds one persistent goose acp / qwen --acp child + ClientSideConnection + ACP session per (chat,agent); initialize+session/new once, reused across turns. Abort = session/cancel the prompt only (never kills the child); child exit -> agent_sessions.status='crashed' -> re-spawn next turn. Dispatcher routes goose/qwen chat-tab tasks to the pooled warm backend via pure shouldUseWarmBackend (needs session_id+chat_id); one-shot runExternalAgent kept as fallback for arena/MCP/new_task. handleSessionUpdate extracted to a shared pure acp-event-map.ts (one-shot path byte-identical). SDK: installed @agentclientprotocol/sdk@^0.22.1 has stable resumeSession/loadSession; resume moot in the warm hot path, deferred to Phase 3. 15 new tests (warm-acp-routing, acp-event-map); 180 coder tests pass; tsc + build clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pending_changes.agent stamped at every queue site (native -> 'boocode', dispatched external -> task.agent, manual RightRail -> NULL) + flows through listPending. New GET /api/sessions/:id/agent-sessions -> [{agent,status,has_session,last_active_at}] per (chat,agent). opencode warm server consumes session.next.step.ended, accumulating input_tokens/output_tokens/cost onto agent_sessions (new idempotent columns) via a pure opencode-usage.ts mapper. Tests: agent-sessions.routes (3) + opencode-usage (6); tsc clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
opencode emits one trailing session.idle/error for a turn cancelled via client.session.abort(), carrying only a sessionID (no turn id). The warm-server backend settled activeTurn on that event, so after Stop + an immediate new message the orphan idle settled the NEXT turn early as success (one-click reachable since v2.6.5's Send->Stop composer).
Adds a pure per-session guard (backends/turn-guard.ts: armAbortGuard / noteTurnActivity / consumeTerminal over swallowNextTerminal) wired into opencode-server.ts: abort arms it, the next terminal is swallowed once, and a new turn's first delta self-heals so a never-arriving orphan can't strand a real turn. Test-first; 3 regression tests in turn-guard.test.ts. Paseo parallel: 1d38aac.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The P1.5-b re-key block (cb1846c) re-adds session_id_fkey as ON DELETE
SET NULL, but the whole block is guarded on chat_id_fkey's absence. A DB
already re-keyed to (chat_id, agent) while session_id_fkey was still
ON DELETE CASCADE never re-enters that block, so applySchema leaves it at
'c' forever — diverging from the schema's stated intent, from worktree_id
(already SET NULL), and from the v2.6.3 changelog's own claim that
session_id is informational SET NULL.
Add a standalone confdeltype-guarded block (mirroring the session_worktrees
defang) that flips session_id_fkey CASCADE -> SET NULL independently of the
re-key gate. Idempotent: fires only while the FK is still 'c' — a no-op on a
fresh deploy (already 'n' from the re-key block) and on every re-run. The
live DB was converged by hand with the identical statements; \d
agent_sessions now shows session_id ... ON DELETE SET NULL.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The tab (a chat) is the context unit: two opencode tabs in one session are two independent agent contexts sharing one worktree. agent_sessions re-keys from (session_id, agent) to (chat_id, agent) — chat_id FK ON DELETE CASCADE (closing a tab ends its context); worktree_id and session_id become informational SET NULL columns. New worktrees table (one-per-session, survives session delete via session_id SET NULL) supersedes session_worktrees, which is defanged (CASCADE dropped) not yet removed. chat_id is threaded end-to-end: tasks.chat_id added, written by the coder message + skills routes from the frontend tab, read by runOpenCodeServerTask which falls back to resolve-or-create a chat for session-less creators (arena/MCP/new_task/generic) so ensureSession never gets a null key. Idempotent migration with a backfill-verify gate (0-row assertion after the test session was deleted). config_hash fingerprint logic preserved; one-worktree-per-session unchanged; runExternalAgent untouched. Column rename worktree_path -> path repointed at all five readers (server delete-guard, risk/stash endpoints, ensureSessionWorktree). Supersedes the earlier (worktree_id) draft.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>