- Parallel batch execution: batch field on Step, batchConfig on Flow,
batch-aware readySteps with maxConcurrent gating, getReadyInBatch helper
- SWITCH branching step: new 'switch' StepKind with cases/programmed conditions,
resolveSwitch() pure function, switch-excluded steps tracked in
SchedulerState, non-selected branches excluded from execution
- Add 'timed_out' to flow_runs/flow_steps CHECK constraints
- Add retry_count and max_retries columns to flow_steps
- Add timeout detection in advanceInner loop (configurable FLOW_STEP_TIMEOUT_MS)
- Add retriable logic: re-dispatch on timeout if maxRetries > 0 and retryCount < maxRetries
- Add isRetriable() + shouldRetry() pure decision functions
- Add timed_out handling to reconcileResumeStep and reconcileRun
- Add 'timed_out' to ws-frames enum, publishStep status type
- Remove codecontext service block from docker-compose.yml
- Remove CODECONTEXT_URL env var
- Delete codecontext/Dockerfile
- Update callCodecontext() to try boocontext MCP first with HTTP fallback
- Graceful degradation: if boocontext MCP unavailable, tools still work via HTTP
New plans table (id, project_id, title, description, status, flow_run_id,
progress_pct, items_total, items_completed, metadata, timestamps) with
CHECK constraints and indexes.
Plan store (plan-store.ts): createPlan, getPlan, listPlans, listActivePlans,
updatePlan, updatePlanFromRun, findPlanWithRunningRun, planStatusFromRun.
Flow-runner integration: onRunTerminal callback fires on every terminal
transition (complete/fail/cancel) and updates linked plans automatically.
5 API endpoints: GET /api/plans, GET /api/plans/active, GET /api/plans/:id,
POST /api/plans, PATCH /api/plans/:id.
484 tests pass, build clean.
Shared boocontext MCP client (boocontext_client.ts) wrapping the existing
mcp-client.ts callTool() infrastructure with 32KB truncation and error
handling. Used by get_code_health.
4 new first-class agent tools backed by the boocontext MCP server:
- get_code_health — A-F grades per file across 7 dimensions, project health
summary, refactoring candidates (wraps boocontext_health)
- get_code_impact — merged symbol trace + blast radius in one call (wraps
boocontext_impact, replaces two-step get_symbol_info+get_blast_radius)
- get_type_info — TypeScript type recovery via type-inject MCP (wraps
boocontext_types, returns signatures, interfaces, generics, JSDoc)
- get_code_map — DCP-compressed context map with compress toggle (wraps
boocontext_map, 10x token reduction vs full scan)
All 4 registered in ALL_TOOLS as read-only tools.
In-memory SessionMeta registry tracks active terminal sessions with
paneId, sessionId, projectPath, title, createdAt, lastActivityAt.
GET /api/term/sessions returns all active sessions as JSON array.
Registry is updated on WS attach and cleaned up on disconnect.
KV cache quantization (--cache-type-k q4_0) and ngram speculative decoding
(--spec-type ngram-mod) are high-value llama.cpp features that improve VRAM
usage and tokens/sec. Removing them from the shadowing lists allows agents
to enable them via llama_extra_args.
Implements audit-harness-inspired session lifecycle: audit session
creation/end/recover/report-daily with JSONL buffer and graded context
recovery (L0-L4). Guideline service for behavioral compliance rules
(condition/action model with criticality). Correction service for
persistent user correction tracking across agent sessions.
8 supporting skills: audit-start/end/report-daily/recover + command
variants for slash-command integration.
Adds Inference tab to SettingsPane with controls for temperature, top-p,
top-k, min-p, and other inference parameters. Server-side route and
provider config wiring to pass overrides through the inference pipeline.
New /analytics route: token usage dashboard with aggregate summary,
per-session breakdown, context window stats, and per-category token
distribution. Data served from existing agent_sessions + tool_cost_stats.
New /results route: browsable archive of orchestrator flow runs and
arena battles. Two-tab layout (Analysis Runs / Arena Battles) using
existing API endpoints (no new backend).
Sidebar gains Results (ScrollText icon) and Token Analytics (BarChart3
icon) nav buttons above Settings.
- Approval gate steps pause and await human resolution
- appendStepEvent wired into markStep, failRun, dispatchAgentStep
- Trigger rule unit tests (6 variants)
- New parallel-research flow with one_success trigger
- TriggerRule type (all_success/one_success/all_done) for parallel deps
- Variable substitution ($stepId.output.field) in agent step prompts
- Approval gate step kind (pauses flow via permission frames)
- flow_step_events table for append-only event-sourced step log
- evaluateTriggerRule pure function in flow-runner-decisions
- AgentCapabilitiesSchema with supportsStreaming/Reasoning/Background flags
- supportsStreaming and supportsReasoningStream fields in ProviderSnapshotEntry
- new_task tool: background mode flag for non-blocking subtask dispatch
- File-based memory under .boocode/memory/ (project/user/reference topics)
- Hierarchical 4-scope scan: global → home → project → session
- Keyword/tag relevance matching for query-based recall
- Injected as <boocode-memory> block in system prompt at assembly
- v1 recall-only (extract/dream deferred to v2)
- lsp/ module: types, config, JSON-RPC client, server-manager, operations
- lsp_diagnostics: TypeScript/JavaScript diagnostics for a file
- lsp_goto_definition: find symbol definition at position
- lsp_find_references: find all references to a symbol
- Registered as READ_TOOLS in tool index
Root cause: two proven corruption mechanisms — (M1) non-idempotent apply
stamped the same block N times when a quantized model re-emitted the same
edit_file call or a turn was retried; (M2) Levenshtein tier 4 was fail-open
with no uniqueness guard, silently splicing into the wrong location.
Fixes applied at every layer of the pipeline:
Matcher (fuzzy-match.ts): raise SIMILARITY_THRESHOLD 0.66 → 0.85; add
AMBIGUITY_EPSILON uniqueness guard — two windows within 0.05 of the top
score → ambiguous, not a guess; add block-anchor gate (≥3-line needles
require first+last line exact match before a window is scored).
Edit planner (pending_changes.ts): extract planEdit() as a pure function;
idempotency guards detect already-applied states (anchored insert re-stamp,
old-gone-but-new-present); findPendingDuplicate() collapses identical
pending rows at queue time so M1 never reaches applyOne.
Atomic writes (pending_changes.ts): temp-file + rename on the same
filesystem so a crash can't leave a half-written source file; realpath()
first so symlinks survive the rename.
Per-file mutex (pending_changes.ts): withFileLock() serializes concurrent
read-modify-write on the same path via a chained-Promise Map.
EOL preservation (pending_changes.ts): normalize CRLF → LF for matching,
restore native line ending on write so Windows-style files stay clean.
Context isolation (inference_context.ts): replace module-level singleton
with AsyncLocalStorage so concurrent inference runs (arena parallel
dispatch, dispatcher poll racing a user message) each get their own
scoped context with no clobbering.
Tests: plan-edit.test.ts (pure planEdit unit tests), extended fuzzy-match
and pending_changes_integration suites, ALS isolation test that proves
overlapping runs get correct session IDs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Arena is a new pane kind for competitive AI evaluation. A Battle runs
the same prompt against 2-6 Contestants across two concurrent lanes:
local lane (llama-swap models, serial) and cloud lane (parallel).
Added to all three registries: @boocode/contracts WsFrameSchema,
server InferenceFrame, and web WsFrame.
Backend (apps/coder):
- arena-runner: battle scheduler, lane classifier, benchmark, results
writer, resume, user winner override
- arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL
- arena-decisions: status transitions and resume logic (unit-tested)
- arena-analyzer-helpers: pure helper functions (unit-tested)
- arena-model-call: model call utility for analysis
- arena routes: create/get/list/stop/analyze/cross-examine/winner/diff
- schema: battles, contestants, cross_examinations tables (idempotent)
- remove old /api/arena* routes and tasks.arena_id column
Frontend (apps/web):
- ArenaLauncherDialog: battle type, prompt, contestant selection
- ArenaPane: live roster, streaming output, analysis, cross-exam
- DiffView: unified diff with line-by-line color for coding contests
- Winner override per-row dropdown (Trophy icon)
- battle_updated WS handler for live winner/analysis updates
- arena pane kind in Workspace, ChatTabBar, useSidebar
Cross-app:
- ArenaState and ArenaContestantShape/WsFrame types (contracts)
- battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame
- manifest.json written per battle results folder
- /Arena added to .gitignore
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the raw per-agent mode dropdown in the BooCoder composer with a
curated three-option permission ladder mapped generically onto each
provider's native modes: `plan` id -> Plan, default -> Ask, isUnattended
-> Bypass (claude bypassPermissions, qwen yolo, opencode full-access).
modeId stays the single wire field; the active unified mode is derived
from it (no contracts change).
Native BooCode gains its own mode set: Ask stages to the pending-changes
queue (today's behavior), Bypass auto-applies the queue to disk after the
turn (interactive messages path + task dispatcher path), Plan falls back
to Ask. The shared apps/server inference engine is left untouched.
Also preserve isUnattended on live-probed ACP modes so opencode's bypass
mode stays detectable from the wire.
Coder 373 tests green; coder + web typecheck clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Give the expand chevrons the BooCoder outline-button look (border-border
bg-background, hover:bg-muted, filled when expanded) instead of the borderless
ghost style. Applies to both BooChat's flat menu and BooCoder's grouped menu.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Most plugin/han SKILL.md and command files write `description:` as a folded
block scalar (`>` / `|`) with the text on the following indented lines. The
old single-line frontmatter reader captured the literal `>`, so the slash
menu showed garbage/blank descriptions for nearly all of them. frontmatterField
now collapses folded blocks (join with spaces) and preserves literal blocks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document the in-app Orchestrator engine and its load-bearing read-only
invariant in apps/coder/CLAUDE.md, and note that apps/coder/.env.host is
now gitignored (recreated from .env.example with CLAUDE_SDK_BACKEND=1).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Untrack the host env file (git rm --cached, kept on disk for the boocoder
service) and widen .gitignore to .env.* (re-including .env.example) so env
files no longer get committed. The file's prior contents (dev DB password +
internal Tailscale URLs; no API keys) remain in history — left as-is given the
single-user Tailscale-only threat model.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each flow row in the launcher and each command in the / slash picker now
shows an always-on one-liner with a chevron that expands a 1-2 sentence
what/when blurb (condensed from the Han skill descriptions). Launcher gets a
read-only pill and a per-row Run separate from expand; the fast/concise
toggle is now wired through to the conductor workers. Shared ChatInput, so
the slash explanations cover both BooChat and BooCoder. Web tsc clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>