Arena is a new pane kind for competitive AI evaluation. A Battle runs
the same prompt against 2-6 Contestants across two concurrent lanes:
local lane (llama-swap models, serial) and cloud lane (parallel).
Added to all three registries: @boocode/contracts WsFrameSchema,
server InferenceFrame, and web WsFrame.
Backend (apps/coder):
- arena-runner: battle scheduler, lane classifier, benchmark, results
writer, resume, user winner override
- arena-analyzer: two-stage digest→judge analysis on DEFAULT_MODEL
- arena-decisions: status transitions and resume logic (unit-tested)
- arena-analyzer-helpers: pure helper functions (unit-tested)
- arena-model-call: model call utility for analysis
- arena routes: create/get/list/stop/analyze/cross-examine/winner/diff
- schema: battles, contestants, cross_examinations tables (idempotent)
- remove old /api/arena* routes and tasks.arena_id column
Frontend (apps/web):
- ArenaLauncherDialog: battle type, prompt, contestant selection
- ArenaPane: live roster, streaming output, analysis, cross-exam
- DiffView: unified diff with line-by-line color for coding contests
- Winner override per-row dropdown (Trophy icon)
- battle_updated WS handler for live winner/analysis updates
- arena pane kind in Workspace, ChatTabBar, useSidebar
Cross-app:
- ArenaState and ArenaContestantShape/WsFrame types (contracts)
- battle_* frames in WsFrameSchema, InferenceFrame, and web WsFrame
- manifest.json written per battle results folder
- /Arena added to .gitignore
Task model infrastructure for cheap LLM calls (auto-naming, search
rewrite, tags, summaries) via a dedicated llama-server instance at
TASK_MODEL_URL, falling back to LLAMA_SWAP_URL with FAST_MODEL when
unset. Replaces the inline fetch in auto_name.ts with taskModelCompletion.
Adds search query rewriting: on step 0 when web tools are enabled, the
user's message is summarized into a search intent hint appended to the
system prompt, improving web_search relevance.
Schema: tasks table for provider dispatch and arena, sessions.tags column.
Removes the dual-write into messages.tool_calls / messages.tool_results JSON
columns and drops the columns. message_parts is now the only source of truth
for tool calls and tool results.
10 dual-write sites stripped (5 in tool-phase.ts, 2 in routes/skills.ts, 2 in
routes/messages.ts, 1 in routes/chats.ts fork-clone). The recon-driven grep
caught 2 sites beyond the original v1.13.2 roadmap inventory and an extra
fixture file (tool_cost_stats.test.ts) with a direct legacy-column INSERT.
messages_with_parts view rewritten to parts-only subselects (COALESCE
fallbacks gone). View runs via CREATE OR REPLACE so it lands before the
column DROPs in startup DDL — Postgres rejects column-drop on view-referenced
cols. v1.12.1 cleanup DO block (DROP CONSTRAINT messages_status_check /
messages_role_check) removed; those one-shots have done their work.
Adversarial review caught a runtime bug the green test suite missed: the
discard_stale endpoint (chats.ts) had a RETURNING ... tool_calls, tool_results
clause that would have crashed on every 60s-no-token-activity recovery in
production. Fixed by switching to two-step UPDATE returning id, then SELECT
from messages_with_parts so parts-synthesized fields keep flowing on the wire.
Message API type retains tool_calls? / tool_results? — the view synthesizes
those keys from parts so the wire shape is unchanged; frontend reads need no
update. Override on the original v1.13.2 plan, captured in the openspec
proposal.
339/339 server tests passing (including 7 DB-integration tests that applied
the schema migration to a live DB and ran the parts-only view end-to-end).
tsc + web build clean.
Pairs with v1.13.0-ai-sdk-v6 (introduced the dual-write) and v1.13.1-B (moved
the read path to messages_with_parts). Umbrella v1.13 tag ships on this same
commit, marking the strangler-fig closed.
CLAUDE.md picks up Sam's pre-existing edits documenting tag-naming and
CHANGELOG conventions — both already in use by v1.13.19 / v1.13.20.
Every assistant message gets an "Open in pane" affordance that opens the
message in the workspace splitter — Markdown pane (Copy + Download .md) by
default; HTML pane (Download .html only) when the model emits a self-contained
<!DOCTYPE html> or fenced ```html artifact. BOOCHAT.md rule keeps Markdown
default at every length; HTML opt-in on explicit user request.
Backend: services/artifacts.ts (slug derivation + write helpers with
symlink-escape guard via realpath-after-mkdir), routes/artifacts.ts (POST
download + GET stream with nosniff + CSP sandbox defense-in-depth), HTML
detection in finalizeCompletion writing a new message_parts.kind='html_artifact'
row (schema CHECK extended via v1.13.13 pattern), graceful 1MB cap via the
pure decideHtmlArtifactWrite helper. PartKind union extended.
Frontend: MarkdownRenderer.tsx extracted from MessageBubble's inline
MarkdownBody for reuse; MarkdownArtifactPane.tsx + HtmlArtifactPane.tsx with
loading/error states; pane state is reference-only ({chat_id, message_id,
title}) — content fetched on mount to keep workspace_panes jsonb small and
avoid 1MB blobs riding session_workspace_updated frames. iframe sandbox
locked to allow-scripts allow-clipboard-write allow-downloads with no
allow-same-origin, srcDoc not src. openInPane discriminates 404 (expected
fallback) from real errors (toast + bail). PanelRightOpen icon button with
mobile 44px tap-target.
31 new server unit tests including a real-symlink filesystem case; 332/332
server tests passing, tsc clean both sides, pnpm -C apps/web build green.
Smoke deferred to first deploy.
When the agent needed context from another repo, pathGuard rejected every read
with no recovery path. This batch adds a reactive request_read_access flow:
pathGuard's error now hints at the tool, the model emits a structured request,
the inference loop pauses (same mechanism as ask_user_input), the user picks
Allow/Deny via inline chips, and subsequent reads under the granted root succeed
for the rest of the session.
Schema: sessions.allowed_read_paths TEXT[] NOT NULL DEFAULT ARRAY[]::TEXT[]
(idempotent ADD COLUMN IF NOT EXISTS).
Grant unit (design D1): nearest registered projects.path ancestor →
nearest repo-shaped ancestor (.git/ / package.json / go.mod / Cargo.toml)
under PROJECT_ROOT_WHITELIST → else refuse. grant_resolver.ts walks
ancestors with a per-iteration whitelist invariant check so symlinked
input can't escape the whitelist mid-walk (Sam's checkpoint-1 ask).
Path-guard: optional extraRoots arg threaded from session.allowed_read_paths
through executeToolCall to view_file / list_dir / grep / find_files. The
ToolDef.execute signature gets an optional third param; non-FS tools
ignore it. view_file re-anchors the secret-guard check on basename(real)
whenever a relative path starts with "../" so .env / id_rsa* etc. still
deny across grant roots.
Endpoint: POST /api/chats/:id/grant_read_access mirrors /answer_user_input.
On 'allow' it re-resolves the grant root (state may have changed since
prompt — auto-falls to denial reason text on failure, not 500), array_appends
to sessions.allowed_read_paths with in-memory dedup, then publishes
tool_result + session_updated frames and enqueues the next assistant turn.
PATCH /api/sessions/:id allowed_read_paths supports revocation only. Zod
refines absolute + no traversal markers; runtime findUnauthorizedAdditions
guard rejects any entry not already present in the row, so a malicious
curl -X PATCH -d '{"allowed_read_paths":["/etc"]}' returns 400 instead of
bypassing the grant flow (Sam's compliance-review action item).
Frontend: RequestReadAccessCard renders pending (path + reason + Allow/Deny)
and answered (granted/denied summary with the resolved root) variants;
MessageList.flatten/group special-cases the tool name; SettingsPane adds a
per-session grants list with per-row revoke that PATCHes the shortened
array.
Tests: 11 grant_resolver, 8 path_guard, 8 sessions PATCH subset, including
explicit cases for symlink escape mid-walk, walk-bound termination at
whitelist root, /etc bypass attempt via PATCH, and nearest-project
disambiguation. 292 total server tests green.
Pairs with v1.13.16-xml-parser — the model now self-recovers from both
a wrong tool name AND from a refused path.
After a codecontext overview-class tool call lands (get_codebase_overview,
get_framework_analysis, get_semantic_neighborhoods), the pipeline runs a
second inference pass that replaces the recursive runAssistantTurn. The
synth pass auto-fetches the top-N source files referenced in the
codecontext output plus project docs (BOOCHAT.md, AGENTS.md,
*roadmap*.md, CONTEXT.md), applies a 32k-token budget with explicit
drop-priority, and streams a structured response that grounds the model
in real load-bearing code rather than relying on the codecontext summary
alone. Smoke #1 (default) and #2 (Architect) both cite the correct
inference/turn.ts + tool-phase.ts + stream-phase.ts files; smoke #6
(fault injection) verifies the fall-through path marks the synth message
status='failed' and yields cleanly to the recursive turn.
## Truncation-aware extraction
codecontext's wrapper inline-truncates results at 32k chars. Without the
expansion step, the top-N file selection only saw the alphabetical head
of the codebase (apps/booterm/dist/*) and auto-fetched the wrong sources.
The pipeline now calls in-process readTruncation(outputPath) before
extracting referenced files, so top-N selection sees the full 80k+ char
output. The 32k truncated head still ships to the synth model — the
expansion is reference-extraction-only, preserving the token-budget
contract. Graceful degradation on readTruncation null/throw: log warn,
fall back to the truncated head.
## Schema deviation from dispatch
The dispatch claimed no schema migration was needed for the new
'synthesis' part kind. Reality: message_parts.kind has an explicit
CHECK constraint (schema.sql:54) that would reject the new value. Added
a DROP CONSTRAINT IF EXISTS + DO $$ pg_constraint idempotency-guarded
re-add matching the CLAUDE.md migration pattern. The inline CREATE TABLE
constraint also updated so fresh installs land with the extended enum.
## User-abort marks synth-message failed
Deviation from review-time spec ("user-abort path does NOT mark the
message failed"). The outer abort handler in error-handler.ts operates
on the parent turn's assistantMessageId, not the new synth row that
runSynthesisPass created. Without explicit marking, the synth row would
sit in status='streaming' until the 5-min stale-streaming sweeper
(v1.13.1-cleanup-bundle), tripping the frontend's 60s no-token-activity
banner in the meantime — exactly the UX bug class the v1.13.1 sweeper
was added to handle. Marking failed on every catch path (including
user-abort) closes the gap. Cost: one extra DB write + one publish on
the rare user-abort-during-synth path.
## Race-safe synth-tool capture
tool-phase.ts uses synthEntries: Array<{tc, output, error?}> with
per-callback push under Promise.all. find() picks the first non-error
entry by call-order (toolCalls array index). Multiple synth-tools in
one batch are uncommon but handled deterministically.
## Roadmap rebase
Updated boocode_roadmap.md retrospective section + cleanup-order tracker
+ schema-changes summary to use the new vMAJOR.MINOR.PATCH-slug tag
names per the 2026-05-22 retag (CHANGELOG.md is the canonical record).
v1.13.15 listed as "this batch, tag pending"; a one-line follow-up
commit will remove that qualifier after the tag lands.
Surfaces per-tool prompt/completion-token rolling averages in
AgentPicker for at-a-glance agent-cost hints. Implementation is a
SQL view on top of messages_with_parts plus a read endpoint and
AgentPicker tooltip extension. No new write site; all source data
already lands via the existing tool-phase.ts:94-95 / error-handler.ts:
109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage:
true fix made non-NULL.
(1) schema.sql — new tool_cost_stats view. Window-functions over
messages_with_parts.tool_calls with LATERAL jsonb_array_elements.
Attribution: equal split — multi-tool turn divides tokens N-ways;
the 100-call rolling mean absorbs split noise. Filters: status=
'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude
failed turns and sentinels respectively; tool_calls IS NOT NULL is
defense-in-depth since sentinels are role='system' rows. CREATE OR
REPLACE means schema apply is idempotent.
(2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats
returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_
completion_tokens computed at read time (sum / n_calls). Sorted by
tool_name ASC. No pagination — ≤30 tools.
(3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests
keyed off DATABASE_URL env var. Tests skip gracefully when unset
(no-DB default). beforeAll applies the schema via sql.unsafe(read
FileSync(schema.sql)) for self-contained runs. Helper insertAssistant
Turn shared across cases. Covers: empty state, single-tool attribution,
multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion,
parts-authoritative read via messages_with_parts, failed/sentinel
exclusion.
(4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools.
costStats() method binding.
(5) AgentPicker.tsx — fetch costStats on mount, compute per-agent
sum-of-means across whitelisted tools, render muted cost line below
description: "~5.2k prompt / 280 completion · 6/8 tools · last call
3h ago". Skips line entirely when no tool history; preserves existing
native title= for layout backward-compat. formatK/formatAgo colocated.
Tests: 202/202 pass (195 prior + 7 new view-integration). Server +
web tsc clean.
Smoke: schema applied cleanly; GET /api/tools/cost_stats returns
canonical JSON; view + endpoint agree. Single-row result expected
given the v1.13.1-A → v1.13.7 NULL latent regression window; new
traffic populates organically.
Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both
match. View vs table decision documented in handoff_v1.13.10_per_
tool_cost.md (rollback-safe, microsecond-fast at BooCode scale).
~270 LoC across 8 files (5 modified + 3 new).
- message_parts.hidden_at timestamptz column (NULL by default) with a
partial index on (message_id) WHERE hidden_at IS NULL for the common
visible-parts filter.
- messages_with_parts view changed from COALESCE(parts, legacy) to
CASE WHEN EXISTS(any parts of kind) THEN visible-parts ELSE legacy.
COALESCE would have leaked hidden parts back via the legacy fallback
when every part was pruned (smoke caught it pre-commit). The CASE
distinguishes "no parts at all → fall back to legacy column for
pre-v1.13.0 history" from "all parts hidden → return null/empty so
the row drops out of the model payload" exactly.
- prune.ts: scans tool_result parts newest-first, protects the last 40k
tokens (PROTECTED_TOKENS), marks older candidates hidden when their
combined estimate clears 20k (PRUNE_TRIGGER_TOKENS — equal to
COMPACTION_BUFFER from v1.11.0, so a successful prune is exactly the
budget the summary path would have freed). Stops at chats.tail_start_id
so it doesn't double-erase across the last summary boundary. Pure
decision helper selectPruneTargets exported separately for unit tests.
- Wired into maybeFlagForCompaction: prune runs synchronously when
overflow is detected; if it freed >= PRUNE_TRIGGER_TOKENS, the
needs_compaction flag is NOT set and the (expensive) summary inference
call is skipped this turn. The next turn's overflow check re-evaluates
from scratch.
- 6 new unit tests in prune.test.ts cover: empty input, protection-only
(no candidates), candidates below trigger, candidates above trigger,
candidates straddling a summary boundary, exactly-protection-tokens.
179 tests total (was 173).
Smoke verified post-rebuild:
- \\d message_parts shows hidden_at + partial index.
- View definition shows AND p.hidden_at IS NULL filters on all three
subselects.
- Synthetic hide-then-restore confirmed the view drops the tool_result
jsonb to null when its only part is hidden, and restores when un-hidden.
- EXPLAIN ANALYZE on the 42-message stress chat: 0.325ms (faster than
v1.13.1-B's 1.018ms — EXISTS short-circuits cleanly for the common
no-parts case).
- Normal turn (plain text prompt) completes unaffected.
Closes a v1.11.0 design item that was scoped but never implemented. With
v1.13's parts table the prune is dramatically cheaper to write — pre-parts
it would have meant editing JSON blobs in-place; now it's a hidden_at
flag and a view subselect.
Four independent items, all owed from prior dispatches.
- statement_timeout at the database level via:
ALTER DATABASE boocode SET statement_timeout = '30s';
Applied operationally; documented as a comment at the top of schema.sql
(ALTER DATABASE can't run inside a DO block, so it's not idempotent
inside applySchema). Re-apply after a volume reset.
- Tool registry alpha-sorted at module load. llama.cpp's prompt cache
hits on byte-identical prefixes; any reordering of the tool list near
the top of the system prompt would invalidate every cached turn.
Single-source sort at the ALL_TOOLS export so toolJsonSchemas() and
TOOLS_BY_NAME inherit the order automatically. New tools.test.ts
asserts the invariant; total tests 173 (was 172).
- Periodic in-process stuck-row sweeper. Runs every 60s, marks
'streaming' rows older than 5 minutes as 'failed', and publishes
chat_status='idle' on the user channel so the UI dot drops without a
refresh. Closes the mid-session crash UX gap; the v1.12.1 boot sweep
only fires once at startup, so sessions used to stay stuck until next
reboot. setInterval cleaned up via app.addHook('onClose'). Mirrors
handleAbortOrError's publish pattern.
- experimental_repairToolCall wired through AI SDK v6 streamText. Pass-
through implementation: log + return the original toolCall so the
stream keeps going. executeToolPhase's existing error paths (unknown
tool name → 'unknown tool: X' result; zod-reject → 'tool X rejected
— field: required') already surface bad calls to the model; the value
here is preventing the AI SDK from THROWING on parse errors and
killing the whole stream. Owed since v1.13.1-A.
Smoke verified:
- statement_timeout = '30s' confirmed via SHOW.
- Tool path normal flow intact (list_dir prompt → tool_call → result
→ final assistant). No malformed tool calls in the test run; repair
log will surface them when qwen3.6 actually emits one.
- Alpha order verified at runtime via the dist bundle: match: true.
- Sweeper logic not traffic-tested (no stuck rows to find), but the
SQL UPDATE + broker.publishUser pattern is identical to handleAbort
and the boot sweep — synthesis-only verification.
- schema.sql: new messages_with_parts view. tool_calls aggregates parts
with kind='tool_call' as a jsonb array of {id, name, args}; tool_results
picks the single sequence=0 part with kind='tool_result' as a jsonb
{tool_call_id, output, truncated, error?}. COALESCE against the legacy
jsonb columns means pre-v1.13.0 history (no parts rows) still reads
correctly via the fallback, and fresh inserts (where parts dual-write
follows the row INSERT) hit the legacy columns until the parts land.
- reasoning_parts column added to the view but not selected by any caller
yet — v1.13.1-C extends the Message type and pulls it into the model
payload alongside the type extension.
- Read sites switched to FROM messages_with_parts:
- routes/chats.ts:427 (chat history GET)
- routes/messages.ts:95 (session history GET)
- routes/ws.ts:27 (WS snapshot on session connect, resume path)
- services/inference/payload.ts (loadContext for model assembly)
- services/compaction.ts (compaction's payload assembly)
- chats.ts:394 (discard_stale UPDATE RETURNING) unchanged — UPDATEs target
messages directly and the returned shape is for a freshly-modified row
where the legacy column is dual-written and correct.
- messages.ts:478/549 (ask_user_input correlation) intentionally not
migrated — those query a different shape, ported in v1.13.1-C.
- Writes still target `messages` directly; the view is read-only.
Smoke verified against the live container:
- Equivalence: 5/5 messages with both legacy column and parts row return
identical tool_calls jsonb between FROM messages and FROM messages_with_parts.
- Perf: EXPLAIN ANALYZE on the 42-message stress chat returns in ~1ms
(50ms threshold). Bitmap Index Scan on message_parts_msg_seq_idx
carries the parts lookups.
- API contract: GET /api/chats/:id/messages returns identical
{id, name, args} tool_calls and {tool_call_id, output, truncated, error}
tool_results shapes to frontend consumers — no UI changes needed.
- Inference path: sent a view_file prompt; assistant turn 1 emitted the
tool_call, tool message captured the result, follow-up assistant turn
read the result back via loadContext (now view-backed) and answered
correctly. End-to-end loop intact.
v1.13.2 drops the dual-write + the JSON columns + simplifies the view
to just SELECT FROM message_parts.
Adds a granular message_parts table (one row per text/tool_call/tool_result
chunk) without changing any read path. Old messages.content / tool_calls /
tool_results columns remain authoritative for v1.13.0; this dispatch is
write-only mirroring so the AI SDK migration in v1.13.1 can flip read
authority without a backfill window.
Schema:
CREATE TABLE message_parts (id, message_id FK ON DELETE CASCADE,
sequence int, kind text CHECK (text|tool_call|tool_result|reasoning|step_start),
payload jsonb, created_at, UNIQUE (message_id, sequence))
New module services/inference/parts.ts with two pure derive helpers
(partsFromAssistantMessage, partsFromToolMessage) and insertParts that
fan-outs a multi-row INSERT via postgres-js.
Wired dual-write at every site that writes tool_calls or tool_results:
- tool-phase.ts: assistant finalize UPDATE, executed-tool UPDATE,
ask_user_input sentinel UPDATE
- messages.ts answer flow: DELETE pending tool_result part + INSERT
answered one inside the existing sql.begin
- skills.ts: synthetic assistant + tool INSERTs both inside existing tx
- chats.ts fork: CTE clones parts via ROW_NUMBER pairing (source→dest
message id mapping in one statement, no N+1)
- error-handler.ts finalizeCompletion: text part for plain text-only
assistant turns
Deviation: tool-phase.ts finalize UPDATEs and finalizeCompletion text-part
write are not wrapped in fresh sql.begin transactions. Safe in v1.13.0
because JSON columns are authoritative for reads. v1.13.1 must wrap these
sites before flipping read authority — TODO comments added at each
unwrapped site referencing v1.13.1.
Tests: 8 new unit tests for the derive helpers in
services/__tests__/parts.test.ts. Existing 162 tests untouched. 170 total.
- handleAbortOrError now writes status='cancelled' on user stop; rows
no longer stuck 'streaming' forever
- Drop stale messages_status_check constraint (only messages_status_chk
remains, allowing 'cancelled' via TS MESSAGE_STATUSES)
- Remove detectSameNameLoop and DOOM_LOOP_SAME_NAME_THRESHOLD (added
during 2026-05-21 debugging spike, never fired in any real run,
existing detectDoomLoop covers actual failure modes)
- Remove 12 ctx.log.info diagnostic markers added during the same
spike (verbose for production)
- Bundles workspace pane sync + status indicator overhaul +
startup hung-row sweep landed earlier in v1.12.1 work
Status indicator (StatusDot): drops the flat amber pulse for a richer set
of states — orbiting amber for streaming, spinning sky ring for tool_running,
static violet for waiting_for_input, plus the existing idle/error. Backend
chat_status frame widens from 'working|idle|error' to discriminate streaming
vs tool execution vs paused for user input.
Workspace pane sync: pane layout moves from per-device localStorage to
server-side sessions.workspace_panes jsonb. PATCH /api/sessions/:id/workspace
broadcasts session_workspace_updated on the user channel for cross-device live
sync. Echo dedup via JSON comparison so the round-trip frame doesn't loop.
Legacy localStorage seeds the server on first hydrate, then is deleted.
Deprecated session_panes table dropped.
Resilience: startup sweep marks any stale 'streaming' message older than
5 minutes as 'failed' so v1.12.0-style hung rows clear on container restart.
useWorkspacePanes gains validatePanes() to prune dead chatId references from
saved pane state when the chat list lands.
Adds a singleton, ephemeral 'settings' pane kind to the workspace.
Opened via a new bottom-pinned button in ProjectSidebar (emits an
open_settings_pane event when a session is mounted; navigates to
/settings otherwise). Pane has three sections — Session, Project,
Theme — and a maximize toggle that hides sibling pane columns via
display:none on desktop only. Settings panes don't count toward
MAX_PANES and are filtered out of the localStorage persistence layer
so reload always restores a clean workspace.
Schema (additive):
- projects.default_system_prompt TEXT NOT NULL DEFAULT ''
- projects.default_web_search_enabled BOOLEAN NOT NULL DEFAULT false
- sessions.web_search_enabled BOOLEAN (nullable; null = inherit)
Inference resolves user_prompt = session.system_prompt.trim() ||
project.default_system_prompt.trim() — empty/whitespace at either
layer means "no override". Keeps the columns NOT NULL and matches
the existing inherit semantics.
Server routes:
- GET /api/projects/:id (new; settings pane refetches on
project_updated)
- PATCH /api/projects/:id accepts default_system_prompt,
default_web_search_enabled
- PATCH /api/sessions/:id accepts web_search_enabled (tri-state)
- POST /api/projects/:id/sessions/archive-all + GET
/api/projects/:id/sessions/open-count
- POST /api/sessions/:id/chats/archive-all + GET
/api/sessions/:id/chats/open-count
- PATCH /api/sessions/:id now broadcasts session_updated on every
successful PATCH (was rename-only). Lets SettingsPane open in
another tab pick up edits without a refetch.
Bulk-archive publishes one session_archived / chat_archived frame
per affected id so useSidebar's existing reducer cases handle them
incrementally — no new frame type, no payload widening.
ModelPicker refactored: shared ModelList inside a responsive shell.
Desktop = labeled trigger + DropdownMenu, mobile = icon-only Cpu
button + BottomSheet. Header in Session.tsx drops the pill wrap on
mobile since the new trigger is the visual.
ChatInput gains an icon-only '+' DropdownMenu next to AgentPicker
when sessionId + webSearchEnabled props are provided. One item for
now — Web search — with a checkmark reflecting the stored value
(true), not the effective one. Click PATCHes the override; to
restore inherit-from-project the user opens SettingsPane.
ThemePicker lifted out of pages/Settings.tsx into a reusable
component. The standalone /settings route is now a thin wrapper
that mounts <ThemePicker /> with a Back button on top
(navigate(-1) with fallback to '/'); the SettingsPane Theme tab
renders the same picker bare.
Project section delete-flow removed (button + confirm dialog +
handler). Replaced with "Archive all sessions" using the same
two-step count → confirm → fire pattern as "Archive all chats" in
the Session section. api.projects.remove() stays in the client
because useProjects.ts still uses it.
Hand-rolled Switch primitive in SettingsPane (no shadcn switch in
the project; spec said no new deps). Section nav is plain buttons
(no shadcn Tabs).
Adds 18 preset themes (16 dual-mode + 2 light-only) selectable from
a new /settings route. Persists per-user via the existing key-value
settings table — no schema refactor. Default on first load is
obsidian dark.
Storage: two new seeded keys (theme_id, theme_mode) inserted
idempotently from schema.sql. PATCH /api/settings tightens validation
with a discriminated branch — theme_id must be one of the 18
whitelisted ids, theme_mode ∈ {dark,light,system}, anything else
rejects 400. Other keys pass through the loose record schema.
CSS layer: 18 files in apps/web/src/styles/themes/, each declaring
.theme-<id> (light) and .theme-<id>.dark (dark) — except ivory and
chalk which are light-only. Anchor-to-token mapping per spec §3.
--destructive stays red across all themes. --radius unchanged at
0.625rem (spec parenthetical was about "not per-theme", not a
specific value swap).
Frontend: lib/theme.ts owns THEMES, applyTheme(), setTheme(), and
useTheme() — module-singleton with optimistic PATCH + revert on
failure (mirrors useChatStatus / useSidebar pattern). Settings.tsx
renders a 3-col (md) / 2-col (mobile) grid of shadcn Card swatches
with a Dark/Light/System radio group on top. App.tsx mounts
useTheme() at AppShell top and wires the /settings route.
index.html ships a pre-React FOUC script that reads localStorage
'boocode.theme' and stamps the className on <html> before any
paint. Stripped two pre-existing dark-mode lock-ins (AppShell's
hardcoded 'dark' className and body's neutral-950/100 tailwind
utilities) that would have fought theme tokens.
Light-only + dark request → falls back to obsidian dark in three
places: lib/theme.ts effectiveThemeId(), the FOUC script, and the
picker's "Light only" badge. No inline message; matches spec §8
decision 1.
shadcn primitives card and radio-group installed via shadcn CLI
(no hand-rolling). card.tsx and radio-group.tsx are the only ui/
additions.
Old hardcoded MAX_TOOL_LOOP_DEPTH=15 replaced by per-agent
max_tool_calls (1-100, AGENTS.md frontmatter) with defaults: 30 for
read-only-only agents, 10 for agents that include any non-read-only
tool, 15 for raw chat. When the loop hits cap, fire one final summary
call with tools disabled, stream the wrap-up into the in-flight
assistant message, then insert a system sentinel with
metadata.kind='cap_hit'. The sentinel renders an amber bubble with a
Continue button (latest sentinel only) that POSTs to a new
/api/chats/:id/continue route to extend. Hard ceiling: 3 cap-hits per
chat (2 continues max) — third sentinel reports can_continue=false.
Error frames carry a machine-readable reason code alongside human
error text. Failed messages persist the reason via
metadata.kind='error' so the bubble renders specifics on reload (WS
error frame is one-shot).
Tool call UI rewired: ToolCallLine renders inline (↳ name args
spinner/check/✗, expand-on-tap for args+result); ToolCallGroup
collapses 3+ consecutive same-tool runs into a compact card.
MessageList owns a three-pass pre-render (flatten + fold tool
results onto matching runs by id + group same-tool runs + number
sentinels). MessageBubble drops tool rendering and adds the
sentinel / error-reason branches. ToolCallCard deleted.
Roadmap follow-up logged: add explicit max_tool_calls: 30 to the 6
agents in /data/AGENTS.md and /opt/boocode/AGENTS.md post-ship for
discoverability (defaults handle behavior identically).
Six builtin defaults (Code Reviewer, Debugger, Refactorer, Architect,
Security Auditor, Prompt Builder) with no model field so session.model
wins. Project root AGENTS.md parsed on demand with mtime cache; when
present, only its agents are shown. sessions.agent_id resolves per turn
into effective system prompt, temperature, and a tool whitelist applied
in inference. AgentPicker mounts in the ChatInput toolbar; SettingsDrawer
agent surface deferred to Batch 7.
- Fork: POST /api/chats/:id/fork creates a new chat in the same session,
copies messages up to target (status=complete) with row-offset
clock_timestamp() for stable ordering. Client emits open_chat_in_active_pane
event; Workspace opens it in the active pane. No maybeAutoNameChat on forks.
- Delete: DELETE /api/chats/:id/messages/:message_id with 409 if the chat is
currently streaming. Cascading-forward delete (created_at >= target).
MessageBubble Trash button + confirm Dialog.
- Header: Projects -> Project -> Session breadcrumb, model badge pill,
inline session rename, active file path via new useActivePane() hook.
Server now publishes session_renamed on PATCH /api/sessions/:id;
client-side dup emit removed from Session.tsx.
- Housekeeping: NOW() -> clock_timestamp() in schema.sql defaults, dead
PaneTab.tsx and panes/PaneShell.tsx removed, session_panes backfill
INSERT removed (CREATE TABLE retained), Tailnet trust comment near
app.listen().
Feature 1 — Tab close menu (pure local pane state, no API):
- ChatTabBar context menu: Rename / sep / Close / Close others / Close to right / Close all
- Workspace bulk-tab primitives: closeOtherTabs, closeTabsToRight, closeAllTabs (manipulate panes[].chatIds, no fetch)
- Drop in-bar Delete; landing card's name-typed Delete is the canonical destructive path
Feature 2 — Chat archive + delete:
- chats.status vocabulary aligned with projects ('open' | 'archived'); DROP old inline CHECK, UPDATE 'closed' → 'archived', ADD new named chats_status_chk
- POST /api/chats/:id/archive (204) + POST /api/chats/:id/unarchive (200) + GET /api/sessions/:id/chats?status=archived; DELETE publishes chat_deleted; PATCH simplified to name-only
- 3 new WS frames: chat_archived, chat_unarchived, chat_deleted (renamed from chat_closed)
- Same dedup discipline: server-only publish, no local sessionEvents.emit in client
- SessionLandingPage: right-click ContextMenu (Open / Rename / Archive / sep / Delete-destructive), inline rename, archive confirm dialog, delete dialog with name-typed Input gated until typed text === chat.name, Archived chats collapsible section with Restore
- Card-level Archive + Delete icon buttons reusing the same dialog state setters; stopPropagation on both so card click still opens the chat; archived cards keep only Restore
UX — chat content width cap:
- ChatPane content (MessageList, queue chips, stop button, ChatInput) wrapped in inner max-w-[1000px] mx-auto w-full so messages center; outer border-t / scroll containers stay full-width so pane chrome and backgrounds remain edge-to-edge
- No new deps, no media queries (narrow viewports collapse to width naturally)
Server:
- projects.status + projects.gitea_remote (additive) with CHECK ('open','archived')
- GET /api/projects?status=archived; PATCH /api/projects/:id (rename);
POST /api/projects/:id/archive | unarchive; POST /api/projects/create
- POST /api/projects ON CONFLICT (path) DO UPDATE SET status='open': re-add
of archived path restores existing row (preserves id + FKs); already-open
path returns 409. Detected-repos picker now excludes only status='open'.
- New gitea.ts (createGiteaRepo + GiteaRepoExistsError) and
project_bootstrap.ts (sanitize name, mkdir under PROJECT_ROOT_WHITELIST,
git init -b main + first commit with -c user.name/email per-command, optional
Gitea repo create + remote add + push; all via execFile, no shell).
- 3 new user-stream frames: project_archived, project_unarchived, project_updated.
- sidebar.ts now selects path + gitea_remote and filters status='open'.
- Gitea env added to config.ts (GITEA_BASE_URL, GITEA_USER, GITEA_TOKEN,
GITEA_SSH_HOST).
- docker-compose.yml /opt mount flipped to rw so create-project can mkdir.
- auto_name.ts gate relaxed from `!== 1` to `< 1` (fires on every turn while
chat name is empty, not only the first).
Web:
- ProjectSidebar: project rows use proper Radix ContextMenu; items Rename /
Archive / Open in Gitea. Inline rename, archive confirm dialog.
Removed obsolete handleRemove + DropdownMenu hack.
- Home: Add-existing + Create-new buttons; collapsible Archived Projects
section with Restore.
- New CreateProjectModal: name + live folder preview, commit msg, Private/
Public radio, create-Gitea-remote checkbox, toast on success/warnings.
- New projectUrls.ts giteaUrlFor() — uses gitea_remote when present,
falls back to convention URL.
- 3 new event types in sessionEvents.ts with idempotent useSidebar handlers.
- SidebarProject extended with path + gitea_remote so Open-in-Gitea can
resolve without a separate fetch.
Session 1:N Chat data model with backfill. Workspace switches to client-side
multi-tab pane management. Right-rail file browser with float-over viewer and
click-drag line selection replaces FileBrowserPane. Adds /compact streaming
summarizer (respects compact markers in context builder), force-send (cancels
in-flight, persists partial as 'cancelled', awaits cancellation completion via
deferred Promise + 5s timeout), message queue, stop generation, chat
auto-rename, session archive/unarchive with Closed Sessions section on repo
landing page. CHECK constraints on sessions.status, messages.role,
messages.status with KEEP IN SYNC comments tying to MESSAGE_ROLES /
MESSAGE_STATUSES const arrays. Deletes dead pane routes/hook and the
api.panes.* client block.
Sessions created before Batch 3 have no rows in session_panes, so the
Workspace renders "No panes" on first open. Idempotent INSERT inserts
a default chat pane at position 0 for any session without one. NOT
EXISTS guard makes the statement a no-op after the first run.
- Restructure Pane as a tagged union over kind so checking kind narrows state
- Rename session_panes_session_idx -> idx_session_panes_session for naming
consistency with idx_sessions_project, idx_messages_session
Adds the session_panes table, Pane/PaneState/PaneCreate/PaneUpdate types,
UserStreamFrame discriminated union, and extends SidebarSession with
project_id (also added to the sidebar SELECT).
Schema (idempotent):
ALTER TABLE sessions ADD COLUMN IF NOT EXISTS updated_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp();
The column already exists from v1 (DEFAULT NOW()); ALTER is a no-op kept for
self-documentation. Explicit clock_timestamp() bumps now run wherever the
column actually matters — see services/inference.ts and routes/sessions.ts.
Backend updated_at maintenance:
- services/inference.ts: after each terminal status UPDATE on the assistant
message (failure / tool-call complete / clean complete), also bump
sessions.updated_at = clock_timestamp() so the parent session jumps to
the top of recency ordering on every assistant turn.
- routes/sessions.ts PATCH: NOW() → clock_timestamp() for consistency.
New endpoint GET /api/sidebar (routes/sidebar.ts):
{ projects: [{ id, name, recent_sessions[≤6], total_sessions }] }
One outer query for projects ordered added_at DESC; per-project Promise.all
over (recent_sessions LIMIT 6 ORDER BY updated_at DESC) and COUNT(*)::int.
Outer Promise.all parallelizes across projects. Two queries per project; the
composite idx_sessions_project(project_id, updated_at DESC) serves the inner
query. Auth via the global Remote-User hook. types/api.ts gains
SidebarSession / SidebarProject / SidebarResponse; index.ts wires the route.
Frontend foundations:
- api/types.ts mirrors the three sidebar interfaces.
- api/client.ts: api.sidebar.get() → Promise<SidebarResponse>.
- hooks/sessionEvents.ts: five-variant union — added project_created,
project_deleted, session_created, session_deleted. session_renamed
unchanged from Batch 1. Bus internals untouched (still a dumb
Set<Listener>, no validation).
New hooks/useSidebar.ts (module-singleton):
- Module-scope sharedData/sharedError/sharedLoading/initialized/fetchInFlight/
subscribers; a single sessionEvents.subscribe at module-top-level mutates
sharedData via an exhaustive switch over the five events. load() dedupes
parallel calls via fetchInFlight. Hook is a thin subscription layer: any
number of mount points share state and the very first one triggers the
single GET /api/sidebar. Subsequent mounts read cached state synchronously
(no skeleton flash). Public shape: { data, error, loading, retry }.
- Lift to module-scope was driven by the "ONE sidebar request on mount"
spec promise — both ProjectSidebar AND Home consume the hook now, and
they share the singleton.
Frontend UI:
- components/ProjectSidebar.tsx (rewrite, 234 lines): per-project chevron +
folder + name; chevron toggles expand, name navigates /project/:id.
Expanded → ≤5 sessions with MessageSquare + name + muted relTime()
timestamp. "View all (N)" link when total_sessions > 5, routing to
/project/:id. Active session row uses bg-sidebar-accent. Active project
always renders expanded (URL-derived: direct /project/:id or scan of
recent_sessions for /session/:id). Expanded ids persisted in
localStorage['boocode.sidebar.expanded'] with try/catch on both read and
write. Loading shows 4 muted-pulse skeleton blocks; empty + error +
retry button; error toast guarded by ref so it fires once per distinct
message and resets on recovery. Remove path calls api.projects.remove
directly + explicit project_deleted emit (replaced the prior
useProjects() dependency which fired a redundant /api/projects on
mount, violating the one-fetch promise).
- components/AddProjectModal.tsx: captures returned Project and emits
project_created before onAdded() / onOpenChange(false).
- pages/Project.tsx: emits session_created after create(); trash button is
now async with try/catch — emits session_deleted on success,
toast.error on failure.
- pages/Home.tsx: switched from useProjects to useSidebar so loading /
fires exactly one /api/sidebar, with no parallel /api/projects.
- pages/Session.tsx: manual inline rename now emits session_renamed on
the success path so the sidebar updates live without a refresh (also
fixes the regression made visible by Batch 2 — the sidebar caches
session names where the project page used to re-fetch on every visit).
useProjects.ts retains a project_deleted emit inside remove for any future
caller; no live consumer uses it (ProjectSidebar calls api.projects.remove
directly). Acknowledged dead code, to be removed in the next cleanup pass
along with three remaining NOW() → clock_timestamp() consistency flips at
routes/messages.ts:70, routes/messages.ts:127, and services/auto_name.ts:144.
Cross-tab parity for session_created/session_deleted/project_created/
project_deleted is deferred — those events are tab-local in Batch 2 per
spec. session_renamed continues to propagate cross-tab via the existing
WS frame from Batch 1.
Four features land together on this branch:
1. Markdown rendering — assistant messages go through react-markdown +
remark-gfm. Fenced code blocks render via existing CodeBlock (with copy
button); inline `code` is styled inline. User messages stay plain text.
No raw HTML (no rehype-raw).
2. Per-message Copy + Regenerate. New endpoint
POST /api/sessions/:id/messages/:message_id/regenerate validates the
target (404/400/409), atomically deletes the target plus any later
messages in the session, inserts a fresh streaming assistant row, and
enqueues a normal inference run. The DELETE bound uses a SQL subquery
(`created_at >= (SELECT created_at FROM messages WHERE id = $1)`)
instead of a JS round-trip so postgres TIMESTAMPTZ µs precision is
preserved — otherwise sub-ms clock_timestamp() differences between the
user row and the assistant row collapsed to the same JS Date, pulling
the triggering user message into the >= bound. New `messages_deleted`
WS frame so already-connected clients prune the stale tail without
needing a full snapshot resend.
3. tok/s + ctx counter. Five new nullable message columns: tokens_used,
ctx_used, ctx_max, started_at, finished_at. started_at is set right
before the OpenAI call in services/inference.ts (not in the route, not
in the frame handler); finished_at + tokens_used + ctx_used + ctx_max
are committed in the same UPDATE that flips status to 'complete'. The
inference request now opts into stream_options.include_usage so the
final chunk carries usage; defensive parsing also picks up timings.n_ctx
when llama.cpp emits it (currently absent for our llama-swap models, so
ctx_max stays NULL and the UI just shows `<used> ctx`). message_complete
frame extended with tokens_used / ctx_used / ctx_max / started_at /
finished_at / model. Frontend StatsLine in MessageBubble computes tok/s
client-side from the timestamps and renders muted mono text below the
body of completed assistant messages.
4. AI chat naming after the first turn. Backend services/auto_name.ts
runs via setImmediate after the top-level inference resolves; it
checks that there is exactly one completed assistant message and that
the session has not been user-renamed (`name IS NULL OR name = '' OR
name = 'New session'`), then fires a single non-streaming chat
completion with the spec prompt. Qwen3 chat templates emit chain-of-
thought into reasoning_content and burn the entire max_tokens budget
without producing visible output, so the request includes
`chat_template_kwargs: { enable_thinking: false }` and max_tokens=30.
Title is trimmed, quote-stripped, "Title:" prefix dropped, and
truncated to 60 chars before a guarded UPDATE on sessions.name. New
`session_renamed` WS frame propagates to the open session view
directly and to the project's session list via a tiny module-scope
event bus (apps/web/src/hooks/sessionEvents.ts) — kept dumb: one event
type, two methods, no library.
Cleanups: dropped the now-unused splitCodeBlocks export from CodeBlock.tsx
(react-markdown supersedes it), and added a long-form NOTE in auto_name.ts
documenting the enable_thinking + max_tokens pattern for any future Qwen-
family non-streaming utility calls (planned: fork-message, agent-routing,
web-search summarization).
Schema bootstrap remains idempotent (ADD COLUMN IF NOT EXISTS). Auth,
broker, clock_timestamp() conventions, and zod validation all unchanged.