v1.13.11-a: WS frame schemas + frontend receive validation

First half of the WebSocket-frame-typing batch (split per recon — total scope was ~535 LoC, larger than the roadmap's ~300 estimate, so the server-side publish-site conversion lands separately in v1.13.11-b). Phase A scope: (1) apps/server/src/types/ws-frames.ts (NEW) — Zod schemas for all 27 wire-format WS frame types. Discriminated union (WsFrameSchema) plus KNOWN_FRAME_TYPES const for diagnostic lookup. UUIDs are z.string(). uuid(); model-emitted tool_call_id stays z.string().min(1) since OpenAI- compatible APIs emit "call_<random>" not UUID. Per-kind payload narrowing (tool args, message_parts payloads) intentionally stays z.unknown() — frame-level drift detection is the goal; deep payload validation is follow-up work. (2) apps/web/src/api/ws-frames.ts (NEW) — byte-identical mirror of the authoritative server file. No path alias from web→server in the existing tsconfig setup; sync-by-hand was chosen over a new packages/shared/ dir. A ws-frames.test.ts test asserts the two files match. (3) apps/server/src/services/broker.ts — adds publishFrame() and publishUserFrame() methods to the Broker interface. Both validate via WsFrameSchema and fail-closed: log + drop on invalid. createBroker now accepts an optional FastifyBaseLogger so validation failures land in the pino stream (with console.error fallback for unit tests). The existing publish() / publishUser() raw methods stay legal — they get converted to the typed variants in v1.13.11-b. (4) apps/web/src/hooks/useSessionStream.ts + useUserEvents.ts — wrap ws.onmessage with WsFrameSchema.safeParse. Fail-closed: invalid frames log + return without dispatching. Hand-maintained WsFrame and SessionEvent types stay in place; one cast bridges Zod-typed → narrowed shape (Zod uses OpaqueObject for nested Message[] / WorkspacePane[] etc., which are dev-time-narrowed via the existing hand-maintained types). (5) apps/web/package.json — adds zod ^3.23.8 as a direct dep. Was a transitive dep via ai-sdk / postgres; promotion makes the import legal. (6) Tests: 15 new in ws-frames.test.ts covering happy-path per major frame type, drift-catchers (unknown type, invalid enum, non-UUID, negative tokens), parts-authoritative read variants, the mirror-file diff check, and four broker fail-closed scenarios. 219/219 server tests pass (was 204; +15 new). Two recon corrections to the dispatch brief, both flagged before implementation: - No 'parts_appended' frame exists. The brief assumed one; the codebase reads parts via the messages_with_parts view after message_complete triggers a refetch. MessagePartSchema is therefore unused this batch. - No 'tool_running' frame exists. The brief listed it as standalone; it is in fact a 'chat_status' variant ({ status: 'tool_running' }), already covered by ChatStatusFrame. Smoke: clean container boot, no validation errors in the server log. Real production frames pass validation (the schemas were derived from the existing hand-maintained types in api/types.ts and sessionEvents.ts). v1.13.11-b will follow immediately: convert all ~85 raw broker.publish / ctx.publish call sites across 11 server files to publishFrame / publishUserFrame. Mechanical edit; the wiring done here means the diff in -b is just the call-site swaps. ~310 LoC across 9 files (4 new + 5 modified).
v1.13.15-tools: tiered tool loading via BOOCODE_TOOLS env var
2026-05-22 15:48:32 +00:00 · 2026-05-22 14:59:01 +00:00 · 2026-05-22 14:54:17 +00:00 · 2026-05-22 14:52:37 +00:00 · 2026-05-22 14:42:09 +00:00 · 2026-05-22 14:07:11 +00:00
81 changed files with 7676 additions and 2190 deletions
--- a/.env.example
+++ b/.env.example
@@ -10,3 +10,12 @@ POSTGRES_PASSWORD=CHANGE_ME
 # Internal Tailscale address that bypasses Authelia. Override if you
 # point BooCode at a different SearXNG instance.
 SEARXNG_URL=http://100.114.205.53:8888
+
+# v1.13.15-tools: BOOCODE_TOOLS narrows the tool whitelist sent to the LLM.
+# Unset (default) → all tools (~21k schema). Useful primarily for single-purpose
+# sessions where the model only needs read-only filesystem access.
+#
+# core      → view_file, list_dir, grep, find_files                       (~2k)
+# standard  → core + web_*, git_status, all 8 codecontext_* tools         (~10k)
+# all       → every tool in ALL_TOOLS                                     (~21k)
+# BOOCODE_TOOLS=all
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,7 @@
 node_modules
 dist
 .env
+CLAUDE.local.md
 *.log
 .DS_Store
 .vite
--- a/BOOCHAT.md
+++ b/BOOCHAT.md
@@ -1,7 +1,5 @@
 # BooChat

-You are the assistant running inside BooChat — a self-hosted developer chat app.
-
 ## Capabilities

 - Read-only file tools: `view_file`, `list_dir`, `grep`, `find_files`
--- a/BOOCODER.md
+++ b/BOOCODER.md
@@ -2,8 +2,6 @@

 > (Stub. v2.0 implementation pending. This file documents the intended contract.)

-You are the assistant running inside BooCoder — the write-capable companion to BooChat.
-
 ## Capabilities

 - Everything in `BOOCHAT.md`
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -33,7 +33,7 @@ npx tsc -p apps/web/tsconfig.app.json --noEmit  # web app specifically
 docker compose build --no-cache boocode && docker compose up -d
 ```

-Tests: `pnpm -C apps/server test` runs 23 vitest tests. No test harness on `apps/web` (adding it requires installing vitest as a new devDep). Vitest pinned to `^3` because Vite 5 / vitest 4 are incompatible. No linters configured.
+Tests: `pnpm -C apps/server test` runs the vitest suite. No test harness on `apps/web` (adding it requires installing vitest as a new devDep). Vitest pinned to `^3` because Vite 5 / vitest 4 are incompatible. No linters configured. Vitest include glob is `src/**/__tests__/**/*.test.ts` (see `apps/server/vitest.config.ts`) — tests outside `src/**/__tests__/` silently won't run; match the per-domain convention (`apps/server/src/services/__tests__/foo.test.ts`).

 ## Architecture

@@ -46,9 +46,24 @@ Tests: `pnpm -C apps/server test` runs 23 vitest tests. No test harness on `apps
 - **Zod** for request validation and config parsing.

 Key services:
- **`services/inference.ts`** — Streams LLM responses, executes tool loops (max depth 15, see `MAX_TOOL_LOOP_DEPTH`), flushes to DB every 500ms. Publishes `InferenceFrame` events through the broker.
+- **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (v1.13.0 dual-write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts`), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures.
+- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch:
+  - **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
+  - **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.
+  - **Tools have NO `execute` field.** BooCode dispatches tools in tool-phase.ts, not the AI SDK loop. Only `description` + `inputSchema: jsonSchema(parameters)` — surfacing tool-call parts via `fullStream` and stopping is what we want.
+  - **`includeUsage: true` MUST be set on `createOpenAICompatible`** in `services/inference/provider.ts`. The adapter defaults it false, omitting `stream_options.include_usage` from the request body; llama-swap then never emits the usage block and `result.usage.inputTokens/outputTokens` resolve to `undefined`. Latent regression from v1.13.1-A through v1.13.7 — every assistant row in that window has `tokens_used`/`ctx_used` NULL. Don't remove this flag during refactor.
+  - **Tool-call-only turns may emit a leading `\n` text-delta** as the assistant content. `MessageList.flatten`'s `hasText` and `MessageBubble`'s `hasContent` both `.trim()` before the length check — otherwise whitespace-only content renders an empty bubble + ActionRow between every tool call (v1.13.7 fix). `payload.ts:buildMessagesPayload` also skips `status='failed'` AND complete-but-empty (no content, no tool_calls) assistant rows to avoid "Cannot have 2 or more assistant messages at the end of the list" upstream rejections after cap-hit + Continue.
+- **AI SDK ModelMessage conversion** (`toModelMessages` in stream-phase.ts). Tool messages need a `toolName` for `ToolResultPart` — BooCode's OpenAI-shape history doesn't carry it, so a forward-scan builds a `tool_call_id → toolName` map from prior assistant `tool_calls`. Tool outputs wrapped as `{ type: 'json' | 'text', value }` matching the v6 `ToolResultOutput` union. Assistant messages with reasoning emit a `ReasoningPart` first in the content array (v1.13.1-C).
+- **`experimental_repairToolCall`** (v1.13.3) wired into `streamText` to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through implementation — logs the bad call and returns it unmodified; `executeToolPhase`'s existing zod-reject error path routes it to the model on the next turn.
+- **`chat_status` frame shape** (published via `broker.publishUser`) — `status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'` (widened from `working|idle|error` in v1.12.1). Frontend `useChatStatus` derives `idle_warm` (<30s since idle) vs `idle_cold`. `ChatThroughput` renders inline beside `StatusDot` only when streaming or tool_running, fed by 500ms-throttled `'usage'` WS frames (`completion_tokens` + `ctx_used` + `ctx_max`). The `POST /api/chats/:id/discard_stale` endpoint exists to mark a stuck-streaming row as `failed` when the frontend's 60s no-token-activity timer (`ChatPane` content-length watcher) gives up.
+- **Boot-time stale-streaming sweep** in `apps/server/src/index.ts` after `applySchema()`: any `messages.status='streaming'` older than 5 minutes flips to `'failed'`. Logs only on non-zero count. Recovers from container restart while inference was mid-stream (v1.12.1).
+- **Periodic 60s sweeper** in `apps/server/src/index.ts` (v1.13.3 + v1.13.5). Same `setInterval` runs `sweepStaleStreaming` (marks `messages.status='streaming'` older than 5 min as `failed`, publishes `chat_status='idle'` so the UI dot drops) and `cleanupTruncations` (TTL + orphan reap of tmpfs truncation files). `app.addHook('onClose')` clears the timer. No-op when nothing to reap.
 - **`services/broker.ts`** — In-memory pub/sub with two channel types: per-session (message streaming) and per-user (sidebar updates). No persistence; clients reconnect on restart.
- **`services/tools.ts`** — Four read-only file tools exposed as OpenAI function-calling schemas. All file access goes through `path_guard.ts` which resolves against project root.
+- **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) go through three guard layers: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). v1.11.8+ web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (resolved with `project.default_web_search_enabled` fallback) and filtered out of the LLM's tool schema when false. v1.13.5 truncation: when a tool slice cuts content, `services/truncate.ts` stashes the full text on tmpfs at `BOOCODE_TRUNCATION_DIR` (default `/tmp/boocode-truncations`, 0o700) keyed by an opaque `tr_<12 base32 chars>` id, and the `view_truncated_output(id)` tool retrieves it. 5MB cap (matches `view_file`'s `MAX_FILE_BYTES`), 7-day TTL, reaped by the periodic sweeper. Tmpfs path means container restart loses retrieval — acceptable, the model usually has moved on.
+- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = floor(0.85 × ctx_max)` (v1.13.9 opencode-pattern early trigger; was `ctx_max - 20k` pre-v1.13.9, which gave only 7.6% headroom at 262k and 0 budget for ≤20k contexts). **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). First inferences after a boocode boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model yet; negative cache TTL is 60s, recovers on next turn. v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
+- **`services/system-prompt.ts`** — `buildSystemPrompt` is the string-returning shim; `buildSystemPromptWithFingerprint` is the canonical impl returning `{prompt, fingerprint, drift}`. v1.13.8 instrumentation: SHA-256 of the assembled prefix is logged per `buildMessagesPayload` call (msg `prefix-fingerprint`, level=info); a `Map<sessionId, lastHash>` observer fires `prefix-drift` (level=warn) on hash change with a field-level `changed_inputs` diff. Smoke proved the prefix is byte-stable across turns in steady-state — the originally-planned `system_prompt_cache` DB table was dropped as redundant against the v1.12.0 input-layer mtime caches (BOOCHAT.md here + AGENTS.md global+per-project in `agents.ts:safeStat`).
+- **`services/inference/budget.ts`** — tool-call budgets: `BUDGET_READ_ONLY = 30`, `BUDGET_NON_READ_ONLY = 10` (forward-looking; no write tools yet), `BUDGET_NO_AGENT = 30` (v1.13.7; was 15 — every tool in `ALL_TOOLS` is read-only today, so no-agent mode shares the read-only-agent cap). Per-agent `max_tool_calls` from AGENTS.md frontmatter overrides.
+- **`messages_with_parts` view** (v1.13.1-B; `schema.sql`). Read sites that need `tool_calls` / `tool_results` / `reasoning_parts` SELECT from this view, NOT `messages` directly. `COALESCE`s parts-table rows over the legacy JSON columns, so pre-v1.13.0 history still resolves. Writes still target `messages`; the v1.13.0 dual-write into `message_parts` keeps both halves in sync. New payload-assembly code must use the view — calling `messages.tool_calls` directly will miss anything written post-v1.13.1-B if the JSON column ever drifts (and dual-write makes that easy to miss). Shapes: `tool_calls jsonb[]`, `tool_results jsonb` single object, `reasoning_parts jsonb[]` of `{text}`.
 - **`services/file_ops.ts`** — Shared file operation implementations used by both inference tools and HTTP routes.
 - **`services/auto_name.ts`** — Non-streaming LLM call to generate 4-word session titles after first assistant reply.

@@ -86,23 +101,23 @@ Font / CSS pipeline (apps/web):

 ### Multi-pane workspace

-Sessions hold 1–5 panes (chat / empty / placeholder terminal+agent). Workspace pane state is **client-side only** (localStorage key `boocode.workspace.panes.<sessionId>`); the legacy `session_panes` table and its REST endpoints are deprecated — no `/api/panes/*` routes exist. Each chat lives in at most one pane; tab strip is per-pane and tracks `chatIds[]` + `activeChatIdx`. Sessions 1:N chats; chats own messages. Tab reorder via native HTML5 drag events.
+Sessions hold 1–5 panes (chat / empty / placeholder terminal+agent). v1.12.1 moved pane state from per-device localStorage to `sessions.workspace_panes jsonb` for cross-device sync. `PATCH /api/sessions/:id/workspace` persists; `session_workspace_updated` user-channel frame broadcasts to every device watching the session. `useWorkspacePanes` debounces saves 300ms and dedups echoes by JSON string. Legacy localStorage key `boocode.workspace.panes.<sessionId>` is read once on first hydrate (one-time seed-and-delete migration when server is empty but localStorage has data); no longer written. The deprecated `session_panes` table was dropped. `validatePanes(validChatIds)` prunes panes referencing chat IDs that no longer exist (called by `useSessionChats` after the chat list fetch lands). Each chat lives in at most one pane; tab strip is per-pane and tracks `chatIds[]` + `activeChatIdx`. Tab reorder via native HTML5 drag events.

 ## Database

-PostgreSQL 16. Tables: `projects`, `sessions`, `chats`, `messages`, `settings`, `session_panes` (deprecated). Schema applied idempotently on startup via `applySchema()`. Use `clock_timestamp()` (not `NOW()`) inside transactions. CHECK constraints in place: `projects_status_chk` ('open'|'archived'), `sessions_status_chk` (same), `chats_status_chk` (same), `messages_role_chk`, `messages_status_chk` — keep in sync with the `*_STATUSES` const arrays in `apps/server/src/types/api.ts`.
+PostgreSQL 16. Tables: `projects`, `sessions`, `chats`, `messages`, `settings`. (`session_panes` was dropped in v1.12.1; workspace pane state lives in `sessions.workspace_panes jsonb`.) Schema applied idempotently on startup via `applySchema()`. Use `clock_timestamp()` (not `NOW()`) inside transactions. CHECK constraints in place: `projects_status_chk` ('open'|'archived'), `sessions_status_chk` (same), `chats_status_chk` (same), `messages_role_chk`, `messages_status_chk` — keep in sync with the `*_STATUSES` const arrays in `apps/server/src/types/api.ts`. The older anonymous `messages_status_check` (without 'cancelled') and `messages_role_check` (without 'system') were dropped in v1.12.1; only the `_chk` variants remain.

 Schema CHECK migration order when renaming allowed values: (1) `ALTER TABLE ... DROP CONSTRAINT IF EXISTS <system_name>` (inline `CREATE TABLE` checks get `<table>_<column>_check`), (2) `UPDATE` rows to new values, (3) wrap new constraint ADD in `DO $$ ... pg_constraint` guard — that block is the only way to get `ADD CONSTRAINT IF NOT EXISTS`.

-Position-shift pattern for panes (legacy `session_panes` table): negate-and-restore to avoid UNIQUE(session_id, position) collisions during reorder/insert/delete. Sentinel value -100 for the moving pane.

 ## Environment

-Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`.
+Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context), `BOOCODE_TOOLS` (`core` | `standard` | `all`, default `all`; v1.13.15-tools tier filter — ceiling, never expands an agent's whitelist).

 ## Workflow

 - Sam reviews all diffs and commits manually. Do not commit unless explicitly asked.
+- Per-batch docs live under `openspec/changes/<slug>/{proposal,tasks,design}.md`. Already-shipped batches are snapshots in `openspec/changes/archived/`. New batches follow the proposal+tasks shape; see `openspec/README.md` for the convention.
 - Deploy: `cd /opt/boocode && docker compose up --build -d` (or `docker compose build --no-cache boocode && docker compose up -d` if you suspect a layer-cache issue).
 - Git push to Gitea: `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin <branch>`. The default agent identity is rejected; the in-repo deploy key (`secrets/`, gitignored) is the working one. Transient `Connection reset by peer` retries cleanly after `sleep 5`.
 - Don't accumulate `.bak-*` files. Clean them up in the same batch or immediately after merge.
@@ -124,9 +139,16 @@ Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0
 - TypeScript strict mode. Both apps share `tsconfig.base.json`.
 - Server uses NodeNext module resolution (`.js` extensions in imports).
 - Discriminated unions for type narrowing: `Pane` (by `kind`), `SessionEvent` (by `type`), `InferenceFrame` (by `type`).
+- **Adding a new WS frame type** requires updating BOTH the server's `InferenceFrame` (loose `type:` union + optional fields in `services/inference/turn.ts`) AND the web `WsFrame` (strict discriminated union in `apps/web/src/api/types.ts`). Server publish is permissive; the frontend type is the wire-format gate. The `'usage'` frame added in v1.12.2 needed both sides; missing the web side silently drops the frame at JSON-parse.
 - shadcn primitives live in `components/ui/`. Don't modify them unless adding a new primitive.
 - `inferLanguage()` from `lib/attachments.ts` is the canonical file-extension-to-language map. `CodeBlock.tsx` keeps its own `LANG_MAP` because it also resolves markdown fence names.
 - Two UI event buses: `hooks/sessionEvents.ts` for DB-state events (chat_created, session_updated); `lib/events.ts` for ephemeral UI (`sendToTerminal`, `terminalsRegistry`). Don't merge — different subscriber lifecycles.
 - `vite.config.ts` proxy entries are order-sensitive: more-specific prefixes (`/api/term`, `/ws/term`) must come BEFORE `/api`.
 - Mobile pane URL sync (`Session.tsx`): the `?pane=<id>` effect resets `activePaneIdx` whenever `panes` changes. New-pane creation on mobile must push `?pane=` atomically — `addPaneAndSwitch` is the wrapper that does this. `addSplitPane` returns the new pane id for callers.
 - xterm.js v5 uses canvas rendering — browser doesn't see xterm's selection; the native right-click menu has no working Copy for terminal text. App keybindings (`Cmd/Ctrl-C`, `Cmd/Ctrl-Shift-C`) are the path.
+- **New tools** live in their own `services/<name>.ts` file (see `web_search.ts`, `web_fetch.ts`) — exports a pure `executeFoo(input, ...deps)` for direct test access plus a `ToolDef` wrapper that `loadConfig()`s its real dependencies. Register the ToolDef in `tools.ts` `ALL_TOOLS` (and `READ_ONLY_TOOL_NAMES` if applicable). Inject `fetcher: typeof fetch = fetch` rather than `vi.spyOn(globalThis, 'fetch')` — cleanup is simpler and the production call site stays unchanged.
+- **Sentinels** are `role='system'` rows with structured `metadata.kind` (`cap_hit`, `doom_loop`). UI-only — `buildMessagesPayload` strips them via `isAnySentinel` so the LLM never sees them. A new kind requires arms in `MessageMetadata` in BOTH `apps/server/src/types/api.ts` AND `apps/web/src/api/types.ts`, plus a render branch in `apps/web/src/components/MessageBubble.tsx`.
+- **ReadableStream test stubs** use `pull()` (not `start()`) so chunks are produced lazily — `start()` enqueues everything and calls `controller.close()` before the consumer reads, so a subsequent `reader.cancel()` finds the stream already closed and the `cancel()` callback never fires. Also provide MORE chunks than the test will consume so the source stays in 'readable' state when cancel runs (e.g. cap test reads ~6 chunks, stub provides 10).
+- Tool-name whitelists must derive from `ALL_TOOLS` in `services/tools.ts`, never hardcoded. `services/agents.ts` `ALL_TOOL_NAMES` had this drift class until v1.12 — same pattern applies to any future tool-aware code.
+- Agent registry lives at `data/AGENTS.md` (global, bind-mounted at `/data/AGENTS.md`). No per-project `AGENTS.md` in this repo — removed in v1.12 to eliminate the two-files-must-stay-in-sync drift. The `getAgentsForProject` per-project override mechanism remains for *other* projects.
+- MCP stdio transport uses newline-delimited JSON (NDJSON), NOT LSP-style `Content-Length` headers. The `codecontext/shim.go` framing implementation is the reference; per the MCP spec (modelcontextprotocol.io/specification/server/transports).
--- a/apps/server/package.json
+++ b/apps/server/package.json
@@ -11,8 +11,10 @@
    "test": "vitest run"
  },
  "dependencies": {
+    "@ai-sdk/openai-compatible": "^2.0.47",
    "@fastify/static": "^7.0.4",
    "@fastify/websocket": "^10.0.1",
+    "ai": "^6.0.190",
    "fastify": "^4.28.1",
    "postgres": "^3.4.4",
    "ws": "^8.18.0",
--- a/apps/server/src/index.ts
+++ b/apps/server/src/index.ts
@@ -16,11 +16,13 @@ import { registerWebSocket } from './routes/ws.js';
 import { registerModelRoutes } from './routes/models.js';
 import { registerAgentRoutes } from './routes/agents.js';
 import { registerSkillsRoutes } from './routes/skills.js';
-import { createInferenceRunner } from './services/inference.js';
+import { registerToolsRoutes } from './routes/tools.js';
+import { createInferenceRunner } from './services/inference/index.js';
 import { createBroker } from './services/broker.js';
 import { listSkills } from './services/skills.js';
 import * as compaction from './services/compaction.js';
 import { configureModelContext } from './services/model-context.js';
+import { cleanupTruncations } from './services/truncate.js';

 async function main() {
  const config = loadConfig();
@@ -49,6 +51,18 @@ async function main() {
  await applySchema(sql);
  app.log.info('database schema applied');

+  const swept = await sql<{ count: string }[]>`
+    WITH swept AS (
+      UPDATE messages SET status = 'failed'
+      WHERE status = 'streaming' AND created_at < NOW() - INTERVAL '5 minutes'
+      RETURNING id
+    ) SELECT count(*)::text AS count FROM swept
+  `;
+  const sweptCount = Number(swept[0]?.count ?? 0);
+  if (sweptCount > 0) {
+    app.log.info({ sweptCount }, 'swept stale streaming messages to failed');
+  }
+
  // v1.11.3: tell the model-context cache where llama-swap lives. Cache
  // lookups go to ${LLAMA_SWAP_URL}/upstream/<model>/props to read
  // default_generation_settings.n_ctx — the value persisted as messages.ctx_max.
@@ -61,7 +75,7 @@ async function main() {
    return { status: dbOk ? 'ok' : 'degraded', db: dbOk };
  });

-  const broker = createBroker();
+  const broker = createBroker(app.log);

  registerProjectRoutes(app, sql, config, broker);
  registerSessionRoutes(app, sql, config, broker);
@@ -70,6 +84,7 @@ async function main() {
  registerAgentRoutes(app, sql);
  registerSidebarRoutes(app, sql);
  registerChatRoutes(app, sql, broker);
+  registerToolsRoutes(app, sql);

  // Batch 9.6: warm the skills cache at boot and surface the count. Empty or
  // missing /data/skills is non-fatal — the skill tools just return empty.
@@ -189,6 +204,52 @@ async function main() {
    app.log.info(`serving static frontend from ${webDist}`);
  }

+  // v1.13.3: periodic in-process sweeper for streaming rows orphaned by a
+  // mid-session crash. The boot sweep (above) only fires once at startup;
+  // this loop catches the in-flight case. 60s cadence + 5-min threshold
+  // matches the boot sweep so behavior is consistent. Publishes
+  // chat_status='idle' on the user channel so the UI dot drops without a
+  // refresh — same pattern as handleAbortOrError.
+  const SWEEP_INTERVAL_MS = 60_000;
+  const sweepStaleStreaming = async (): Promise<void> => {
+    try {
+      const rows = await sql<{ id: string; chat_id: string }[]>`
+        UPDATE messages
+        SET status = 'failed', finished_at = clock_timestamp()
+        WHERE status = 'streaming'
+          AND created_at < NOW() - INTERVAL '5 minutes'
+        RETURNING id, chat_id
+      `;
+      if (rows.length === 0) return;
+      app.log.warn(
+        { swept: rows.length, ids: rows.map((r) => r.id) },
+        'swept stale streaming rows',
+      );
+      const seenChats = new Set<string>();
+      const now = new Date().toISOString();
+      for (const row of rows) {
+        if (seenChats.has(row.chat_id)) continue;
+        seenChats.add(row.chat_id);
+        broker.publishUser('default', {
+          type: 'chat_status',
+          chat_id: row.chat_id,
+          status: 'idle',
+          at: now,
+        });
+      }
+    } catch (err) {
+      app.log.error({ err }, 'stuck-row sweeper failed');
+    }
+  };
+  // v1.13.5: truncation cleanup rides the same cadence — 60s tick reaps
+  // tmpfs files past the 7-day TTL plus any orphans whose owning part has
+  // been pruned (v1.13.4) or deleted. No-op when the dir is empty.
+  const sweepTimer = setInterval(() => {
+    void sweepStaleStreaming();
+    void cleanupTruncations({ sql, log: app.log });
+  }, SWEEP_INTERVAL_MS);
+  app.addHook('onClose', async () => { clearInterval(sweepTimer); });
+
  const shutdown = async (signal: string) => {
    app.log.info(`received ${signal}, shutting down`);
    try {
--- a/apps/server/src/routes/chats.ts
+++ b/apps/server/src/routes/chats.ts
@@ -18,6 +18,12 @@ const ForkBody = z.object({
  name: z.string().min(1).max(200).optional(),
 });

+const DiscardStaleBody = z.object({
+  message_id: z.string().uuid(),
+});
+
+const STALE_MIN_AGE_SECONDS = 60;
+
 export function registerChatRoutes(
  app: FastifyInstance,
  sql: Sql,
@@ -307,6 +313,28 @@ export function registerChatRoutes(
            AND created_at <= ${target.created_at}::timestamptz
            AND status = 'complete'
        `;
+        // v1.13.0: clone message_parts for the forked messages. Source and
+        // destination preserve ordering (the INSERT above orders by created_at,
+        // id) so a ROW_NUMBER pairing maps source.id → dest.id deterministically.
+        await tx`
+          WITH src AS (
+            SELECT id, ROW_NUMBER() OVER (ORDER BY created_at ASC, id ASC) AS rn
+            FROM messages
+            WHERE chat_id = ${source.id}
+              AND created_at <= ${target.created_at}::timestamptz
+              AND status = 'complete'
+          ),
+          dst AS (
+            SELECT id, ROW_NUMBER() OVER (ORDER BY created_at ASC, id ASC) AS rn
+            FROM messages
+            WHERE chat_id = ${chat!.id}
+          )
+          INSERT INTO message_parts (message_id, sequence, kind, payload)
+          SELECT dst.id, p.sequence, p.kind, p.payload
+          FROM message_parts p
+          JOIN src ON p.message_id = src.id
+          JOIN dst ON dst.rn = src.rn
+        `;
        return chat!;
      });

@@ -320,6 +348,73 @@ export function registerChatRoutes(
    }
  );

+  // v1.12.3: explicit recovery from a stuck-streaming assistant row. The
+  // frontend gates this behind a 60s no-token-activity timer; the server
+  // re-checks the age and current status for safety. Non-streaming rows
+  // return 409 (frontend race; idempotent retry is fine).
+  app.post<{ Params: { id: string } }>(
+    '/api/chats/:id/discard_stale',
+    async (req, reply) => {
+      const parsed = DiscardStaleBody.safeParse(req.body ?? {});
+      if (!parsed.success) {
+        reply.code(400);
+        return { error: 'invalid body', details: parsed.error.flatten() };
+      }
+      const rows = await sql<{
+        id: string;
+        session_id: string;
+        chat_id: string;
+        status: string;
+        age_seconds: number;
+      }[]>`
+        SELECT id, session_id, chat_id, status,
+               EXTRACT(EPOCH FROM (clock_timestamp() - created_at))::int AS age_seconds
+        FROM messages
+        WHERE id = ${parsed.data.message_id} AND chat_id = ${req.params.id}
+      `;
+      if (rows.length === 0) {
+        reply.code(404);
+        return { error: 'message not found in chat' };
+      }
+      const msg = rows[0]!;
+      if (msg.status !== 'streaming') {
+        reply.code(409);
+        return { error: 'message is no longer streaming', current_status: msg.status };
+      }
+      if (msg.age_seconds < STALE_MIN_AGE_SECONDS) {
+        reply.code(409);
+        return { error: 'message is not stale yet', age_seconds: msg.age_seconds };
+      }
+      const updated = await sql<Message[]>`
+        UPDATE messages
+        SET status = 'failed',
+            content = COALESCE(content, ''),
+            finished_at = clock_timestamp()
+        WHERE id = ${msg.id} AND status = 'streaming'
+        RETURNING id, session_id, chat_id, role, content, kind, tool_calls, tool_results,
+                  status, last_seq, tokens_used, ctx_used, ctx_max, started_at, finished_at,
+                  created_at, metadata, summary, tail_start_id, compacted_at
+      `;
+      if (updated.length === 0) {
+        // Race: the row flipped out of 'streaming' between our SELECT and UPDATE.
+        reply.code(409);
+        return { error: 'message status changed mid-request' };
+      }
+      broker.publishUser('default', {
+        type: 'chat_status',
+        chat_id: msg.chat_id,
+        status: 'idle',
+        at: new Date().toISOString(),
+      });
+      broker.publish(msg.session_id, {
+        type: 'message_complete',
+        message_id: msg.id,
+        chat_id: msg.chat_id,
+      });
+      return updated[0];
+    }
+  );
+
  app.get<{ Params: { id: string } }>(
    '/api/chats/:id/messages',
    async (req, reply) => {
@@ -328,11 +423,12 @@ export function registerChatRoutes(
        reply.code(404);
        return { error: 'chat not found' };
      }
+      // v1.13.1-B: reads tool_calls/tool_results via the parts-merged view.
      const rows = await sql<Message[]>`
        SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata,
               summary, tail_start_id, compacted_at
-        FROM messages
+        FROM messages_with_parts
        WHERE chat_id = ${req.params.id}
        ORDER BY created_at ASC, id ASC
      `;
--- a/apps/server/src/routes/messages.ts
+++ b/apps/server/src/routes/messages.ts
@@ -91,11 +91,12 @@ export function registerMessageRoutes(
      // SummaryCard) and shows compacted_at-stamped rows inline for context.
      // Internal inference assembly filters compacted_at IS NULL separately —
      // see services/inference.ts loadContext + services/compaction.ts.
+      // v1.13.1-B: reads tool_calls/tool_results via the parts-merged view.
      const rows = await sql<Message[]>`
        SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata,
               summary, tail_start_id, compacted_at
-        FROM messages
+        FROM messages_with_parts
        WHERE session_id = ${req.params.id}
        ORDER BY created_at ASC, id ASC
      `;
@@ -469,30 +470,36 @@ export function registerMessageRoutes(
      const chat = chatRows[0]!;
      const sessionId = chat.session_id;

-      // Find the assistant message that emitted this tool_call. Scoped by
-      // chat_id + role to avoid cross-chat lookups; ordered by created_at DESC
-      // because the most recent issuance wins when an LLM reuses call IDs
-      // across turns (the older, already-answered one is a different row with
-      // populated tool_results downstream).
-      const callerRows = await sql<{ id: string; tool_calls: ToolCall[] | null }[]>`
-        SELECT id, tool_calls FROM messages
-        WHERE chat_id = ${chat.id}
-          AND role = 'assistant'
-          AND tool_calls IS NOT NULL
-        ORDER BY created_at DESC
+      // v1.13.1-C: find the assistant's tool_call by indexing message_parts
+      // directly on payload->>'id'. Scoped by chat_id + role via the JOIN.
+      // Pre-v1.13.0 history has no parts rows — those tool_calls become
+      // unreachable here (404). Acceptable per the dispatch decision: any
+      // pending elicitation from before v1.13.0 is long timed out by now;
+      // promote to a hotfix with a JSON-column fallback if it ever surfaces.
+      const callerRows = await sql<{
+        message_id: string;
+        payload: { id: string; name: string; args: Record<string, unknown> };
+      }[]>`
+        SELECT p.message_id, p.payload
+        FROM message_parts p
+        JOIN messages m ON m.id = p.message_id
+        WHERE m.chat_id = ${chat.id}
+          AND m.role = 'assistant'
+          AND p.kind = 'tool_call'
+          AND p.payload->>'id' = ${tool_call_id}
+        ORDER BY m.created_at DESC
+        LIMIT 1
      `;
-      let foundCall: ToolCall | null = null;
-      for (const row of callerRows) {
-        const match = row.tool_calls?.find((tc) => tc.id === tool_call_id);
-        if (match) {
-          foundCall = match;
-          break;
-        }
-      }
-      if (!foundCall) {
+      const callerRow = callerRows[0];
+      if (!callerRow) {
        reply.code(404);
        return { error: 'unknown_tool_call_id' };
      }
+      const foundCall: ToolCall = {
+        id: callerRow.payload.id,
+        name: callerRow.payload.name,
+        args: callerRow.payload.args,
+      };
      if (foundCall.name !== 'ask_user_input') {
        reply.code(400);
        return { error: 'tool_call_not_ask_user_input' };
@@ -539,18 +546,21 @@ export function registerMessageRoutes(
        }
      }

-      // Find the pending tool row. ORDER BY created_at DESC + LIMIT 1 picks
-      // the most recent row with this tool_call_id; the already-answered
-      // check below guards against UPDATE-ing a stale answer.
+      // v1.13.1-C: find the pending tool row via message_parts on
+      // payload->>'tool_call_id'. Same fallback caveat as the caller lookup
+      // above — pre-v1.13.0 rows are unreachable here.
      const toolRows = await sql<{
-        id: string;
-        tool_results: { tool_call_id: string; output: unknown } | null;
+        message_id: string;
+        payload: { tool_call_id: string; output: unknown };
      }[]>`
-        SELECT id, tool_results FROM messages
-        WHERE chat_id = ${chat.id}
-          AND role = 'tool'
-          AND tool_results->>'tool_call_id' = ${tool_call_id}
-        ORDER BY created_at DESC
+        SELECT p.message_id, p.payload
+        FROM message_parts p
+        JOIN messages m ON m.id = p.message_id
+        WHERE m.chat_id = ${chat.id}
+          AND m.role = 'tool'
+          AND p.kind = 'tool_result'
+          AND p.payload->>'tool_call_id' = ${tool_call_id}
+        ORDER BY m.created_at DESC
        LIMIT 1
      `;
      const toolRow = toolRows[0];
@@ -558,7 +568,7 @@ export function registerMessageRoutes(
        reply.code(404);
        return { error: 'unknown_tool_call_id', detail: 'tool message not found' };
      }
-      if (toolRow.tool_results && toolRow.tool_results.output !== null) {
+      if (toolRow.payload && toolRow.payload.output !== null) {
        reply.code(409);
        return { error: 'tool_call_already_answered' };
      }
@@ -570,11 +580,21 @@ export function registerMessageRoutes(
        truncated: false,
      };

+      const toolMessageId = toolRow.message_id;
      const result = await sql.begin(async (tx) => {
        await tx`
          UPDATE messages
          SET tool_results = ${tx.json(newToolResults as never)}
-          WHERE id = ${toolRow.id}
+          WHERE id = ${toolMessageId}
+        `;
+        // v1.13.0: replace the pending tool_result part inserted at message
+        // creation (tool-phase.ts) with the answered one. Delete-then-insert
+        // is simpler than UPDATE because parts are append-style elsewhere;
+        // the UNIQUE (message_id, sequence) constraint blocks plain insert.
+        await tx`DELETE FROM message_parts WHERE message_id = ${toolMessageId} AND kind = 'tool_result'`;
+        await tx`
+          INSERT INTO message_parts (message_id, sequence, kind, payload)
+          VALUES (${toolMessageId}, 0, 'tool_result', ${tx.json(newToolResults as never)})
        `;
        const [assistantMsg] = await tx<{ id: string }[]>`
          INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
@@ -584,7 +604,7 @@ export function registerMessageRoutes(
        await tx`UPDATE sessions SET updated_at = clock_timestamp() WHERE id = ${sessionId}`;
        await tx`UPDATE chats SET updated_at = clock_timestamp() WHERE id = ${chat.id}`;
        return {
-          tool_message_id: toolRow.id,
+          tool_message_id: toolMessageId,
          assistant_message_id: assistantMsg!.id,
        };
      });
--- a/apps/server/src/routes/sessions.ts
+++ b/apps/server/src/routes/sessions.ts
@@ -13,6 +13,18 @@ const CreateBody = z.object({
  agent_id: z.string().min(1).max(200).nullable().optional(),
 });

+const WorkspacePaneZ = z.object({
+  id: z.string().min(1).max(200),
+  kind: z.enum(['chat', 'terminal', 'agent', 'empty', 'settings']),
+  chatId: z.string().min(1).max(200).optional(),
+  chatIds: z.array(z.string().min(1).max(200)).max(50),
+  activeChatIdx: z.number().int(),
+});
+
+const WorkspacePanesBody = z.object({
+  workspace_panes: z.array(WorkspacePaneZ).max(10),
+});
+
 const PatchBody = z.object({
  name: z.string().min(1).max(200).optional(),
  model: z.string().min(1).max(200).optional(),
@@ -44,7 +56,7 @@ export function registerSessionRoutes(
      }
      const status = req.query.status === 'archived' ? 'archived' : 'open';
      const rows = await sql<Session[]>`
-        SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled
+        SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled, workspace_panes
        FROM sessions
        WHERE project_id = ${req.params.id} AND status = ${status}
        ORDER BY updated_at DESC
@@ -92,7 +104,7 @@ export function registerSessionRoutes(
        const [session] = await tx<Session[]>`
          INSERT INTO sessions (project_id, name, model, system_prompt, agent_id)
          VALUES (${req.params.id}, ${name}, ${model}, ${systemPrompt}, ${agentId})
-          RETURNING id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled
+          RETURNING id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled, workspace_panes
        `;
        await tx`
          INSERT INTO chats (session_id, name, status)
@@ -112,7 +124,7 @@ export function registerSessionRoutes(

  app.get<{ Params: { id: string } }>('/api/sessions/:id', async (req, reply) => {
    const rows = await sql<Session[]>`
-      SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled
+      SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled, workspace_panes
      FROM sessions WHERE id = ${req.params.id}
    `;
    if (rows.length === 0) {
@@ -158,7 +170,7 @@ export function registerSessionRoutes(
          updated_at = clock_timestamp()
        WHERE id = ${req.params.id}
        RETURNING id, project_id, name, model, system_prompt, status, created_at, updated_at,
-                  agent_id, web_search_enabled
+                  agent_id, web_search_enabled, workspace_panes
      `;
      if (rows.length === 0) {
        reply.code(404);
@@ -187,6 +199,36 @@ export function registerSessionRoutes(
    }
  );

+  app.patch<{ Params: { id: string } }>(
+    '/api/sessions/:id/workspace',
+    async (req, reply) => {
+      const parsed = WorkspacePanesBody.safeParse(req.body);
+      if (!parsed.success) {
+        reply.code(400);
+        return { error: 'invalid body', details: parsed.error.flatten() };
+      }
+      const rows = await sql<Session[]>`
+        UPDATE sessions
+        SET workspace_panes = ${sql.json(parsed.data.workspace_panes as never)},
+            updated_at = clock_timestamp()
+        WHERE id = ${req.params.id}
+        RETURNING id, project_id, name, model, system_prompt, status, created_at, updated_at,
+                  agent_id, web_search_enabled, workspace_panes
+      `;
+      if (rows.length === 0) {
+        reply.code(404);
+        return { error: 'session not found' };
+      }
+      const session = rows[0]!;
+      broker.publishUser('default', {
+        type: 'session_workspace_updated',
+        session_id: session.id,
+        workspace_panes: session.workspace_panes,
+      });
+      return session;
+    }
+  );
+
  // v1.9: bulk-archive every open session in a project. Mirrors the
  // single-archive shape (same broker frame type) so the existing useSidebar
  // reducer cases handle it without changes — just N frames instead of 1.
@@ -263,7 +305,7 @@ export function registerSessionRoutes(
      const rows = await sql<Session[]>`
        UPDATE sessions SET status = 'open', updated_at = clock_timestamp()
        WHERE id = ${req.params.id} AND status = 'archived'
-        RETURNING id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled
+        RETURNING id, project_id, name, model, system_prompt, status, created_at, updated_at, agent_id, web_search_enabled, workspace_panes
      `;
      if (rows.length === 0) {
        reply.code(404);
--- a/apps/server/src/routes/skills.ts
+++ b/apps/server/src/routes/skills.ts
@@ -90,11 +90,26 @@ export function registerSkillsRoutes(
          VALUES (${sessionId}, ${chat.id}, 'assistant', '', ${sql.json(toolCalls as never)}, 'complete', clock_timestamp())
          RETURNING id
        `;
+        // v1.13.0: dual-write the synthetic assistant message's tool_call.
+        // Single skill_use tool_call, no text content, so one part at seq 0.
+        await tx`
+          INSERT INTO message_parts (message_id, sequence, kind, payload)
+          VALUES (${synthAssistant!.id}, 0, 'tool_call', ${tx.json({
+            id: toolCallId,
+            name: 'skill_use',
+            args: { name: skill_name },
+          } as never)})
+        `;
        const [toolMsg] = await tx<{ id: string }[]>`
          INSERT INTO messages (session_id, chat_id, role, content, tool_results, status, created_at)
          VALUES (${sessionId}, ${chat.id}, 'tool', '', ${sql.json(toolResults as never)}, 'complete', clock_timestamp())
          RETURNING id
        `;
+        // v1.13.0: dual-write the synthetic tool result (the skill body).
+        await tx`
+          INSERT INTO message_parts (message_id, sequence, kind, payload)
+          VALUES (${toolMsg!.id}, 0, 'tool_result', ${tx.json(toolResults as never)})
+        `;
        const [userMsg] = await tx<{ id: string }[]>`
          INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
          VALUES (${sessionId}, ${chat.id}, 'user', ${userText}, 'complete', clock_timestamp())
--- a/apps/server/src/routes/tools.ts
+++ b/apps/server/src/routes/tools.ts
@@ -0,0 +1,40 @@
+import type { FastifyInstance } from 'fastify';
+import type { Sql } from '../db.js';
+
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+
+// v1.13.10: per-tool token cost rolling window read endpoint. Backed by the
+// tool_cost_stats view in schema.sql (last 100 calls per tool, equal-split
+// attribution across multi-tool turns, sentinel/failed-turn excluded).
+// Consumed by AgentPicker for at-a-glance per-agent cost hints.
+export function registerToolsRoutes(app: FastifyInstance, sql: Sql): void {
+  app.get('/api/tools/cost_stats', async () => {
+    const rows = await sql<
+      {
+        tool_name: string;
+        prompt_tokens_sum: number;
+        completion_tokens_sum: number;
+        n_calls: number;
+        updated_at: string;
+      }[]
+    >`
+      SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
+      FROM tool_cost_stats
+      ORDER BY tool_name ASC
+    `;
+    const stats: ToolCostStat[] = rows.map((r) => ({
+      tool_name: r.tool_name,
+      mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
+      mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
+      n_calls: r.n_calls,
+      updated_at: r.updated_at,
+    }));
+    return { stats };
+  });
+}
--- a/apps/server/src/routes/ws.ts
+++ b/apps/server/src/routes/ws.ts
@@ -23,11 +23,12 @@ export function registerWebSocket(

      // v1.11: snapshot includes compaction fields so MessageBubble can
      // render the SummaryCard for summary=true rows on first connect.
+      // v1.13.1-B: reads tool_calls/tool_results via the parts-merged view.
      const messages = await sql<Message[]>`
        SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
               tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata,
               summary, tail_start_id, compacted_at
-        FROM messages
+        FROM messages_with_parts
        WHERE session_id = ${sessionId}
        ORDER BY created_at ASC, id ASC
      `;
--- a/apps/server/src/schema.sql
+++ b/apps/server/src/schema.sql
@@ -1,3 +1,10 @@
+-- v1.13.3: statement_timeout is set at database level via:
+--   ALTER DATABASE boocode SET statement_timeout = '30s';
+-- ALTER DATABASE can't run inside a DO block, so this is an operational
+-- step rather than schema. Re-apply after a volume reset (the setting
+-- lives in pg_db which survives `docker compose up --build` but NOT a
+-- `docker volume rm boocode_pgdata`).
+
 CREATE TABLE IF NOT EXISTS projects (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL,
@@ -32,6 +39,148 @@ CREATE TABLE IF NOT EXISTS messages (

 CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, created_at);

+-- v1.13.0: granular message parts table for AI SDK migration. Old
+-- messages.content / tool_calls / tool_results columns stay authoritative
+-- for reads in v1.13.0; this table is dual-written so the swap can happen
+-- in a later dispatch without a backfill window. ON DELETE CASCADE means
+-- removing a message removes its parts in one go.
+CREATE TABLE IF NOT EXISTS message_parts (
+  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
+  message_id uuid NOT NULL REFERENCES messages(id) ON DELETE CASCADE,
+  sequence int NOT NULL,
+  kind text NOT NULL,
+  payload jsonb NOT NULL,
+  created_at timestamptz NOT NULL DEFAULT clock_timestamp(),
+  CONSTRAINT message_parts_kind_chk CHECK (kind IN ('text', 'tool_call', 'tool_result', 'reasoning', 'step_start')),
+  CONSTRAINT message_parts_seq_uniq UNIQUE (message_id, sequence)
+);
+CREATE INDEX IF NOT EXISTS message_parts_msg_seq_idx ON message_parts (message_id, sequence);
+
+-- v1.13.4: prune support. hidden_at marks parts that have been pruned out
+-- of the model payload by the two-tier compaction prune (services/inference/
+-- prune.ts). Rows stay in the DB so frontend can still display them with a
+-- "hidden" indicator (out of scope this dispatch). messages_with_parts
+-- view filters these out — see below. Partial index speeds the common
+-- "visible parts only" filter.
+DO $$
+BEGIN
+  IF NOT EXISTS (
+    SELECT 1 FROM information_schema.columns
+    WHERE table_name = 'message_parts' AND column_name = 'hidden_at'
+  ) THEN
+    ALTER TABLE message_parts ADD COLUMN hidden_at timestamptz NULL;
+  END IF;
+END $$;
+CREATE INDEX IF NOT EXISTS message_parts_hidden_idx
+  ON message_parts (message_id) WHERE hidden_at IS NULL;
+
+-- v1.13.1-B: read-path view. Read sites SELECT FROM messages_with_parts
+-- instead of messages so tool_calls / tool_results / reasoning_parts come
+-- from the granular message_parts table. The COALESCE means pre-v1.13.0
+-- history (no parts rows) still resolves via the legacy JSON columns; the
+-- dual-write from v1.13.0 keeps both in sync for all rows written since.
+-- Writes continue to target `messages` directly — the view is read-only.
+-- Shapes match the in-memory ToolCall / ToolResult types: tool_calls is a
+-- jsonb array of {id, name, args}, tool_results is a single jsonb object
+-- {tool_call_id, output, truncated, error?}. reasoning_parts is new — only
+-- consumed by the inference history fetch (payload.ts) so v1.13.1-C can
+-- wire reasoning into the model payload. Not surfaced in external APIs yet.
+CREATE OR REPLACE VIEW messages_with_parts AS
+SELECT
+  m.id, m.session_id, m.chat_id, m.role, m.content, m.kind, m.status,
+  m.last_seq, m.tokens_used, m.ctx_used, m.ctx_max,
+  m.started_at, m.finished_at, m.created_at, m.metadata,
+  m.summary, m.tail_start_id, m.compacted_at,
+  -- v1.13.4: prune semantics need to distinguish "no parts row exists"
+  -- (pre-v1.13.0 fallback to legacy column) from "all parts hidden"
+  -- (prune intended — return null/empty so the row drops from the model
+  -- payload). A naive COALESCE would fall back to the legacy column when
+  -- every part is hidden, undoing the prune. CASE on EXISTS(any kind)
+  -- splits the two cases.
+  CASE
+    WHEN EXISTS (SELECT 1 FROM message_parts pp
+                  WHERE pp.message_id = m.id AND pp.kind = 'tool_call')
+    THEN (SELECT jsonb_agg(p.payload ORDER BY p.sequence)
+            FROM message_parts p
+           WHERE p.message_id = m.id AND p.kind = 'tool_call' AND p.hidden_at IS NULL)
+    ELSE m.tool_calls
+  END AS tool_calls,
+  CASE
+    WHEN EXISTS (SELECT 1 FROM message_parts pp
+                  WHERE pp.message_id = m.id AND pp.kind = 'tool_result')
+    THEN (SELECT p.payload
+            FROM message_parts p
+           WHERE p.message_id = m.id AND p.kind = 'tool_result' AND p.hidden_at IS NULL
+           ORDER BY p.sequence LIMIT 1)
+    ELSE m.tool_results
+  END AS tool_results,
+  (SELECT jsonb_agg(p.payload ORDER BY p.sequence)
+     FROM message_parts p
+    WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
+FROM messages m;
+
+-- v1.13.10: per-tool token cost rolling window. Derives from
+-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
+-- the legacy JSON column) so this works whether the chat predates v1.13.0
+-- or postdates v1.13.2 (column drop). No new write site — all source data
+-- already lands via the existing tool-phase.ts:94-95 UPDATE.
+--
+-- Attribution model: equal split. A turn emitting N tool calls divides its
+-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
+-- brief for rationale + rejected alternatives.
+--
+-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
+-- = completion (output). Non-obvious naming; pinned via canonical writes at
+-- tool-phase.ts:94-95 et al.
+--
+-- Filtering rationale:
+--   status='complete'                — exclude failed/cancelled (defense in
+--                                      depth; failed-path doesn't write
+--                                      tokens_used so they're filtered
+--                                      indirectly too).
+--   metadata->>'kind' exclusions     — exclude cap_hit / doom_loop sentinels
+--                                      (defense in depth; sentinels are
+--                                      role='system' with tool_calls=NULL
+--                                      so they're filtered indirectly too).
+--   experimental_repairToolCall      — no special handling; retries flow
+--                                      as normal next-turn tool_result
+--                                      errors and count naturally.
+--
+-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
+-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
+-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
+CREATE OR REPLACE VIEW tool_cost_stats AS
+WITH per_call AS (
+  SELECT
+    (tc->>'name')::text AS tool_name,
+    (m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
+    (m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
+    m.created_at,
+    ROW_NUMBER() OVER (
+      PARTITION BY (tc->>'name')::text
+      ORDER BY m.created_at DESC
+    ) AS rn
+  FROM messages_with_parts m,
+    LATERAL jsonb_array_elements(m.tool_calls) AS tc
+  WHERE m.tool_calls IS NOT NULL
+    AND jsonb_array_length(m.tool_calls) > 0
+    AND m.tokens_used IS NOT NULL
+    AND m.ctx_used IS NOT NULL
+    AND m.status = 'complete'
+    AND (m.metadata IS NULL
+         OR m.metadata->>'kind' IS NULL
+         OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
+)
+SELECT
+  tool_name,
+  ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
+  ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
+  COUNT(*)::int AS n_calls,
+  MAX(created_at) AS updated_at
+FROM per_call
+WHERE rn <= 100
+GROUP BY tool_name;
+
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;
@@ -47,22 +196,14 @@ CREATE TABLE IF NOT EXISTS settings (

 INSERT INTO settings (key, value) VALUES ('default_model', '"qwen3.6-35b-a3b-mxfp4"') ON CONFLICT (key) DO NOTHING;

-- DEPRECATED: client-side pane state as of v1.2-batch4. Table retained per
-- additive schema rule; no writes. Drop in a future destructive migration.
-CREATE TABLE IF NOT EXISTS session_panes (
-  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-  session_id   UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
-  position     INTEGER NOT NULL,
-  kind         TEXT NOT NULL CHECK (kind IN ('chat', 'file_browser', 'terminal')),
-  state        JSONB NOT NULL DEFAULT '{}',
-  created_at   TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
-  UNIQUE (session_id, position)
-);
-CREATE INDEX IF NOT EXISTS idx_session_panes_session ON session_panes (session_id);
+-- v1.12.1: deprecated session_panes table removed. Workspace pane state now
+-- lives in sessions.workspace_panes (jsonb), see below.
+DROP TABLE IF EXISTS session_panes;

-- v1.4: backfill removed. Pane layout is client-side (localStorage) since v1.2-batch4.
-- The CREATE TABLE above is retained for additive-schema discipline; drop is a
-- future destructive migration.
+-- v1.12.1: server-side workspace pane layout, replaces localStorage so every
+-- device sees the same panes for a given session. Shape matches
+-- WorkspacePane[] from apps/server/src/types/api.ts.
+ALTER TABLE sessions ADD COLUMN IF NOT EXISTS workspace_panes JSONB NOT NULL DEFAULT '[]'::jsonb;

 -- v1.2: sessions.status (open | archived)
 ALTER TABLE sessions ADD COLUMN IF NOT EXISTS status TEXT NOT NULL DEFAULT 'open';
@@ -128,6 +269,19 @@ BEGIN
  END IF;
 END $$;

+-- v1.12.1: drop stale inline CHECK constraints that were superseded by the
+-- named *_chk variants above. messages_status_check missed 'cancelled' and
+-- messages_role_check missed 'system' — both narrower than what's in use.
+DO $$
+BEGIN
+  IF EXISTS (SELECT 1 FROM pg_constraint WHERE conname = 'messages_status_check') THEN
+    ALTER TABLE messages DROP CONSTRAINT messages_status_check;
+  END IF;
+  IF EXISTS (SELECT 1 FROM pg_constraint WHERE conname = 'messages_role_check') THEN
+    ALTER TABLE messages DROP CONSTRAINT messages_role_check;
+  END IF;
+END $$;
+
 -- v1.2-project-ux: projects.status + projects.gitea_remote
 -- KEEP IN SYNC: apps/server/src/types/api.ts PROJECT_STATUSES
 ALTER TABLE projects ADD COLUMN IF NOT EXISTS status TEXT NOT NULL DEFAULT 'open';
--- a/apps/server/src/services/tests/compaction.test.ts
+++ b/apps/server/src/services/tests/compaction.test.ts
@@ -6,6 +6,7 @@ import {
  turns,
  select,
  buildPrompt,
+  buildHeadPayload,
  type CompactionMessage,
 } from '../compaction.js';
 import { SUMMARY_TEMPLATE } from '../compaction-prompt.js';
@@ -31,6 +32,7 @@ function mkMsg(
    status: 'complete',
    tool_calls: null,
    tool_results: null,
+    reasoning_parts: null,
    metadata: null,
    created_at: new Date(counter * 1000).toISOString(),
    ...overrides,
@@ -39,49 +41,58 @@ function mkMsg(

 // ---- usable -----------------------------------------------------------------

-describe('usable', () => {
-  it('returns 0 when contextLimit is 0', () => {
+// v1.13.9: ratio-only early trigger at 0.85 × contextLimit. Replaces the
+// v1.11.0-era `contextLimit - 20_000` math, which degenerated to 0 for
+// contexts ≤20k and gave only 7-8% headroom at 262k.
+describe('usable() — ratio-only early trigger (v1.13.9)', () => {
+  it('returns floor(0.85 * limit) for the qwen3.6 daily-driver context', () => {
+    // floor(0.85 * 262144) = floor(222822.4) = 222822 — 15% headroom for
+    // the summarizer to do its turn without itself overflowing.
+    expect(usable(262144)).toBe(222822);
+  });
+
+  it('returns 0.85× for a mid-sized context', () => {
+    expect(usable(100_000)).toBe(85_000);
+  });
+
+  it('returns 0.85× for a small context (no degenerate 0)', () => {
+    // floor(0.85 * 8192) = 6963. Under the old formula this returned 0
+    // (8192 - 20_000 clamped to 0), effectively disabling compaction for
+    // small-context models. The ratio keeps the trigger active.
+    expect(usable(8192)).toBe(6963);
+  });
+
+  it('returns 0 for zero or negative contextLimit', () => {
    expect(usable(0)).toBe(0);
-  });
-
-  it('returns 0 when contextLimit is below the 20k buffer', () => {
-    // Math.max(0, x - 20000) clamps the subtraction so we never report
-    // negative headroom. A 10k-context model reports 0 usable, which makes
-    // isOverflow short-circuit to false (correct — we can't size the
-    // compaction with no headroom).
-    expect(usable(10_000)).toBe(0);
-    expect(usable(19_999)).toBe(0);
-    expect(usable(20_000)).toBe(0);
-  });
-
-  it('subtracts the 20k buffer from a normal-sized context window', () => {
-    expect(usable(100_000)).toBe(80_000);
-    expect(usable(32_768)).toBe(12_768);
+    expect(usable(-1)).toBe(0);
  });
 });

 // ---- isOverflow -------------------------------------------------------------

 describe('isOverflow', () => {
-  it('returns false when usable is 0 (unknown / sub-buffer context)', () => {
+  it('returns false when usable is 0 (unknown contextLimit)', () => {
    expect(isOverflow({ prompt_tokens: 999_999, completion_tokens: 0 }, 0)).toBe(false);
-    expect(isOverflow({ prompt_tokens: 0, completion_tokens: 999_999 }, 10_000)).toBe(false);
+    expect(isOverflow({ prompt_tokens: 0, completion_tokens: 999_999 }, -1)).toBe(false);
  });

  it('returns false at 50% of usable', () => {
-    // usable(100k) = 80k → 50% = 40k.
+    // v1.13.9: usable(100k) = 85k → 50% ≈ 42.5k.
    expect(isOverflow({ prompt_tokens: 30_000, completion_tokens: 10_000 }, 100_000)).toBe(false);
  });

  it('returns false just under usable', () => {
-    expect(isOverflow({ prompt_tokens: 79_000, completion_tokens: 999 }, 100_000)).toBe(false);
+    // v1.13.9: 84_000 + 999 = 84_999 < 85_000 budget.
+    expect(isOverflow({ prompt_tokens: 84_000, completion_tokens: 999 }, 100_000)).toBe(false);
  });

  it('returns true exactly at usable (>=, not strict >)', () => {
-    expect(isOverflow({ prompt_tokens: 80_000, completion_tokens: 0 }, 100_000)).toBe(true);
+    // v1.13.9: 85_000 == usable(100_000).
+    expect(isOverflow({ prompt_tokens: 85_000, completion_tokens: 0 }, 100_000)).toBe(true);
  });

  it('returns true above usable', () => {
+    // 50_000 + 40_000 = 90_000 > 85_000.
    expect(isOverflow({ prompt_tokens: 50_000, completion_tokens: 40_000 }, 100_000)).toBe(true);
  });
 });
@@ -224,8 +235,9 @@ describe('select', () => {
    const u = mkMsg('user', 'oversized');
    const a = mkMsg('assistant', 'Y'.repeat(40_000));
    const result = select([u, a], 30_000, 1);
-    // usable(30k) = 10k → budget = min(8k, max(2k, floor(10k*0.25))) =
-    // min(8k, max(2k, 2500)) = 2500. 40k chars ≈ 10k tokens. Can't fit.
+    // v1.13.9: usable(30k) = floor(0.85*30k) = 25500 → budget =
+    // min(8k, max(2k, floor(25500*0.25))) = min(8k, max(2k, 6375)) = 6375.
+    // 40k chars ≈ 10k tokens. Still can't fit (10k > 6375).
    expect(result.tail_start_id).toBeUndefined();
    expect(result.head).toEqual([u, a]);
  });
@@ -256,3 +268,56 @@ describe('buildPrompt', () => {
    expect(out.endsWith('extra-context-line')).toBe(true);
  });
 });
+
+// ---- buildHeadPayload (v1.13.6) -----------------------------------------------
+
+describe('buildHeadPayload reasoning render', () => {
+  it('emits reasoning as a <reasoning> tag prefixed onto the assistant content', () => {
+    const out = buildHeadPayload([
+      mkMsg('user', 'show me the file'),
+      mkMsg('assistant', 'reading it now', {
+        reasoning_parts: [{ text: 'user wants src/index.ts; I should view it' }],
+      }),
+    ]);
+    expect(out).toHaveLength(2);
+    expect(out[1]!.role).toBe('assistant');
+    expect(out[1]!.content).toBe(
+      '<reasoning>user wants src/index.ts; I should view it</reasoning>\n\nreading it now',
+    );
+  });
+
+  it('emits a standalone <reasoning> tag when reasoning is present but content is empty (tool-call-only turn)', () => {
+    const out = buildHeadPayload([
+      mkMsg('assistant', '', {
+        reasoning_parts: [{ text: 'jumping straight to grep' }],
+        tool_calls: [{ id: 'c1', name: 'grep', args: { pattern: 'foo' } }],
+      }),
+    ]);
+    expect(out).toHaveLength(1);
+    expect(out[0]!.content).toBe('<reasoning>jumping straight to grep</reasoning>');
+    expect(out[0]!.tool_calls).toHaveLength(1);
+    expect(out[0]!.tool_calls![0]!.function.name).toBe('grep');
+  });
+
+  it('joins multiple reasoning parts without separators (matches the streaming concat)', () => {
+    const out = buildHeadPayload([
+      mkMsg('assistant', 'final answer', {
+        reasoning_parts: [{ text: 'first thought ' }, { text: 'second thought' }],
+      }),
+    ]);
+    expect(out[0]!.content).toBe(
+      '<reasoning>first thought second thought</reasoning>\n\nfinal answer',
+    );
+  });
+
+  it('omits the reasoning tag entirely when reasoning_parts is null or empty', () => {
+    const out = buildHeadPayload([
+      mkMsg('assistant', 'plain answer', { reasoning_parts: null }),
+      mkMsg('assistant', 'other answer', { reasoning_parts: [] }),
+    ]);
+    expect(out[0]!.content).toBe('plain answer');
+    expect(out[1]!.content).toBe('other answer');
+    expect(out[0]!.content).not.toContain('<reasoning>');
+    expect(out[1]!.content).not.toContain('<reasoning>');
+  });
+});
--- a/apps/server/src/services/tests/doom-loop.test.ts
+++ b/apps/server/src/services/tests/doom-loop.test.ts
@@ -1,5 +1,5 @@
 import { describe, it, expect } from 'vitest';
-import { DOOM_LOOP_THRESHOLD, detectDoomLoop } from '../inference.js';
+import { DOOM_LOOP_THRESHOLD, detectDoomLoop } from '../inference/index.js';
 import type { ToolCall } from '../../types/api.js';

 // ---- fixture ----------------------------------------------------------------
--- a/apps/server/src/services/tests/inference.test.ts
+++ b/apps/server/src/services/tests/inference.test.ts
@@ -1,5 +1,5 @@
 import { describe, it, expect } from 'vitest';
-import { buildMessagesPayload } from '../inference.js';
+import { buildMessagesPayload } from '../inference/index.js';
 import type {
  Message,
  MessageRole,
--- a/apps/server/src/services/tests/parts.test.ts
+++ b/apps/server/src/services/tests/parts.test.ts
@@ -0,0 +1,121 @@
+import { describe, it, expect } from 'vitest';
+import { partsFromAssistantMessage, partsFromToolMessage } from '../inference/parts.js';
+import type { ToolCall, ToolResult } from '../../types/api.js';
+
+describe('partsFromAssistantMessage', () => {
+  it('emits one text part for content-only assistant', () => {
+    const parts = partsFromAssistantMessage({ content: 'hello world', tool_calls: null });
+    expect(parts).toHaveLength(1);
+    expect(parts[0]).toEqual({
+      sequence: 0,
+      kind: 'text',
+      payload: { text: 'hello world' },
+    });
+  });
+
+  it('emits one tool_call part for empty-content + single tool_call', () => {
+    const tc: ToolCall = { id: 'call_1', name: 'view_file', args: { path: 'src/a.ts' } };
+    const parts = partsFromAssistantMessage({ content: '', tool_calls: [tc] });
+    expect(parts).toHaveLength(1);
+    expect(parts[0]).toEqual({
+      sequence: 0,
+      kind: 'tool_call',
+      payload: { id: 'call_1', name: 'view_file', args: { path: 'src/a.ts' } },
+    });
+  });
+
+  it('emits text then tool_call parts in order when both present', () => {
+    const tc: ToolCall = { id: 'call_2', name: 'grep', args: { pattern: 'foo' } };
+    const parts = partsFromAssistantMessage({ content: 'let me search', tool_calls: [tc] });
+    expect(parts.map((p) => [p.sequence, p.kind])).toEqual([
+      [0, 'text'],
+      [1, 'tool_call'],
+    ]);
+  });
+
+  it('preserves tool_call order with multiple calls', () => {
+    const calls: ToolCall[] = [
+      { id: 'a', name: 'list_dir', args: { path: '.' } },
+      { id: 'b', name: 'view_file', args: { path: 'x.ts' } },
+      { id: 'c', name: 'grep', args: { pattern: 'y' } },
+    ];
+    const parts = partsFromAssistantMessage({ content: '', tool_calls: calls });
+    expect(parts).toHaveLength(3);
+    expect(parts.map((p) => p.payload)).toEqual([
+      { id: 'a', name: 'list_dir', args: { path: '.' } },
+      { id: 'b', name: 'view_file', args: { path: 'x.ts' } },
+      { id: 'c', name: 'grep', args: { pattern: 'y' } },
+    ]);
+    expect(parts.map((p) => p.sequence)).toEqual([0, 1, 2]);
+  });
+
+  it('returns empty array for empty content + null tool_calls', () => {
+    expect(partsFromAssistantMessage({ content: '', tool_calls: null })).toEqual([]);
+  });
+
+  it('v1.13.1-C: reasoning lands at sequence 0 before text + tool_calls', () => {
+    const tc: ToolCall = { id: 'call_r', name: 'view_file', args: { path: 'x.ts' } };
+    const parts = partsFromAssistantMessage({
+      content: 'inspecting now',
+      tool_calls: [tc],
+      reasoning: 'user asked about x.ts; I should view it',
+    });
+    expect(parts.map((p) => [p.sequence, p.kind])).toEqual([
+      [0, 'reasoning'],
+      [1, 'text'],
+      [2, 'tool_call'],
+    ]);
+    expect(parts[0]!.payload).toEqual({
+      text: 'user asked about x.ts; I should view it',
+    });
+  });
+
+  it('v1.13.1-C: reasoning + empty content + tool_calls preserves seq 0 reasoning', () => {
+    const tc: ToolCall = { id: 'call_r2', name: 'grep', args: { pattern: 'foo' } };
+    const parts = partsFromAssistantMessage({
+      content: '',
+      tool_calls: [tc],
+      reasoning: 'jumping straight to grep',
+    });
+    expect(parts.map((p) => [p.sequence, p.kind])).toEqual([
+      [0, 'reasoning'],
+      [1, 'tool_call'],
+    ]);
+  });
+});
+
+describe('partsFromToolMessage', () => {
+  it('emits a single tool_result part at sequence 0', () => {
+    const tr: ToolResult = {
+      tool_call_id: 'call_1',
+      output: { contents: 'console.log(1)' },
+      truncated: false,
+    };
+    const parts = partsFromToolMessage({ tool_results: tr });
+    expect(parts).toHaveLength(1);
+    expect(parts[0]).toEqual({
+      sequence: 0,
+      kind: 'tool_result',
+      payload: {
+        tool_call_id: 'call_1',
+        output: { contents: 'console.log(1)' },
+        truncated: false,
+      },
+    });
+  });
+
+  it('includes error in payload when present', () => {
+    const tr: ToolResult = {
+      tool_call_id: 'call_2',
+      output: null,
+      truncated: false,
+      error: 'permission denied',
+    };
+    const parts = partsFromToolMessage({ tool_results: tr });
+    expect(parts[0]!.payload).toMatchObject({ error: 'permission denied' });
+  });
+
+  it('returns empty array when tool_results is null', () => {
+    expect(partsFromToolMessage({ tool_results: null })).toEqual([]);
+  });
+});
--- a/apps/server/src/services/tests/prune.test.ts
+++ b/apps/server/src/services/tests/prune.test.ts
@@ -0,0 +1,96 @@
+import { describe, it, expect, beforeEach } from 'vitest';
+import {
+  selectPruneTargets,
+  PROTECTED_TOKENS,
+  PRUNE_TRIGGER_TOKENS,
+  type PartForPrune,
+} from '../inference/prune.js';
+
+// Test fixture: build a tool_result part whose payload size yields a known
+// token estimate (chars/4). The decision logic only cares about
+// JSON.stringify(payload).length, so a string payload of `4n` chars
+// produces exactly `n` tokens.
+let seq = 0;
+function part(tokens: number, createdAt: Date): PartForPrune {
+  seq += 1;
+  // JSON.stringify("xxx...") wraps in quotes (adds 2 chars), so subtract 2
+  // before multiplying. Math.ceil((len+2)/4) needs len ≈ 4*tokens - 2 so the
+  // total stringified length is 4*tokens. Approximate by padding 4 chars per
+  // token; the off-by-one from quotes is small and tests check totals, not
+  // exact per-part counts.
+  const text = 'x'.repeat(tokens * 4 - 2);
+  return { id: `p${seq}`, payload: text, created_at: createdAt };
+}
+
+const T_NOW = new Date('2026-05-22T12:00:00Z');
+function ago(secondsBack: number): Date {
+  return new Date(T_NOW.getTime() - secondsBack * 1000);
+}
+
+describe('selectPruneTargets', () => {
+  beforeEach(() => {
+    seq = 0;
+  });
+
+  it('returns nothing when there are no parts', () => {
+    expect(selectPruneTargets([], null)).toEqual({ ids: [], freedTokens: 0 });
+  });
+
+  it('returns nothing when total tokens are under the protection window', () => {
+    const parts: PartForPrune[] = [
+      part(10_000, ago(10)),
+      part(10_000, ago(20)),
+    ]; // 20k total, all protected
+    expect(selectPruneTargets(parts, null)).toEqual({ ids: [], freedTokens: 0 });
+  });
+
+  it('returns nothing when candidate total is below the prune trigger', () => {
+    // Protection fills with ~40k newest, candidates only ~5k. Below 20k trigger.
+    const parts: PartForPrune[] = [
+      part(20_000, ago(10)),
+      part(20_000, ago(20)),
+      // Past protection; total ~5k won't trigger.
+      part(5_000, ago(30)),
+    ];
+    const result = selectPruneTargets(parts, null);
+    expect(result.ids).toEqual([]);
+    expect(result.freedTokens).toBe(0);
+  });
+
+  it('hides candidates past protection when their total clears the trigger', () => {
+    // Newest 40k protected; older 30k cleanly above the 20k trigger.
+    const parts: PartForPrune[] = [
+      part(20_000, ago(10)),
+      part(20_000, ago(20)),
+      // Past protection, total ~30k freed.
+      part(15_000, ago(30)),
+      part(15_000, ago(40)),
+    ];
+    const result = selectPruneTargets(parts, null);
+    expect(result.ids).toEqual(['p3', 'p4']);
+    expect(result.freedTokens).toBeGreaterThanOrEqual(PRUNE_TRIGGER_TOKENS);
+  });
+
+  it('stops at the compaction summary boundary', () => {
+    // Newest 30k protected (just under PROTECTED_TOKENS=40k); then 30k of
+    // older parts. Boundary sits at ago(35), so the ago(40) part is
+    // beyond it and gets skipped.
+    const parts: PartForPrune[] = [
+      part(15_000, ago(10)),
+      part(15_000, ago(20)),
+      part(15_000, ago(30)), // crosses protection threshold; candidate
+      part(15_000, ago(40)), // beyond summary boundary; skipped
+    ];
+    const tailStart = ago(35);
+    const result = selectPruneTargets(parts, tailStart);
+    // ago(30) is the only candidate inside the window; 15k is below the
+    // 20k trigger so we expect no hides.
+    expect(result.ids).toEqual([]);
+  });
+
+  it('does not prune when only protected parts exist (no candidates)', () => {
+    // Exactly PROTECTED_TOKENS of newest parts; no older candidates.
+    const parts: PartForPrune[] = [part(PROTECTED_TOKENS, ago(10))];
+    expect(selectPruneTargets(parts, null)).toEqual({ ids: [], freedTokens: 0 });
+  });
+});
--- a/apps/server/src/services/tests/system-prompt.test.ts
+++ b/apps/server/src/services/tests/system-prompt.test.ts
@@ -6,7 +6,9 @@ import {
  loadContainerGuidance,
  getContainerGuidance,
  buildSystemPrompt,
+  buildSystemPromptWithFingerprint,
  _resetContainerGuidanceCacheForTests,
+  _resetPrefixObserverForTests,
 } from '../system-prompt.js';
 import type { Agent, Project, Session } from '../../types/api.js';

@@ -17,12 +19,14 @@ let tmpDir: string;
 beforeEach(async () => {
  tmpDir = await mkdtemp(join(tmpdir(), 'system-prompt-test-'));
  _resetContainerGuidanceCacheForTests();
+  _resetPrefixObserverForTests();
  delete process.env['CONTAINER_GUIDANCE_FILE'];
 });

 afterEach(async () => {
  delete process.env['CONTAINER_GUIDANCE_FILE'];
  _resetContainerGuidanceCacheForTests();
+  _resetPrefixObserverForTests();
  await rm(tmpDir, { recursive: true, force: true });
 });

@@ -176,3 +180,75 @@ describe('buildSystemPrompt', () => {
    expect(prompt).not.toContain('--- end container guidance ---');
  });
 });
+
+// v1.13.8: byte-stability instrumentation surface.
+describe('buildSystemPromptWithFingerprint (v1.13.8)', () => {
+  it('returns byte-identical prompts for two consecutive calls with the same inputs', async () => {
+    const path = join(tmpDir, 'BOOCHAT.md');
+    await writeFile(path, 'stable guidance', 'utf8');
+    process.env['CONTAINER_GUIDANCE_FILE'] = path;
+
+    const session = makeSession();
+    const project = makeProject({ path: '/tmp/stable-proj' });
+    const agent = makeAgent({ system_prompt: 'be terse' });
+
+    const first = await buildSystemPromptWithFingerprint(project, session, agent);
+    const second = await buildSystemPromptWithFingerprint(project, session, agent);
+
+    expect(first.prompt).toBe(second.prompt);
+    expect(first.fingerprint.prefix_hash).toBe(second.fingerprint.prefix_hash);
+    expect(first.fingerprint.prefix_length).toBe(second.fingerprint.prefix_length);
+  });
+
+  it('emits drift=null on the first call for a fresh session, then null again when nothing changes', async () => {
+    process.env['CONTAINER_GUIDANCE_FILE'] = join(tmpDir, 'absent.md');
+    const session = makeSession();
+    const project = makeProject({ path: '/tmp/stable-proj' });
+
+    const first = await buildSystemPromptWithFingerprint(project, session, null);
+    expect(first.drift).toBeNull();
+
+    const second = await buildSystemPromptWithFingerprint(project, session, null);
+    expect(second.drift).toBeNull();
+    expect(second.fingerprint.prefix_hash).toBe(first.fingerprint.prefix_hash);
+  });
+
+  it('emits drift with prev/new hashes and a changed_inputs entry when an input mutates', async () => {
+    // Two BOOCHAT.md contents with different mtimes → guidance cache picks
+    // up the change → fingerprint hash flips → drift fires.
+    const path = join(tmpDir, 'BOOCHAT.md');
+    await writeFile(path, 'first', 'utf8');
+    process.env['CONTAINER_GUIDANCE_FILE'] = path;
+
+    const session = makeSession();
+    const project = makeProject({ path: '/tmp/stable-proj' });
+
+    const first = await buildSystemPromptWithFingerprint(project, session, null);
+    expect(first.drift).toBeNull();
+
+    await writeFile(path, 'second — different content', 'utf8');
+    const later = new Date(Date.now() + 60_000);
+    await utimes(path, later, later);
+
+    const second = await buildSystemPromptWithFingerprint(project, session, null);
+    expect(second.drift).not.toBeNull();
+    expect(second.drift!.prev_hash).toBe(first.fingerprint.prefix_hash);
+    expect(second.drift!.new_hash).toBe(second.fingerprint.prefix_hash);
+    expect(second.drift!.prev_hash).not.toBe(second.drift!.new_hash);
+    expect(second.drift!.changed_inputs).toContain('mtime_boochat');
+  });
+
+  it('does not fire drift across distinct sessions even if their hashes differ', async () => {
+    process.env['CONTAINER_GUIDANCE_FILE'] = join(tmpDir, 'absent.md');
+    const sessionA = makeSession({ id: 'sess-A' });
+    const sessionB = makeSession({ id: 'sess-B', system_prompt: 'B-only override' });
+    const project = makeProject({ path: '/tmp/stable-proj' });
+
+    const a = await buildSystemPromptWithFingerprint(project, sessionA, null);
+    const b = await buildSystemPromptWithFingerprint(project, sessionB, null);
+
+    expect(a.drift).toBeNull();
+    expect(b.drift).toBeNull();
+    expect(a.fingerprint.prefix_hash).not.toBe(b.fingerprint.prefix_hash);
+  });
+});
--- a/apps/server/src/services/tests/tool_cost_stats.test.ts
+++ b/apps/server/src/services/tests/tool_cost_stats.test.ts
@@ -0,0 +1,228 @@
+import { describe, it, expect, beforeAll, afterAll } from 'vitest';
+import postgres from 'postgres';
+import { readFileSync } from 'node:fs';
+import { resolve } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+// v1.13.10: integration tests for the tool_cost_stats view. Skipped unless
+// DATABASE_URL is set so they don't break `pnpm test` on a fresh checkout.
+// Run with:
+//   DATABASE_URL=postgres://boocode:<pw>@localhost:5500/boocode pnpm -C apps/server test
+//
+// Isolation: each test uses a unique tool_name suffix derived from a per-test
+// counter. The view aggregates globally across all chats, so without unique
+// tool names parallel test runs would interfere. Cleanup deletes by tool_name
+// suffix in afterAll.
+
+const DB_URL = process.env.DATABASE_URL;
+const describeFn = DB_URL ? describe : describe.skip;
+
+const TEST_RUN_ID = `v13_10_${Date.now()}`;
+const tname = (suffix: string) => `${TEST_RUN_ID}_${suffix}`;
+
+describeFn('tool_cost_stats view (v1.13.10)', () => {
+  let sql: ReturnType<typeof postgres>;
+  let projectId: string;
+  let sessionId: string;
+  let chatId: string;
+
+  beforeAll(async () => {
+    if (!DB_URL) return;
+    sql = postgres(DB_URL, { max: 2, idle_timeout: 5, connect_timeout: 5, onnotice: () => {} });
+
+    // Apply the schema before fixtures so the view exists. Idempotent via
+    // CREATE OR REPLACE VIEW + CREATE TABLE IF NOT EXISTS; safe to run on a
+    // pre-populated DB. Mirrors apps/server/src/db.ts:applySchema.
+    const here = fileURLToPath(import.meta.url);
+    const schemaPath = resolve(here, '../../../schema.sql');
+    const ddl = readFileSync(schemaPath, 'utf8');
+    await sql.unsafe(ddl);
+
+    // Fixture project + session + chat for all inserts in this file.
+    const proj = await sql<{ id: string }[]>`
+      INSERT INTO projects (name, path)
+      VALUES (${`tool_cost_stats_test_${TEST_RUN_ID}`}, ${`/tmp/${TEST_RUN_ID}`})
+      RETURNING id
+    `;
+    projectId = proj[0]!.id;
+    const sess = await sql<{ id: string }[]>`
+      INSERT INTO sessions (project_id, name, model)
+      VALUES (${projectId}, ${'test'}, ${'test-model'})
+      RETURNING id
+    `;
+    sessionId = sess[0]!.id;
+    const chat = await sql<{ id: string }[]>`
+      INSERT INTO chats (session_id, name) VALUES (${sessionId}, ${'test'}) RETURNING id
+    `;
+    chatId = chat[0]!.id;
+  });
+
+  afterAll(async () => {
+    if (!DB_URL) return;
+    // Project FK CASCADE cleans sessions/chats/messages/parts in one shot.
+    await sql`DELETE FROM projects WHERE id = ${projectId}`;
+    await sql.end({ timeout: 5 });
+  });
+
+  async function insertAssistantTurn(opts: {
+    toolNames: string[];
+    tokensUsed: number | null;
+    ctxUsed: number | null;
+    status?: 'streaming' | 'complete' | 'failed' | 'cancelled';
+    metadata?: { kind: string } | null;
+    createdAt?: Date;
+  }): Promise<string> {
+    const toolCalls = opts.toolNames.map((name, i) => ({
+      id: `call_${TEST_RUN_ID}_${name}_${i}`,
+      name,
+      args: {},
+    }));
+    const created = opts.createdAt ?? new Date();
+    const rows = await sql<{ id: string }[]>`
+      INSERT INTO messages (
+        session_id, chat_id, role, content, kind, status,
+        tool_calls, tokens_used, ctx_used,
+        metadata, created_at
+      )
+      VALUES (
+        ${sessionId}, ${chatId}, 'assistant', '', 'message',
+        ${opts.status ?? 'complete'},
+        ${sql.json(toolCalls as never)},
+        ${opts.tokensUsed},
+        ${opts.ctxUsed},
+        ${opts.metadata ? sql.json(opts.metadata as never) : null},
+        ${created}
+      )
+      RETURNING id
+    `;
+    return rows[0]!.id;
+  }
+
+  it('returns empty when no tool calls exist for a tool name', async () => {
+    const t = tname('absent');
+    const stats = await sql<{ tool_name: string }[]>`
+      SELECT * FROM tool_cost_stats WHERE tool_name = ${t}
+    `;
+    expect(stats).toEqual([]);
+  });
+
+  it('attributes single-tool turn fully to that tool', async () => {
+    const t = tname('single');
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 300, ctxUsed: 15000 });
+    const stats = await sql<{
+      tool_name: string;
+      prompt_tokens_sum: number;
+      completion_tokens_sum: number;
+      n_calls: number;
+    }[]>`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
+    expect(stats[0]).toMatchObject({
+      tool_name: t,
+      prompt_tokens_sum: 15000,
+      completion_tokens_sum: 300,
+      n_calls: 1,
+    });
+  });
+
+  it('splits multi-tool turn equally across tools', async () => {
+    const a = tname('multi_a');
+    const b = tname('multi_b');
+    const c = tname('multi_c');
+    // 3 tools, 300 completion / 15000 prompt → each gets 100 / 5000
+    await insertAssistantTurn({ toolNames: [a, b, c], tokensUsed: 300, ctxUsed: 15000 });
+    const stats = await sql<{
+      tool_name: string;
+      prompt_tokens_sum: number;
+      completion_tokens_sum: number;
+      n_calls: number;
+    }[]>`
+      SELECT * FROM tool_cost_stats
+      WHERE tool_name IN (${a}, ${b}, ${c})
+      ORDER BY tool_name
+    `;
+    expect(stats).toHaveLength(3);
+    for (const s of stats) {
+      expect(s.completion_tokens_sum).toBe(100);
+      expect(s.prompt_tokens_sum).toBe(5000);
+      expect(s.n_calls).toBe(1);
+    }
+  });
+
+  it('limits to last 100 calls per tool (FIFO window)', async () => {
+    const t = tname('window');
+    // Insert 110 turns with monotonically-increasing created_at and tokensUsed.
+    // Expect view to keep only the most recent 100.
+    const base = Date.now() + 1_000_000; // distant future to avoid colliding with other tests
+    for (let i = 1; i <= 110; i++) {
+      await insertAssistantTurn({
+        toolNames: [t],
+        tokensUsed: i, // 1..110
+        ctxUsed: i * 10,
+        createdAt: new Date(base + i),
+      });
+    }
+    const [stat] = await sql<{
+      n_calls: number;
+      completion_tokens_sum: number;
+    }[]>`SELECT n_calls, completion_tokens_sum FROM tool_cost_stats WHERE tool_name = ${t}`;
+    expect(stat!.n_calls).toBe(100);
+    // Last 100 are tokensUsed=11..110, sum = (11+110)*100/2 = 6050.
+    expect(stat!.completion_tokens_sum).toBe(6050);
+  });
+
+  it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
+    const t = tname('null_tokens');
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: null, ctxUsed: 1000 });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: null });
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
+    expect(stats).toEqual([]);
+  });
+
+  it('excludes failed/cancelled turns and cap_hit/doom_loop sentinel rows', async () => {
+    const t = tname('filtered');
+    // A: status='failed'                              — excluded
+    // B: status='cancelled'                           — excluded
+    // C: status='complete', metadata={kind:'cap_hit'} — excluded
+    // D: status='complete', metadata={kind:'doom_loop'} — excluded
+    // E: status='complete', metadata=null             — included
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'failed' });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'cancelled' });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'cap_hit' } });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'doom_loop' } });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: null });
+    const [stat] = await sql<{ n_calls: number }[]>`
+      SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
+    `;
+    expect(stat!.n_calls).toBe(1);
+  });
+
+  it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
+    const t = tname('parts');
+    // Insert an assistant row with messages.tool_calls=NULL but a
+    // message_parts row carrying the tool_call. The view reads via
+    // messages_with_parts, which COALESCEs the parts table over the legacy
+    // column — so this row should still aggregate.
+    const rows = await sql<{ id: string }[]>`
+      INSERT INTO messages (
+        session_id, chat_id, role, content, kind, status,
+        tool_calls, tokens_used, ctx_used
+      )
+      VALUES (
+        ${sessionId}, ${chatId}, 'assistant', '', 'message', 'complete',
+        NULL, 200, 5000
+      )
+      RETURNING id
+    `;
+    const messageId = rows[0]!.id;
+    await sql`
+      INSERT INTO message_parts (message_id, sequence, kind, payload)
+      VALUES (
+        ${messageId}, 0, 'tool_call',
+        ${sql.json({ id: `tc_parts_${TEST_RUN_ID}`, name: t, args: {} } as never)}
+      )
+    `;
+    const [stat] = await sql<{ n_calls: number }[]>`
+      SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
+    `;
+    expect(stat!.n_calls).toBe(1);
+  });
+});
--- a/apps/server/src/services/tests/tools.test.ts
+++ b/apps/server/src/services/tests/tools.test.ts
@@ -0,0 +1,76 @@
+import { describe, it, expect } from 'vitest';
+import {
+  ALL_TOOLS,
+  CORE_TOOL_NAMES,
+  STANDARD_TOOL_NAMES,
+  TOOLS_BY_NAME,
+  resolveToolTier,
+} from '../tools.js';
+
+describe('ALL_TOOLS registry', () => {
+  // v1.13.3: tools must be alpha-sorted at module load. llama.cpp's prompt
+  // cache hits on byte-identical prefixes; the tool list lives near the
+  // top of the system prompt, so any order drift invalidates every cached
+  // turn. The registry sort is the single source of truth; downstream
+  // helpers (toolJsonSchemas, TOOLS_BY_NAME, buildAiTools) inherit it.
+  it('exports tools in alphabetical order by name', () => {
+    const names = ALL_TOOLS.map((t) => t.name);
+    expect(names).toEqual([...names].sort((a, b) => a.localeCompare(b)));
+  });
+});
+
+describe('resolveToolTier (v1.13.15-tools)', () => {
+  it('returns CORE tools for tier=core', () => {
+    expect(resolveToolTier('core')).toEqual(CORE_TOOL_NAMES);
+  });
+
+  it('returns STANDARD tools for tier=standard', () => {
+    const result = resolveToolTier('standard');
+    expect(result.length).toBe(STANDARD_TOOL_NAMES.length);
+    expect(result.length).toBeGreaterThan(CORE_TOOL_NAMES.length);
+    // STANDARD is a strict superset of CORE.
+    expect(result).toEqual(expect.arrayContaining([...CORE_TOOL_NAMES]));
+  });
+
+  it('returns ALL tool names for tier=all', () => {
+    expect(resolveToolTier('all').length).toBe(ALL_TOOLS.length);
+  });
+
+  it('defaults to all when env var is undefined', () => {
+    expect(resolveToolTier(undefined).length).toBe(ALL_TOOLS.length);
+  });
+
+  it('is case-insensitive', () => {
+    expect(resolveToolTier('CORE')).toEqual(CORE_TOOL_NAMES);
+    expect(resolveToolTier('Standard').length).toBe(STANDARD_TOOL_NAMES.length);
+  });
+
+  it('falls back to all for unknown tier strings', () => {
+    expect(resolveToolTier('bogus').length).toBe(ALL_TOOLS.length);
+  });
+});
+
+describe('CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validation', () => {
+  // The module-load validation in tools.ts throws if a tier references a
+  // tool that doesn't exist in TOOLS_BY_NAME. These tests double-check that
+  // invariant from the consumer side so a future tier-list edit can't smuggle
+  // in a typo without a test failure.
+  it('every CORE name exists in TOOLS_BY_NAME', () => {
+    for (const name of CORE_TOOL_NAMES) {
+      expect(TOOLS_BY_NAME[name], `CORE references unknown tool '${name}'`).toBeDefined();
+    }
+  });
+
+  it('every STANDARD name exists in TOOLS_BY_NAME', () => {
+    for (const name of STANDARD_TOOL_NAMES) {
+      expect(TOOLS_BY_NAME[name], `STANDARD references unknown tool '${name}'`).toBeDefined();
+    }
+  });
+
+  it('CORE is a subset of STANDARD', () => {
+    const standardSet = new Set<string>(STANDARD_TOOL_NAMES);
+    for (const name of CORE_TOOL_NAMES) {
+      expect(standardSet.has(name), `'${name}' is in CORE but not STANDARD`).toBe(true);
+    }
+  });
+});
--- a/apps/server/src/services/tests/truncate.test.ts
+++ b/apps/server/src/services/tests/truncate.test.ts
@@ -0,0 +1,104 @@
+// v1.13.5: truncate.ts unit coverage. Each test isolates TRUNCATION_DIR
+// under os.tmpdir() so concurrent vitest runs don't collide and the suite
+// stays self-cleaning. cleanupTruncations is covered by file-system half
+// only; the orphan-reap branch needs a real Postgres and is tested via the
+// smoke flow rather than vitest.
+import { afterEach, beforeAll, describe, expect, it, vi } from 'vitest';
+import { promises as fs } from 'fs';
+import path from 'path';
+import os from 'os';
+
+// Set the env var BEFORE importing the module so its module-load constant
+// reads the test directory rather than /tmp/boocode-truncations.
+const testDir = path.join(os.tmpdir(), `boocode-truncate-test-${process.pid}-${Date.now()}`);
+process.env.BOOCODE_TRUNCATION_DIR = testDir;
+
+const mod = await import('../truncate.js');
+const { storeTruncation, readTruncation, truncateIfNeeded, MAX_TRUNCATION_BYTES } = mod;
+
+beforeAll(async () => {
+  await fs.mkdir(testDir, { recursive: true });
+});
+
+afterEach(async () => {
+  // Drop every file between tests so id-collision asserts and orphan-style
+  // counts start from zero.
+  const entries = await fs.readdir(testDir).catch(() => [] as string[]);
+  await Promise.all(entries.map((n) => fs.unlink(path.join(testDir, n)).catch(() => {})));
+});
+
+describe('storeTruncation / readTruncation roundtrip', () => {
+  it('writes and reads identical content', async () => {
+    const original = 'hello\nworld\n' + 'x'.repeat(500);
+    const id = await storeTruncation(original);
+    expect(id).toMatch(/^tr_[0-9a-v]{12}$/);
+    const got = await readTruncation(id);
+    expect(got).toBe(original);
+  });
+
+  it('readTruncation returns null for unknown ids', async () => {
+    const got = await readTruncation('tr_000000000000');
+    expect(got).toBeNull();
+  });
+
+  it('readTruncation rejects malformed ids (returns null, never escapes dir)', async () => {
+    // Path traversal attempt; readTruncation should not even try to open.
+    const got = await readTruncation('../../etc/passwd');
+    expect(got).toBeNull();
+  });
+});
+
+describe('truncateIfNeeded', () => {
+  it('returns sliced content with no outputPath when wasTruncated=false', async () => {
+    const out = await truncateIfNeeded({
+      fullContent: 'irrelevant',
+      slicedContent: 'visible',
+      wasTruncated: false,
+    });
+    expect(out).toEqual({ content: 'visible', truncated: false });
+    expect('outputPath' in out).toBe(false);
+  });
+
+  it('stashes full content and returns outputPath when wasTruncated=true', async () => {
+    const full = 'line1\nline2\nline3\nline4\n';
+    const sliced = 'line1\nline2\n[truncated]';
+    const out = await truncateIfNeeded({
+      fullContent: full,
+      slicedContent: sliced,
+      wasTruncated: true,
+    });
+    expect(out.content).toBe(sliced);
+    expect(out.truncated).toBe(true);
+    expect(out.outputPath).toMatch(/^tr_[0-9a-v]{12}$/);
+    const stashed = await readTruncation(out.outputPath!);
+    expect(stashed).toBe(full);
+  });
+
+  it('skips storage but still reports truncated when fullContent exceeds the cap', async () => {
+    // Build content larger than MAX_TRUNCATION_BYTES. Use a Buffer to size
+    // it without holding a literal that triggers the gigantic-string lint.
+    const oversized = Buffer.alloc(MAX_TRUNCATION_BYTES + 1, 'x').toString('utf8');
+    const sliced = 'preview...';
+    const out = await truncateIfNeeded({
+      fullContent: oversized,
+      slicedContent: sliced,
+      wasTruncated: true,
+    });
+    expect(out).toEqual({ content: sliced, truncated: true });
+    expect('outputPath' in out).toBe(false);
+  });
+
+  it('storage failure surfaces as truncated without outputPath', async () => {
+    // Force writeFile to throw. Spy at the fs module level since truncate.ts
+    // imports { promises as fs } and storeTruncation calls fs.writeFile.
+    const spy = vi.spyOn(fs, 'writeFile').mockRejectedValueOnce(new Error('disk full'));
+    const out = await truncateIfNeeded({
+      fullContent: 'short',
+      slicedContent: 'sliced',
+      wasTruncated: true,
+    });
+    expect(out).toEqual({ content: 'sliced', truncated: true });
+    expect('outputPath' in out).toBe(false);
+    spy.mockRestore();
+  });
+});
--- a/apps/server/src/services/tests/web_tools.test.ts
+++ b/apps/server/src/services/tests/web_tools.test.ts
@@ -295,9 +295,10 @@ describe('executeWebFetch — size + truncation', () => {
    // 1.5M U+1F600 emojis: each is length 2 in UTF-16 (surrogate pair) and
    // 4 bytes in UTF-8. body.length = 3,000,000 chars (~2.86 MiB by
    // UTF-16 count) but Buffer.byteLength = 6,000,000 bytes (>5 MiB).
-    // Pre-fix the char-count comparison let this through; the byte-count
-    // check now rejects. No Content-Length header so the pre-flight
-    // guard doesn't fire — we're testing the POST-consumption check.
+    // v1.11.10: streaming reader catches this as body_too_large (was
+    // response_too_large in the post-consumption check). No
+    // Content-Length header so the pre-flight pass and the streaming
+    // path is the one that rejects.
    const heavy = '😀'.repeat(1_500_000);
    const fakeFetch = vi.fn().mockResolvedValue(
      new Response(heavy, { status: 200, headers: { 'content-type': 'text/plain' } }),
@@ -308,9 +309,8 @@ describe('executeWebFetch — size + truncation', () => {
    );
    expect('error' in result).toBe(true);
    if ('error' in result) {
-      expect(result.error).toBe('response_too_large');
-      // Error reason should reference bytes, not character count.
-      expect(result.reason).toMatch(/bytes/);
+      expect(result.error).toBe('body_too_large');
+      expect(result.reason).toMatch(/exceeded/);
    }
  });

@@ -453,3 +453,138 @@ describe('executeWebFetch — redirect handling', () => {
    expect(fakeFetch.mock.calls[1]![0]).toBe('https://example.com/foo');
  });
 });
+
+// ============================================================================
+// v1.11.10: streaming body cap — abort the response stream at MAX_BYTES
+// ============================================================================
+
+// MAX_BYTES is 5 * 1024 * 1024 = 5_242_880. Repeating this here (rather
+// than importing) so a change to the cap surfaces as a test failure —
+// the limit is part of the public contract.
+const MAX_BYTES_TEST = 5 * 1024 * 1024;
+
+// Build a Response whose body is a real ReadableStream. Uses pull() (not
+// start()) so chunks are produced lazily — without backpressure, an
+// unbounded start() enqueues everything and calls controller.close()
+// before the consumer reads, which means a subsequent reader.cancel()
+// finds the stream already closed and the cancel callback never fires.
+// `cancelFlag` lets the test observe whether reader.cancel() reached the
+// underlying source mid-stream.
+function streamedResponse(
+  chunks: Uint8Array[],
+  init: { contentType?: string; contentLength?: number | null; cancelFlag?: { cancelled: boolean } } = {},
+): Response {
+  let idx = 0;
+  const stream = new ReadableStream({
+    pull(controller) {
+      if (idx >= chunks.length) {
+        controller.close();
+        return;
+      }
+      controller.enqueue(chunks[idx]!);
+      idx += 1;
+    },
+    cancel() {
+      if (init.cancelFlag) init.cancelFlag.cancelled = true;
+    },
+  });
+  const headers: Record<string, string> = {};
+  if (init.contentType) headers['content-type'] = init.contentType;
+  if (init.contentLength !== undefined && init.contentLength !== null) {
+    headers['content-length'] = String(init.contentLength);
+  }
+  return new Response(stream, { status: 200, headers });
+}
+
+describe('executeWebFetch — streaming body cap (v1.11.10)', () => {
+  it('aborts the stream when a server lies about Content-Length and emits over the cap', async () => {
+    // Honest header would have failed the pre-flight check. The lie is
+    // the point: pre-flight passes (100 < 5MB) and the streaming reader
+    // has to be the thing that catches the oversized body.
+    //
+    // Chunk count is deliberately higher than what the reader will
+    // consume (10 × 1MB available, but the reader will cancel after ~6
+    // chunks land it over 5MB). That headroom keeps the stream in
+    // 'readable' state at the moment reader.cancel() runs — otherwise
+    // a pull-then-close race could make the source close the stream
+    // before cancel reaches it, and the cancel() callback wouldn't fire.
+    const oneMB = new Uint8Array(1024 * 1024).fill(65); // 'A'
+    const tenMBInChunks = Array.from({ length: 10 }, () => oneMB);
+    const cancelFlag = { cancelled: false };
+    const fakeFetch = vi.fn().mockResolvedValue(
+      streamedResponse(tenMBInChunks, {
+        contentType: 'text/plain',
+        contentLength: 100,
+        cancelFlag,
+      }),
+    );
+    const result = await executeWebFetch(
+      { url: 'https://example.com/lying-server' },
+      fakeFetch as unknown as typeof fetch,
+    );
+    expect('error' in result).toBe(true);
+    if ('error' in result) {
+      expect(result.error).toBe('body_too_large');
+      expect(result.reason).toMatch(/exceeded/);
+    }
+    // Critical: reader.cancel() actually fired so the underlying
+    // connection / stream got released. Otherwise the abort would be
+    // notional and the server could keep streaming.
+    expect(cancelFlag.cancelled).toBe(true);
+  });
+
+  it('catches an oversized stream when Content-Length is omitted entirely', async () => {
+    // Many real servers (chunked transfer-encoding, dynamic responses)
+    // never send Content-Length. The pre-flight check has nothing to
+    // gate on; the streaming reader is the only line of defense.
+    // 10 chunks vs the ~6 the reader will consume — same headroom
+    // rationale as the lying-Content-Length test above.
+    const oneMB = new Uint8Array(1024 * 1024).fill(66); // 'B'
+    const tenMBInChunks = Array.from({ length: 10 }, () => oneMB);
+    const fakeFetch = vi.fn().mockResolvedValue(
+      streamedResponse(tenMBInChunks, { contentType: 'text/plain' }),
+    );
+    const result = await executeWebFetch(
+      { url: 'https://example.com/no-length' },
+      fakeFetch as unknown as typeof fetch,
+    );
+    expect('error' in result && result.error).toBe('body_too_large');
+  });
+
+  it('passes a multi-chunk body that totals just under the cap', async () => {
+    // Boundary case: MAX_BYTES - 1 bytes split across N chunks. The
+    // streaming reader's `total > maxBytes` check is strict-greater so
+    // exactly MAX_BYTES would still succeed; MAX_BYTES + 1 would fail.
+    // - 1 leaves clear headroom without coinciding with the boundary.
+    const targetTotal = MAX_BYTES_TEST - 1;
+    const chunkSize = 256 * 1024; // 256 KiB chunks
+    const chunks: Uint8Array[] = [];
+    let remaining = targetTotal;
+    while (remaining > 0) {
+      const size = Math.min(chunkSize, remaining);
+      chunks.push(new Uint8Array(size).fill(67)); // 'C'
+      remaining -= size;
+    }
+    const fakeFetch = vi.fn().mockResolvedValue(
+      streamedResponse(chunks, { contentType: 'text/plain' }),
+    );
+    const result = await executeWebFetch(
+      { url: 'https://example.com/right-at-cap' },
+      fakeFetch as unknown as typeof fetch,
+    );
+    // The streaming reader succeeded — we got a content shape, not an
+    // error. (Downstream truncate() will clamp the final string to
+    // MAX_CHARS_CAP=32000 and set truncated:true; that's the existing
+    // truncation logic and is exercised by its own test. The point of
+    // THIS test is that readBodyCapped didn't trip on a body that
+    // sits just under its byte limit.)
+    expect('content' in result).toBe(true);
+    if ('content' in result) {
+      expect(result.content.length).toBeGreaterThan(0);
+      // All ASCII 'C's, so the leading 200 chars before any truncation
+      // marker should be all C — proves we read real bytes through the
+      // streaming reader rather than getting an empty buffer.
+      expect(result.content.slice(0, 200)).toBe('C'.repeat(200));
+    }
+  });
+});
--- a/apps/server/src/services/tests/ws-frames.test.ts
+++ b/apps/server/src/services/tests/ws-frames.test.ts
@@ -0,0 +1,218 @@
+import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
+import { readFileSync } from 'node:fs';
+import { resolve } from 'node:path';
+import { fileURLToPath } from 'node:url';
+import {
+  WsFrameSchema,
+  KNOWN_FRAME_TYPES,
+  type WsFrame,
+} from '../../types/ws-frames.js';
+import { createBroker } from '../broker.js';
+
+const VALID_UUID_A = '00000000-0000-0000-0000-000000000001';
+const VALID_UUID_B = '00000000-0000-0000-0000-000000000002';
+const VALID_UUID_C = '00000000-0000-0000-0000-000000000003';
+const VALID_TIMESTAMP = '2026-05-22T14:30:00.000Z';
+
+describe('WsFrameSchema (v1.13.11-a)', () => {
+  it('accepts a well-formed chat_status frame', () => {
+    const result = WsFrameSchema.safeParse({
+      type: 'chat_status',
+      chat_id: VALID_UUID_A,
+      status: 'streaming',
+      at: VALID_TIMESTAMP,
+    });
+    expect(result.success).toBe(true);
+  });
+
+  it('rejects an unknown frame type', () => {
+    const result = WsFrameSchema.safeParse({
+      type: 'cosmic_ray_strike',
+      chat_id: VALID_UUID_A,
+    });
+    expect(result.success).toBe(false);
+  });
+
+  it('rejects a chat_status frame with invalid status enum', () => {
+    // v1.12.1 dropped the legacy 'working' status. Any frame still emitting it
+    // should fail validation — that's a drift catcher.
+    const result = WsFrameSchema.safeParse({
+      type: 'chat_status',
+      chat_id: VALID_UUID_A,
+      status: 'working',
+      at: VALID_TIMESTAMP,
+    });
+    expect(result.success).toBe(false);
+  });
+
+  it('rejects a UUID field with a non-UUID string', () => {
+    const result = WsFrameSchema.safeParse({
+      type: 'chat_status',
+      chat_id: 'not-a-uuid',
+      status: 'idle',
+      at: VALID_TIMESTAMP,
+    });
+    expect(result.success).toBe(false);
+  });
+
+  it('rejects negative token counts in usage frame', () => {
+    const result = WsFrameSchema.safeParse({
+      type: 'usage',
+      message_id: VALID_UUID_A,
+      chat_id: VALID_UUID_B,
+      completion_tokens: -1,
+      ctx_used: 100,
+      ctx_max: 1000,
+    });
+    expect(result.success).toBe(false);
+  });
+
+  it('accepts a usage frame with nullable token counts (pre-v1.13.7 history)', () => {
+    const result = WsFrameSchema.safeParse({
+      type: 'usage',
+      message_id: VALID_UUID_A,
+      chat_id: VALID_UUID_B,
+      completion_tokens: null,
+      ctx_used: null,
+      ctx_max: null,
+    });
+    expect(result.success).toBe(true);
+  });
+
+  it('accepts a tool_result frame with non-UUID tool_call_id (model-emitted)', () => {
+    // Model-emitted tool_call_ids look like "call_abc123", not UUIDs.
+    const result = WsFrameSchema.safeParse({
+      type: 'tool_result',
+      tool_message_id: VALID_UUID_A,
+      chat_id: VALID_UUID_B,
+      tool_call_id: 'call_abc123',
+      output: { whatever: true },
+      truncated: false,
+    });
+    expect(result.success).toBe(true);
+  });
+
+  it('accepts a compacted frame', () => {
+    const result = WsFrameSchema.safeParse({
+      type: 'compacted',
+      session_id: VALID_UUID_A,
+      chat_id: VALID_UUID_B,
+      summary_message_id: VALID_UUID_C,
+    });
+    expect(result.success).toBe(true);
+  });
+
+  it('accepts a session_workspace_updated frame', () => {
+    const result = WsFrameSchema.safeParse({
+      type: 'session_workspace_updated',
+      session_id: VALID_UUID_A,
+      workspace_panes: [{ id: 'p1', kind: 'chat', chatIds: [], activeChatIdx: 0 }],
+    });
+    expect(result.success).toBe(true);
+  });
+
+  it('every KNOWN_FRAME_TYPES entry has a discriminated branch', () => {
+    // Probe each known type by attempting a minimal valid construction.
+    // Failure here means the union and the KNOWN_FRAME_TYPES list drifted.
+    for (const type of KNOWN_FRAME_TYPES) {
+      const probe = WsFrameSchema.safeParse({ type, __dummy__: true });
+      // We expect FAILURE on every type because we're missing required fields,
+      // but the failure must be ABOUT the missing fields, not about an unknown
+      // type. A "Invalid discriminator value" error means the type isn't in
+      // the union — that's a drift.
+      if (probe.success) continue;
+      const issues = probe.error.issues;
+      const hasInvalidDiscriminator = issues.some(
+        (i) => i.code === 'invalid_union_discriminator',
+      );
+      expect(hasInvalidDiscriminator, `frame type '${type}' is missing from the discriminated union`).toBe(false);
+    }
+  });
+});
+
+describe('ws-frames.ts file mirror parity', () => {
+  it('apps/server and apps/web copies are byte-identical', () => {
+    const here = fileURLToPath(import.meta.url);
+    const serverPath = resolve(here, '../../../types/ws-frames.ts');
+    const webPath = resolve(here, '../../../../../web/src/api/ws-frames.ts');
+    const serverContent = readFileSync(serverPath, 'utf8');
+    const webContent = readFileSync(webPath, 'utf8');
+    expect(webContent, 'apps/web/src/api/ws-frames.ts must be byte-identical to apps/server/src/types/ws-frames.ts').toBe(serverContent);
+  });
+});
+
+describe('broker.publishFrame / publishUserFrame fail-closed behavior', () => {
+  let logErrors: Array<{ obj: unknown; msg: string }>;
+  let mockLog: Parameters<typeof createBroker>[0];
+
+  beforeEach(() => {
+    logErrors = [];
+    mockLog = {
+      error: (obj: unknown, msg: string) => {
+        logErrors.push({ obj, msg });
+      },
+      info: () => {},
+      warn: () => {},
+      debug: () => {},
+      trace: () => {},
+      fatal: () => {},
+      child: () => mockLog as never,
+      level: 'info',
+      silent: () => {},
+    } as unknown as Parameters<typeof createBroker>[0];
+  });
+
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
+  it('publishFrame delivers a valid frame to subscribers', () => {
+    const broker = createBroker(mockLog);
+    const received: WsFrame[] = [];
+    broker.subscribe('sess-1', (f) => received.push(f as WsFrame));
+    broker.publishFrame('sess-1', {
+      type: 'delta',
+      message_id: VALID_UUID_A,
+      chat_id: VALID_UUID_B,
+      content: 'hello',
+    });
+    expect(received).toHaveLength(1);
+    expect((received[0] as { type: string }).type).toBe('delta');
+    expect(logErrors).toHaveLength(0);
+  });
+
+  it('publishFrame drops + logs an invalid frame instead of delivering it', () => {
+    const broker = createBroker(mockLog);
+    const received: WsFrame[] = [];
+    broker.subscribe('sess-1', (f) => received.push(f as WsFrame));
+    broker.publishFrame('sess-1', {
+      type: 'delta',
+      message_id: 'not-a-uuid',
+      content: 'hello',
+    } as never);
+    expect(received).toHaveLength(0);
+    expect(logErrors).toHaveLength(1);
+    expect(logErrors[0]!.msg).toMatch(/ws-frame-validation-failed/);
+  });
+
+  it('publishUserFrame drops + logs an invalid user-channel frame', () => {
+    const broker = createBroker(mockLog);
+    const received: WsFrame[] = [];
+    broker.subscribeUser('default', (f) => received.push(f as WsFrame));
+    broker.publishUserFrame('default', {
+      type: 'chat_status',
+      chat_id: VALID_UUID_A,
+      status: 'working', // v1.12.1 dropped this enum value
+      at: VALID_TIMESTAMP,
+    } as never);
+    expect(received).toHaveLength(0);
+    expect(logErrors).toHaveLength(1);
+  });
+
+  it('publishFrame validation failure does not throw (no cascade into stream-phase)', () => {
+    const broker = createBroker(mockLog);
+    expect(() =>
+      broker.publishFrame('sess-1', { type: 'unknown_type' } as never),
+    ).not.toThrow();
+  });
+});
--- a/apps/server/src/services/agents.ts
+++ b/apps/server/src/services/agents.ts
@@ -1,7 +1,7 @@
 import { promises as fs } from 'node:fs';
 import { join } from 'node:path';
 import type { Agent, AgentsResponse, AgentParseError } from '../types/api.js';
-import { ALL_TOOLS } from './tools.js';
+import { ALL_TOOLS, resolveToolTier } from './tools.js';

 // v1.8.1: global agents live at /data/AGENTS.md inside the container
 // (./data:/data:ro mount on the host). Per-project AGENTS.md at the project
@@ -186,11 +186,14 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
    throw new Error(fmErrors.join('; '));
  }

+  // v1.13.15-tools: intersect with BOOCODE_TOOLS tier (ceiling, not expansion).
+  // Unset → resolveToolTier returns ALL tool names → no narrowing.
+  const tierAllowed = new Set(resolveToolTier(process.env.BOOCODE_TOOLS));
  const filteredTools = Array.isArray(fm.tools)
    ? fm.tools.filter((t): t is string =>
-        (ALL_TOOL_NAMES as readonly string[]).includes(t),
+        (ALL_TOOL_NAMES as readonly string[]).includes(t) && tierAllowed.has(t),
      )
-    : DEFAULT_TOOLS;
+    : DEFAULT_TOOLS.filter((t) => tierAllowed.has(t));

  return {
    id: slugify(section.name),
@@ -252,6 +255,22 @@ export function invalidateAgentsCache(projectPath?: string): void {
  }
 }

+// v1.13.8: cache-read accessor for the system-prompt prefix-fingerprint log.
+// Returns the AGENTS.md mtimes that getAgentsForProject() observed on its
+// last cache fill for this projectPath. Both fields are null when the cache
+// is cold (e.g. tests, fresh boot before the first inference turn). Does no
+// I/O — a fresh stat would race the cache and isn't what the fingerprint
+// wants anyway (we want what was actually used to resolve the agent).
+export function getAgentsMtimes(projectPath: string): {
+  global: number | null;
+  project: number | null;
+} {
+  const key = projectPath || '__none__';
+  const entry = cache.get(key);
+  if (!entry) return { global: null, project: null };
+  return { global: entry.globalMtime, project: entry.projectMtime };
+}
+
 async function safeStat(path: string): Promise<number | null> {
  try {
    const s = await fs.stat(path);
--- a/apps/server/src/services/auto_name.ts
+++ b/apps/server/src/services/auto_name.ts
@@ -1,4 +1,4 @@
-import type { InferenceContext } from './inference.js';
+import type { InferenceContext } from './inference/index.js';

 const NAMING_SYSTEM_PROMPT =
  'You name chat sessions. Reply directly with no thinking, reasoning, or explanation. Output ONLY the title, 4 words max, no quotes, no punctuation, no prefix like "Title:".';
--- a/apps/server/src/services/broker.ts
+++ b/apps/server/src/services/broker.ts
@@ -1,3 +1,6 @@
+import type { FastifyBaseLogger } from 'fastify';
+import { WsFrameSchema, type WsFrame } from '../types/ws-frames.js';
+
 export type Frame = Record<string, unknown> & { type: string };
 export type Listener = (frame: Frame) => void;

@@ -6,9 +9,15 @@ export interface Broker {
  subscribe(sessionId: string, listener: Listener): () => void;
  publishUser(user: string, frame: Frame): void;
  subscribeUser(user: string, listener: Listener): () => void;
+  // v1.13.11-a: typed publish wrappers. Validate against WsFrameSchema and
+  // delegate to publish / publishUser on success; log + drop on failure
+  // (fail-closed). Existing publish / publishUser callers stay legal — they
+  // get converted to the typed variant in v1.13.11-b.
+  publishFrame(sessionId: string, frame: WsFrame): void;
+  publishUserFrame(user: string, frame: WsFrame): void;
 }

-export function createBroker(): Broker {
+export function createBroker(log?: FastifyBaseLogger): Broker {
  const topics = new Map<string, Set<Listener>>();
  const userTopics = new Map<string, Set<Listener>>();

@@ -39,6 +48,28 @@ export function createBroker(): Broker {
    };
  }

+  // v1.13.11-a: shared validation guard. Returns the parsed/typed frame on
+  // success, or null on failure (after logging). Brief mandates fail-closed
+  // semantics: invalid frames don't reach subscribers; throwing here could
+  // cascade into stream-phase aborts which v1.13.7 already had to defend
+  // against, so log + drop is the right shape.
+  function validate(channel: 'session' | 'user', key: string, frame: WsFrame): WsFrame | null {
+    const parsed = WsFrameSchema.safeParse(frame);
+    if (parsed.success) return parsed.data;
+    const frameType = (frame as { type?: unknown })?.type;
+    const errors = parsed.error.flatten();
+    if (log) {
+      log.error(
+        { channel, key, frame_type: frameType, errors },
+        'ws-frame-validation-failed: dropping invalid frame',
+      );
+    } else {
+      // Fallback for callers that didn't pass a logger (e.g. unit tests).
+      console.error('ws-frame-validation-failed', { channel, key, frame_type: frameType, errors });
+    }
+    return null;
+  }
+
  return {
    publish(sessionId, frame) {
      publishTo(topics, sessionId, frame);
@@ -52,5 +83,15 @@ export function createBroker(): Broker {
    subscribeUser(user, listener) {
      return subscribeTo(userTopics, user, listener);
    },
+    publishFrame(sessionId, frame) {
+      const valid = validate('session', sessionId, frame);
+      if (!valid) return;
+      publishTo(topics, sessionId, valid as Frame);
+    },
+    publishUserFrame(user, frame) {
+      const valid = validate('user', user, frame);
+      if (!valid) return;
+      publishTo(userTopics, user, valid as Frame);
+    },
  };
 }
--- a/apps/server/src/services/codecontext_client.ts
+++ b/apps/server/src/services/codecontext_client.ts
@@ -17,6 +17,7 @@
 //      which we re-surface with a hint to add the file to .codecontextignore.

 import { realpath } from 'node:fs/promises';
+import { truncateIfNeeded } from './truncate.js';

 export interface CodecontextRequest {
  toolName: string;
@@ -27,6 +28,9 @@ export interface CodecontextRequest {
 export interface CodecontextResponse {
  result: string;
  truncated: boolean;
+  // v1.13.5: optional opaque id pointing at the full pre-slice content on
+  // tmpfs. Set when truncated=true and storage succeeded.
+  outputPath?: string;
 }

 const CODECONTEXT_BASE_URL = process.env['CODECONTEXT_URL'] ?? 'http://codecontext:8080';
@@ -105,13 +109,22 @@ export async function callCodecontext(

  // Step 4: inline truncation. The model gets a clear hint about how to
  // narrow the next call rather than a silent cut. Mirrors web_fetch.ts.
+  // v1.13.5: stash the full body on tmpfs when truncating so the model can
+  // retrieve more via view_truncated_output(id).
  if (body.result.length > TRUNCATION_LIMIT) {
    const truncated = body.result.slice(0, TRUNCATION_LIMIT);
    const omitted = body.result.length - TRUNCATION_LIMIT;
+    const slicedWithMarker =
+      `${truncated}\n\n[truncated, ${omitted} chars omitted; narrow with file_path, file_type, or limit]`;
+    const wrapped = await truncateIfNeeded({
+      fullContent: body.result,
+      slicedContent: slicedWithMarker,
+      wasTruncated: true,
+    });
    return {
-      result:
-        `${truncated}\n\n[truncated, ${omitted} chars omitted; narrow with file_path, file_type, or limit]`,
-      truncated: true,
+      result: wrapped.content,
+      truncated: wrapped.truncated,
+      ...(wrapped.outputPath ? { outputPath: wrapped.outputPath } : {}),
    };
  }
  return { result: body.result, truncated: false };
--- a/apps/server/src/services/compaction.ts
+++ b/apps/server/src/services/compaction.ts
@@ -23,7 +23,13 @@ import type { Broker } from './broker.js';
 import { SUMMARY_TEMPLATE } from './compaction-prompt.js';
 import * as modelContextLookup from './model-context.js';

-const COMPACTION_BUFFER = 20_000;
+// v1.13.9: ratio-only overflow trigger. Fires compaction at 85% of ctx_max
+// (opencode session/overflow.ts pattern). Replaces the v1.11.0-era
+// `ctx_max - 20_000` formula which degenerated to 0 for contexts ≤20k and
+// gave only 7-8% headroom to the summarizer at 262k. Ratio gives consistent
+// 15% headroom at any scale, and small-ctx models no longer get an
+// effectively-disabled trigger.
+const EARLY_TRIGGER_RATIO = 0.85;
 const MIN_PRESERVE_RECENT_TOKENS = 2_000;
 const MAX_PRESERVE_RECENT_TOKENS = 8_000;
 const DEFAULT_TAIL_TURNS = 2;
@@ -39,19 +45,24 @@ export interface CompactionMessage {
  status: 'streaming' | 'complete' | 'failed' | 'cancelled';
  tool_calls: Array<{ id: string; name: string; args: Record<string, unknown> }> | null;
  tool_results: { tool_call_id: string; output: unknown; truncated: boolean; error?: string } | null;
+  // v1.13.6: reasoning_parts captured by v1.13.1-C and read back through
+  // messages_with_parts. Embedded into the head-assembly payload as prose so
+  // the summarizer LLM sees what the model was reasoning through when it
+  // chose its tool calls.
+  reasoning_parts: Array<{ text: string }> | null;
  metadata: { kind?: string } | null;
  created_at: string;
 }

 // === overflow ===

-// Tokens we hold in reserve for the model's response so a near-full context
-// can still produce a useful turn. Mirrors opencode's COMPACTION_BUFFER.
-// Returns 0 when the context limit is unknown (caller treats 0 as "do not
-// trigger overflow"); avoids dividing-by-zero downstream.
+// Returns the token budget at which overflow fires. Triggers compaction at
+// 85% of contextLimit (opencode session/overflow.ts pattern). Returns 0 when
+// the context limit is unknown — caller treats 0 as "do not trigger overflow",
+// keeping inference flowing rather than compacting a turn we can't size.
 export function usable(contextLimit: number): number {
  if (!contextLimit || contextLimit <= 0) return 0;
-  return Math.max(0, contextLimit - COMPACTION_BUFFER);
+  return Math.floor(EARLY_TRIGGER_RATIO * contextLimit);
 }

 export interface Usage {
@@ -197,7 +208,8 @@ export function buildPrompt(
 // would silently drop pre-legacy-compact history before the LLM sees it.
 // Compaction wants to send the entire head, full stop.) ===

-interface OpenAiMessage {
+// v1.13.6: exported for unit-test access (reasoning render coverage).
+export interface OpenAiMessage {
  role: 'system' | 'user' | 'assistant' | 'tool';
  content: string | null;
  tool_calls?: Array<{
@@ -212,7 +224,8 @@ function isCapHitSentinel(m: CompactionMessage): boolean {
  return m.role === 'system' && m.metadata != null && m.metadata.kind === 'cap_hit';
 }

-function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] {
+// v1.13.6: exported for unit-test access (reasoning render coverage).
+export function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] {
  const out: OpenAiMessage[] = [];
  for (const m of head) {
    if (isCapHitSentinel(m)) continue;
@@ -243,9 +256,22 @@ function buildHeadPayload(head: CompactionMessage[]): OpenAiMessage[] {
      continue;
    }
    if (m.role === 'assistant') {
+      // v1.13.6: embed reasoning text as prose prefixed onto the assistant
+      // content. OpenAI wire shape doesn't carry reasoning as a structured
+      // field, but the summarizer is reading text — a tagged prose block
+      // gives it the same signal. We mirror the AI SDK ReasoningPart shape
+      // by using a <reasoning>...</reasoning> wrapper so the summarizer can
+      // distinguish reasoning from user-visible answer.
+      let body = m.content && m.content.length > 0 ? m.content : '';
+      if (m.reasoning_parts && m.reasoning_parts.length > 0) {
+        const reasoning = m.reasoning_parts.map((r) => r.text).join('');
+        body = body.length > 0
+          ? `<reasoning>${reasoning}</reasoning>\n\n${body}`
+          : `<reasoning>${reasoning}</reasoning>`;
+      }
      const msg: OpenAiMessage = {
        role: 'assistant',
-        content: m.content && m.content.length > 0 ? m.content : null,
+        content: body.length > 0 ? body : null,
      };
      if (m.tool_calls && m.tool_calls.length > 0) {
        msg.tool_calls = m.tool_calls.map((tc) => ({
@@ -342,9 +368,14 @@ export async function process(input: ProcessInput): Promise<void> {
  // 2. All currently-active messages in this chat (compacted_at IS NULL).
  // ORDER BY (created_at, id) matches loadContext in inference.ts so the
  // turns() boundary logic sees the same sequence the LLM will.
+  // v1.13.1-B: reads tool_calls/tool_results via the parts-merged view so
+  // the compaction payload matches what the LLM saw on the original turn.
+  // v1.13.6: also pulls reasoning_parts (added in v1.13.1-C) so summaries
+  // capture what the model was working through before each tool call.
  const messages = await sql<CompactionMessage[]>`
-    SELECT id, role, content, kind, summary, status, tool_calls, tool_results, metadata, created_at
-    FROM messages
+    SELECT id, role, content, kind, summary, status, tool_calls, tool_results,
+           reasoning_parts, metadata, created_at
+    FROM messages_with_parts
    WHERE chat_id = ${chatId} AND compacted_at IS NULL
    ORDER BY created_at ASC, id ASC
  `;
--- a/apps/server/src/services/inference.ts
+++ b/apps/server/src/services/inference.ts
--- a/apps/server/src/services/inference/budget.ts
+++ b/apps/server/src/services/inference/budget.ts
@@ -0,0 +1,25 @@
+import type { Agent } from '../../types/api.js';
+import { READ_ONLY_TOOL_NAMES } from '../tools.js';
+
+// v1.8.2: tool-call budget defaults. Resolved per-turn by resolveToolBudget.
+//   - Agent with explicit max_tool_calls: that value.
+//   - Agent with read-only-only tools:    BUDGET_READ_ONLY (30).
+//   - Agent with any non-read-only tool:  BUDGET_NON_READ_ONLY (10).
+//   - No agent (raw chat):                BUDGET_NO_AGENT (30).
+// v1.13.7: bumped BUDGET_NO_AGENT 15→30 to match BUDGET_READ_ONLY. Every tool
+// in ALL_TOOLS today is read-only (see services/tools.ts comment at
+// READ_ONLY_TOOL_NAMES); the cautious 15-cap was a forward-looking guard for
+// write tools that haven't landed yet. No-agent mode gets the same toolset as
+// an all-read-only agent at runtime, so they should share the same budget.
+export const BUDGET_READ_ONLY = 30;
+export const BUDGET_NON_READ_ONLY = 10;
+export const BUDGET_NO_AGENT = 30;
+
+const READ_ONLY_SET: ReadonlySet<string> = new Set(READ_ONLY_TOOL_NAMES);
+
+export function resolveToolBudget(agent: Agent | null): number {
+  if (agent?.max_tool_calls != null) return agent.max_tool_calls;
+  if (!agent) return BUDGET_NO_AGENT;
+  const allReadOnly = agent.tools.every((t) => READ_ONLY_SET.has(t));
+  return allReadOnly ? BUDGET_READ_ONLY : BUDGET_NON_READ_ONLY;
+}
--- a/apps/server/src/services/inference/error-handler.ts
+++ b/apps/server/src/services/inference/error-handler.ts
@@ -0,0 +1,167 @@
+import type { MessageMetadata, Session } from '../../types/api.js';
+import * as modelContext from '../model-context.js';
+import { maybeFlagForCompaction } from './payload.js';
+import { insertParts, partsFromAssistantMessage } from './parts.js';
+import type { InferenceContext, StreamResult, TurnArgs } from './turn.js';
+
+export async function handleAbortOrError(
+  ctx: InferenceContext,
+  args: TurnArgs,
+  accumulated: string,
+  err: unknown
+): Promise<void> {
+  const { sessionId, chatId, assistantMessageId } = args;
+  const isAbort = err instanceof Error && err.name === 'AbortError';
+  const finalStatus = isAbort ? 'cancelled' : 'failed';
+  const errMsg = err instanceof Error ? err.message : String(err);
+  // v1.8.2: persist a structured error metadata blob on genuine failures so
+  // the bubble can render the reason on reload without re-deriving from the
+  // (one-shot) WS error frame. User-initiated abort skips this — there's no
+  // "reason" to surface for a stop the user already explicitly chose.
+  const errorMetadata: MessageMetadata | null = isAbort
+    ? null
+    : { kind: 'error', error_reason: 'llm_provider_error', error_text: errMsg };
+  if (errorMetadata) {
+    await ctx.sql`
+      UPDATE messages
+      SET status = ${finalStatus},
+          content = ${accumulated},
+          finished_at = clock_timestamp(),
+          metadata = ${ctx.sql.json(errorMetadata as never)}
+      WHERE id = ${assistantMessageId}
+    `;
+  } else {
+    await ctx.sql`
+      UPDATE messages
+      SET status = ${finalStatus},
+          content = ${accumulated},
+          finished_at = clock_timestamp()
+      WHERE id = ${assistantMessageId}
+    `;
+  }
+  const [failSessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
+    UPDATE sessions SET updated_at = clock_timestamp()
+    WHERE id = ${sessionId}
+    RETURNING project_id, name, updated_at
+  `;
+  ctx.publishUser({ type: 'session_updated', session_id: sessionId, project_id: failSessRow!.project_id, name: failSessRow!.name, updated_at: failSessRow!.updated_at });
+  // v1.8 mobile-tabs: cancellation is a user-initiated stop, treat as idle;
+  // genuine errors flip the dot red. v1.8.2: error path also carries a
+  // machine-readable `reason` so the UI can render specifics inline.
+  if (isAbort) {
+    // v1.12.1: defensive cancellation write. The status=${finalStatus} UPDATE
+    // above already sets 'cancelled' for the AbortError case, but a row can
+    // leak as 'streaming' when the abort fires between the post-tool-phase
+    // INSERT (executeToolPhase) and the next runAssistantTurn's stream setup,
+    // bypassing the try/catch around executeStreamPhase. The status guard
+    // makes this a no-op when the earlier write already landed.
+    await ctx.sql`
+      UPDATE messages
+      SET status = 'cancelled', content = ${accumulated}, finished_at = clock_timestamp()
+      WHERE id = ${args.assistantMessageId} AND status = 'streaming'
+    `;
+    ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
+    ctx.publish(sessionId, {
+      type: 'message_complete',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+    });
+    ctx.log.info({ sessionId, chatId, assistantMessageId }, 'inference cancelled');
+  } else {
+    ctx.publishUser({
+      type: 'chat_status',
+      chat_id: chatId,
+      status: 'error',
+      at: new Date().toISOString(),
+      reason: 'llm_provider_error',
+    });
+    ctx.publish(sessionId, {
+      type: 'error',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      error: errMsg,
+      reason: 'llm_provider_error',
+    });
+    ctx.log.error({ err, sessionId, assistantMessageId }, 'inference failed');
+  }
+}
+
+export async function finalizeCompletion(
+  ctx: InferenceContext,
+  args: TurnArgs,
+  result: StreamResult,
+  startedAt: string | null,
+  session: Session
+): Promise<void> {
+  const { sessionId, chatId, assistantMessageId } = args;
+  const { content, finishReason, promptTokens, completionTokens } = result;
+
+  // v1.11.3: see executeToolPhase for the rationale.
+  const mctx = await modelContext.getModelContext(session.model);
+  const nCtx = mctx?.n_ctx ?? null;
+
+  const [updated] = await ctx.sql<
+    { tokens_used: number | null; ctx_used: number | null; ctx_max: number | null; finished_at: string | null }[]
+  >`
+    UPDATE messages
+    SET content = ${content},
+        status = 'complete',
+        tokens_used = ${completionTokens},
+        ctx_used = ${promptTokens},
+        ctx_max = ${nCtx},
+        finished_at = clock_timestamp()
+    WHERE id = ${assistantMessageId}
+    RETURNING tokens_used, ctx_used, ctx_max, finished_at
+  `;
+  // v1.13.0: dual-write the text part. finalizeCompletion is the terminal
+  // path for text-only assistant turns (no tool calls); tool_calls are null
+  // here by construction (the tool-bearing path goes through executeToolPhase).
+  // v1.13.1-C: include result.reasoning so reasoning-channel models capture
+  // a kind='reasoning' part alongside the text.
+  // TODO(v1.13.1): wrap the UPDATE above and this insertParts in a single
+  // sql.begin before flipping read authority to message_parts.
+  await insertParts(
+    ctx.sql,
+    partsFromAssistantMessage({
+      content,
+      tool_calls: null,
+      reasoning: result.reasoning,
+    }).map((p) => ({
+      ...p,
+      message_id: assistantMessageId,
+    })),
+  );
+  // v1.11: flag for compaction on the terminal turn too. Catches the common
+  // case of a turn that hit the limit without invoking tools.
+  await maybeFlagForCompaction(ctx, chatId, updated);
+  const [completeSessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
+    UPDATE sessions SET updated_at = clock_timestamp()
+    WHERE id = ${sessionId}
+    RETURNING project_id, name, updated_at
+  `;
+  ctx.publishUser({ type: 'session_updated', session_id: sessionId, project_id: completeSessRow!.project_id, name: completeSessRow!.name, updated_at: completeSessRow!.updated_at });
+  ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
+  ctx.publish(sessionId, {
+    type: 'message_complete',
+    message_id: assistantMessageId,
+    chat_id: chatId,
+    tokens_used: updated?.tokens_used ?? null,
+    ctx_used: updated?.ctx_used ?? null,
+    ctx_max: updated?.ctx_max ?? null,
+    started_at: startedAt,
+    finished_at: updated?.finished_at ?? null,
+    model: session.model,
+  });
+  ctx.log.info(
+    {
+      sessionId,
+      chatId,
+      assistantMessageId,
+      finishReason,
+      chars: content.length,
+      tokens_used: updated?.tokens_used,
+      ctx_used: updated?.ctx_used,
+    },
+    'inference complete'
+  );
+}
--- a/apps/server/src/services/inference/index.ts
+++ b/apps/server/src/services/inference/index.ts
@@ -0,0 +1,20 @@
+// v1.12.4: re-export shim. Outside callers (apps/server/src/index.ts and the
+// vitest inference tests) import from './services/inference/index.js'. The
+// directory is now the public surface; turn.ts holds runAssistantTurn /
+// runInference / createInferenceRunner while the other inference/*.ts files
+// stay implementation-private.
+
+export {
+  createInferenceRunner,
+  runAssistantTurn,
+  runInference,
+} from './turn.js';
+export type {
+  FramePublisher,
+  InferenceContext,
+  InferenceFrame,
+  StreamResult,
+  TurnArgs,
+} from './turn.js';
+export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js';
+export { buildMessagesPayload } from './payload.js';
--- a/apps/server/src/services/inference/parts.ts
+++ b/apps/server/src/services/inference/parts.ts
@@ -0,0 +1,95 @@
+import type { Sql } from '../../db.js';
+import type { ToolCall, ToolResult } from '../../types/api.js';
+
+// v1.13.0: dual-write helper. Every site that writes the legacy
+// messages.tool_calls / messages.tool_results JSON columns calls into here
+// to mirror the same data into message_parts rows. Reads still go to the
+// JSON columns; the swap to parts-as-source-of-truth happens in a later
+// v1.13 dispatch alongside the AI SDK streamText migration.
+
+export type PartKind = 'text' | 'tool_call' | 'tool_result' | 'reasoning' | 'step_start';
+
+export interface PartInsert {
+  message_id: string;
+  sequence: number;
+  kind: PartKind;
+  payload: unknown;
+}
+
+export async function insertParts(sql: Sql, parts: PartInsert[]): Promise<void> {
+  if (parts.length === 0) return;
+  // postgres-js fans out an array of objects to a multi-row INSERT. Each
+  // payload field needs sql.json() so jsonb storage receives a JSON value
+  // rather than a quoted string.
+  await sql`
+    INSERT INTO message_parts ${sql(
+      parts.map((p) => ({
+        message_id: p.message_id,
+        sequence: p.sequence,
+        kind: p.kind,
+        payload: sql.json(p.payload as never),
+      })),
+      'message_id',
+      'sequence',
+      'kind',
+      'payload',
+    )}
+  `;
+}
+
+// Derive parts from the canonical messages row for an assistant message.
+// reasoning (when non-empty) becomes a 'reasoning' part at sequence 0 —
+// it precedes user-visible content logically. content (when non-empty)
+// becomes a 'text' part next; each tool_call becomes a 'tool_call' part
+// with payload { id, name, args } where args is the parsed object (we
+// use the in-memory ToolCall shape, not the OpenAI stringified one).
+export function partsFromAssistantMessage(args: {
+  content: string;
+  tool_calls: ToolCall[] | null;
+  // v1.13.1-C: optional reasoning text streamed alongside the answer.
+  // Most rows have none — only models with separate reasoning channels
+  // (qwen3.6 etc.) populate this.
+  reasoning?: string;
+}): Omit<PartInsert, 'message_id'>[] {
+  const out: Omit<PartInsert, 'message_id'>[] = [];
+  let seq = 0;
+  if (args.reasoning && args.reasoning.length > 0) {
+    out.push({ sequence: seq, kind: 'reasoning', payload: { text: args.reasoning } });
+    seq += 1;
+  }
+  if (args.content && args.content.length > 0) {
+    out.push({ sequence: seq, kind: 'text', payload: { text: args.content } });
+    seq += 1;
+  }
+  for (const tc of args.tool_calls ?? []) {
+    out.push({
+      sequence: seq,
+      kind: 'tool_call',
+      payload: { id: tc.id, name: tc.name, args: tc.args },
+    });
+    seq += 1;
+  }
+  return out;
+}
+
+// Derive a single tool_result part from a tool message's tool_results JSON.
+// The payload includes the same shape that buildMessagesPayload reads from
+// later: tool_call_id, output, optional error/truncated metadata.
+export function partsFromToolMessage(args: {
+  tool_results: ToolResult | null;
+}): Omit<PartInsert, 'message_id'>[] {
+  if (!args.tool_results) return [];
+  const tr = args.tool_results;
+  return [
+    {
+      sequence: 0,
+      kind: 'tool_result',
+      payload: {
+        tool_call_id: tr.tool_call_id,
+        output: tr.output,
+        truncated: tr.truncated,
+        ...(tr.error ? { error: tr.error } : {}),
+      },
+    },
+  ];
+}
--- a/apps/server/src/services/inference/payload.ts
+++ b/apps/server/src/services/inference/payload.ts
@@ -0,0 +1,226 @@
+import type { FastifyBaseLogger } from 'fastify';
+import type { Sql } from '../../db.js';
+import type {
+  Agent,
+  Message,
+  Project,
+  Session,
+} from '../../types/api.js';
+import * as compaction from '../compaction.js';
+import { buildSystemPromptWithFingerprint } from '../system-prompt.js';
+import { isAnySentinel } from './sentinels.js';
+import { PRUNE_TRIGGER_TOKENS, prune } from './prune.js';
+import type { InferenceContext } from './turn.js';
+
+export interface OpenAiMessage {
+  role: 'system' | 'user' | 'assistant' | 'tool';
+  content: string | null;
+  tool_calls?: Array<{
+    id: string;
+    type: 'function';
+    function: { name: string; arguments: string };
+  }>;
+  tool_call_id?: string;
+  // v1.13.1-C: reasoning text from a prior assistant turn, sourced from
+  // message_parts kind='reasoning' rows joined in via reasoning_parts on
+  // the messages_with_parts view. stream-phase.ts/toModelMessages threads
+  // this into the AI SDK ReasoningPart when forwarding to the model so
+  // reasoning models can resume mid-thought across tool-call boundaries.
+  reasoning?: string;
+}
+
+// v1.12: buildSystemPrompt lives in services/system-prompt.ts. It awaits the
+// container-guidance loader, so this function is async too and every call
+// site in inference.ts awaits the result.
+// v1.13.8: optional log argument. When provided, emit prefix-fingerprint
+// per call + prefix-drift when the same session sees a hash change. Tests
+// omit it and exercise the byte-stability surface directly through
+// buildSystemPromptWithFingerprint. The observer Map in system-prompt.ts
+// updates regardless of whether log is passed.
+export async function buildMessagesPayload(
+  session: Session,
+  project: Project,
+  history: Message[],
+  agent: Agent | null = null,
+  log?: FastifyBaseLogger,
+): Promise<OpenAiMessage[]> {
+  const out: OpenAiMessage[] = [];
+  const { prompt: systemPrompt, fingerprint, drift } =
+    await buildSystemPromptWithFingerprint(project, session, agent);
+  if (log) {
+    log.info(fingerprint);
+    if (drift) log.warn(drift);
+  }
+  out.push({ role: 'system', content: systemPrompt });
+
+  // Find the latest compact marker — only send messages from that point onwards
+  let startIdx = 0;
+  for (let i = history.length - 1; i >= 0; i--) {
+    if (history[i]!.kind === 'compact') {
+      startIdx = i;
+      break;
+    }
+  }
+
+  for (let i = startIdx; i < history.length; i++) {
+    const m = history[i]!;
+    if (m.kind === 'compact') {
+      out.push({ role: 'system', content: m.content });
+      continue;
+    }
+    // v1.8.2 / v1.11.6: cap-hit and doom-loop sentinels are UI-only — never
+    // send them to the LLM. The synthetic instruction note lives only inside
+    // the summary call's messages array and is never persisted, so on a
+    // follow-up turn the model resumes with a clean context.
+    if (isAnySentinel(m)) continue;
+    if (m.role === 'assistant' && m.status === 'streaming') continue;
+    if (m.role === 'assistant' && m.status === 'cancelled') continue;
+    // v1.13.7: skip failed assistant turns. A failed row carries no usable
+    // content for the model, and leaving it in the payload alongside any
+    // following assistant message produces "Cannot have 2 or more assistant
+    // messages at the end of the list" from the OpenAI-compatible upstream.
+    if (m.role === 'assistant' && m.status === 'failed') continue;
+    // v1.13.7: skip "empty" completed assistants — clen=0 + no tool_calls.
+    // These can land when an upstream stream returns finishReason='stop' with
+    // no text/tool output (network blip, rate limit recovery, model quirk).
+    // Same risk as the failed-status case: a trailing empty assistant plus
+    // the next attempt's assistant placeholder = two trailing assistants and
+    // the API rejects the whole payload.
+    if (
+      m.role === 'assistant' &&
+      m.status === 'complete' &&
+      (m.content == null || m.content.trim().length === 0) &&
+      (m.tool_calls == null || m.tool_calls.length === 0)
+    ) {
+      continue;
+    }
+    if (m.role === 'tool') {
+      const tr = m.tool_results;
+      if (!tr) continue;
+      const outputText = tr.error
+        ? `error: ${tr.error}`
+        : typeof tr.output === 'string'
+          ? tr.output
+          : JSON.stringify(tr.output);
+      out.push({
+        role: 'tool',
+        content: outputText,
+        tool_call_id: tr.tool_call_id,
+      });
+      continue;
+    }
+    if (m.role === 'assistant') {
+      const msg: OpenAiMessage = {
+        role: 'assistant',
+        content: m.content && m.content.length > 0 ? m.content : null,
+      };
+      if (m.tool_calls && m.tool_calls.length > 0) {
+        msg.tool_calls = m.tool_calls.map((tc) => ({
+          id: tc.id,
+          type: 'function' as const,
+          function: { name: tc.name, arguments: JSON.stringify(tc.args) },
+        }));
+      }
+      // v1.13.1-C: collapse reasoning_parts into a single string. The view
+      // returns them ordered by sequence; multiple reasoning parts on one
+      // message are rare but concat preserves ordering. Skip when absent.
+      if (m.reasoning_parts && m.reasoning_parts.length > 0) {
+        msg.reasoning = m.reasoning_parts.map((p) => p.text ?? '').join('');
+      }
+      out.push(msg);
+      continue;
+    }
+    out.push({ role: 'user', content: m.content });
+  }
+  return out;
+}
+
+export async function loadContext(
+  sql: Sql,
+  sessionId: string,
+  chatId: string
+): Promise<{ session: Session; project: Project; history: Message[] } | null> {
+  const sessionRows = await sql<Session[]>`
+    SELECT id, project_id, name, model, system_prompt, status, created_at, updated_at,
+           agent_id, web_search_enabled
+    FROM sessions WHERE id = ${sessionId}
+  `;
+  if (sessionRows.length === 0) return null;
+  const session = sessionRows[0]!;
+
+  const projectRows = await sql<Project[]>`
+    SELECT id, name, path, added_at, last_session_id, status, gitea_remote,
+           default_system_prompt, default_web_search_enabled
+    FROM projects WHERE id = ${session.project_id}
+  `;
+  if (projectRows.length === 0) return null;
+  const project = projectRows[0]!;
+
+  // v1.11: filter compacted messages out of the inference assembly. The GET
+  // /api/sessions/:id/messages endpoint still returns everything (so the UI
+  // can show history with the summary card inline); only LLM payloads skip
+  // compacted rows. compacted_at IS NULL keeps the active summary + tail.
+  // v1.13.1-B: reads tool_calls/tool_results via the parts-merged view.
+  // v1.13.1-C: also pull reasoning_parts so assistant messages from
+  // reasoning models can be replayed with their reasoning context preserved.
+  const history = await sql<Message[]>`
+    SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
+           tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata,
+           reasoning_parts
+    FROM messages_with_parts
+    WHERE chat_id = ${chatId} AND compacted_at IS NULL
+    ORDER BY created_at ASC, id ASC
+  `;
+
+  return { session, project, history };
+}
+
+// v1.11: shared helper used after both finalizeCompletion and executeToolPhase
+// persist their token counts. Reads tokens off the just-UPDATEd row (which
+// the caller returns from RETURNING), runs compaction.isOverflow, and flips
+// chats.needs_compaction. The next runAssistantTurn invocation acts on it.
+// Silent on missing tokens — llama-swap occasionally omits usage on truncated
+// streams, and we'd rather miss one overflow than crash the inference path.
+export async function maybeFlagForCompaction(
+  ctx: InferenceContext,
+  chatId: string,
+  updated: { tokens_used: number | null; ctx_used: number | null; ctx_max: number | null } | undefined,
+): Promise<void> {
+  if (!updated) return;
+  const promptTokens = updated.ctx_used;
+  const completionTokens = updated.tokens_used;
+  const contextLimit = updated.ctx_max;
+  if (typeof promptTokens !== 'number') return;
+  if (typeof completionTokens !== 'number') return;
+  if (typeof contextLimit !== 'number') return;
+  const overflow = compaction.isOverflow(
+    { prompt_tokens: promptTokens, completion_tokens: completionTokens },
+    contextLimit,
+  );
+  if (!overflow) return;
+
+  // v1.13.4: try the cheap prune first. If it freed at least
+  // PRUNE_TRIGGER_TOKENS (20k) worth of context, we're below the threshold
+  // again — skip flagging summarize for the next turn. The next turn's
+  // overflow check will re-evaluate from scratch.
+  // v1.13.9: the overflow trigger above is now 85% of ctx_max (was
+  // ctx_max - 20k). PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed
+  // threshold — independent of the overflow formula.
+  // Prune failures (DB errors etc.) propagate so the surrounding inference
+  // path sees them; the catch in finalizeCompletion / executeToolPhase
+  // doesn't shield this — by design, we want to know if prune is broken.
+  const pruned = await prune({ sql: ctx.sql, chatId });
+  if (pruned.hidden > 0) {
+    ctx.log.info(
+      { chatId, hidden: pruned.hidden, freedTokens: pruned.freedTokens },
+      'inference: prune freed context budget',
+    );
+  }
+  if (pruned.freedTokens >= PRUNE_TRIGGER_TOKENS) {
+    // Prune handled it; skip the (expensive) summarize path.
+    return;
+  }
+
+  await ctx.sql`UPDATE chats SET needs_compaction = true WHERE id = ${chatId}`;
+  ctx.log.info({ chatId, promptTokens, completionTokens, contextLimit }, 'inference: flagged for compaction');
+}
--- a/apps/server/src/services/inference/provider.ts
+++ b/apps/server/src/services/inference/provider.ts
@@ -0,0 +1,34 @@
+import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
+import type { LanguageModel } from 'ai';
+
+// v1.13.1-A: AI SDK provider against llama-swap. baseURL is threaded from
+// config.LLAMA_SWAP_URL at call time (not module-load) so tests can stub the
+// upstream without touching env vars. No apiKey — llama-swap is unauth in our
+// Tailscale topology and exposing it over the public internet is gated by
+// Authelia at the Caddy layer, not by API keys.
+
+const cache = new Map<string, ReturnType<typeof createOpenAICompatible>>();
+
+function getProvider(baseURL: string): ReturnType<typeof createOpenAICompatible> {
+  let provider = cache.get(baseURL);
+  if (!provider) {
+    provider = createOpenAICompatible({
+      name: 'llama-swap',
+      baseURL: baseURL.endsWith('/v1') ? baseURL : `${baseURL}/v1`,
+      // v1.13.7: @ai-sdk/openai-compatible defaults includeUsage=false, which
+      // omits `stream_options.include_usage` from the request body. Without
+      // it, llama.cpp / llama-swap never emits the trailing usage block, so
+      // `result.usage` resolves with inputTokens=outputTokens=undefined and
+      // tokens_used / ctx_used land as NULL in every messages row. Setting
+      // true here re-enables the per-stream usage payload across all models
+      // served via the llama-swap provider.
+      includeUsage: true,
+    });
+    cache.set(baseURL, provider);
+  }
+  return provider;
+}
+
+export function upstreamModel(baseURL: string, modelId: string): LanguageModel {
+  return getProvider(baseURL).chatModel(modelId);
+}
--- a/apps/server/src/services/inference/prune.ts
+++ b/apps/server/src/services/inference/prune.ts
@@ -0,0 +1,127 @@
+import type { Sql } from '../../db.js';
+
+// v1.13.4: two-tier compaction prune. Opencode's prune half (the cheap one);
+// summarize half shipped in v1.11.0 as services/compaction.ts.
+//
+// Algorithm: scan tool_result parts newest-first. Protect the last
+// PROTECTED_TOKENS of content (the model recently saw these — pruning them
+// kills coherence). Older parts are candidates. Mark them hidden_at only
+// if the candidate pool would free at least PRUNE_TRIGGER_TOKENS — pruning
+// 3 small tool_results to recover 500 tokens isn't worth the loss of
+// fidelity for the model's next turn.
+//
+// Stops at the last compaction summary boundary (chats.tail_start_id). The
+// v1.11.0 summary already encodes everything before that point; pruning
+// across the boundary would double-erase.
+
+export const PROTECTED_TOKENS = 40_000;
+export const PRUNE_TRIGGER_TOKENS = 20_000;
+
+// Rough char-to-token estimate. Same heuristic compaction's usable() uses
+// implicitly via the buffer constant.
+function estimateTokens(text: string): number {
+  return Math.ceil(text.length / 4);
+}
+
+function payloadTokens(payload: unknown): number {
+  return estimateTokens(JSON.stringify(payload ?? ''));
+}
+
+export interface PruneResult {
+  hidden: number;
+  freedTokens: number;
+}
+
+// Pure algorithmic core, exported for unit-test access. Takes parts already
+// ordered newest-first, plus an optional cutoff (last compaction summary
+// boundary). Returns the part ids to hide and the total token estimate of
+// the candidates. Caller does the DB UPDATE.
+export interface PartForPrune {
+  id: string;
+  payload: unknown;
+  created_at: Date;
+}
+
+export function selectPruneTargets(
+  partsNewestFirst: ReadonlyArray<PartForPrune>,
+  tailStartCreatedAt: Date | null,
+): { ids: string[]; freedTokens: number } {
+  let protectedTokens = 0;
+  const candidates: { id: string; tokens: number }[] = [];
+  let crossedProtection = false;
+
+  for (const part of partsNewestFirst) {
+    if (tailStartCreatedAt && part.created_at < tailStartCreatedAt) {
+      // Past the last summary boundary; the v1.11.0 anchored summary already
+      // covers everything older. Bail rather than double-erase.
+      break;
+    }
+    const tokens = payloadTokens(part.payload);
+    if (!crossedProtection) {
+      protectedTokens += tokens;
+      if (protectedTokens >= PROTECTED_TOKENS) {
+        crossedProtection = true;
+      }
+      continue;
+    }
+    candidates.push({ id: part.id, tokens });
+  }
+
+  const candidateTokens = candidates.reduce((s, c) => s + c.tokens, 0);
+  if (candidates.length === 0 || candidateTokens < PRUNE_TRIGGER_TOKENS) {
+    return { ids: [], freedTokens: 0 };
+  }
+  return { ids: candidates.map((c) => c.id), freedTokens: candidateTokens };
+}
+
+export async function prune(args: {
+  sql: Sql;
+  chatId: string;
+}): Promise<PruneResult> {
+  const { sql, chatId } = args;
+
+  // Newest-first scan of visible tool_result parts in this chat. Pull
+  // chats.tail_start_id alongside so we know where the last summary boundary
+  // sits (don't prune across it).
+  const parts = await sql<{
+    id: string;
+    payload: unknown;
+    created_at: Date;
+    tail_start_id: string | null;
+  }[]>`
+    SELECT p.id, p.payload, m.created_at,
+      (SELECT c.tail_start_id FROM chats c WHERE c.id = ${chatId}) AS tail_start_id
+    FROM message_parts p
+    JOIN messages m ON m.id = p.message_id
+    WHERE m.chat_id = ${chatId}
+      AND p.kind = 'tool_result'
+      AND p.hidden_at IS NULL
+    ORDER BY m.created_at DESC, p.sequence DESC
+  `;
+
+  if (parts.length === 0) {
+    return { hidden: 0, freedTokens: 0 };
+  }
+
+  // Read the boundary cutoff timestamp once. Older messages are off-limits.
+  let tailStartCreatedAt: Date | null = null;
+  const firstTailId = parts[0]?.tail_start_id ?? null;
+  if (firstTailId) {
+    const tailRow = await sql<{ created_at: Date }[]>`
+      SELECT created_at FROM messages WHERE id = ${firstTailId}
+    `;
+    tailStartCreatedAt = tailRow[0]?.created_at ?? null;
+  }
+
+  const decision = selectPruneTargets(parts, tailStartCreatedAt);
+  if (decision.ids.length === 0) {
+    return { hidden: 0, freedTokens: 0 };
+  }
+
+  await sql`
+    UPDATE message_parts
+    SET hidden_at = clock_timestamp()
+    WHERE id = ANY(${decision.ids})
+  `;
+  return { hidden: decision.ids.length, freedTokens: decision.freedTokens };
+}
--- a/apps/server/src/services/inference/sentinel-summaries.ts
+++ b/apps/server/src/services/inference/sentinel-summaries.ts
@@ -0,0 +1,523 @@
+import type {
+  Agent,
+  Message,
+  MessageMetadata,
+  Project,
+  Session,
+} from '../../types/api.js';
+import * as modelContext from '../model-context.js';
+import { buildMessagesPayload } from './payload.js';
+import { DOOM_LOOP_THRESHOLD } from './sentinels.js';
+import { streamCompletion } from './stream-phase.js';
+import { DB_FLUSH_INTERVAL_MS } from './types.js';
+import type {
+  InferenceContext,
+  StreamResult,
+  TurnArgs,
+} from './turn.js';
+
+// Synthetic system note appended to the cap-hit summary call. Verbatim from
+// the v1.8.2 spec — do not paraphrase: the model is more reliable when the
+// instruction is short, declarative, and identical across calls.
+const CAP_HIT_SUMMARY_NOTE = (limit: number) =>
+  `You've reached the tool budget (${limit} calls). Produce the best answer you can with what you have. Do not call more tools.`;
+
+const DOOM_LOOP_NOTE = (name: string) =>
+  `You called ${name} with the same arguments ${DOOM_LOOP_THRESHOLD} times in a row. Stop calling it. Produce the best answer you can with what you have.`;
+
+export async function runCapHitSummary(
+  ctx: InferenceContext,
+  args: TurnArgs,
+  session: Session,
+  project: Project,
+  history: Message[],
+  agent: Agent | null,
+  budget: number,
+): Promise<void> {
+  const { sessionId, chatId, assistantMessageId, signal } = args;
+
+  const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
+  messages.push({ role: 'system', content: CAP_HIT_SUMMARY_NOTE(budget) });
+
+  const startedRow = await ctx.sql<{ started_at: string }[]>`
+    UPDATE messages
+    SET started_at = clock_timestamp()
+    WHERE id = ${assistantMessageId}
+    RETURNING started_at
+  `;
+  const startedAt = startedRow[0]?.started_at ?? null;
+
+  ctx.publish(sessionId, {
+    type: 'message_started',
+    message_id: assistantMessageId,
+    chat_id: chatId,
+    role: 'assistant',
+  });
+
+  let accumulated = '';
+  let pendingFlushTimer: NodeJS.Timeout | null = null;
+  let flushPromise: Promise<unknown> = Promise.resolve();
+  const flushNow = () => {
+    if (pendingFlushTimer) {
+      clearTimeout(pendingFlushTimer);
+      pendingFlushTimer = null;
+    }
+    const snapshot = accumulated;
+    flushPromise = flushPromise.then(() =>
+      ctx.sql`UPDATE messages SET content = ${snapshot} WHERE id = ${assistantMessageId}`
+    );
+  };
+  const scheduleFlush = () => {
+    if (pendingFlushTimer) return;
+    pendingFlushTimer = setTimeout(() => {
+      pendingFlushTimer = null;
+      flushNow();
+    }, DB_FLUSH_INTERVAL_MS);
+  };
+
+  let summaryOk = false;
+  let summarySoftCancelled = false;
+  let summaryError: string | null = null;
+  let result: StreamResult | null = null;
+  try {
+    result = await streamCompletion(
+      ctx,
+      session.model,
+      messages,
+      { tools: null, temperature: agent?.temperature },
+      (delta) => {
+        accumulated += delta;
+        ctx.publish(sessionId, {
+          type: 'delta',
+          message_id: assistantMessageId,
+          chat_id: chatId,
+          content: delta,
+        });
+        scheduleFlush();
+      },
+      undefined,
+      signal,
+    );
+    summaryOk = true;
+  } catch (err) {
+    if (err instanceof Error && err.name === 'AbortError') {
+      summarySoftCancelled = true;
+    } else {
+      summaryError = err instanceof Error ? err.message : String(err);
+    }
+  } finally {
+    if (pendingFlushTimer) {
+      clearTimeout(pendingFlushTimer);
+      pendingFlushTimer = null;
+    }
+    await flushPromise;
+  }
+
+  // Finalize the summary message based on the three outcomes. The sentinel
+  // is inserted regardless so the user always has the Continue affordance —
+  // even on a partial / failed summary the chat history shows where the
+  // budget was hit.
+  if (summaryOk && result) {
+    // v1.11.3: see executeToolPhase for the rationale.
+    const mctx = await modelContext.getModelContext(session.model);
+    const nCtx = mctx?.n_ctx ?? null;
+    const [updated] = await ctx.sql<
+      { tokens_used: number | null; ctx_used: number | null; ctx_max: number | null; finished_at: string | null }[]
+    >`
+      UPDATE messages
+      SET content = ${result.content},
+          status = 'complete',
+          tokens_used = ${result.completionTokens},
+          ctx_used = ${result.promptTokens},
+          ctx_max = ${nCtx},
+          finished_at = clock_timestamp()
+      WHERE id = ${assistantMessageId}
+      RETURNING tokens_used, ctx_used, ctx_max, finished_at
+    `;
+    ctx.publish(sessionId, {
+      type: 'message_complete',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      tokens_used: updated?.tokens_used ?? null,
+      ctx_used: updated?.ctx_used ?? null,
+      ctx_max: updated?.ctx_max ?? null,
+      started_at: startedAt,
+      finished_at: updated?.finished_at ?? null,
+      model: session.model,
+    });
+  } else if (summarySoftCancelled) {
+    await ctx.sql`
+      UPDATE messages
+      SET content = ${accumulated},
+          status = 'cancelled',
+          finished_at = clock_timestamp()
+      WHERE id = ${assistantMessageId}
+    `;
+    ctx.publish(sessionId, {
+      type: 'message_complete',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+    });
+  } else {
+    const errMeta: MessageMetadata = {
+      kind: 'error',
+      error_reason: 'summary_after_cap_failed',
+      error_text: summaryError ?? 'summary failed',
+    };
+    await ctx.sql`
+      UPDATE messages
+      SET content = ${accumulated},
+          status = 'failed',
+          finished_at = clock_timestamp(),
+          metadata = ${ctx.sql.json(errMeta as never)}
+      WHERE id = ${assistantMessageId}
+    `;
+    ctx.publish(sessionId, {
+      type: 'error',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      error: summaryError ?? 'summary failed',
+      reason: 'summary_after_cap_failed',
+    });
+  }
+
+  // Bump session/chat updated_at exactly once for this turn.
+  const [sessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
+    UPDATE sessions SET updated_at = clock_timestamp()
+    WHERE id = ${sessionId}
+    RETURNING project_id, name, updated_at
+  `;
+  ctx.publishUser({
+    type: 'session_updated',
+    session_id: sessionId,
+    project_id: sessRow!.project_id,
+    name: sessRow!.name,
+    updated_at: sessRow!.updated_at,
+  });
+
+  await insertCapHitSentinel(ctx, sessionId, chatId, agent, budget);
+
+  // Status frame fires last so the dot color reflects the terminal state.
+  // Success → idle, abort → idle (user-driven stop), error → error+reason.
+  if (summaryOk) {
+    ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
+  } else if (summarySoftCancelled) {
+    ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
+  } else {
+    ctx.publishUser({
+      type: 'chat_status',
+      chat_id: chatId,
+      status: 'error',
+      at: new Date().toISOString(),
+      reason: 'summary_after_cap_failed',
+    });
+  }
+
+  ctx.log.info(
+    { sessionId, chatId, assistantMessageId, budget, summaryOk, summaryCancelled: summarySoftCancelled },
+    'inference cap-hit summary finished',
+  );
+}
+
+async function insertCapHitSentinel(
+  ctx: InferenceContext,
+  sessionId: string,
+  chatId: string,
+  agent: Agent | null,
+  budget: number,
+): Promise<void> {
+  // Hard ceiling: count prior cap_hit sentinels in this chat. After two
+  // continues (sentinel count of 2), the next sentinel reports can_continue
+  // false and the UI disables the Continue button.
+  const priorRows = await ctx.sql<{ count: number }[]>`
+    SELECT COUNT(*)::int AS count
+    FROM messages
+    WHERE chat_id = ${chatId}
+      AND role = 'system'
+      AND metadata->>'kind' = 'cap_hit'
+  `;
+  const priorCount = priorRows[0]?.count ?? 0;
+  const canContinue = priorCount < 2;
+  const metadata: MessageMetadata = {
+    kind: 'cap_hit',
+    used: budget,
+    limit: budget,
+    agent_name: agent?.name ?? null,
+    can_continue: canContinue,
+  };
+  const content = `Reached tool budget (${budget}/${budget}). Continue to extend.`;
+
+  const [row] = await ctx.sql<{ id: string }[]>`
+    INSERT INTO messages (session_id, chat_id, role, content, status, created_at, metadata)
+    VALUES (${sessionId}, ${chatId}, 'system', ${content}, 'complete', clock_timestamp(), ${ctx.sql.json(metadata as never)})
+    RETURNING id
+  `;
+
+  // The sentinel content is static, but we still walk the standard frame
+  // sequence (started → delta → complete) so useSessionStream's reducer
+  // appends it via the same path it uses for streaming assistant messages.
+  // The delta carries the full text in one chunk.
+  ctx.publish(sessionId, {
+    type: 'message_started',
+    message_id: row!.id,
+    chat_id: chatId,
+    role: 'system',
+  });
+  ctx.publish(sessionId, {
+    type: 'delta',
+    message_id: row!.id,
+    chat_id: chatId,
+    content,
+  });
+  ctx.publish(sessionId, {
+    type: 'message_complete',
+    message_id: row!.id,
+    chat_id: chatId,
+    metadata,
+  });
+}
+
+// v1.11.6: doom-loop wrap-up. Mirrors runCapHitSummary structurally — same
+// in-flight-slot reuse, same tools-disabled streaming-summary call, same
+// post-finalize sentinel insert + chat_status drop. Differences:
+//   - synthetic note text comes from DOOM_LOOP_NOTE (names the looping tool)
+//   - sentinel metadata is { kind: 'doom_loop', tool_name, args, threshold }
+//     and has no Continue affordance (manual retry would just re-loop)
+//   - chat_status error path uses reason: 'doom_loop_summary_failed'
+// Kept as a clone rather than refactored into a shared helper because the
+// two summary paths still differ in error reason + sentinel shape; a third
+// sentinel would justify factoring out runWrapUpSummary(opts).
+export async function runDoomLoopSummary(
+  ctx: InferenceContext,
+  args: TurnArgs,
+  session: Session,
+  project: Project,
+  history: Message[],
+  agent: Agent | null,
+  loop: { name: string; args: Record<string, unknown> },
+): Promise<void> {
+  const { sessionId, chatId, assistantMessageId, signal } = args;
+
+  const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
+  messages.push({ role: 'system', content: DOOM_LOOP_NOTE(loop.name) });
+
+  const startedRow = await ctx.sql<{ started_at: string }[]>`
+    UPDATE messages
+    SET started_at = clock_timestamp()
+    WHERE id = ${assistantMessageId}
+    RETURNING started_at
+  `;
+  const startedAt = startedRow[0]?.started_at ?? null;
+
+  ctx.publish(sessionId, {
+    type: 'message_started',
+    message_id: assistantMessageId,
+    chat_id: chatId,
+    role: 'assistant',
+  });
+
+  let accumulated = '';
+  let pendingFlushTimer: NodeJS.Timeout | null = null;
+  let flushPromise: Promise<unknown> = Promise.resolve();
+  const flushNow = () => {
+    if (pendingFlushTimer) {
+      clearTimeout(pendingFlushTimer);
+      pendingFlushTimer = null;
+    }
+    const snapshot = accumulated;
+    flushPromise = flushPromise.then(() =>
+      ctx.sql`UPDATE messages SET content = ${snapshot} WHERE id = ${assistantMessageId}`
+    );
+  };
+  const scheduleFlush = () => {
+    if (pendingFlushTimer) return;
+    pendingFlushTimer = setTimeout(() => {
+      pendingFlushTimer = null;
+      flushNow();
+    }, DB_FLUSH_INTERVAL_MS);
+  };
+
+  let summaryOk = false;
+  let summarySoftCancelled = false;
+  let summaryError: string | null = null;
+  let result: StreamResult | null = null;
+  try {
+    result = await streamCompletion(
+      ctx,
+      session.model,
+      messages,
+      { tools: null, temperature: agent?.temperature },
+      (delta) => {
+        accumulated += delta;
+        ctx.publish(sessionId, {
+          type: 'delta',
+          message_id: assistantMessageId,
+          chat_id: chatId,
+          content: delta,
+        });
+        scheduleFlush();
+      },
+      undefined,
+      signal,
+    );
+    summaryOk = true;
+  } catch (err) {
+    if (err instanceof Error && err.name === 'AbortError') {
+      summarySoftCancelled = true;
+    } else {
+      summaryError = err instanceof Error ? err.message : String(err);
+    }
+  } finally {
+    if (pendingFlushTimer) {
+      clearTimeout(pendingFlushTimer);
+      pendingFlushTimer = null;
+    }
+    await flushPromise;
+  }
+
+  if (summaryOk && result) {
+    const mctx = await modelContext.getModelContext(session.model);
+    const nCtx = mctx?.n_ctx ?? null;
+    const [updated] = await ctx.sql<
+      { tokens_used: number | null; ctx_used: number | null; ctx_max: number | null; finished_at: string | null }[]
+    >`
+      UPDATE messages
+      SET content = ${result.content},
+          status = 'complete',
+          tokens_used = ${result.completionTokens},
+          ctx_used = ${result.promptTokens},
+          ctx_max = ${nCtx},
+          finished_at = clock_timestamp()
+      WHERE id = ${assistantMessageId}
+      RETURNING tokens_used, ctx_used, ctx_max, finished_at
+    `;
+    ctx.publish(sessionId, {
+      type: 'message_complete',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      tokens_used: updated?.tokens_used ?? null,
+      ctx_used: updated?.ctx_used ?? null,
+      ctx_max: updated?.ctx_max ?? null,
+      started_at: startedAt,
+      finished_at: updated?.finished_at ?? null,
+      model: session.model,
+    });
+  } else if (summarySoftCancelled) {
+    await ctx.sql`
+      UPDATE messages
+      SET content = ${accumulated},
+          status = 'cancelled',
+          finished_at = clock_timestamp()
+      WHERE id = ${assistantMessageId}
+    `;
+    ctx.publish(sessionId, {
+      type: 'message_complete',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+    });
+  } else {
+    // Doom-loop summary failure reuses the existing summary_after_cap_failed
+    // error reason — the ErrorReason union is shared between sentinel paths
+    // and the UI surfaces a generic "summary failed" line for both. We don't
+    // add a new reason code because the user-visible failure mode is the
+    // same (model gave up mid-summary). Sentinel below still fires.
+    const errMeta: MessageMetadata = {
+      kind: 'error',
+      error_reason: 'summary_after_cap_failed',
+      error_text: summaryError ?? 'doom-loop summary failed',
+    };
+    await ctx.sql`
+      UPDATE messages
+      SET content = ${accumulated},
+          status = 'failed',
+          finished_at = clock_timestamp(),
+          metadata = ${ctx.sql.json(errMeta as never)}
+      WHERE id = ${assistantMessageId}
+    `;
+    ctx.publish(sessionId, {
+      type: 'error',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      error: summaryError ?? 'doom-loop summary failed',
+      reason: 'summary_after_cap_failed',
+    });
+  }
+
+  const [sessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
+    UPDATE sessions SET updated_at = clock_timestamp()
+    WHERE id = ${sessionId}
+    RETURNING project_id, name, updated_at
+  `;
+  ctx.publishUser({
+    type: 'session_updated',
+    session_id: sessionId,
+    project_id: sessRow!.project_id,
+    name: sessRow!.name,
+    updated_at: sessRow!.updated_at,
+  });
+
+  await insertDoomLoopSentinel(ctx, sessionId, chatId, loop);
+
+  if (summaryOk || summarySoftCancelled) {
+    ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
+  } else {
+    ctx.publishUser({
+      type: 'chat_status',
+      chat_id: chatId,
+      status: 'error',
+      at: new Date().toISOString(),
+      reason: 'summary_after_cap_failed',
+    });
+  }
+
+  ctx.log.info(
+    { sessionId, chatId, assistantMessageId, loopedTool: loop.name, summaryOk, summaryCancelled: summarySoftCancelled },
+    'inference doom-loop summary finished',
+  );
+}
+
+async function insertDoomLoopSentinel(
+  ctx: InferenceContext,
+  sessionId: string,
+  chatId: string,
+  loop: { name: string; args: Record<string, unknown> },
+): Promise<void> {
+  // No hard-ceiling / can-continue logic here — doom-loop is a different
+  // failure mode from cap-hit. Continuing would re-trigger the loop with
+  // the same tools available; the user needs to restate their question
+  // or switch agents instead.
+  const metadata: MessageMetadata = {
+    kind: 'doom_loop',
+    tool_name: loop.name,
+    args: loop.args,
+    threshold: DOOM_LOOP_THRESHOLD,
+  };
+  const content = `Detected ${DOOM_LOOP_THRESHOLD} identical calls to ${loop.name}. Stopping the tool-call loop. Produce the best answer you can with what you have.`;
+
+  const [row] = await ctx.sql<{ id: string }[]>`
+    INSERT INTO messages (session_id, chat_id, role, content, status, created_at, metadata)
+    VALUES (${sessionId}, ${chatId}, 'system', ${content}, 'complete', clock_timestamp(), ${ctx.sql.json(metadata as never)})
+    RETURNING id
+  `;
+
+  // Standard frame sequence — same as cap-hit sentinel — so
+  // useSessionStream's reducer appends the row via the existing path.
+  ctx.publish(sessionId, {
+    type: 'message_started',
+    message_id: row!.id,
+    chat_id: chatId,
+    role: 'system',
+  });
+  ctx.publish(sessionId, {
+    type: 'delta',
+    message_id: row!.id,
+    chat_id: chatId,
+    content,
+  });
+  ctx.publish(sessionId, {
+    type: 'message_complete',
+    message_id: row!.id,
+    chat_id: chatId,
+    metadata,
+  });
+}
--- a/apps/server/src/services/inference/sentinels.ts
+++ b/apps/server/src/services/inference/sentinels.ts
@@ -0,0 +1,53 @@
+import type { Message, ToolCall } from '../../types/api.js';
+
+// v1.11.6: doom-loop guard. When the model calls the same tool with the
+// same arguments DOOM_LOOP_THRESHOLD times in a row within one user-message
+// turn, abort the recursion and run the same wrap-up summary path as the
+// cap-hit case. Ported from opencode (DOOM_LOOP_THRESHOLD in
+// session/processor.ts). Threshold of 3 is the smallest value that doesn't
+// false-positive on a model that retries once after a transient error.
+export const DOOM_LOOP_THRESHOLD = 3;
+
+// Returns the name + args of the looping tool when the LAST
+// DOOM_LOOP_THRESHOLD entries in `recentToolCalls` are identical (same name
+// AND deep-equal args via JSON.stringify). Returns null otherwise.
+// Pure; exported for unit-test access.
+export function detectDoomLoop(
+  recentToolCalls: ToolCall[],
+): { name: string; args: Record<string, unknown> } | null {
+  if (recentToolCalls.length < DOOM_LOOP_THRESHOLD) return null;
+  const last = recentToolCalls.slice(-DOOM_LOOP_THRESHOLD);
+  const ref = last[0]!;
+  const refArgs = JSON.stringify(ref.args);
+  for (let i = 1; i < last.length; i++) {
+    const tc = last[i]!;
+    if (tc.name !== ref.name) return null;
+    if (JSON.stringify(tc.args) !== refArgs) return null;
+  }
+  return { name: ref.name, args: ref.args };
+}
+
+export function isCapHitSentinel(m: Message): boolean {
+  return (
+    m.role === 'system' &&
+    m.metadata !== null &&
+    typeof m.metadata === 'object' &&
+    (m.metadata as { kind?: unknown }).kind === 'cap_hit'
+  );
+}
+
+// v1.11.6: parallel predicate. Same UI-only semantics as cap-hit sentinels —
+// never sent to the LLM (filtered by buildMessagesPayload through the
+// isAnySentinel check below).
+export function isDoomLoopSentinel(m: Message): boolean {
+  return (
+    m.role === 'system' &&
+    m.metadata !== null &&
+    typeof m.metadata === 'object' &&
+    (m.metadata as { kind?: unknown }).kind === 'doom_loop'
+  );
+}
+
+export function isAnySentinel(m: Message): boolean {
+  return isCapHitSentinel(m) || isDoomLoopSentinel(m);
+}
--- a/apps/server/src/services/inference/stream-phase.ts
+++ b/apps/server/src/services/inference/stream-phase.ts
@@ -0,0 +1,482 @@
+import type {
+  Agent,
+  Session,
+  ToolCall,
+} from '../../types/api.js';
+import * as modelContext from '../model-context.js';
+import { toolJsonSchemas, type ToolJsonSchema } from '../tools.js';
+import type { OpenAiMessage } from './payload.js';
+import {
+  XML_TOOL_CLOSE,
+  XML_TOOL_OPEN,
+  parseXmlToolCall,
+  partialXmlOpenerStart,
+} from './xml-parser.js';
+import { DB_FLUSH_INTERVAL_MS, type StreamPhaseState } from './types.js';
+import type {
+  InferenceContext,
+  StreamResult,
+  TurnArgs,
+} from './turn.js';
+import { upstreamModel } from './provider.js';
+import {
+  jsonSchema,
+  streamText,
+  tool,
+  type JSONValue,
+  type ModelMessage,
+  type ToolCallRepairFunction,
+} from 'ai';
+
+interface StreamOptions {
+  // null = omit tools entirely (compact phase); [] = caller stripped all tools
+  // (rare; we still omit from the request body to avoid OpenAI 400).
+  tools: ToolJsonSchema[] | null;
+  temperature?: number;
+}
+
+// v1.13.1-A: convert BooCode's OpenAI-shaped history into AI SDK
+// ModelMessage[]. Tool result messages need a `toolName` field that the
+// OpenAI shape doesn't carry; we look it up by scanning earlier assistant
+// `tool_calls` entries for a matching id.
+function toModelMessages(messages: OpenAiMessage[]): ModelMessage[] {
+  const toolNameById = new Map<string, string>();
+  for (const m of messages) {
+    if (m.role === 'assistant' && m.tool_calls) {
+      for (const tc of m.tool_calls) {
+        toolNameById.set(tc.id, tc.function.name);
+      }
+    }
+  }
+  const out: ModelMessage[] = [];
+  for (const m of messages) {
+    if (m.role === 'system' || m.role === 'user') {
+      out.push({ role: m.role, content: m.content ?? '' });
+      continue;
+    }
+    if (m.role === 'assistant') {
+      const hasTools = m.tool_calls && m.tool_calls.length > 0;
+      const hasReasoning = typeof m.reasoning === 'string' && m.reasoning.length > 0;
+      if (!hasTools && !hasReasoning) {
+        // Bare text assistant (string content). null content + no tool_calls
+        // is degenerate but harmless to forward.
+        out.push({ role: 'assistant', content: m.content ?? '' });
+        continue;
+      }
+      // v1.13.1-C: AI SDK ReasoningPart precedes text + tool-calls in the
+      // assistant content array. Reasoning models (qwen3.6) consume their
+      // prior reasoning context to resume mid-thought across tool boundaries.
+      const parts: Array<
+        | { type: 'reasoning'; text: string }
+        | { type: 'text'; text: string }
+        | { type: 'tool-call'; toolCallId: string; toolName: string; input: unknown }
+      > = [];
+      if (hasReasoning) {
+        parts.push({ type: 'reasoning', text: m.reasoning! });
+      }
+      if (m.content && m.content.length > 0) {
+        parts.push({ type: 'text', text: m.content });
+      }
+      for (const tc of m.tool_calls ?? []) {
+        let input: unknown = {};
+        try {
+          input = tc.function.arguments.length > 0 ? JSON.parse(tc.function.arguments) : {};
+        } catch {
+          // Malformed args from a prior turn: pass through as a raw blob so
+          // the model sees the same shape it emitted. Wraps the string under
+          // _raw to match the buildMessagesPayload upstream convention.
+          input = { _raw: tc.function.arguments };
+        }
+        parts.push({ type: 'tool-call', toolCallId: tc.id, toolName: tc.function.name, input });
+      }
+      out.push({ role: 'assistant', content: parts });
+      continue;
+    }
+    if (m.role === 'tool') {
+      const toolCallId = m.tool_call_id ?? '';
+      const toolName = toolNameById.get(toolCallId) ?? 'unknown';
+      const raw = m.content ?? '';
+      let output: { type: 'text'; value: string } | { type: 'json'; value: JSONValue };
+      try {
+        // JSON.parse returns `any`; cast to JSONValue since the upstream
+        // tool_results column is already JSON-serializable by construction.
+        output = { type: 'json', value: JSON.parse(raw) as JSONValue };
+      } catch {
+        output = { type: 'text', value: raw };
+      }
+      out.push({
+        role: 'tool',
+        content: [{ type: 'tool-result', toolCallId, toolName, output }],
+      });
+      continue;
+    }
+  }
+  return out;
+}
+
+// Build the AI SDK tools record from BooCode's JSON-schema tool definitions.
+// No `execute` field: BooCode runs tools itself in tool-phase.ts; streamText
+// surfaces the tool-call parts via fullStream and we capture them for the
+// outer loop to dispatch.
+function buildAiTools(schemas: ToolJsonSchema[]): Record<string, ReturnType<typeof tool>> {
+  const out: Record<string, ReturnType<typeof tool>> = {};
+  for (const s of schemas) {
+    out[s.function.name] = tool({
+      description: s.function.description,
+      inputSchema: jsonSchema(s.function.parameters),
+    });
+  }
+  return out;
+}
+
+// v1.10.5 Qwen-coder XML fallback. Some local models (notably qwen3-coder via
+// llama-swap) emit tool calls as inline XML inside delta.content rather than
+// the structured tool_calls field. We extract them out of the streamed text
+// before flushing it to the client, mirroring the pre-AI-SDK behavior.
+//
+// XML shape:
+//   <tool_call>
+//   <function=NAME>
+//   <parameter=KEY>VALUE</parameter>
+//   ...
+//   </function>
+//   </tool_call>
+// Multiple <tool_call> blocks may appear back-to-back; they never nest.
+export async function streamCompletion(
+  ctx: InferenceContext,
+  model: string,
+  messages: OpenAiMessage[],
+  opts: StreamOptions,
+  onDelta: (content: string) => void,
+  onUsage: ((prompt: number | null, completion: number | null) => void) | undefined,
+  signal?: AbortSignal
+): Promise<StreamResult> {
+  const aiMessages = toModelMessages(messages);
+  const hasTools = opts.tools !== null && opts.tools.length > 0;
+  const aiTools = hasTools ? buildAiTools(opts.tools!) : undefined;
+
+  const startedAt = Date.now();
+  // v1.13.1-C: accumulate reasoning text across reasoning-delta parts.
+  // qwen3.6 emits these on a separate channel from text content; we capture
+  // them per stream so finalizeCompletion can dual-write a 'reasoning' part.
+  // Replaces the v1.13.1-A counter-only diagnostic.
+  let reasoningAccumulated = '';
+
+  // v1.13.3: experimental_repairToolCall keeps the stream alive when the
+  // model emits a malformed tool call (bad JSON args, unknown name, etc.).
+  // Without a repair function streamText throws and the WHOLE stream dies;
+  // with one, the SDK invokes us and we route the bad call through normally.
+  // Strategy: pass through unmodified. executeToolPhase's existing error
+  // path (unknown tool name → "unknown tool: X" result; zod-reject → tool
+  // 'X' rejected — fieldname: required) already gives the model a clean
+  // recovery surface on the next turn. Logging gives us visibility into
+  // how often qwen3.6 actually emits broken calls.
+  const repairToolCall: ToolCallRepairFunction<NonNullable<typeof aiTools>> = async ({
+    toolCall,
+    error,
+  }) => {
+    ctx.log.warn(
+      {
+        toolCallId: toolCall.toolCallId,
+        toolName: toolCall.toolName,
+        error: error.message,
+      },
+      'malformed tool call surfaced via repairToolCall',
+    );
+    return toolCall;
+  };
+
+  const result = streamText({
+    model: upstreamModel(ctx.config.LLAMA_SWAP_URL, model),
+    messages: aiMessages,
+    ...(aiTools
+      ? { tools: aiTools, toolChoice: 'auto' as const, experimental_repairToolCall: repairToolCall }
+      : {}),
+    ...(typeof opts.temperature === 'number' ? { temperature: opts.temperature } : {}),
+    abortSignal: signal,
+  });
+
+  let content = '';
+  let pendingBuffer = '';
+  let finishReason: string | null = null;
+  // v1.13.1-A: AI SDK emits one `tool-call` part per fully-aggregated call,
+  // so we no longer need the OpenAI-index reassembly map the manual SSE
+  // parser used. XML tool calls extracted from text content go into the
+  // same flat list and keep the v1.10.5 synthetic id convention.
+  const toolCalls: ToolCall[] = [];
+
+  for await (const part of result.fullStream) {
+    switch (part.type) {
+      case 'text-delta': {
+        pendingBuffer += part.text;
+        // Extract any complete <tool_call>...</tool_call> blocks before
+        // flushing visible text.
+        while (true) {
+          const startIdx = pendingBuffer.indexOf(XML_TOOL_OPEN);
+          if (startIdx === -1) break;
+          const closeIdx = pendingBuffer.indexOf(XML_TOOL_CLOSE, startIdx);
+          if (closeIdx === -1) break;
+          const blockEnd = closeIdx + XML_TOOL_CLOSE.length;
+          const block = pendingBuffer.slice(startIdx, blockEnd);
+          if (startIdx > 0) {
+            const before = pendingBuffer.slice(0, startIdx);
+            content += before;
+            onDelta(before);
+          }
+          const parsedCall = parseXmlToolCall(block);
+          if (parsedCall) {
+            const synthIdx = toolCalls.length;
+            toolCalls.push({
+              id: `xml_call_${synthIdx}`,
+              name: parsedCall.name,
+              args: parsedCall.args,
+            });
+          }
+          // Parse failures still drop the block — leaking <tool_call> XML to
+          // the chat would look worse than silently swallowing the bad block.
+          pendingBuffer = pendingBuffer.slice(blockEnd);
+        }
+        // Hold back any (partial or full) unclosed opener; flush the rest.
+        const partialIdx = partialXmlOpenerStart(pendingBuffer);
+        if (partialIdx >= 0) {
+          if (partialIdx > 0) {
+            const flush = pendingBuffer.slice(0, partialIdx);
+            content += flush;
+            onDelta(flush);
+          }
+          pendingBuffer = pendingBuffer.slice(partialIdx);
+        } else if (pendingBuffer.length > 0) {
+          content += pendingBuffer;
+          onDelta(pendingBuffer);
+          pendingBuffer = '';
+        }
+        break;
+      }
+      case 'tool-call': {
+        // AI SDK has already parsed the input into an object. Match the
+        // ToolCall shape BooCode passes around in toolCallsBuffer downstream.
+        toolCalls.push({
+          id: part.toolCallId,
+          name: part.toolName,
+          args: (part.input ?? {}) as Record<string, unknown>,
+        });
+        break;
+      }
+      case 'reasoning-delta': {
+        // v1.13.1-C: accumulate; finalizeCompletion / executeToolPhase
+        // dual-write the resulting text as a kind='reasoning' part.
+        if (typeof part.text === 'string') {
+          reasoningAccumulated += part.text;
+        }
+        break;
+      }
+      case 'finish': {
+        if (typeof part.finishReason === 'string') {
+          finishReason = part.finishReason;
+        }
+        break;
+      }
+      case 'error': {
+        const err = part.error;
+        throw err instanceof Error ? err : new Error(String(err));
+      }
+      // Intentional no-op: start, start-step, text-start, text-end,
+      // reasoning-start, reasoning-end, source, file, tool-input-start,
+      // tool-input-delta, tool-input-end, tool-result, tool-error,
+      // finish-step, raw. We only care about the aggregated tool-call and
+      // text-delta paths above; the rest are AI SDK lifecycle/streaming
+      // breadcrumbs that don't change BooCode's persistence or WS contract.
+      default:
+        break;
+    }
+  }
+
+  // v1.13.1-A: drain any buffered partial XML opener as plain text. The
+  // pre-AI-SDK path did this on stream end too — better to leak `<tool_c`
+  // than vanish the text.
+  if (pendingBuffer.length > 0) {
+    content += pendingBuffer;
+    onDelta(pendingBuffer);
+    pendingBuffer = '';
+  }
+
+  // AI SDK v6 fullStream returns normally on abort; check signal explicitly.
+  // Without this throw the row would land as status='complete' with partial
+  // content instead of going through handleAbortOrError → status='cancelled'.
+  // Smoke D caught this in v1.13.1-A — don't refactor it away.
+  if (signal?.aborted) {
+    const abortErr = new Error('aborted');
+    abortErr.name = 'AbortError';
+    throw abortErr;
+  }
+
+  // Usage lands as a promise on the result; awaiting after fullStream is
+  // drained is safe. AI SDK v6 names: `inputTokens` / `outputTokens`.
+  let promptTokens: number | null = null;
+  let completionTokens: number | null = null;
+  try {
+    const usage = await result.usage;
+    if (typeof usage.inputTokens === 'number') promptTokens = usage.inputTokens;
+    if (typeof usage.outputTokens === 'number') completionTokens = usage.outputTokens;
+  } catch {
+    // Some providers omit usage on partial streams; leave both null.
+  }
+
+  if (onUsage && (promptTokens !== null || completionTokens !== null)) {
+    onUsage(promptTokens, completionTokens);
+  }
+
+  if (reasoningAccumulated.length > 0) {
+    ctx.log.debug(
+      { reasoningChars: reasoningAccumulated.length, model, elapsed_ms: Date.now() - startedAt },
+      'streamCompletion: captured reasoning',
+    );
+  }
+
+  return {
+    finishReason,
+    content,
+    toolCalls,
+    promptTokens,
+    completionTokens,
+    reasoning: reasoningAccumulated,
+  };
+}
+
+export async function executeStreamPhase(
+  ctx: InferenceContext,
+  args: TurnArgs,
+  session: Session,
+  messages: OpenAiMessage[],
+  state: StreamPhaseState,
+  agent: Agent | null,
+  // v1.11.8: when false, web_search and web_fetch are stripped from the
+  // tool list sent to the LLM, so the model can't even attempt them.
+  webToolsEnabled: boolean,
+): Promise<StreamResult> {
+  const { sessionId, chatId, assistantMessageId, signal } = args;
+
+  const startedRow = await ctx.sql<{ started_at: string }[]>`
+    UPDATE messages
+    SET started_at = clock_timestamp()
+    WHERE id = ${assistantMessageId}
+    RETURNING started_at
+  `;
+  state.startedAt = startedRow[0]?.started_at ?? null;
+
+  ctx.publish(sessionId, {
+    type: 'message_started',
+    message_id: assistantMessageId,
+    chat_id: chatId,
+    role: 'assistant',
+  });
+
+  let pendingFlushTimer: NodeJS.Timeout | null = null;
+  let flushPromise: Promise<unknown> = Promise.resolve();
+
+  const flushNow = () => {
+    if (pendingFlushTimer) {
+      clearTimeout(pendingFlushTimer);
+      pendingFlushTimer = null;
+    }
+    const snapshot = state.accumulated;
+    flushPromise = flushPromise.then(() =>
+      ctx.sql`UPDATE messages SET content = ${snapshot} WHERE id = ${assistantMessageId}`
+    );
+  };
+
+  const scheduleFlush = () => {
+    if (pendingFlushTimer) return;
+    pendingFlushTimer = setTimeout(() => {
+      pendingFlushTimer = null;
+      flushNow();
+    }, DB_FLUSH_INTERVAL_MS);
+  };
+
+  // Tool whitelist: if an agent is set, filter the global tool list to only the
+  // tool names it allows. Unknown names in agent.tools are dropped silently
+  // (handled here by intersection). When no agent: send all tools.
+  // v1.11.8: a second filter strips web_search + web_fetch unless the chat
+  // has them explicitly enabled. Counts as an opt-in security boundary: the
+  // model can't summon a tool that wasn't offered to it.
+  const WEB_TOOL_NAMES: ReadonlySet<string> = new Set(['web_search', 'web_fetch']);
+  const effectiveTools: ToolJsonSchema[] = (agent
+    ? toolJsonSchemas().filter((t) => agent.tools.includes(t.function.name))
+    : toolJsonSchemas()
+  ).filter((t) => webToolsEnabled || !WEB_TOOL_NAMES.has(t.function.name));
+  const effectiveTemperature = agent?.temperature;
+
+  // v1.12.2: ctx_max lookup is cached after the first hit per model, so this
+  // is a Map probe in steady state. We capture nCtx once at the top of the
+  // stream so the throttled usage publish doesn't refetch each tick.
+  const mctxForStream = await modelContext.getModelContext(session.model);
+  const nCtxForStream = mctxForStream?.n_ctx ?? null;
+
+  // v1.12.2 → v1.13.1-A: live usage publishes were throttled to ~500ms when
+  // the manual SSE parser saw `parsed.usage` per chunk. AI SDK v6 surfaces
+  // usage only at stream end (result.usage promise), so the throttle is
+  // effectively a single trailing publish. ChatThroughput will tick once at
+  // stream completion rather than mid-stream — known regression vs v1.12.2,
+  // recovered if a future dispatch interpolates from delta cadence.
+  const USAGE_THROTTLE_MS = 500;
+  let lastUsageAt = 0;
+  let pendingUsage: { p: number | null; c: number | null } | null = null;
+  let usageTimer: NodeJS.Timeout | null = null;
+  const flushUsage = () => {
+    if (!pendingUsage) return;
+    const { p, c } = pendingUsage;
+    pendingUsage = null;
+    lastUsageAt = Date.now();
+    ctx.publish(sessionId, {
+      type: 'usage',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      completion_tokens: c,
+      ctx_used: p,
+      ctx_max: nCtxForStream,
+    });
+  };
+
+  try {
+    return await streamCompletion(
+      ctx,
+      session.model,
+      messages,
+      { tools: effectiveTools, temperature: effectiveTemperature },
+      (delta) => {
+        state.accumulated += delta;
+        ctx.publish(sessionId, {
+          type: 'delta',
+          message_id: assistantMessageId,
+          chat_id: chatId,
+          content: delta,
+        });
+        ctx.log.debug({ sessionId, delta }, 'inference delta');
+        scheduleFlush();
+      },
+      (prompt, completion) => {
+        pendingUsage = { p: prompt, c: completion };
+        const elapsed = Date.now() - lastUsageAt;
+        if (elapsed >= USAGE_THROTTLE_MS) {
+          flushUsage();
+        } else if (!usageTimer) {
+          usageTimer = setTimeout(() => {
+            usageTimer = null;
+            flushUsage();
+          }, USAGE_THROTTLE_MS - elapsed);
+        }
+      },
+      signal
+    );
+  } finally {
+    if (pendingFlushTimer) {
+      clearTimeout(pendingFlushTimer);
+      pendingFlushTimer = null;
+    }
+    if (usageTimer) {
+      clearTimeout(usageTimer);
+      usageTimer = null;
+    }
+    await flushPromise;
+  }
+}
--- a/apps/server/src/services/inference/tool-phase.ts
+++ b/apps/server/src/services/inference/tool-phase.ts
@@ -0,0 +1,256 @@
+import type { Session, ToolCall } from '../../types/api.js';
+import * as modelContext from '../model-context.js';
+import { PathScopeError } from '../path_guard.js';
+import { TOOLS_BY_NAME } from '../tools.js';
+import { maybeFlagForCompaction } from './payload.js';
+import { insertParts, partsFromAssistantMessage, partsFromToolMessage } from './parts.js';
+import type {
+  InferenceContext,
+  StreamResult,
+  TurnArgs,
+} from './turn.js';
+// v1.12.4: ESM value-import cycle. executeToolPhase recurses into
+// runAssistantTurn which lives in inference.ts. The cycle is safe because
+// the reference is read at call time (inside an async function body), not
+// at module top-level. Node + tsc resolve this cleanly.
+import { runAssistantTurn } from './turn.js';
+
+async function executeToolCall(
+  projectRoot: string,
+  toolCall: ToolCall
+): Promise<{ output: unknown; truncated: boolean; error?: string }> {
+  const tool = TOOLS_BY_NAME[toolCall.name];
+  if (!tool) {
+    return { output: null, truncated: false, error: `unknown tool: ${toolCall.name}` };
+  }
+  const parsed = tool.inputSchema.safeParse(toolCall.args);
+  if (!parsed.success) {
+    // v1.12 Track B.2: enrich the zod-reject path so the model sees a
+    // one-line, tool-named hint ("tool 'search_symbols' rejected — query:
+    // Required") instead of a JSON blob of flatten output. Higher recovery
+    // rate on the next turn; doom-loop guard still bounds infinite retries.
+    // The cast is because tool.inputSchema is ZodType<unknown>, so zod can't
+    // statically narrow flatten()'s fieldErrors key set — but the runtime
+    // shape is the standard { formErrors: string[]; fieldErrors: Record<...> }.
+    const flatten = parsed.error.flatten() as {
+      formErrors: string[];
+      fieldErrors: Record<string, string[] | undefined>;
+    };
+    const fieldErrors = Object.entries(flatten.fieldErrors)
+      .map(([field, errs]) => `${field}: ${errs?.[0] ?? 'invalid'}`)
+      .join('; ');
+    const formError = flatten.formErrors[0];
+    const hint = fieldErrors || formError || 'unknown validation error';
+    return {
+      output: null,
+      truncated: false,
+      error: `tool '${toolCall.name}' rejected — ${hint}`,
+    };
+  }
+  try {
+    const output = await tool.execute(parsed.data, projectRoot);
+    const truncated =
+      typeof output === 'object' && output !== null && 'truncated' in output
+        ? Boolean((output as { truncated: unknown }).truncated)
+        : false;
+    return { output, truncated };
+  } catch (err) {
+    if (err instanceof PathScopeError) {
+      return { output: null, truncated: false, error: err.message };
+    }
+    return {
+      output: null,
+      truncated: false,
+      error: err instanceof Error ? err.message : String(err),
+    };
+  }
+}
+
+export async function executeToolPhase(
+  ctx: InferenceContext,
+  args: TurnArgs,
+  result: StreamResult,
+  startedAt: string | null,
+  session: Session,
+  projectRoot: string
+): Promise<void> {
+  const { sessionId, chatId, assistantMessageId, toolsUsed, signal } = args;
+  const { content, toolCalls, promptTokens, completionTokens } = result;
+
+  // v1.11.3: ctx_max comes from llama-swap /upstream/<model>/props, not the
+  // streaming completion (which doesn't emit n_ctx). getModelContext caches
+  // the positive lookup for the process lifetime, so this is a single Map
+  // hit after the first invocation per model.
+  const mctx = await modelContext.getModelContext(session.model);
+  const nCtx = mctx?.n_ctx ?? null;
+
+  const [updated] = await ctx.sql<
+    { tokens_used: number | null; ctx_used: number | null; ctx_max: number | null; finished_at: string | null }[]
+  >`
+    UPDATE messages
+    SET content = ${content},
+        status = 'complete',
+        tool_calls = ${ctx.sql.json(toolCalls as never)},
+        tokens_used = ${completionTokens},
+        ctx_used = ${promptTokens},
+        ctx_max = ${nCtx},
+        finished_at = clock_timestamp()
+    WHERE id = ${assistantMessageId}
+    RETURNING tokens_used, ctx_used, ctx_max, finished_at
+  `;
+  // v1.13.0: dual-write to message_parts. v1.13.1-B made parts authoritative
+  // for reads via the messages_with_parts view; the JSON column write above
+  // remains for v1.13.1 fallback compatibility (dropped in v1.13.2).
+  // v1.13.1-C: include result.reasoning so models with separate reasoning
+  // channels (qwen3.6) get a kind='reasoning' part at sequence 0.
+  // TODO(v1.13.1): wrap the UPDATE above and this insertParts in a single
+  // sql.begin before flipping read authority to message_parts. Without the
+  // transaction, a crash between the two leaves an orphan message that
+  // becomes invisible in the parts-authoritative read path.
+  await insertParts(
+    ctx.sql,
+    partsFromAssistantMessage({
+      content,
+      tool_calls: toolCalls,
+      reasoning: result.reasoning,
+    }).map((p) => ({
+      ...p,
+      message_id: assistantMessageId,
+    })),
+  );
+  // v1.11: flag for compaction if this turn pushed us over the usable budget.
+  // We never compact mid-loop (the recursive runAssistantTurn keeps tools
+  // flowing); the flag fires on the NEXT turn's pre-fetch hook above.
+  await maybeFlagForCompaction(ctx, chatId, updated);
+  const [toolSessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
+    UPDATE sessions SET updated_at = clock_timestamp()
+    WHERE id = ${sessionId}
+    RETURNING project_id, name, updated_at
+  `;
+  ctx.publishUser({ type: 'session_updated', session_id: sessionId, project_id: toolSessRow!.project_id, name: toolSessRow!.name, updated_at: toolSessRow!.updated_at });
+  for (const tc of toolCalls) {
+    ctx.publish(sessionId, {
+      type: 'tool_call',
+      message_id: assistantMessageId,
+      chat_id: chatId,
+      tool_call: tc,
+    });
+  }
+  ctx.publish(sessionId, {
+    type: 'message_complete',
+    message_id: assistantMessageId,
+    chat_id: chatId,
+    tokens_used: updated?.tokens_used ?? null,
+    ctx_used: updated?.ctx_used ?? null,
+    ctx_max: updated?.ctx_max ?? null,
+    started_at: startedAt,
+    finished_at: updated?.finished_at ?? null,
+    model: session.model,
+  });
+
+  // Batch 9.7: ask_user_input pauses the loop. The tool row is still inserted
+  // (the answer endpoint needs a target row to UPDATE), but tool_results is
+  // pre-stamped with output=null as a "pending" sentinel and no tool_result
+  // frame goes out — the card renders from the tool_call frame alone. Mixed
+  // batches still execute the other tools normally.
+  ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'tool_running', at: new Date().toISOString() });
+  let pausingForUserInput = false;
+  await Promise.all(
+    toolCalls.map(async (tc) => {
+      const [toolRow] = await ctx.sql<{ id: string }[]>`
+        INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
+        VALUES (${sessionId}, ${chatId}, 'tool', '', 'complete', clock_timestamp())
+        RETURNING id
+      `;
+      const toolMessageId = toolRow!.id;
+      if (tc.name === 'ask_user_input') {
+        pausingForUserInput = true;
+        const sentinel = { tool_call_id: tc.id, output: null, truncated: false };
+        await ctx.sql`
+          UPDATE messages
+          SET tool_results = ${ctx.sql.json(sentinel as never)}
+          WHERE id = ${toolMessageId}
+        `;
+        // v1.13.0: mirror the pending sentinel into message_parts. The
+        // answer-endpoint UPDATE later (messages.ts:576) will delete and
+        // re-insert this part when the user submits their answer.
+        // TODO(v1.13.1): wrap the INSERT + UPDATE + insertParts triple in
+        // a per-iteration sql.begin before flipping read authority.
+        await insertParts(
+          ctx.sql,
+          partsFromToolMessage({ tool_results: sentinel }).map((p) => ({
+            ...p,
+            message_id: toolMessageId,
+          })),
+        );
+        return;
+      }
+      const tres = await executeToolCall(projectRoot, tc);
+      const stored = {
+        tool_call_id: tc.id,
+        output: tres.output,
+        truncated: tres.truncated,
+        ...(tres.error ? { error: tres.error } : {}),
+      };
+      await ctx.sql`
+        UPDATE messages
+        SET tool_results = ${ctx.sql.json(stored as never)}
+        WHERE id = ${toolMessageId}
+      `;
+      // v1.13.0: dual-write the tool_result part.
+      // TODO(v1.13.1): wrap the INSERT + UPDATE + insertParts triple in a
+      // per-iteration sql.begin before flipping read authority.
+      await insertParts(
+        ctx.sql,
+        partsFromToolMessage({ tool_results: stored }).map((p) => ({
+          ...p,
+          message_id: toolMessageId,
+        })),
+      );
+      ctx.publish(sessionId, {
+        type: 'tool_result',
+        tool_message_id: toolMessageId,
+        chat_id: chatId,
+        tool_call_id: tc.id,
+        output: tres.output,
+        truncated: tres.truncated,
+        ...(tres.error ? { error: tres.error } : {}),
+      });
+    })
+  );
+
+  if (pausingForUserInput) {
+    ctx.publishUser({
+      type: 'chat_status',
+      chat_id: chatId,
+      status: 'waiting_for_input',
+      at: new Date().toISOString(),
+    });
+    ctx.log.info(
+      { sessionId, chatId, assistantMessageId },
+      'inference paused awaiting user input',
+    );
+    return;
+  }
+
+  const [nextAssistant] = await ctx.sql<{ id: string }[]>`
+    INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
+    VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming', clock_timestamp())
+    RETURNING id
+  `;
+  await runAssistantTurn(ctx, {
+    sessionId,
+    chatId,
+    assistantMessageId: nextAssistant!.id,
+    // v1.8.2: charge this turn's actual tool invocations against the budget.
+    // One assistant message can emit multiple tool_calls, so we add the run
+    // count, not 1. The next turn's budget check sees the cumulative total.
+    toolsUsed: toolsUsed + result.toolCalls.length,
+    // v1.11.6: append the just-executed tool calls to the per-turn history
+    // so the next runAssistantTurn's doom-loop check can see them. We don't
+    // cap the array length here — per-turn budgets keep it bounded
+    // (typically <30 entries), and slicing happens inside detectDoomLoop.
+    recentToolCalls: [...args.recentToolCalls, ...result.toolCalls],
+    signal,
+  });
+}
--- a/apps/server/src/services/inference/turn.ts
+++ b/apps/server/src/services/inference/turn.ts
@@ -0,0 +1,329 @@
+import type { FastifyBaseLogger } from 'fastify';
+import type { Sql } from '../../db.js';
+import type { Config } from '../../config.js';
+import type {
+  Agent,
+  ErrorReason,
+  Message,
+  MessageMetadata,
+  Project,
+  Session,
+  ToolCall,
+  UserStreamFrame,
+} from '../../types/api.js';
+import { ALL_TOOLS } from '../tools.js';
+import { resolveProjectRoot } from '../path_guard.js';
+import { maybeAutoNameChat } from '../auto_name.js';
+import { getAgentById } from '../agents.js';
+import * as compaction from '../compaction.js';
+import * as modelContext from '../model-context.js';
+import type { Broker } from '../broker.js';
+import { resolveToolBudget } from './budget.js';
+import {
+  DOOM_LOOP_THRESHOLD,
+  detectDoomLoop,
+} from './sentinels.js';
+import {
+  buildMessagesPayload,
+  loadContext,
+} from './payload.js';
+import {
+  finalizeCompletion,
+  handleAbortOrError,
+} from './error-handler.js';
+import {
+  executeStreamPhase,
+  streamCompletion,
+} from './stream-phase.js';
+import { executeToolPhase } from './tool-phase.js';
+import { DB_FLUSH_INTERVAL_MS, type StreamPhaseState } from './types.js';
+import {
+  runCapHitSummary,
+  runDoomLoopSummary,
+} from './sentinel-summaries.js';
+
+// v1.12.4: re-exported so external callers (tests, future consumers) keep
+// importing from services/inference.js as the public surface.
+export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js';
+export { buildMessagesPayload } from './payload.js';
+
+export interface InferenceFrame {
+  type:
+    | 'message_started'
+    | 'delta'
+    | 'tool_call'
+    | 'tool_result'
+    | 'message_complete'
+    | 'usage'
+    | 'messages_deleted'
+    | 'session_renamed'
+    | 'chat_renamed'
+    | 'error';
+  message_id?: string;
+  message_ids?: string[];
+  chat_id?: string;
+  tool_message_id?: string;
+  tool_call_id?: string;
+  // v1.8.2: 'system' added so cap-hit sentinel messages can announce themselves
+  // through the normal message_started → delta → message_complete sequence.
+  role?: 'assistant' | 'tool' | 'user' | 'system';
+  content?: string;
+  tool_call?: ToolCall;
+  output?: unknown;
+  truncated?: boolean;
+  error?: string;
+  // v1.8.2: structured error reason. Set on `type: 'error'` so the UI can
+  // surface a specific message; `error` stays the human-readable text.
+  reason?: ErrorReason;
+  // v1.8.2: piggybacks on `message_complete` so static or terminally-resolved
+  // messages can carry their persisted metadata to the live stream without a
+  // refetch (sentinels carry { kind: 'cap_hit', ... }; failed messages carry
+  // { kind: 'error', ... }).
+  metadata?: MessageMetadata | null;
+  tokens_used?: number | null;
+  ctx_used?: number | null;
+  ctx_max?: number | null;
+  completion_tokens?: number | null;
+  started_at?: string | null;
+  finished_at?: string | null;
+  model?: string;
+  session_id?: string;
+  name?: string;
+}
+
+export type FramePublisher = (sessionId: string, frame: InferenceFrame) => void;
+
+export interface InferenceContext {
+  sql: Sql;
+  config: Config;
+  log: FastifyBaseLogger;
+  publish: FramePublisher;
+  publishUser: (frame: UserStreamFrame) => void;
+  // v1.11: passed through so compaction.process can publish 'compacted'
+  // frames on the same session WS channel useSessionStream subscribes to.
+  // Compaction is the only path that needs the raw broker handle (regular
+  // inference goes through `publish`); keeping a separate field avoids
+  // tempting other code paths into bypassing the session-id binding.
+  broker: Broker;
+}
+
+// v1.12.4: payload assembly extracted to ./inference/payload.ts (tests
+// import buildMessagesPayload from this module, so a re-export below
+// preserves the public surface). Stream + tool phases extracted to
+// ./inference/stream-phase.ts and ./inference/tool-phase.ts.
+
+export interface StreamResult {
+  finishReason: string | null;
+  content: string;
+  toolCalls: ToolCall[];
+  promptTokens: number | null;
+  completionTokens: number | null;
+  // v1.13.1-C: reasoning text accumulated across reasoning-delta parts.
+  // Empty string when the model doesn't emit reasoning (most cases).
+  reasoning: string;
+}
+
+
+export interface TurnArgs {
+  sessionId: string;
+  chatId: string;
+  assistantMessageId: string;
+  // v1.8.2: cumulative tool calls executed this run. Compared against the
+  // resolved budget at the top of each turn. Replaces the older `depth`
+  // counter (which counted iterations, not invocations).
+  toolsUsed: number;
+  // v1.11.6: ordered tool calls executed in this user-message turn (across
+  // recursive runAssistantTurn invocations). Reset to [] at user-message
+  // boundaries by runInference, same as toolsUsed. Doom-loop check at the
+  // top of runAssistantTurn slices the last DOOM_LOOP_THRESHOLD entries.
+  recentToolCalls: ToolCall[];
+  signal: AbortSignal | undefined;
+}
+
+
+export async function runAssistantTurn(
+  ctx: InferenceContext,
+  args: TurnArgs,
+): Promise<void> {
+  const { sessionId, chatId } = args;
+
+  // v1.11: if the prior turn flagged this chat for compaction, run it first
+  // so loadContext below reads the post-compaction history. We swallow
+  // compaction failures (clearing the flag so we don't loop) and proceed
+  // with the un-compacted history — a slow turn that hits the model's
+  // hard limit is recoverable; a dead session is not.
+  const chatFlag = await ctx.sql<{ needs_compaction: boolean }[]>`
+    SELECT needs_compaction FROM chats WHERE id = ${chatId}
+  `;
+  if (chatFlag[0]?.needs_compaction) {
+    try {
+      await compaction.process({
+        sql: ctx.sql,
+        config: ctx.config,
+        log: ctx.log,
+        broker: ctx.broker,
+        chatId,
+      });
+    } catch (err) {
+      ctx.log.warn({ err, chatId }, 'auto-compaction failed; clearing flag and proceeding');
+      await ctx.sql`UPDATE chats SET needs_compaction = false WHERE id = ${chatId}`;
+    }
+  }
+
+  const loaded = await loadContext(ctx.sql, sessionId, chatId);
+  if (!loaded) {
+    ctx.log.warn({ sessionId }, 'inference: session or project missing');
+    return;
+  }
+  const { session, project, history } = loaded;
+  const projectRoot = await resolveProjectRoot(project.path);
+  // Agent resolution is per-turn so PATCH agent_id mid-conversation takes
+  // effect on the next message. Unknown agent_id returns null silently —
+  // session falls back to base prompt + all tools + default temperature.
+  const agent = session.agent_id
+    ? await getAgentById(project.path, session.agent_id)
+    : null;
+
+  // v1.8.2: cap-hit replaces the older "tool loop depth exceeded" failure.
+  // When we've already burned the budget *before* this turn even runs, we
+  // skip straight to the summary flow — the in-flight assistant message slot
+  // gets reused for the wrap-up reply instead of being marked failed.
+  const budget = resolveToolBudget(agent);
+  if (args.toolsUsed >= budget) {
+    await runCapHitSummary(ctx, args, session, project, history, agent, budget);
+    return;
+  }
+
+  // v1.11.6: doom-loop guard. Detected BEFORE the budget cap (the model can
+  // burn through 3 identical calls long before the 15-call budget fires).
+  // Same in-flight-slot-reuse pattern as runCapHitSummary — wrap-up reply
+  // lands in args.assistantMessageId, then a doom_loop sentinel is inserted
+  // to make the abort visible in the chat history.
+  const loop = detectDoomLoop(args.recentToolCalls);
+  if (loop) {
+    await runDoomLoopSummary(ctx, args, session, project, history, agent, loop);
+    return;
+  }
+
+  const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
+
+  // v1.11.8: resolve per-chat web-tools opt-in. Tri-state on the wire:
+  //   - session.web_search_enabled = null → inherit project default
+  //   - session.web_search_enabled = true/false → explicit
+  // Both web_search and web_fetch are gated by this single flag (the UI
+  // label is "Enable web search and fetch" — same store, both tools).
+  // Default is false unless explicitly opted in, matching the v1.9
+  // plumbing intent ("inert until Batch 8 ships the actual tools").
+  const webToolsEnabled =
+    session.web_search_enabled ?? project.default_web_search_enabled ?? false;
+
+  const state: StreamPhaseState = { accumulated: '', startedAt: null };
+  let result: StreamResult;
+  try {
+    result = await executeStreamPhase(ctx, args, session, messages, state, agent, webToolsEnabled);
+  } catch (err) {
+    await handleAbortOrError(ctx, args, state.accumulated, err);
+    return;
+  }
+
+  if (result.toolCalls.length > 0) {
+    await executeToolPhase(ctx, args, result, state.startedAt, session, projectRoot);
+    return;
+  }
+
+  await finalizeCompletion(ctx, args, result, state.startedAt, session);
+}
+
+export async function runInference(
+  ctx: InferenceContext,
+  sessionId: string,
+  chatId: string,
+  assistantMessageId: string,
+  signal?: AbortSignal
+): Promise<void> {
+  // v1.8.2: every fresh inference (initial send, regenerate, force_send,
+  // continue) starts with a clean budget. Tool-call accumulation across
+  // Continue invocations is what the hard ceiling guards against, not the
+  // per-call budget.
+  // v1.11.6: recentToolCalls also resets — doom-loop detection is scoped
+  // to a single user-message turn, so a Continue starts with no history.
+  return runAssistantTurn(ctx, {
+    sessionId,
+    chatId,
+    assistantMessageId,
+    toolsUsed: 0,
+    recentToolCalls: [],
+    signal,
+  });
+}
+
+// v1.8.2: cap-hit summary flow. Called instead of erroring when the loop
+// hits its budget. Reuses the in-flight assistant message slot to stream a
+// short wrap-up reply with the synthetic note prepended and tools disabled,
+// then always inserts a cap_hit sentinel afterward (regardless of summary
+// outcome) so the UI can show a Continue affordance.
+interface InferenceRegistration {
+  controller: AbortController;
+  completed: Promise<void>;
+}
+
+export function createInferenceRunner(
+  ctx: Omit<InferenceContext, 'publishUser'>,
+  publishUserFn: (user: string, frame: UserStreamFrame) => void
+) {
+  const registry = new Map<string, InferenceRegistration>();
+
+  return {
+    enqueue(sessionId: string, chatId: string, assistantMessageId: string, user: string) {
+      const callCtx: InferenceContext = {
+        ...ctx,
+        publishUser: (frame) => publishUserFn(user, frame),
+        // v1.11: broker comes in via ctx (set at registration time). Repeated
+        // here so the destructure carries it onto the per-call ctx without
+        // having to add it to every enqueue/cancel signature individually.
+        broker: ctx.broker,
+      };
+      // v1.8 mobile-tabs: announce working before the async loop starts so
+      // every device subscribed to the user channel sees the amber dot.
+      callCtx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'streaming', at: new Date().toISOString() });
+      const controller = new AbortController();
+      let resolveCompleted!: () => void;
+      const completed = new Promise<void>((res) => { resolveCompleted = res; });
+      const registration: InferenceRegistration = { controller, completed };
+      registry.set(chatId, registration);
+      void (async () => {
+        try {
+          await runInference(callCtx, sessionId, chatId, assistantMessageId, controller.signal);
+          setImmediate(() => {
+            void maybeAutoNameChat(callCtx, chatId, sessionId).catch((err: Error) => {
+              callCtx.log.warn({ err, chatId }, 'auto-name failed');
+            });
+          });
+        } catch (err) {
+          callCtx.log.error({ err }, 'unhandled inference error');
+        } finally {
+          resolveCompleted();
+          // Only clear our own registration; a force-send may have replaced it.
+          if (registry.get(chatId) === registration) {
+            registry.delete(chatId);
+          }
+        }
+      })();
+    },
+
+    async cancel(_sessionId: string, chatId: string): Promise<boolean> {
+      const reg = registry.get(chatId);
+      if (!reg) return false;
+      reg.controller.abort();
+      // Swallow — we just need to wait for the catch/finally to persist state.
+      await reg.completed.catch(() => {});
+      return true;
+    },
+
+    hasActive(chatId: string): boolean {
+      return registry.has(chatId);
+    },
+  };
+}
+
+export const _toolNames = ALL_TOOLS.map((t) => t.name);
--- a/apps/server/src/services/inference/types.ts
+++ b/apps/server/src/services/inference/types.ts
@@ -0,0 +1,13 @@
+// v1.12.4: shared inter-phase types/constants for the extracted phase files.
+// Lives here so stream-phase, tool-phase, and the summary functions still in
+// inference.ts can all reference the same definitions without circular imports.
+
+export interface StreamPhaseState {
+  accumulated: string;
+  startedAt: string | null;
+}
+
+// 500ms keeps the DB UPDATE rate bounded under heavy streaming. Used by
+// executeStreamPhase, runCapHitSummary, and runDoomLoopSummary — every site
+// that does a debounced content flush during streaming.
+export const DB_FLUSH_INTERVAL_MS = 500;
--- a/apps/server/src/services/inference/xml-parser.ts
+++ b/apps/server/src/services/inference/xml-parser.ts
@@ -0,0 +1,53 @@
+// v1.10.5: XML-tag tool-call fallback. Some models emit
+// <tool_call><function=foo><parameter=key>value</parameter></function></tool_call>
+// in plain content instead of using the OpenAI tool_calls JSON channel.
+// The streaming loop in inference.ts extracts these blocks via these helpers.
+
+export const XML_TOOL_OPEN = '<tool_call>';
+export const XML_TOOL_CLOSE = '</tool_call>';
+
+export function parseXmlToolCall(
+  block: string,
+): { name: string; args: Record<string, unknown> } | null {
+  const nameMatch = block.match(/<function=([^>]+)>/);
+  if (!nameMatch || !nameMatch[1]) return null;
+  const name = nameMatch[1].trim();
+  if (!name) return null;
+  const args: Record<string, unknown> = {};
+  // Non-greedy body so each <parameter=…>…</parameter> pair is matched
+  // independently even when multiple appear in the same block.
+  const paramRe = /<parameter=([^>]+)>([\s\S]*?)<\/parameter>/g;
+  for (const m of block.matchAll(paramRe)) {
+    const key = (m[1] ?? '').trim();
+    if (!key) continue;
+    const raw = (m[2] ?? '').trim();
+    try {
+      args[key] = JSON.parse(raw);
+    } catch {
+      args[key] = raw;
+    }
+  }
+  return { name, args };
+}
+
+// Locate the first character that begins (or completely contains) an
+// unfinished <tool_call> opener in `s`. Returns -1 when `s` can be flushed
+// to the client in full without risking a partial tag leak.
+//   Case 1: a full `<tool_call>` opener with no matching closer — caller
+//           must keep everything from that index forward until the next
+//           chunk arrives with the closer.
+//   Case 2: `s` ends with a strict prefix of `<tool_call>` (e.g. `<tool_c`).
+//           Caller must keep just that suffix in the buffer.
+// Note: case 1 assumes the calling loop already extracted every complete
+// <tool_call>…</tool_call> pair before reaching this check.
+export function partialXmlOpenerStart(s: string): number {
+  const fullOpener = s.indexOf(XML_TOOL_OPEN);
+  if (fullOpener !== -1) return fullOpener;
+  const lastLt = s.lastIndexOf('<');
+  if (lastLt === -1) return -1;
+  const suffix = s.slice(lastLt);
+  if (XML_TOOL_OPEN.startsWith(suffix) && suffix.length < XML_TOOL_OPEN.length) {
+    return lastLt;
+  }
+  return -1;
+}
--- a/apps/server/src/services/system-prompt.ts
+++ b/apps/server/src/services/system-prompt.ts
@@ -8,9 +8,19 @@
 //   + container guidance (this layer, NEW in v1.12)
 //   + agent.system_prompt          (resolved from data/AGENTS.md by getAgentById)
 //   + session.system_prompt OR project.default_system_prompt
+//
+// v1.13.8: byte-stability instrumentation. buildSystemPromptWithFingerprint
+// returns the assembled string plus a SHA-256 fingerprint and a per-session
+// drift signal. buildSystemPrompt stays a string→string shim for backward
+// compat (tests use it). No cache added — recon proved input-layer mtime
+// caches (this file + agents.ts) already deliver byte-stable inputs in
+// steady state. v1.13.8 measures that claim against production traffic
+// before any cache infrastructure earns its place.

+import { createHash } from 'node:crypto';
 import { readFile, stat } from 'node:fs/promises';
 import type { Agent, Project, Session } from '../types/api.js';
+import { getAgentsMtimes } from './agents.js';

 const BASE_SYSTEM_PROMPT = (projectPath: string) =>
  `You are BooCode Chat, a code investigation assistant. The user is working on a project located at ${projectPath}. Use the file-read tools (view_file, list_dir, grep, find_files) to investigate code when needed. Be concise. Cite file paths and line numbers when discussing code. Do not hallucinate file contents — read the file first. Tool results may be truncated; if so, narrow your query rather than guessing.`;
@@ -60,11 +70,94 @@ export function _resetContainerGuidanceCacheForTests(): void {
  cachedGuidance = null;
 }

-export async function buildSystemPrompt(
+// v1.13.8: expose the mtime currently held in the BOOCHAT cache so the
+// fingerprint log can stamp it without re-statting (no I/O race against
+// getContainerGuidance, which is the canonical mtime source).
+function getCachedGuidanceMtime(): number | null {
+  if (!cachedGuidance) return null;
+  // mtime=0 is the sentinel for "file is missing" (set in the catch above).
+  // Surface it as null so the log/diff doesn't treat absence as a number.
+  return cachedGuidance.mtime > 0 ? cachedGuidance.mtime : null;
+}
+
+// v1.13.8: fingerprint emitted per turn, observer state keyed by session.
+// Field set is intentionally small — we want the diff between two
+// fingerprints to point at the exact input that drifted, not bury the
+// signal in noise.
+export interface PrefixFingerprint {
+  msg: 'prefix-fingerprint';
+  project_id: string;
+  agent_id: string | null;
+  agent_name: string | null;
+  session_id: string;
+  prefix_hash: string;
+  prefix_length: number;
+  mtime_boochat: number | null;
+  mtime_agents_global: number | null;
+  mtime_agents_project: number | null;
+  has_agent_system_prompt: boolean;
+  has_session_override: boolean;
+  has_project_override: boolean;
+}
+
+export interface PrefixDrift {
+  msg: 'prefix-drift';
+  session_id: string;
+  prev_hash: string;
+  new_hash: string;
+  prev_length: number;
+  new_length: number;
+  // Names of fields in PrefixFingerprint (excluding the hash + length pair
+  // and the session_id key itself) whose values differ between the previous
+  // observation and this one. The bug case is `changed_inputs: []` — hash
+  // differs but no tracked input moved, which means assembly is
+  // nondeterministic somewhere.
+  changed_inputs: string[];
+}
+
+// Fields tracked per-session for the drift diff. Stored alongside the hash
+// so we can recompute changed_inputs without re-running buildSystemPrompt.
+interface ObservedInputs {
+  agent_id: string | null;
+  mtime_boochat: number | null;
+  mtime_agents_global: number | null;
+  mtime_agents_project: number | null;
+  has_agent_system_prompt: boolean;
+  has_session_override: boolean;
+  has_project_override: boolean;
+}
+
+interface ObserverEntry {
+  hash: string;
+  length: number;
+  inputs: ObservedInputs;
+}
+
+// Unbounded by design for v1.13.8 (instrumentation, short-lived sessions in
+// the smoke test). TODO(v1.13.x follow-up if v1.13.8 surfaces stable):
+// LRU-bound this Map at 1000 sessions when the in-process surface lives long
+// enough to matter.
+const prefixObserver = new Map<string, ObserverEntry>();
+
+// Test-only: clear the observer so consecutive tests don't share state.
+export function _resetPrefixObserverForTests(): void {
+  prefixObserver.clear();
+}
+
+function computeChangedInputs(prev: ObservedInputs, curr: ObservedInputs): string[] {
+  const out: string[] = [];
+  const keys = Object.keys(curr) as (keyof ObservedInputs)[];
+  for (const k of keys) {
+    if (prev[k] !== curr[k]) out.push(k);
+  }
+  return out;
+}
+
+export async function buildSystemPromptWithFingerprint(
  project: Project,
  session: Session,
-  agent: Agent | null
-): Promise<string> {
+  agent: Agent | null,
+): Promise<{ prompt: string; fingerprint: PrefixFingerprint; drift: PrefixDrift | null }> {
  let out = BASE_SYSTEM_PROMPT(project.path);
  const guidance = await getContainerGuidance();
  if (guidance) {
@@ -79,5 +172,60 @@ export async function buildSystemPrompt(
  if (userPrompt.length > 0) {
    out += '\n\n' + userPrompt;
  }
-  return out;
+
+  const hash = createHash('sha256').update(out, 'utf8').digest('hex');
+  const agentsMtimes = getAgentsMtimes(project.path);
+  const inputs: ObservedInputs = {
+    agent_id: agent?.id ?? null,
+    mtime_boochat: getCachedGuidanceMtime(),
+    mtime_agents_global: agentsMtimes.global,
+    mtime_agents_project: agentsMtimes.project,
+    has_agent_system_prompt: !!(agent && agent.system_prompt.trim().length > 0),
+    has_session_override: sessionPrompt.length > 0,
+    has_project_override: projectPrompt.length > 0,
+  };
+
+  const fingerprint: PrefixFingerprint = {
+    msg: 'prefix-fingerprint',
+    project_id: project.id,
+    agent_id: agent?.id ?? null,
+    agent_name: agent?.name ?? null,
+    session_id: session.id,
+    prefix_hash: hash,
+    prefix_length: out.length,
+    mtime_boochat: inputs.mtime_boochat,
+    mtime_agents_global: inputs.mtime_agents_global,
+    mtime_agents_project: inputs.mtime_agents_project,
+    has_agent_system_prompt: inputs.has_agent_system_prompt,
+    has_session_override: inputs.has_session_override,
+    has_project_override: inputs.has_project_override,
+  };
+
+  let drift: PrefixDrift | null = null;
+  const prev = prefixObserver.get(session.id);
+  if (prev && prev.hash !== hash) {
+    drift = {
+      msg: 'prefix-drift',
+      session_id: session.id,
+      prev_hash: prev.hash,
+      new_hash: hash,
+      prev_length: prev.length,
+      new_length: out.length,
+      changed_inputs: computeChangedInputs(prev.inputs, inputs),
+    };
+  }
+  prefixObserver.set(session.id, { hash, length: out.length, inputs });
+
+  return { prompt: out, fingerprint, drift };
+}
+
+// Backward-compatible string-returning shim. Kept so existing callers
+// (tests, future code paths that don't want to log) work unchanged.
+export async function buildSystemPrompt(
+  project: Project,
+  session: Session,
+  agent: Agent | null,
+): Promise<string> {
+  const { prompt } = await buildSystemPromptWithFingerprint(project, session, agent);
+  return prompt;
 }
--- a/apps/server/src/services/tools.ts
+++ b/apps/server/src/services/tools.ts
@@ -8,6 +8,7 @@ import { getGitMeta } from './git_meta.js';
 import { findSkills, getSkillBody, getSkillResource } from './skills.js';
 import { webSearch } from './web_search.js';
 import { webFetch } from './web_fetch.js';
+import { readTruncation, truncateIfNeeded } from './truncate.js';
 // v1.12 Track B.2: codecontext tools. 8 wrappers re-exported from
 // tools/codecontext/index.ts. Each calls into services/codecontext_client.ts
 // which talks to the codecontext sidecar at http://codecontext:8080.
@@ -109,12 +110,22 @@ export const viewFile: ToolDef<ViewFileInputT> = {
    const slice = lines.slice(start - 1, end);
    const content = slice.join('\n');
    const truncated = total > end || start > 1;
+    // v1.13.5: stash the full file on tmpfs so the model can retrieve more
+    // via view_truncated_output(id) without re-reading the file (which it
+    // may not have project-relative-path access to in future agent setups).
+    // raw is bounded by MAX_FILE_BYTES (5MB), within truncateIfNeeded's cap.
+    const wrapped = await truncateIfNeeded({
+      fullContent: raw,
+      slicedContent: content,
+      wasTruncated: truncated,
+    });
    return {
      path: relative(projectRoot, real) || basename(real),
-      content,
+      content: wrapped.content,
      total_lines: total,
      returned_lines: [start, end],
-      truncated,
+      truncated: wrapped.truncated,
+      ...(wrapped.outputPath ? { outputPath: wrapped.outputPath } : {}),
    };
  },
 };
@@ -157,41 +168,64 @@ export const listDir: ToolDef<ListDirInputT> = {
      ? entries
      : entries.filter((e) => !e.name.startsWith('.'));
    const total = filtered.length;
-    const slice = filtered.slice(0, MAX_DIR_ENTRIES);
-    const out = await Promise.all(
-      slice.map(async (e) => {
-        const child = resolve(real, e.name);
-        let size: number | undefined;
-        if (e.isFile()) {
-          try {
-            const cs = await stat(child);
-            size = cs.size;
-          } catch {
-            /* ignore */
-          }
-        }
-        return {
-          name: e.name,
-          type: e.isDirectory() ? ('dir' as const) : ('file' as const),
-          ...(size != null ? { size } : {}),
-        };
-      })
-    );
-    // v1.11.7: filter entries whose project-relative path matches a secret
-    // pattern. Each entry is tested using the project-rel dir + its name
-    // so the pattern's path/segment semantics work for nested dirs like
-    // `.aws/`. The count is surfaced via `pathguard_note` — we never list
-    // the hidden paths (defeats the purpose).
+    const wasTruncated = total > MAX_DIR_ENTRIES;
    const relDir = relative(projectRoot, real) || '.';
+    // v1.13.5: when we'd truncate, render the FULL list to tmpfs so
+    // view_truncated_output can serve it. Stat sizes for all entries when
+    // truncating so the stored view matches the visible shape; this is the
+    // one extra cost for big directories, bounded by total entries (which
+    // is itself bounded by filesystem behavior).
+    const processOne = async (e: typeof filtered[number]) => {
+      const child = resolve(real, e.name);
+      let size: number | undefined;
+      if (e.isFile()) {
+        try {
+          const cs = await stat(child);
+          size = cs.size;
+        } catch { /* ignore */ }
+      }
+      return {
+        name: e.name,
+        type: e.isDirectory() ? ('dir' as const) : ('file' as const),
+        ...(size != null ? { size } : {}),
+      };
+    };
+    const slice = filtered.slice(0, MAX_DIR_ENTRIES);
+    const out = await Promise.all(slice.map(processOne));
+    // v1.11.7: filter entries whose project-relative path matches a secret
+    // pattern. The same filter applies to the full-list snapshot below so
+    // the stashed file never holds entries the slice would have hidden.
    const secretFilter = filterSecretEntries(out, (e) =>
      relDir === '.' ? e.name : `${relDir}/${e.name}`,
    );
+    let outputPath: string | undefined;
+    if (wasTruncated) {
+      const fullProcessed = await Promise.all(filtered.map(processOne));
+      const fullFiltered = filterSecretEntries(fullProcessed, (e) =>
+        relDir === '.' ? e.name : `${relDir}/${e.name}`,
+      );
+      // One line per entry, view_truncated_output's line slicing semantics
+      // map cleanly. Format: "<type>\t<name>[\tsize=N]". Header documents
+      // the shape so the model can grep / regex without prior schema lookup.
+      const header = `# list_dir ${relDir} — ${fullFiltered.kept.length} entries`;
+      const lines = [header, ...fullFiltered.kept.map((e) => {
+        const sz = 'size' in e && e.size != null ? `\tsize=${e.size}` : '';
+        return `${e.type}\t${e.name}${sz}`;
+      })];
+      const wrapped = await truncateIfNeeded({
+        fullContent: lines.join('\n'),
+        slicedContent: '',
+        wasTruncated: true,
+      });
+      outputPath = wrapped.outputPath;
+    }
    return {
      path: relDir,
      entries: secretFilter.kept,
      total: secretFilter.kept.length,
-      truncated: total > MAX_DIR_ENTRIES,
+      truncated: wasTruncated,
      ...(secretFilter.note ? { pathguard_note: secretFilter.note } : {}),
+      ...(outputPath ? { outputPath } : {}),
    };
  },
 };
@@ -315,6 +349,71 @@ export const findFiles: ToolDef<FindFilesInputT> = {
  },
 };

+// v1.13.5: retrieves the full content of a previously-truncated tool output
+// via the opaque id stamped on the original tool_result. Line-based slicing
+// matches view_file's mental model so the model uses the same affordances.
+// Tmpfs-backed, 7-day TTL (see services/truncate.ts).
+const VIEW_TRUNCATED_DEFAULT_LINES = 200;
+
+const ViewTruncatedOutputInput = z.object({
+  id: z.string().regex(/^tr_[0-9a-v]{12}$/),
+  start_line: z.number().int().positive().optional(),
+  end_line: z.number().int().positive().optional(),
+});
+type ViewTruncatedOutputInputT = z.infer<typeof ViewTruncatedOutputInput>;
+
+export const viewTruncatedOutput: ToolDef<ViewTruncatedOutputInputT> = {
+  name: 'view_truncated_output',
+  description: `Retrieve the full content of a previously-truncated tool output by its outputPath id. When a tool returns { truncated: true, outputPath: "tr_..." }, call this to view the full content. Defaults to the first ${VIEW_TRUNCATED_DEFAULT_LINES} lines. Use start_line and end_line (1-indexed, inclusive) to slice. Stored for 7 days.`,
+  inputSchema: ViewTruncatedOutputInput,
+  jsonSchema: {
+    type: 'function',
+    function: {
+      name: 'view_truncated_output',
+      description: `Retrieve the full content of a previously-truncated tool output by its outputPath id. Returns the first ${VIEW_TRUNCATED_DEFAULT_LINES} lines by default; use start_line/end_line to slice. Stored for 7 days.`,
+      parameters: {
+        type: 'object',
+        properties: {
+          id: { type: 'string', description: 'The outputPath value from an earlier truncated tool result (e.g. "tr_abc123def456").' },
+          start_line: { type: 'integer', description: 'First line (1-indexed). Default 1.' },
+          end_line: { type: 'integer', description: `Last line (1-indexed, inclusive). Default ${VIEW_TRUNCATED_DEFAULT_LINES} lines past start.` },
+        },
+        required: ['id'],
+        additionalProperties: false,
+      },
+    },
+  },
+  async execute(input, _projectRoot) {
+    const content = await readTruncation(input.id);
+    if (content === null) {
+      return {
+        id: input.id,
+        content: '',
+        truncated: false,
+        error: `No truncation found for id "${input.id}". It may have been pruned (7-day TTL) or never existed.`,
+      };
+    }
+    const lines = content.split('\n');
+    const total = lines.length;
+    let start = input.start_line ?? 1;
+    let end = input.end_line ?? Math.min(total, start + VIEW_TRUNCATED_DEFAULT_LINES - 1);
+    if (start < 1) start = 1;
+    if (end > total) end = total;
+    if (end < start) end = start;
+    const slice = lines.slice(start - 1, end).join('\n');
+    // Re-slicing this view isn't truncation in the dual-write sense — the
+    // model already has the id; no point stashing the slice again.
+    const truncated = total > end || start > 1;
+    return {
+      id: input.id,
+      content: slice,
+      total_lines: total,
+      returned_lines: [start, end],
+      truncated,
+    };
+  },
+};
+
 // v1.8 Level 1 branch awareness: gives the model a read-only view of the
 // project's git state. No path input — operates on the inference-resolved
 // project root via getGitMeta. Subprocess runs with a 2s timeout (see git_meta).
@@ -527,8 +626,14 @@ export const askUserInput: ToolDef<AskUserInputInputT> = {
  },
 };

+// v1.13.3: alpha-sorted by tool.name at module load. llama.cpp's prompt
+// cache hits on byte-identical prefixes; the tool list lives near the top
+// of the system prompt, so any order drift would invalidate every cached
+// turn. Single source of truth for ordering lives here — toolJsonSchemas()
+// and TOOLS_BY_NAME inherit it.
 export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
  viewFile as ToolDef<unknown>,
+  viewTruncatedOutput as ToolDef<unknown>,
  listDir as ToolDef<unknown>,
  grep as ToolDef<unknown>,
  findFiles as ToolDef<unknown>,
@@ -553,7 +658,7 @@ export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
  watchChanges as ToolDef<unknown>,
  getSemanticNeighborhoods as ToolDef<unknown>,
  getFrameworkAnalysis as ToolDef<unknown>,
-];
+].sort((a, b) => a.name.localeCompare(b.name));

 // v1.8.2: forward-compatible read-only whitelist. An agent whose `tools` is
 // fully contained in this set gets a generous default tool budget (30);
@@ -565,6 +670,7 @@ export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
 // project state, so it belongs in the read-only set for budget purposes.
 export const READ_ONLY_TOOL_NAMES = [
  'view_file',
+  'view_truncated_output',
  'list_dir',
  'grep',
  'find_files',
@@ -594,6 +700,64 @@ export const TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntrie
  ALL_TOOLS.map((t) => [t.name, t])
 );

+// v1.13.15-tools: tiered tool loading. BOOCODE_TOOLS env var (`core` |
+// `standard` | `all`) filters the agent's tool whitelist before LLM dispatch.
+// Daily-driver token win on qwen3.6-35b-a3b — the 35B-A3B MoE benefits from
+// any prompt-cache stability win (fewer tools = shorter, more stable tool
+// schemas in the system prompt). Pattern lift from eyaltoledano/claude-task-
+// master (MIT + Commons Clause — pattern only, no code lift).
+//
+// The env var is a CEILING. It only narrows; never expands an agent's
+// declared whitelist. Default behavior (var unset) is unchanged: all tools.
+export const CORE_TOOL_NAMES = [
+  'view_file',
+  'list_dir',
+  'grep',
+  'find_files',
+] as const;
+
+export const STANDARD_TOOL_NAMES = [
+  ...CORE_TOOL_NAMES,
+  'web_search',
+  'web_fetch',
+  'git_status',
+  'get_codebase_overview',
+  'get_file_analysis',
+  'get_symbol_info',
+  'search_symbols',
+  'get_dependencies',
+  'watch_changes',
+  'get_semantic_neighborhoods',
+  'get_framework_analysis',
+] as const;
+
+// Module-load validation: every name in CORE / STANDARD must exist in
+// TOOLS_BY_NAME. Catches typos and stale tier definitions before they reach
+// production; server boot fails loudly rather than silently filtering valid
+// tools out of agent whitelists.
+for (const name of CORE_TOOL_NAMES) {
+  if (!TOOLS_BY_NAME[name]) {
+    throw new Error(`CORE_TOOL_NAMES references unknown tool: '${name}'`);
+  }
+}
+for (const name of STANDARD_TOOL_NAMES) {
+  if (!TOOLS_BY_NAME[name]) {
+    throw new Error(`STANDARD_TOOL_NAMES references unknown tool: '${name}'`);
+  }
+}
+
+export function resolveToolTier(tier: string | undefined): readonly string[] {
+  switch ((tier ?? 'all').toLowerCase()) {
+    case 'core':
+      return CORE_TOOL_NAMES;
+    case 'standard':
+      return STANDARD_TOOL_NAMES;
+    case 'all':
+    default:
+      return ALL_TOOLS.map((t) => t.name);
+  }
+}
+
 export function toolJsonSchemas(): ToolJsonSchema[] {
  return ALL_TOOLS.map((t) => t.jsonSchema);
 }
--- a/apps/server/src/services/truncate.ts
+++ b/apps/server/src/services/truncate.ts
@@ -0,0 +1,170 @@
+import { promises as fs } from 'fs';
+import { randomBytes } from 'crypto';
+import path from 'path';
+import type { Sql } from '../db.js';
+
+// v1.13.5: opencode-style truncation storage. When a tool slice would cut
+// content the model might still want, we store the full text on tmpfs and
+// hand the model an opaque id. view_truncated_output(id) retrieves it.
+//
+// Tmpfs path means full content vanishes on container restart; chats that
+// outlive a restart lose retrieval (acceptable — the user has usually moved
+// on or the data is stale). 7-day TTL + orphan reap bound disk growth via
+// the periodic sweeper in index.ts.
+
+export const TRUNCATION_DIR = process.env.BOOCODE_TRUNCATION_DIR ?? '/tmp/boocode-truncations';
+export const TRUNCATION_TTL_MS = 7 * 24 * 60 * 60 * 1000;
+// Matches view_file's MAX_FILE_BYTES — anything bigger was already refused
+// at the source tool's size check, so we never see it here.
+export const MAX_TRUNCATION_BYTES = 5 * 1024 * 1024;
+
+const ID_RE = /^tr_[0-9a-v]{12}$/;
+
+let dirEnsured = false;
+async function ensureDir(): Promise<void> {
+  if (dirEnsured) return;
+  await fs.mkdir(TRUNCATION_DIR, { recursive: true, mode: 0o700 });
+  dirEnsured = true;
+}
+
+// 12 base32 chars ≈ 60 bits of entropy. Collision probability across a
+// 7-day window with ~thousands of truncations is essentially zero.
+function newId(): string {
+  const buf = randomBytes(8);
+  const alphabet = '0123456789abcdefghijklmnopqrstuv';
+  let out = 'tr_';
+  for (const byte of buf) {
+    out += alphabet[byte & 0x1f];
+    out += alphabet[(byte >> 3) & 0x1f];
+  }
+  return out.slice(0, 15);
+}
+
+function idToPath(id: string): string {
+  // Defense-in-depth: the model never supplies a path component (only ids),
+  // but a malformed id from anywhere else shouldn't escape TRUNCATION_DIR.
+  if (!ID_RE.test(id)) {
+    throw new Error(`Invalid truncation id: ${id}`);
+  }
+  return path.join(TRUNCATION_DIR, id);
+}
+
+export async function storeTruncation(fullContent: string): Promise<string> {
+  const bytes = Buffer.byteLength(fullContent, 'utf8');
+  if (bytes > MAX_TRUNCATION_BYTES) {
+    throw new Error(`Truncation content ${bytes}B exceeds ${MAX_TRUNCATION_BYTES}B cap`);
+  }
+  await ensureDir();
+  const id = newId();
+  await fs.writeFile(idToPath(id), fullContent, { encoding: 'utf8', mode: 0o600 });
+  return id;
+}
+
+export async function readTruncation(id: string): Promise<string | null> {
+  if (!ID_RE.test(id)) return null;
+  try {
+    return await fs.readFile(idToPath(id), { encoding: 'utf8' });
+  } catch (err) {
+    if ((err as NodeJS.ErrnoException).code === 'ENOENT') return null;
+    throw err;
+  }
+}
+
+// Wrap a tool's output. If wasTruncated, stash the full content on tmpfs
+// and return its id alongside the sliced view the tool would have returned.
+// Storage failure (disk full, permission denied) is non-fatal — the sliced
+// view ships without an outputPath, which is exactly what the tool returned
+// before v1.13.5. Same goes for content over MAX_TRUNCATION_BYTES.
+export async function truncateIfNeeded(args: {
+  fullContent: string;
+  slicedContent: string;
+  wasTruncated: boolean;
+}): Promise<{ content: string; truncated: boolean; outputPath?: string }> {
+  if (!args.wasTruncated) {
+    return { content: args.slicedContent, truncated: false };
+  }
+  const bytes = Buffer.byteLength(args.fullContent, 'utf8');
+  if (bytes > MAX_TRUNCATION_BYTES) {
+    return { content: args.slicedContent, truncated: true };
+  }
+  try {
+    const outputPath = await storeTruncation(args.fullContent);
+    return { content: args.slicedContent, truncated: true, outputPath };
+  } catch {
+    return { content: args.slicedContent, truncated: true };
+  }
+}
+
+// Periodic cleanup. Called from index.ts's sweep interval (v1.13.3 cadence).
+// Pass 1: TTL — anything older than TRUNCATION_TTL_MS is gone.
+// Pass 2: orphans — files with no live message_parts.payload->'output'->>'outputPath'
+// reference. Catches the case where a part referencing an outputPath got
+// hidden by prune (v1.13.4) and the file is now unreachable.
+export async function cleanupTruncations(args: {
+  sql: Sql;
+  log: { warn: (obj: object, msg: string) => void; error: (obj: object, msg: string) => void };
+}): Promise<{ ttlReaped: number; orphanReaped: number }> {
+  await ensureDir();
+  const cutoff = Date.now() - TRUNCATION_TTL_MS;
+  let ttlReaped = 0;
+  let orphanReaped = 0;
+
+  let entries: string[];
+  try {
+    entries = await fs.readdir(TRUNCATION_DIR);
+  } catch (err) {
+    args.log.error({ err }, 'cleanupTruncations readdir failed');
+    return { ttlReaped, orphanReaped };
+  }
+  if (entries.length === 0) return { ttlReaped, orphanReaped };
+
+  const survivors: string[] = [];
+  for (const name of entries) {
+    if (!ID_RE.test(name)) continue;
+    const full = path.join(TRUNCATION_DIR, name);
+    try {
+      const stat = await fs.stat(full);
+      if (stat.mtimeMs < cutoff) {
+        await fs.unlink(full);
+        ttlReaped += 1;
+      } else {
+        survivors.push(name);
+      }
+    } catch {
+      // File vanished between readdir and stat — fine.
+    }
+  }
+
+  if (survivors.length === 0) {
+    if (ttlReaped > 0) {
+      args.log.warn({ ttlReaped, orphanReaped: 0 }, 'cleanupTruncations reaped files');
+    }
+    return { ttlReaped, orphanReaped: 0 };
+  }
+
+  // outputPath rides inside the tool_result part's payload.output object
+  // (see partsFromToolMessage in inference/parts.ts), so the json path is
+  // payload->'output'->>'outputPath' rather than top-level.
+  const referenced = await args.sql<{ output_path: string }[]>`
+    SELECT DISTINCT p.payload->'output'->>'outputPath' AS output_path
+    FROM message_parts p
+    WHERE p.kind = 'tool_result'
+      AND p.payload->'output' ? 'outputPath'
+      AND p.payload->'output'->>'outputPath' = ANY(${survivors})
+  `;
+  const live = new Set(referenced.map((r) => r.output_path));
+  for (const name of survivors) {
+    if (live.has(name)) continue;
+    try {
+      await fs.unlink(path.join(TRUNCATION_DIR, name));
+      orphanReaped += 1;
+    } catch {
+      // ignore
+    }
+  }
+
+  if (ttlReaped > 0 || orphanReaped > 0) {
+    args.log.warn({ ttlReaped, orphanReaped }, 'cleanupTruncations reaped files');
+  }
+  return { ttlReaped, orphanReaped };
+}
--- a/apps/server/src/services/web_fetch.ts
+++ b/apps/server/src/services/web_fetch.ts
@@ -11,6 +11,7 @@
 import { z } from 'zod';
 import { isPublicUrl } from './url_guard.js';
 import type { ToolDef } from './tools.js';
+import { truncateIfNeeded } from './truncate.js';

 const WebFetchInput = z.object({
  url: z.string().min(1).max(2048),
@@ -62,6 +63,39 @@ function stripHtml(html: string): { text: string; title: string | undefined } {
  return { text, title };
 }

+// v1.11.10: streaming body reader. Aborts the response stream the instant
+// cumulative bytes cross maxBytes, so a server that lies about
+// Content-Length (or omits it entirely) can't make us buffer gigabytes
+// before the post-read check fires. reader.cancel() releases the
+// underlying connection on the spot.
+async function readBodyCapped(
+  res: Response,
+  maxBytes: number,
+): Promise<{ ok: true; body: string } | { ok: false; bytesRead: number }> {
+  if (!res.body) return { ok: true, body: '' };
+  const reader = res.body.getReader();
+  const chunks: Uint8Array[] = [];
+  let total = 0;
+  try {
+    while (true) {
+      const { done, value } = await reader.read();
+      if (done) break;
+      total += value.byteLength;
+      if (total > maxBytes) {
+        // Best-effort cancel — surfaces on the server side as a closed
+        // connection and (in our tests) fires the ReadableStream's
+        // cancel() callback so we can assert the abort happened.
+        await reader.cancel();
+        return { ok: false, bytesRead: total };
+      }
+      chunks.push(value);
+    }
+  } finally {
+    try { reader.releaseLock(); } catch { /* already released by cancel() */ }
+  }
+  return { ok: true, body: Buffer.concat(chunks).toString('utf8') };
+}
+
 function truncate(text: string, max: number): { content: string; truncated: boolean } {
  if (text.length <= max) return { content: text, truncated: false };
  const omitted = text.length - max;
@@ -159,19 +193,20 @@ export async function executeWebFetch(
    }
  }
  const contentType = (res.headers.get('content-type') ?? '').toLowerCase();
-  // Read body. We rely on the 5MB cap by checking length after consumption
-  // — most malicious or accidental large responses also exceed it via the
-  // Content-Length pre-flight above. A truly hostile server that lies
-  // about length AND streams gigabytes would defeat that; the per-hop
-  // 15s timeout is the secondary fence.
-  const body = await res.text();
-  // v1.11.8 review: byte-count, not char-count. A 5MB cap on body.length
-  // (UTF-16 code units) lets a multi-byte payload (emoji, CJK) pass when
-  // its wire size already exceeded MAX_BYTES.
-  const bodyBytes = Buffer.byteLength(body, 'utf8');
-  if (bodyBytes > MAX_BYTES) {
-    return { error: 'response_too_large', reason: `body ${bodyBytes} bytes > ${MAX_BYTES}` };
+  // v1.11.10: stream the body with a hard byte cap. Previously we read
+  // res.text() in one shot and then byte-length-checked — a server that
+  // lies about Content-Length (or omits it) could make us buffer
+  // gigabytes before the post-check fired. readBodyCapped aborts the
+  // stream the instant total bytes cross MAX_BYTES. The Content-Length
+  // pre-flight above stays as a cheap early reject for honest servers.
+  const read = await readBodyCapped(res, MAX_BYTES);
+  if (!read.ok) {
+    return {
+      error: 'body_too_large',
+      reason: `Response body exceeded ${MAX_BYTES} bytes (read ${read.bytesRead} before abort)`,
+    };
  }
+  const body = read.body;

  let textRaw: string;
  let title: string | undefined;
@@ -196,15 +231,24 @@ export async function executeWebFetch(
  }

  const truncated = truncate(textRaw, maxChars);
+  // v1.13.5: stash the full pre-slice body when truncation fires so the
+  // model can pull more via view_truncated_output(id) without re-fetching.
+  // textRaw is already bounded by MAX_BYTES (5MB), within truncate.ts's cap.
+  const wrapped = await truncateIfNeeded({
+    fullContent: textRaw,
+    slicedContent: truncated.content,
+    wasTruncated: truncated.truncated,
+  });
  // Report the FINAL URL (post-redirects) so the LLM knows where the body
  // came from — useful for citations and for the model to reason about
  // domain trust.
  return {
    url: currentUrl,
    title,
-    content: truncated.content,
+    content: wrapped.content,
    content_type: contentType,
-    truncated: truncated.truncated,
+    truncated: wrapped.truncated,
+    ...(wrapped.outputPath ? { outputPath: wrapped.outputPath } : {}),
  };
 }

--- a/apps/server/src/types/api.ts
+++ b/apps/server/src/types/api.ts
@@ -39,6 +39,19 @@ export interface Session {
  // project.default_web_search_enabled. Plumbed but inert in v1.9 — the
  // actual web_search tool ships in Batch 8.
  web_search_enabled: boolean | null;
+  // v1.12.1: server-side workspace pane layout. Replaces per-device
+  // localStorage so all devices viewing the session see the same panes.
+  workspace_panes: WorkspacePane[];
+}
+
+export type WorkspacePaneKind = 'chat' | 'terminal' | 'agent' | 'empty' | 'settings';
+
+export interface WorkspacePane {
+  id: string;
+  kind: WorkspacePaneKind;
+  chatId?: string;
+  chatIds: string[];
+  activeChatIdx: number;
 }

 // v1.8.1: agents come from two sources. 'global' = /data/AGENTS.md (always
@@ -173,6 +186,11 @@ export interface Message {
  // v1.8.2: per-message metadata. See MessageMetadata for the discriminated
  // shapes currently in use.
  metadata: MessageMetadata | null;
+  // v1.13.1-C: reasoning content captured from the model's reasoning stream
+  // (qwen3.6 etc.). Populated from message_parts via the messages_with_parts
+  // view's reasoning_parts column. Optional — most rows have no reasoning
+  // and the API may omit the field on legacy responses.
+  reasoning_parts?: Array<{ text: string }> | null;
  // v1.11: anchored rolling compaction. Optional so consumers that SELECT
  // the pre-v1.11 column set still type-check. See compaction.ts +
  // schema.sql for semantics.
@@ -273,6 +291,11 @@ export interface SessionRenamedFrame {
  session_id: string;
  name: string;
 }
+export interface SessionWorkspaceUpdatedFrame {
+  type: 'session_workspace_updated';
+  session_id: string;
+  workspace_panes: WorkspacePane[];
+}
 export interface SessionArchivedFrame {
  type: 'session_archived';
  session_id: string;
@@ -324,7 +347,7 @@ export interface ProjectUpdatedFrame {
 export interface ChatStatusFrame {
  type: 'chat_status';
  chat_id: string;
-  status: 'working' | 'idle' | 'error';
+  status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error';
  at: string;
  reason?: ErrorReason;
 }
@@ -335,6 +358,7 @@ export type UserStreamFrame =
  | SessionDeletedFrame
  | SessionUpdatedFrame
  | SessionRenamedFrame
+  | SessionWorkspaceUpdatedFrame
  | SessionArchivedFrame
  | ChatCreatedFrame
  | ChatUpdatedFrame
--- a/apps/server/src/types/ws-frames.ts
+++ b/apps/server/src/types/ws-frames.ts
@@ -0,0 +1,314 @@
+// v1.13.11-a: Zod schemas for every WebSocket frame published by the server.
+// Validation runs both on send (broker.publishFrame / publishUserFrame) and
+// on receive (apps/web/src/hooks/useSessionStream + useUserEvents). Catches
+// silent protocol drift between publisher and consumer.
+//
+// IMPORTANT: This file is duplicated byte-identical at
+// apps/web/src/api/ws-frames.ts. The two apps have separate tsconfigs and
+// no path alias; the duplication is sync-by-hand. A test asserts the two
+// files match. If you change one, change the other.
+//
+// Per-kind payload schemas (tool_call args, message_parts payloads, etc.)
+// stay z.unknown() in v1.13.11. Frame-level drift detection is the goal;
+// deep payload validation is follow-up work.
+
+import { z } from 'zod';
+
+// ---- shared primitives -----------------------------------------------------
+
+const Uuid = z.string().uuid();
+// Tool call IDs are model-emitted (e.g. "call_abc123") — not UUIDs.
+const ToolCallId = z.string().min(1);
+const IsoTimestamp = z.string().min(1);
+
+const ChatStatusValue = z.enum([
+  'streaming',
+  'tool_running',
+  'waiting_for_input',
+  'idle',
+  'error',
+]);
+
+const ErrorReasonValue = z.enum([
+  'llm_provider_error',
+  'doom_loop',
+  'doom_loop_summary_failed',
+  'cap_hit',
+  'cap_hit_summary_failed',
+]);
+
+const MessageRoleValue = z.enum(['user', 'assistant', 'system', 'tool']);
+
+const ToolCallShape = z.object({
+  id: ToolCallId,
+  name: z.string().min(1),
+  args: z.record(z.string(), z.unknown()),
+});
+
+// Free-form bags: opaque to the frame schema; deep validation is out of
+// scope. passthrough preserves unknown keys so the consumer sees the full
+// shape even when this schema doesn't enumerate every field.
+const OpaqueObject = z.object({}).passthrough();
+
+// ---- per-session channel frames --------------------------------------------
+
+export const SnapshotFrame = z.object({
+  type: z.literal('snapshot'),
+  messages: z.array(OpaqueObject),
+});
+
+export const MessageStartedFrame = z.object({
+  type: z.literal('message_started'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  role: MessageRoleValue,
+});
+
+export const DeltaFrame = z.object({
+  type: z.literal('delta'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  content: z.string(),
+});
+
+export const ToolCallFrame = z.object({
+  type: z.literal('tool_call'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  tool_call: ToolCallShape,
+});
+
+export const ToolResultFrame = z.object({
+  type: z.literal('tool_result'),
+  tool_message_id: Uuid,
+  chat_id: Uuid.optional(),
+  tool_call_id: ToolCallId,
+  output: z.unknown(),
+  truncated: z.boolean(),
+  error: z.string().optional(),
+});
+
+export const MessageCompleteFrame = z.object({
+  type: z.literal('message_complete'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  tokens_used: z.number().int().nonnegative().nullable().optional(),
+  ctx_used: z.number().int().nonnegative().nullable().optional(),
+  ctx_max: z.number().int().positive().nullable().optional(),
+  started_at: IsoTimestamp.nullable().optional(),
+  finished_at: IsoTimestamp.nullable().optional(),
+  model: z.string().optional(),
+  metadata: OpaqueObject.nullable().optional(),
+});
+
+export const UsageFrame = z.object({
+  type: z.literal('usage'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  completion_tokens: z.number().int().nonnegative().nullable(),
+  ctx_used: z.number().int().nonnegative().nullable(),
+  ctx_max: z.number().int().positive().nullable(),
+});
+
+export const MessagesDeletedFrame = z.object({
+  type: z.literal('messages_deleted'),
+  message_ids: z.array(Uuid),
+  chat_id: Uuid.optional(),
+});
+
+export const ChatRenamedFrame = z.object({
+  type: z.literal('chat_renamed'),
+  chat_id: Uuid,
+  name: z.string(),
+});
+
+export const CompactedFrame = z.object({
+  type: z.literal('compacted'),
+  session_id: Uuid,
+  chat_id: Uuid,
+  summary_message_id: Uuid,
+});
+
+export const ErrorFrame = z.object({
+  type: z.literal('error'),
+  message_id: Uuid.optional(),
+  chat_id: Uuid.optional(),
+  error: z.string(),
+  reason: ErrorReasonValue.optional(),
+});
+
+// ---- per-user channel frames (sidebar refresh) -----------------------------
+
+export const ChatStatusFrame = z.object({
+  type: z.literal('chat_status'),
+  chat_id: Uuid,
+  status: ChatStatusValue,
+  at: IsoTimestamp,
+  reason: ErrorReasonValue.optional(),
+});
+
+export const SessionUpdatedFrame = z.object({
+  type: z.literal('session_updated'),
+  session_id: Uuid,
+  project_id: Uuid,
+  name: z.string(),
+  updated_at: IsoTimestamp,
+});
+
+export const SessionRenamedFrame = z.object({
+  type: z.literal('session_renamed'),
+  session_id: Uuid,
+  name: z.string(),
+});
+
+export const SessionCreatedFrame = z.object({
+  type: z.literal('session_created'),
+  session: OpaqueObject,
+  project_id: Uuid,
+});
+
+export const SessionArchivedFrame = z.object({
+  type: z.literal('session_archived'),
+  session_id: Uuid,
+  project_id: Uuid,
+});
+
+export const SessionDeletedFrame = z.object({
+  type: z.literal('session_deleted'),
+  session_id: Uuid,
+  project_id: Uuid,
+});
+
+export const SessionWorkspaceUpdatedFrame = z.object({
+  type: z.literal('session_workspace_updated'),
+  session_id: Uuid,
+  workspace_panes: z.array(OpaqueObject),
+});
+
+export const ChatCreatedFrame = z.object({
+  type: z.literal('chat_created'),
+  chat: OpaqueObject,
+  session_id: Uuid,
+});
+
+export const ChatUpdatedFrame = z.object({
+  type: z.literal('chat_updated'),
+  chat_id: Uuid,
+  session_id: Uuid,
+  name: z.string().nullable(),
+  updated_at: IsoTimestamp,
+});
+
+export const ChatArchivedFrame = z.object({
+  type: z.literal('chat_archived'),
+  chat_id: Uuid,
+  session_id: Uuid,
+});
+
+export const ChatUnarchivedFrame = z.object({
+  type: z.literal('chat_unarchived'),
+  chat: OpaqueObject,
+});
+
+export const ChatDeletedFrame = z.object({
+  type: z.literal('chat_deleted'),
+  chat_id: Uuid,
+  session_id: Uuid,
+});
+
+export const ProjectCreatedFrame = z.object({
+  type: z.literal('project_created'),
+  project: OpaqueObject,
+});
+
+export const ProjectArchivedFrame = z.object({
+  type: z.literal('project_archived'),
+  project_id: Uuid,
+});
+
+export const ProjectUnarchivedFrame = z.object({
+  type: z.literal('project_unarchived'),
+  project: OpaqueObject,
+});
+
+export const ProjectUpdatedFrame = z.object({
+  type: z.literal('project_updated'),
+  project_id: Uuid,
+  name: z.string(),
+});
+
+export const ProjectDeletedFrame = z.object({
+  type: z.literal('project_deleted'),
+  project_id: Uuid,
+});
+
+// ---- discriminated union ---------------------------------------------------
+
+export const WsFrameSchema = z.discriminatedUnion('type', [
+  // per-session
+  SnapshotFrame,
+  MessageStartedFrame,
+  DeltaFrame,
+  ToolCallFrame,
+  ToolResultFrame,
+  MessageCompleteFrame,
+  UsageFrame,
+  MessagesDeletedFrame,
+  ChatRenamedFrame,
+  CompactedFrame,
+  ErrorFrame,
+  // per-user
+  ChatStatusFrame,
+  SessionUpdatedFrame,
+  SessionRenamedFrame,
+  SessionCreatedFrame,
+  SessionArchivedFrame,
+  SessionDeletedFrame,
+  SessionWorkspaceUpdatedFrame,
+  ChatCreatedFrame,
+  ChatUpdatedFrame,
+  ChatArchivedFrame,
+  ChatUnarchivedFrame,
+  ChatDeletedFrame,
+  ProjectCreatedFrame,
+  ProjectArchivedFrame,
+  ProjectUnarchivedFrame,
+  ProjectUpdatedFrame,
+  ProjectDeletedFrame,
+]);
+
+export type WsFrame = z.infer<typeof WsFrameSchema>;
+
+// Convenience: the set of known frame types. Useful for the publishFrame
+// helper to log the offending type name when validation fails. Kept in sync
+// by hand with the discriminated union above.
+export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
+  'snapshot',
+  'message_started',
+  'delta',
+  'tool_call',
+  'tool_result',
+  'message_complete',
+  'usage',
+  'messages_deleted',
+  'chat_renamed',
+  'compacted',
+  'error',
+  'chat_status',
+  'session_updated',
+  'session_renamed',
+  'session_created',
+  'session_archived',
+  'session_deleted',
+  'session_workspace_updated',
+  'chat_created',
+  'chat_updated',
+  'chat_archived',
+  'chat_unarchived',
+  'chat_deleted',
+  'project_created',
+  'project_archived',
+  'project_unarchived',
+  'project_updated',
+  'project_deleted',
+] as const;
--- a/apps/web/package.json
+++ b/apps/web/package.json
@@ -31,7 +31,8 @@
    "shiki": "^1.29.2",
    "sonner": "^2.0.7",
    "tailwind-merge": "^3.6.0",
-    "tw-animate-css": "^1.4.0"
+    "tw-animate-css": "^1.4.0",
+    "zod": "^3.23.8"
  },
  "devDependencies": {
    "@tailwindcss/postcss": "^4.3.0",
--- a/apps/web/src/api/client.ts
+++ b/apps/web/src/api/client.ts
@@ -12,6 +12,7 @@ import type {
  GitMeta,
  Skill,
  AskUserAnswer,
+  ToolCostStat,
 } from './types';

 export class ApiError extends Error {
@@ -143,6 +144,11 @@ export const api = {
      ),
    openChatsCount: (id: string) =>
      request<{ count: number }>(`/api/sessions/${id}/chats/open-count`),
+    updateWorkspacePanes: (id: string, panes: Session['workspace_panes']) =>
+      request<Session>(`/api/sessions/${id}/workspace`, {
+        method: 'PATCH',
+        body: JSON.stringify({ workspace_panes: panes }),
+      }),
  },

  chats: {
@@ -175,6 +181,11 @@ export const api = {
      request<{ ok: true }>(`/api/chats/${chatId}/compact`, { method: 'POST' }),
    stop: (chatId: string) =>
      request<{ stopped: boolean }>(`/api/chats/${chatId}/stop`, { method: 'POST' }),
+    discardStale: (chatId: string, messageId: string) =>
+      request<Message>(`/api/chats/${chatId}/discard_stale`, {
+        method: 'POST',
+        body: JSON.stringify({ message_id: messageId }),
+      }),
    forceSend: (chatId: string, content: string) =>
      request<{ user_message_id: string; assistant_message_id: string }>(
        `/api/chats/${chatId}/force_send`,
@@ -252,6 +263,14 @@ export const api = {
    list: () => request<{ skills: Skill[] }>('/api/skills'),
  },

+  // v1.13.10: per-tool cost rolling-window stats (last 100 calls per tool,
+  // equal-split attribution across multi-tool turns). Read endpoint backed by
+  // the tool_cost_stats view. AgentPicker consumes this for per-agent cost
+  // hints.
+  tools: {
+    costStats: () => request<{ stats: ToolCostStat[] }>('/api/tools/cost_stats'),
+  },
+
  settings: {
    get: () => request<Record<string, unknown>>('/api/settings'),
    patch: (body: Record<string, unknown>) =>
--- a/apps/web/src/api/types.ts
+++ b/apps/web/src/api/types.ts
@@ -1,6 +1,18 @@
 export const PROJECT_STATUSES = ['open', 'archived'] as const;
 export type ProjectStatus = typeof PROJECT_STATUSES[number];

+// v1.13.10: per-tool cost rolling-window stat. Returned by
+// GET /api/tools/cost_stats — one entry per tool with mean prompt/completion
+// tokens over the last 100 invocations. AgentPicker sums across an agent's
+// whitelisted tools for per-agent cost hints.
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+
 export interface Project {
  id: string;
  name: string;
@@ -34,6 +46,8 @@ export interface Session {
  agent_id: string | null;
  // v1.9: null = inherit from project.default_web_search_enabled.
  web_search_enabled: boolean | null;
+  // v1.12.1: server-authoritative pane layout, replaces localStorage.
+  workspace_panes: WorkspacePane[];
 }

 // v1.8.1: 'global' = /data/AGENTS.md (always-on), 'project' = per-project
@@ -159,6 +173,11 @@ export interface Message {
  // v1.8.2: per-message metadata; see MessageMetadata. null for the vast
  // majority of messages.
  metadata: MessageMetadata | null;
+  // v1.13.1-C: reasoning content captured from models that stream reasoning
+  // tokens separately (qwen3.6 etc.). Backend populates from message_parts;
+  // optional on the wire — frontend doesn't render this yet (reserved for
+  // a v1.14 UI surface).
+  reasoning_parts?: Array<{ text: string }> | null;
  // v1.11: anchored rolling compaction fields. Optional on the wire so that
  // older API responses (or test fixtures) parse without explicit nulls.
  //   summary       — true on the assistant row that holds the active
@@ -330,6 +349,17 @@ export type WsFrame =
      // to the client without a refetch.
      metadata?: MessageMetadata | null;
    }
+  // v1.12.2: live throughput frame, published mid-stream every ~500ms with
+  // the latest token + ctx counts so ChatThroughput can render tok/s and
+  // ctx_used while the model is still generating.
+  | {
+      type: 'usage';
+      message_id: string;
+      chat_id?: string;
+      completion_tokens: number | null;
+      ctx_used: number | null;
+      ctx_max: number | null;
+    }
  | { type: 'messages_deleted'; message_ids: string[]; chat_id?: string }
  | { type: 'chat_renamed'; chat_id: string; name: string }
  // v1.11: published by services/compaction.ts after the new anchored
--- a/apps/web/src/api/ws-frames.ts
+++ b/apps/web/src/api/ws-frames.ts
@@ -0,0 +1,314 @@
+// v1.13.11-a: Zod schemas for every WebSocket frame published by the server.
+// Validation runs both on send (broker.publishFrame / publishUserFrame) and
+// on receive (apps/web/src/hooks/useSessionStream + useUserEvents). Catches
+// silent protocol drift between publisher and consumer.
+//
+// IMPORTANT: This file is duplicated byte-identical at
+// apps/web/src/api/ws-frames.ts. The two apps have separate tsconfigs and
+// no path alias; the duplication is sync-by-hand. A test asserts the two
+// files match. If you change one, change the other.
+//
+// Per-kind payload schemas (tool_call args, message_parts payloads, etc.)
+// stay z.unknown() in v1.13.11. Frame-level drift detection is the goal;
+// deep payload validation is follow-up work.
+
+import { z } from 'zod';
+
+// ---- shared primitives -----------------------------------------------------
+
+const Uuid = z.string().uuid();
+// Tool call IDs are model-emitted (e.g. "call_abc123") — not UUIDs.
+const ToolCallId = z.string().min(1);
+const IsoTimestamp = z.string().min(1);
+
+const ChatStatusValue = z.enum([
+  'streaming',
+  'tool_running',
+  'waiting_for_input',
+  'idle',
+  'error',
+]);
+
+const ErrorReasonValue = z.enum([
+  'llm_provider_error',
+  'doom_loop',
+  'doom_loop_summary_failed',
+  'cap_hit',
+  'cap_hit_summary_failed',
+]);
+
+const MessageRoleValue = z.enum(['user', 'assistant', 'system', 'tool']);
+
+const ToolCallShape = z.object({
+  id: ToolCallId,
+  name: z.string().min(1),
+  args: z.record(z.string(), z.unknown()),
+});
+
+// Free-form bags: opaque to the frame schema; deep validation is out of
+// scope. passthrough preserves unknown keys so the consumer sees the full
+// shape even when this schema doesn't enumerate every field.
+const OpaqueObject = z.object({}).passthrough();
+
+// ---- per-session channel frames --------------------------------------------
+
+export const SnapshotFrame = z.object({
+  type: z.literal('snapshot'),
+  messages: z.array(OpaqueObject),
+});
+
+export const MessageStartedFrame = z.object({
+  type: z.literal('message_started'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  role: MessageRoleValue,
+});
+
+export const DeltaFrame = z.object({
+  type: z.literal('delta'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  content: z.string(),
+});
+
+export const ToolCallFrame = z.object({
+  type: z.literal('tool_call'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  tool_call: ToolCallShape,
+});
+
+export const ToolResultFrame = z.object({
+  type: z.literal('tool_result'),
+  tool_message_id: Uuid,
+  chat_id: Uuid.optional(),
+  tool_call_id: ToolCallId,
+  output: z.unknown(),
+  truncated: z.boolean(),
+  error: z.string().optional(),
+});
+
+export const MessageCompleteFrame = z.object({
+  type: z.literal('message_complete'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  tokens_used: z.number().int().nonnegative().nullable().optional(),
+  ctx_used: z.number().int().nonnegative().nullable().optional(),
+  ctx_max: z.number().int().positive().nullable().optional(),
+  started_at: IsoTimestamp.nullable().optional(),
+  finished_at: IsoTimestamp.nullable().optional(),
+  model: z.string().optional(),
+  metadata: OpaqueObject.nullable().optional(),
+});
+
+export const UsageFrame = z.object({
+  type: z.literal('usage'),
+  message_id: Uuid,
+  chat_id: Uuid.optional(),
+  completion_tokens: z.number().int().nonnegative().nullable(),
+  ctx_used: z.number().int().nonnegative().nullable(),
+  ctx_max: z.number().int().positive().nullable(),
+});
+
+export const MessagesDeletedFrame = z.object({
+  type: z.literal('messages_deleted'),
+  message_ids: z.array(Uuid),
+  chat_id: Uuid.optional(),
+});
+
+export const ChatRenamedFrame = z.object({
+  type: z.literal('chat_renamed'),
+  chat_id: Uuid,
+  name: z.string(),
+});
+
+export const CompactedFrame = z.object({
+  type: z.literal('compacted'),
+  session_id: Uuid,
+  chat_id: Uuid,
+  summary_message_id: Uuid,
+});
+
+export const ErrorFrame = z.object({
+  type: z.literal('error'),
+  message_id: Uuid.optional(),
+  chat_id: Uuid.optional(),
+  error: z.string(),
+  reason: ErrorReasonValue.optional(),
+});
+
+// ---- per-user channel frames (sidebar refresh) -----------------------------
+
+export const ChatStatusFrame = z.object({
+  type: z.literal('chat_status'),
+  chat_id: Uuid,
+  status: ChatStatusValue,
+  at: IsoTimestamp,
+  reason: ErrorReasonValue.optional(),
+});
+
+export const SessionUpdatedFrame = z.object({
+  type: z.literal('session_updated'),
+  session_id: Uuid,
+  project_id: Uuid,
+  name: z.string(),
+  updated_at: IsoTimestamp,
+});
+
+export const SessionRenamedFrame = z.object({
+  type: z.literal('session_renamed'),
+  session_id: Uuid,
+  name: z.string(),
+});
+
+export const SessionCreatedFrame = z.object({
+  type: z.literal('session_created'),
+  session: OpaqueObject,
+  project_id: Uuid,
+});
+
+export const SessionArchivedFrame = z.object({
+  type: z.literal('session_archived'),
+  session_id: Uuid,
+  project_id: Uuid,
+});
+
+export const SessionDeletedFrame = z.object({
+  type: z.literal('session_deleted'),
+  session_id: Uuid,
+  project_id: Uuid,
+});
+
+export const SessionWorkspaceUpdatedFrame = z.object({
+  type: z.literal('session_workspace_updated'),
+  session_id: Uuid,
+  workspace_panes: z.array(OpaqueObject),
+});
+
+export const ChatCreatedFrame = z.object({
+  type: z.literal('chat_created'),
+  chat: OpaqueObject,
+  session_id: Uuid,
+});
+
+export const ChatUpdatedFrame = z.object({
+  type: z.literal('chat_updated'),
+  chat_id: Uuid,
+  session_id: Uuid,
+  name: z.string().nullable(),
+  updated_at: IsoTimestamp,
+});
+
+export const ChatArchivedFrame = z.object({
+  type: z.literal('chat_archived'),
+  chat_id: Uuid,
+  session_id: Uuid,
+});
+
+export const ChatUnarchivedFrame = z.object({
+  type: z.literal('chat_unarchived'),
+  chat: OpaqueObject,
+});
+
+export const ChatDeletedFrame = z.object({
+  type: z.literal('chat_deleted'),
+  chat_id: Uuid,
+  session_id: Uuid,
+});
+
+export const ProjectCreatedFrame = z.object({
+  type: z.literal('project_created'),
+  project: OpaqueObject,
+});
+
+export const ProjectArchivedFrame = z.object({
+  type: z.literal('project_archived'),
+  project_id: Uuid,
+});
+
+export const ProjectUnarchivedFrame = z.object({
+  type: z.literal('project_unarchived'),
+  project: OpaqueObject,
+});
+
+export const ProjectUpdatedFrame = z.object({
+  type: z.literal('project_updated'),
+  project_id: Uuid,
+  name: z.string(),
+});
+
+export const ProjectDeletedFrame = z.object({
+  type: z.literal('project_deleted'),
+  project_id: Uuid,
+});
+
+// ---- discriminated union ---------------------------------------------------
+
+export const WsFrameSchema = z.discriminatedUnion('type', [
+  // per-session
+  SnapshotFrame,
+  MessageStartedFrame,
+  DeltaFrame,
+  ToolCallFrame,
+  ToolResultFrame,
+  MessageCompleteFrame,
+  UsageFrame,
+  MessagesDeletedFrame,
+  ChatRenamedFrame,
+  CompactedFrame,
+  ErrorFrame,
+  // per-user
+  ChatStatusFrame,
+  SessionUpdatedFrame,
+  SessionRenamedFrame,
+  SessionCreatedFrame,
+  SessionArchivedFrame,
+  SessionDeletedFrame,
+  SessionWorkspaceUpdatedFrame,
+  ChatCreatedFrame,
+  ChatUpdatedFrame,
+  ChatArchivedFrame,
+  ChatUnarchivedFrame,
+  ChatDeletedFrame,
+  ProjectCreatedFrame,
+  ProjectArchivedFrame,
+  ProjectUnarchivedFrame,
+  ProjectUpdatedFrame,
+  ProjectDeletedFrame,
+]);
+
+export type WsFrame = z.infer<typeof WsFrameSchema>;
+
+// Convenience: the set of known frame types. Useful for the publishFrame
+// helper to log the offending type name when validation fails. Kept in sync
+// by hand with the discriminated union above.
+export const KNOWN_FRAME_TYPES: readonly WsFrame['type'][] = [
+  'snapshot',
+  'message_started',
+  'delta',
+  'tool_call',
+  'tool_result',
+  'message_complete',
+  'usage',
+  'messages_deleted',
+  'chat_renamed',
+  'compacted',
+  'error',
+  'chat_status',
+  'session_updated',
+  'session_renamed',
+  'session_created',
+  'session_archived',
+  'session_deleted',
+  'session_workspace_updated',
+  'chat_created',
+  'chat_updated',
+  'chat_archived',
+  'chat_unarchived',
+  'chat_deleted',
+  'project_created',
+  'project_archived',
+  'project_unarchived',
+  'project_updated',
+  'project_deleted',
+] as const;
--- a/apps/web/src/components/AgentPicker.tsx
+++ b/apps/web/src/components/AgentPicker.tsx
@@ -1,8 +1,8 @@
-import { useEffect, useState } from 'react';
+import { useEffect, useMemo, useState } from 'react';
 import { Check, ChevronDown } from 'lucide-react';
 import { toast } from 'sonner';
 import { api } from '@/api/client';
-import type { Agent, AgentParseError } from '@/api/types';
+import type { Agent, AgentParseError, ToolCostStat } from '@/api/types';
 import {
  DropdownMenu,
  DropdownMenuContent,
@@ -22,6 +22,10 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
  const [parseErrors, setParseErrors] = useState<AgentParseError[]>([]);
  const [error, setError] = useState<string | null>(null);
  const [open, setOpen] = useState(false);
+  // v1.13.10: per-tool cost rolling window. Fetched once on mount; would
+  // refresh on remount or page reload. Acceptable for a decision aid — the
+  // 100-call rolling mean doesn't shift fast.
+  const [costStats, setCostStats] = useState<ToolCostStat[]>([]);

  // v1.8.1: per-agent parse errors are non-blocking. Silent if any agents
  // loaded successfully; a gray warning toast fires only when EVERY agent
@@ -52,6 +56,29 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
    };
  }, [projectId]);

+  // v1.13.10: cost stats are project-independent — the 100-call rolling
+  // window is global across all chats. Fetch once per mount; tolerate failure
+  // silently (cost line hides).
+  useEffect(() => {
+    let cancelled = false;
+    api.tools
+      .costStats()
+      .then((r) => {
+        if (!cancelled) setCostStats(r.stats);
+      })
+      .catch(() => {
+        if (!cancelled) setCostStats([]);
+      });
+    return () => {
+      cancelled = true;
+    };
+  }, []);
+
+  const costByTool = useMemo(
+    () => Object.fromEntries(costStats.map((s) => [s.tool_name, s])),
+    [costStats],
+  );
+
  const selectedAgent = agents?.find((a) => a.id === value) ?? null;
  const triggerLabel = value === null
    ? 'No agent'
@@ -86,25 +113,33 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
              <span className="font-medium">No agent</span>
            </DropdownMenuItem>
            {agents.length > 0 && <DropdownMenuSeparator />}
-            {agents.map((a) => (
-              <DropdownMenuItem
-                key={a.id}
-                onSelect={() => void onChange(a.id)}
-                className="text-xs flex-col items-start gap-0.5"
-              >
-                <div className="flex items-center gap-1.5">
-                  <Check
-                    className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
-                  />
-                  <span className="font-medium">{a.name}</span>
-                </div>
-                {a.description && (
-                  <span className="text-muted-foreground pl-[18px] truncate w-full">
-                    {a.description}
-                  </span>
-                )}
-              </DropdownMenuItem>
-            ))}
+            {agents.map((a) => {
+              const cost = agentCost(a, costByTool);
+              return (
+                <DropdownMenuItem
+                  key={a.id}
+                  onSelect={() => void onChange(a.id)}
+                  className="text-xs flex-col items-start gap-0.5"
+                >
+                  <div className="flex items-center gap-1.5">
+                    <Check
+                      className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
+                    />
+                    <span className="font-medium">{a.name}</span>
+                  </div>
+                  {a.description && (
+                    <span className="text-muted-foreground pl-[18px] truncate w-full">
+                      {a.description}
+                    </span>
+                  )}
+                  {cost.nWithData > 0 && (
+                    <span className="text-muted-foreground/70 pl-[18px] truncate w-full">
+                      ~{formatK(cost.prompt)} prompt / {cost.completion} completion · {cost.nWithData}/{cost.nTools} tools{cost.mostRecent ? ` · last call ${formatAgo(cost.mostRecent)}` : ''}
+                    </span>
+                  )}
+                </DropdownMenuItem>
+              );
+            })}
            {parseErrors.length > 0 && (
              <div
                className="px-2 py-1.5 mt-1 text-xs text-amber-500 border-t border-border"
@@ -119,3 +154,49 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
    </DropdownMenu>
  );
 }
+
+// v1.13.10: sum the per-tool means across an agent's whitelisted tools.
+// Sum-of-means, not mean-of-sums — we're combining independent rolling
+// averages. nWithData reflects how many of the agent's tools have any
+// history yet; the line hides entirely when zero so a fresh deploy doesn't
+// render "0k / 0 / 0 tools".
+function agentCost(
+  agent: Agent,
+  costByTool: Record<string, ToolCostStat>,
+): {
+  prompt: number;
+  completion: number;
+  nTools: number;
+  nWithData: number;
+  mostRecent: string | null;
+} {
+  let prompt = 0;
+  let completion = 0;
+  let nWithData = 0;
+  let mostRecent: string | null = null;
+  for (const t of agent.tools) {
+    const s = costByTool[t];
+    if (!s) continue;
+    prompt += s.mean_prompt_tokens;
+    completion += s.mean_completion_tokens;
+    nWithData++;
+    if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
+  }
+  return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
+}
+
+function formatK(n: number): string {
+  if (n < 1000) return String(n);
+  if (n < 10_000) return `${(n / 1000).toFixed(1)}k`;
+  return `${Math.round(n / 1000)}k`;
+}
+
+function formatAgo(iso: string): string {
+  const then = new Date(iso).getTime();
+  if (Number.isNaN(then)) return '—';
+  const diff = Date.now() - then;
+  if (diff < 60_000) return 'just now';
+  if (diff < 3_600_000) return `${Math.round(diff / 60_000)}m ago`;
+  if (diff < 86_400_000) return `${Math.round(diff / 3_600_000)}h ago`;
+  return `${Math.round(diff / 86_400_000)}d ago`;
+}
--- a/apps/web/src/components/ChatTabBar.tsx
+++ b/apps/web/src/components/ChatTabBar.tsx
@@ -2,6 +2,7 @@ import { useState } from 'react';
 import { Bot, History, MessageSquare, Plus, Terminal, X } from 'lucide-react';
 import type { Chat, WorkspacePane } from '@/api/types';
 import { StatusDot } from '@/components/StatusDot';
+import { ChatThroughput } from '@/components/ChatThroughput';
 import {
  ContextMenu,
  ContextMenuContent,
@@ -99,6 +100,7 @@ export function ChatTabBar({
              >
                <MessageSquare size={12} className="shrink-0" />
                <StatusDot chatId={chat.id} />
+                <ChatThroughput chatId={chat.id} />
                {renamingId === chat.id ? (
                  <input
                    autoFocus
--- a/apps/web/src/components/ChatThroughput.tsx
+++ b/apps/web/src/components/ChatThroughput.tsx
@@ -0,0 +1,28 @@
+import { useChatStatus } from '@/hooks/useChatStatus';
+import { useChatThroughput } from '@/hooks/useChatThroughput';
+import { cn } from '@/lib/utils';
+
+interface Props {
+  chatId: string | null | undefined;
+  className?: string;
+}
+
+// v1.12.2: inline throughput readout. Renders next to StatusDot while the
+// chat is streaming or running a tool. Hidden in idle/error/waiting states
+// — the dot already communicates those.
+export function ChatThroughput({ chatId, className }: Props) {
+  const status = useChatStatus(chatId);
+  const t = useChatThroughput(chatId);
+  if (!chatId || !t) return null;
+  if (status !== 'streaming' && status !== 'tool_running') return null;
+  const tps = t.tps != null && t.tps > 0 ? Math.round(t.tps) : null;
+  const showCtx = t.ctx_used != null && t.ctx_max != null;
+  if (tps === null && !showCtx) return null;
+  return (
+    <span className={cn('text-xs text-muted-foreground tabular-nums', className)}>
+      {tps !== null && `${tps} tok/s`}
+      {tps !== null && showCtx && ' · '}
+      {showCtx && `${t.ctx_used!.toLocaleString()}/${t.ctx_max!.toLocaleString()}`}
+    </span>
+  );
+}
--- a/apps/web/src/components/MessageBubble.tsx
+++ b/apps/web/src/components/MessageBubble.tsx
@@ -651,7 +651,9 @@ export function MessageBubble({ message, sessionChats, capHitInfo }: Props) {

  const isStreaming = message.status === 'streaming';
  const failed = message.status === 'failed';
-  const hasContent = message.content.length > 0;
+  // v1.13.7: match the MessageList.flatten trim guard so a whitespace-only
+  // assistant turn doesn't render an empty bubble + dangling ActionRow.
+  const hasContent = message.content.trim().length > 0;
  // v1.8.2: if metadata stamps an error reason, surface it inline under the
  // generic "message failed" line. Keeps the user's eye where it already is
  // rather than introducing a separate banner.
--- a/apps/web/src/components/MessageList.tsx
+++ b/apps/web/src/components/MessageList.tsx
@@ -45,7 +45,12 @@ function flatten(messages: Message[]): RenderItem[] {
      continue;
    }
    const hasToolCalls = m.tool_calls != null && m.tool_calls.length > 0;
-    const hasText = m.content.length > 0;
+    // v1.13.7: trim before checking. AI SDK v6 streaming occasionally emits a
+    // leading "\n" text-delta on tool-call-only turns, which used to flow into
+    // messages.content with length=1 and render an empty bubble + ActionRow
+    // between each tool call. Whitespace-only content has no visible payload,
+    // so treat it as no-content.
+    const hasText = m.content.trim().length > 0;
    if (m.role === 'assistant' && hasToolCalls) {
      if (hasText || m.status === 'streaming') {
        items.push({ kind: 'message', message: m });
--- a/apps/web/src/components/MobileTabSwitcher.tsx
+++ b/apps/web/src/components/MobileTabSwitcher.tsx
@@ -13,6 +13,7 @@ import { toast } from 'sonner';
 import type { Chat, WorkspacePane } from '@/api/types';
 import { BottomSheet } from '@/components/BottomSheet';
 import { StatusDot } from '@/components/StatusDot';
+import { ChatThroughput } from '@/components/ChatThroughput';
 import {
  DropdownMenu,
  DropdownMenuContent,
@@ -206,6 +207,7 @@ export function MobileTabSwitcher({
        >
          <span className="shrink-0 text-muted-foreground">{paneIcon(active?.kind ?? 'chat')}</span>
          <StatusDot chatId={activeChatId} />
+          <ChatThroughput chatId={activeChatId} />
          <span className="truncate flex-1 text-left">{activeLabel}</span>
          <ChevronDown size={14} className="opacity-60 shrink-0" />
        </button>
@@ -237,6 +239,7 @@ export function MobileTabSwitcher({
              >
                <span className="shrink-0 text-muted-foreground">{paneIcon(pane.kind)}</span>
                <StatusDot chatId={cid ?? null} />
+                <ChatThroughput chatId={cid ?? null} />
                {renamingChatId === cid && cid ? (
                  <input
                    autoFocus
--- a/apps/web/src/components/StaleStreamBanner.tsx
+++ b/apps/web/src/components/StaleStreamBanner.tsx
@@ -0,0 +1,34 @@
+interface Props {
+  onRetry: () => void;
+  onDiscard: () => void;
+}
+
+// v1.12.3: shown when an assistant message has been 'streaming' for 60+
+// seconds without new tokens. Lives above ChatInput in ChatPane. Retry
+// discards the stuck row then resends the last user message; Discard just
+// clears the row and drops the dot to idle.
+export function StaleStreamBanner({ onRetry, onDiscard }: Props) {
+  return (
+    <div className="border border-amber-500/30 bg-amber-500/5 rounded-md p-3 mb-2 mx-4 flex items-center justify-between gap-2">
+      <span className="text-sm text-muted-foreground">
+        Previous response didn't complete.
+      </span>
+      <div className="flex gap-2">
+        <button
+          type="button"
+          onClick={onRetry}
+          className="text-xs px-2 py-1 rounded border border-border hover:bg-accent max-md:min-h-[44px] max-md:px-3"
+        >
+          Retry
+        </button>
+        <button
+          type="button"
+          onClick={onDiscard}
+          className="text-xs px-2 py-1 rounded border border-border hover:bg-accent max-md:min-h-[44px] max-md:px-3"
+        >
+          Discard
+        </button>
+      </div>
+    </div>
+  );
+}
--- a/apps/web/src/components/StatusDot.tsx
+++ b/apps/web/src/components/StatusDot.tsx
@@ -6,15 +6,10 @@ interface Props {
  className?: string;
 }

-const STATUS_CLASS: Record<DerivedStatus, string> = {
-  working: 'bg-amber-500 animate-pulse',
-  idle_warm: 'bg-emerald-500',
-  idle_cold: 'bg-muted-foreground/40',
-  error: 'bg-destructive',
-};
-
 const STATUS_LABEL: Record<DerivedStatus, string> = {
-  working: 'working',
+  streaming: 'streaming',
+  tool_running: 'running tool',
+  waiting_for_input: 'waiting for input',
  idle_warm: 'idle',
  idle_cold: 'idle',
  error: 'error',
@@ -22,15 +17,58 @@ const STATUS_LABEL: Record<DerivedStatus, string> = {

 export function StatusDot({ chatId, className }: Props) {
  const status = useChatStatus(chatId);
+
+  if (status === 'streaming') {
+    return (
+      <span
+        aria-label="Status: streaming"
+        title="streaming"
+        className={cn('inline-block relative w-3 h-3 shrink-0', className)}
+      >
+        <span className="absolute inset-0 animate-spin-slow">
+          <span className="absolute top-0 left-1/2 -translate-x-1/2 w-1 h-1 rounded-full bg-amber-500" />
+          <span className="absolute bottom-0 left-1/2 -translate-x-1/2 w-1 h-1 rounded-full bg-amber-500/60" />
+        </span>
+      </span>
+    );
+  }
+
+  if (status === 'tool_running') {
+    return (
+      <span
+        aria-label="Status: running tool"
+        title="running tool"
+        className={cn(
+          'inline-block w-3 h-3 rounded-full border-2 border-sky-500 border-t-transparent animate-spin shrink-0',
+          className,
+        )}
+      />
+    );
+  }
+
+  if (status === 'waiting_for_input') {
+    return (
+      <span
+        aria-label="Status: waiting for input"
+        title="waiting for input"
+        className={cn(
+          'inline-block w-1.5 h-1.5 rounded-full shrink-0 bg-violet-500',
+          className,
+        )}
+      />
+    );
+  }
+
+  const bg =
+    status === 'idle_warm' ? 'bg-emerald-500'
+      : status === 'error' ? 'bg-destructive'
+      : 'bg-muted-foreground/40';
+
  return (
    <span
      aria-label={`Status: ${STATUS_LABEL[status]}`}
      title={STATUS_LABEL[status]}
-      className={cn(
-        'inline-block w-1.5 h-1.5 rounded-full shrink-0',
-        STATUS_CLASS[status],
-        className,
-      )}
+      className={cn('inline-block w-1.5 h-1.5 rounded-full shrink-0', bg, className)}
    />
  );
 }
--- a/apps/web/src/components/panes/ChatPane.tsx
+++ b/apps/web/src/components/panes/ChatPane.tsx
@@ -5,6 +5,7 @@ import { api } from '@/api/client';
 import { useSessionStream } from '@/hooks/useSessionStream';
 import { MessageList } from '@/components/MessageList';
 import { ChatInput } from '@/components/ChatInput';
+import { StaleStreamBanner } from '@/components/StaleStreamBanner';
 import {
  DropdownMenu,
  DropdownMenuContent,
@@ -44,6 +45,38 @@ export function ChatPane({ sessionId, chatId, projectId, agentId, onAgentChange,

  const chatMessages = stream.messages.filter((m) => m.chat_id === chatId);
  const streaming = chatMessages.some((m) => m.status === 'streaming');
+
+  // v1.12.3: stale-stream detection. Watches the (at most one) streaming
+  // assistant row. If its content length doesn't grow for STALE_THRESHOLD_MS,
+  // assume the upstream call is dead and surface the recovery banner. We use
+  // content length as the activity signal because every token delta extends
+  // it; last_seq isn't currently bumped per delta.
+  const STALE_THRESHOLD_MS = 60_000;
+  const streamingMsg = chatMessages.find((m) => m.status === 'streaming' && m.role === 'assistant');
+  const streamingId = streamingMsg?.id ?? null;
+  const streamingLen = streamingMsg?.content.length ?? 0;
+  const lastActivityRef = useRef<{ id: string; len: number; at: number } | null>(null);
+  const [stale, setStale] = useState(false);
+  useEffect(() => {
+    if (!streamingId) {
+      lastActivityRef.current = null;
+      setStale(false);
+      return;
+    }
+    const prev = lastActivityRef.current;
+    if (!prev || prev.id !== streamingId || prev.len !== streamingLen) {
+      lastActivityRef.current = { id: streamingId, len: streamingLen, at: Date.now() };
+      setStale(false);
+    }
+    const interval = setInterval(() => {
+      const a = lastActivityRef.current;
+      if (!a) return;
+      if (Date.now() - a.at >= STALE_THRESHOLD_MS) {
+        setStale(true);
+      }
+    }, 5_000);
+    return () => clearInterval(interval);
+  }, [streamingId, streamingLen]);
  // v1.11.5: per-chat model context limit comes from chat.model_context_limit
  // populated by GET /api/sessions/:id/chats. Threaded into ChatInput so
  // ContextBar can render a zero-state before the first assistant message.
@@ -87,6 +120,45 @@ export function ChatPane({ sessionId, chatId, projectId, agentId, onAgentChange,
    }
  }

+  const handleDiscardStale = useCallback(async () => {
+    if (!streamingId) return;
+    try {
+      await api.chats.discardStale(chatId, streamingId);
+      setStale(false);
+      lastActivityRef.current = null;
+    } catch (err) {
+      // 409 (race) is benign — the row already terminated some other way.
+      const msg = err instanceof Error ? err.message : 'discard failed';
+      if (!msg.includes('409')) toast.error(msg);
+      setStale(false);
+    }
+  }, [chatId, streamingId]);
+
+  const handleRetryStale = useCallback(async () => {
+    if (!streamingId) return;
+    const lastUser = [...chatMessages].reverse().find((m) => m.role === 'user' && m.kind === 'message');
+    if (!lastUser) {
+      toast.error('no prior user message to retry');
+      return;
+    }
+    try {
+      await api.chats.discardStale(chatId, streamingId);
+    } catch (err) {
+      const msg = err instanceof Error ? err.message : 'discard failed';
+      if (!msg.includes('409')) {
+        toast.error(msg);
+        return;
+      }
+    }
+    setStale(false);
+    lastActivityRef.current = null;
+    try {
+      await api.messages.send(chatId, lastUser.content);
+    } catch (err) {
+      toast.error(err instanceof Error ? err.message : 'retry send failed');
+    }
+  }, [chatId, streamingId, chatMessages]);
+
  const handleForceSend = useCallback(async (content: string) => {
    const trimmed = content.trim();
    if (!trimmed) return;
@@ -187,6 +259,13 @@ export function ChatPane({ sessionId, chatId, projectId, agentId, onAgentChange,
        </div>
      )}

+      {stale && streamingId && (
+        <StaleStreamBanner
+          onRetry={() => void handleRetryStale()}
+          onDiscard={() => void handleDiscardStale()}
+        />
+      )}
+
      <ChatInput
        disabled={false}
        projectId={projectId}
--- a/apps/web/src/hooks/sessionEvents.ts
+++ b/apps/web/src/hooks/sessionEvents.ts
@@ -41,6 +41,12 @@ export interface SessionUpdatedEvent {
  updated_at: string;
 }

+export interface SessionWorkspaceUpdatedEvent {
+  type: 'session_workspace_updated';
+  session_id: string;
+  workspace_panes: import('@/api/types').WorkspacePane[];
+}
+
 export interface SessionLoadedEvent {
  type: 'session_loaded';
  session_id: string;
@@ -131,7 +137,7 @@ export interface ProjectUpdatedEvent {
 export interface ChatStatusEvent {
  type: 'chat_status';
  chat_id: string;
-  status: 'working' | 'idle' | 'error';
+  status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error';
  at: string;
  reason?: ErrorReason;
 }
@@ -143,6 +149,7 @@ export type SessionEvent =
  | SessionCreatedEvent
  | SessionDeletedEvent
  | SessionUpdatedEvent
+  | SessionWorkspaceUpdatedEvent
  | SessionLoadedEvent
  | OpenFileInBrowserEvent
  | AttachChatFileEvent
--- a/apps/web/src/hooks/useChatStatus.ts
+++ b/apps/web/src/hooks/useChatStatus.ts
@@ -1,8 +1,14 @@
 import { useEffect, useState } from 'react';
 import { sessionEvents } from './sessionEvents';

-export type RawStatus = 'working' | 'idle' | 'error';
-export type DerivedStatus = 'working' | 'idle_warm' | 'idle_cold' | 'error';
+export type RawStatus = 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error';
+export type DerivedStatus =
+  | 'streaming'
+  | 'tool_running'
+  | 'waiting_for_input'
+  | 'idle_warm'
+  | 'idle_cold'
+  | 'error';

 // Window during which an idle dot stays green; after this, it fades to gray.
 const WARM_WINDOW_MS = 30_000;
@@ -53,7 +59,9 @@ if (!G.__boocode_chat_status_subscribed) {

 function derive(entry: Entry | undefined): DerivedStatus {
  if (!entry) return 'idle_cold';
-  if (entry.status === 'working') return 'working';
+  if (entry.status === 'streaming') return 'streaming';
+  if (entry.status === 'tool_running') return 'tool_running';
+  if (entry.status === 'waiting_for_input') return 'waiting_for_input';
  if (entry.status === 'error') return 'error';
  const age = Date.now() - new Date(entry.at).getTime();
  return age < WARM_WINDOW_MS ? 'idle_warm' : 'idle_cold';
--- a/apps/web/src/hooks/useChatThroughput.ts
+++ b/apps/web/src/hooks/useChatThroughput.ts
@@ -0,0 +1,106 @@
+import { useEffect, useState } from 'react';
+
+// v1.12.2: live throughput stream consumer. Fed by useSessionStream when a
+// 'usage' WS frame lands. Renders next to StatusDot via ChatThroughput.
+//
+// Singleton + Set<setState> pattern mirrors useChatStatus so any component
+// can subscribe to any chatId without prop drilling.
+
+export interface ThroughputSample {
+  tps: number | null;
+  ctx_used: number | null;
+  ctx_max: number | null;
+}
+
+interface Entry {
+  ctx_used: number | null;
+  ctx_max: number | null;
+  completion_tokens: number | null;
+  recorded_at: number;
+  prev_completion_tokens: number | null;
+  prev_recorded_at: number | null;
+  tps: number | null;
+}
+
+// Stale window. After this, useChatThroughput returns null — clears the
+// indicator after the stream ends without the next inference turn.
+const STALE_MS = 10_000;
+
+const entries = new Map<string, Entry>();
+const subscribers = new Set<() => void>();
+
+function notify(): void {
+  for (const s of subscribers) {
+    try { s(); } catch { /* swallow */ }
+  }
+}
+
+// v1.12.2: imported by useSessionStream's WS handler. Computes tps from the
+// gap between successive completion_tokens samples; first sample yields null
+// (we need two points). Skips zero-progress samples so a duplicate usage
+// frame doesn't push tps to 0.
+export function recordUsage(
+  chatId: string,
+  data: { completion_tokens: number | null; ctx_used: number | null; ctx_max: number | null },
+): void {
+  const now = Date.now();
+  const prev = entries.get(chatId);
+  let tps: number | null = prev?.tps ?? null;
+  if (
+    prev &&
+    data.completion_tokens != null &&
+    prev.completion_tokens != null &&
+    data.completion_tokens > prev.completion_tokens &&
+    now > prev.recorded_at
+  ) {
+    const dTokens = data.completion_tokens - prev.completion_tokens;
+    const dSeconds = (now - prev.recorded_at) / 1000;
+    tps = dTokens / dSeconds;
+  }
+  entries.set(chatId, {
+    ctx_used: data.ctx_used,
+    ctx_max: data.ctx_max,
+    completion_tokens: data.completion_tokens,
+    recorded_at: now,
+    prev_completion_tokens: prev?.completion_tokens ?? null,
+    prev_recorded_at: prev?.recorded_at ?? null,
+    tps,
+  });
+  notify();
+}
+
+export function clearThroughput(chatId: string): void {
+  if (entries.delete(chatId)) notify();
+}
+
+// Periodic sweep: re-notify so stale entries fall off the UI when the
+// stream ends without a follow-up frame. Light — one timer for the whole app.
+const G = globalThis as Record<string, unknown>;
+if (!G.__boocode_throughput_ticker) {
+  G.__boocode_throughput_ticker = true;
+  setInterval(() => {
+    const now = Date.now();
+    let touched = false;
+    for (const [k, v] of entries) {
+      if (now - v.recorded_at > STALE_MS) {
+        entries.delete(k);
+        touched = true;
+      }
+    }
+    if (touched) notify();
+  }, 2_000);
+}
+
+export function useChatThroughput(chatId: string | null | undefined): ThroughputSample | null {
+  const [, force] = useState({});
+  useEffect(() => {
+    const sub = () => force({});
+    subscribers.add(sub);
+    return () => { subscribers.delete(sub); };
+  }, []);
+  if (!chatId) return null;
+  const entry = entries.get(chatId);
+  if (!entry) return null;
+  if (Date.now() - entry.recorded_at > STALE_MS) return null;
+  return { tps: entry.tps, ctx_used: entry.ctx_used, ctx_max: entry.ctx_max };
+}
--- a/apps/web/src/hooks/useSessionChats.ts
+++ b/apps/web/src/hooks/useSessionChats.ts
@@ -12,6 +12,7 @@ export interface UseSessionChatsOpts {
  // about pane indexing.
  openChatInActivePane: (chatId: string) => void;
  initializeFirstChatIfEmpty: (chatId: string) => void;
+  validatePanes: (validChatIds: Set<string>) => void;
 }

 export interface UseSessionChatsResult {
@@ -44,12 +45,15 @@ export function useSessionChats(
  openChatInActivePaneRef.current = opts.openChatInActivePane;
  const initializeFirstChatIfEmptyRef = useRef(opts.initializeFirstChatIfEmpty);
  initializeFirstChatIfEmptyRef.current = opts.initializeFirstChatIfEmpty;
+  const validatePanesRef = useRef(opts.validatePanes);
+  validatePanesRef.current = opts.validatePanes;

  useEffect(() => {
    let cancelled = false;
    api.chats.listForSession(sessionId).then((list) => {
      if (cancelled) return;
      setChats(list);
+      validatePanesRef.current(new Set(list.map((c) => c.id)));
      const openChat = list.find((c) => c.status === 'open');
      if (openChat) {
        initializeFirstChatIfEmptyRef.current(openChat.id);
--- a/apps/web/src/hooks/useSessionStream.ts
+++ b/apps/web/src/hooks/useSessionStream.ts
@@ -1,8 +1,10 @@
 import { useEffect, useRef, useState } from 'react';
 import { toast } from 'sonner';
 import type { Message, WsFrame } from '@/api/types';
+import { WsFrameSchema } from '@/api/ws-frames';
 import { api } from '@/api/client';
 import { sessionEvents } from './sessionEvents';
+import { recordUsage } from './useChatThroughput';

 // session_renamed frame removed from WsFrame — it was declared but never
 // published on the per-session WS channel (server publishes via broker.publishUser
@@ -125,6 +127,19 @@ function applyFrame(state: State, frame: WsFrame): State {
      );
      return { ...state, messages: next };
    }
+    case 'usage': {
+      // v1.12.2: live throughput. Side-effects into the module-level
+      // singleton consumed by ChatThroughput; no message-state mutation.
+      // chat_id is the optional ws-frame field; usage frames always include it.
+      if (frame.chat_id) {
+        recordUsage(frame.chat_id, {
+          completion_tokens: frame.completion_tokens,
+          ctx_used: frame.ctx_used,
+          ctx_max: frame.ctx_max,
+        });
+      }
+      return state;
+    }
    case 'messages_deleted': {
      const removeSet = new Set(frame.message_ids);
      return {
@@ -202,8 +217,28 @@ export function useSessionStream(sessionId: string | undefined) {
        setState((s) => ({ ...s, connected: true, error: null }));
      };
      ws.onmessage = (ev) => {
+        // v1.13.11-a: Zod-validate every inbound frame. Fail-closed — invalid
+        // frames are logged and dropped. WsFrameSchema is the runtime guard;
+        // the hand-maintained WsFrame type stays as the narrowed dev-time
+        // shape (Zod uses OpaqueObject for nested types like Message[]). One
+        // cast bridges the two.
+        let raw: unknown;
        try {
-          const frame = JSON.parse(typeof ev.data === 'string' ? ev.data : '') as WsFrame;
+          raw = JSON.parse(typeof ev.data === 'string' ? ev.data : '');
+        } catch (err) {
+          console.warn('bad ws frame (parse)', err);
+          return;
+        }
+        const validated = WsFrameSchema.safeParse(raw);
+        if (!validated.success) {
+          console.error('ws-frame-validation-failed (session channel)', {
+            frame_type: (raw as { type?: unknown })?.type,
+            errors: validated.error.flatten(),
+          });
+          return;
+        }
+        try {
+          const frame = validated.data as unknown as WsFrame;
          // v1.11: on a compaction completion, re-fetch the message list so
          // the new summary row + the cohort of compacted_at-stamped older
          // rows render correctly. We dispatch the fresh list as a synthetic
--- a/apps/web/src/hooks/useSidebar.ts
+++ b/apps/web/src/hooks/useSidebar.ts
@@ -143,6 +143,9 @@ function applyEvent(prev: SidebarResponse, event: import('./sessionEvents').Sess
    case 'session_loaded':
      // activeSessionProjectId is updated in the subscribe callback; no data change here.
      return prev;
+    case 'session_workspace_updated':
+      // Pane layout is consumed by useWorkspacePanes; sidebar has no stake.
+      return prev;
    case 'open_file_in_browser':
      // Consumed by Workspace (T7); no sidebar state change needed.
      return prev;
--- a/apps/web/src/hooks/useUserEvents.ts
+++ b/apps/web/src/hooks/useUserEvents.ts
@@ -1,4 +1,5 @@
 import { useEffect } from 'react';
+import { WsFrameSchema } from '@/api/ws-frames';
 import { sessionEvents } from './sessionEvents';
 import { createWsReconnectToast } from './wsReconnectToast';

@@ -38,14 +39,33 @@ export function useUserEvents(): void {
      };

      ws.onmessage = (ev) => {
+        // v1.13.11-a: Zod-validate every inbound frame. Fail-closed — invalid
+        // frames are logged and dropped instead of dispatched onto the
+        // sessionEvents bus where a stale or wrong shape would silently
+        // corrupt sidebar / chat state.
+        let raw: unknown;
        try {
-          const parsed: unknown = JSON.parse(ev.data);
-          if (parsed && typeof (parsed as { type?: unknown }).type === 'string') {
-            sessionEvents.emit(parsed as import('./sessionEvents').SessionEvent);
-          }
+          raw = JSON.parse(ev.data);
        } catch (err) {
          console.warn('useUserEvents: failed to parse frame', err);
+          return;
        }
+        const validated = WsFrameSchema.safeParse(raw);
+        if (!validated.success) {
+          console.error('ws-frame-validation-failed (user channel)', {
+            frame_type: (raw as { type?: unknown })?.type,
+            errors: validated.error.flatten(),
+          });
+          return;
+        }
+        // Bridge cast: Zod's union is broader than SessionEvent (it includes
+        // per-session-channel frames too, which never arrive on the user
+        // channel). sessionEvents.emit only dispatches frames whose type
+        // appears in SessionEvent; the narrowing happens via the existing
+        // useSidebar.ts applyEvent switch.
+        sessionEvents.emit(
+          validated.data as unknown as import('./sessionEvents').SessionEvent,
+        );
      };

      ws.onclose = () => {
--- a/apps/web/src/hooks/useWorkspacePanes.ts
+++ b/apps/web/src/hooks/useWorkspacePanes.ts
@@ -4,9 +4,14 @@ import { toast } from 'sonner';
 import { api } from '@/api/client';
 import type { WorkspacePane } from '@/api/types';
 import { setActivePaneInfo, clearActivePane } from '@/hooks/useActivePane';
+import { sessionEvents } from '@/hooks/sessionEvents';

 export const MAX_PANES = 5;
-const STORAGE_KEY = 'boocode.workspace.panes';
+// v1.12.1: legacy localStorage key. Read once on mount to seed the server
+// for sessions still on per-device state, then deleted. Server is now
+// authoritative via sessions.workspace_panes.
+const LEGACY_STORAGE_KEY = 'boocode.workspace.panes';
+const SAVE_DEBOUNCE_MS = 300;

 function generateId(): string {
  return crypto.randomUUID();
@@ -51,9 +56,11 @@ function nonSettingsCount(panes: WorkspacePane[]): number {
  return panes.reduce((n, p) => n + (p.kind === 'settings' ? 0 : 1), 0);
 }

-function loadPanes(sessionId: string): WorkspacePane[] | null {
+// v1.12.1: read legacy per-device localStorage. If present, the caller seeds
+// the server then deletes the key. One-time migration per session.
+function readLegacyPanes(sessionId: string): WorkspacePane[] | null {
  try {
-    const raw = localStorage.getItem(`${STORAGE_KEY}.${sessionId}`);
+    const raw = localStorage.getItem(`${LEGACY_STORAGE_KEY}.${sessionId}`);
    if (!raw) return null;
    const parsed = JSON.parse(raw) as WorkspacePane[];
    if (!Array.isArray(parsed) || parsed.length === 0) return null;
@@ -63,15 +70,6 @@ function loadPanes(sessionId: string): WorkspacePane[] | null {
  }
 }

-function savePanes(sessionId: string, panes: WorkspacePane[]): void {
-  try {
-    localStorage.setItem(
-      `${STORAGE_KEY}.${sessionId}`,
-      JSON.stringify(persistablePanes(panes)),
-    );
-  } catch { /* quota or disabled */ }
-}
-
 export interface UseWorkspacePanesResult {
  panes: WorkspacePane[];
  activePaneIdx: number;
@@ -96,6 +94,7 @@ export interface UseWorkspacePanesResult {
  removePane: (idx: number) => void;
  removeChatFromPanes: (chatId: string) => void;
  initializeFirstChatIfEmpty: (chatId: string) => void;
+  validatePanes: (validChatIds: Set<string>) => void;
  handlePaneDragStart: (idx: number) => (e: DragEvent<HTMLDivElement>) => void;
  handlePaneDragOver: (idx: number) => (e: DragEvent<HTMLDivElement>) => void;
  handlePaneDragLeave: () => void;
@@ -106,15 +105,85 @@ export interface UseWorkspacePanesResult {
 }

 export function useWorkspacePanes(sessionId: string): UseWorkspacePanesResult {
-  const [panes, setPanes] = useState<WorkspacePane[]>(() => {
-    return loadPanes(sessionId) ?? [emptyPane()];
-  });
+  const [panes, setPanes] = useState<WorkspacePane[]>(() => [emptyPane()]);
  const [activePaneIdx, setActivePaneIdx] = useState(0);
  const draggingIdxRef = useRef<number | null>(null);
  const [dragOverIdx, setDragOverIdx] = useState<number | null>(null);
+  // v1.12.1: skip PATCH while hydrating from the server. Without this, the
+  // initial [emptyPane()] would be saved over the server's real state before
+  // the GET resolves.
+  const hydratedRef = useRef(false);
+  // Tracks the last value broadcast by another device (or this one's own
+  // round-trip). If a PATCH would echo this exact payload, we skip the call.
+  const lastRemoteJsonRef = useRef<string>('[]');

+  // v1.12.1: hydrate from server on mount, then subscribe to remote updates.
  useEffect(() => {
-    savePanes(sessionId, panes);
+    hydratedRef.current = false;
+    let cancelled = false;
+    void (async () => {
+      try {
+        const session = await api.sessions.get(sessionId);
+        if (cancelled) return;
+        let initial: WorkspacePane[] = Array.isArray(session.workspace_panes)
+          ? session.workspace_panes
+          : [];
+        // One-time migration: if server is empty but legacy localStorage has
+        // a layout, seed the server and delete the local key.
+        if (initial.length === 0) {
+          const legacy = readLegacyPanes(sessionId);
+          if (legacy && legacy.length > 0) {
+            try {
+              const updated = await api.sessions.updateWorkspacePanes(sessionId, legacy);
+              if (cancelled) return;
+              initial = updated.workspace_panes;
+              localStorage.removeItem(`${LEGACY_STORAGE_KEY}.${sessionId}`);
+            } catch {
+              initial = legacy;
+            }
+          }
+        }
+        const next = initial.length > 0 ? initial : [emptyPane()];
+        lastRemoteJsonRef.current = JSON.stringify(persistablePanes(next));
+        setPanes(next);
+        setActivePaneIdx(0);
+      } finally {
+        if (!cancelled) hydratedRef.current = true;
+      }
+    })();
+    return () => { cancelled = true; };
+  }, [sessionId]);
+
+  // v1.12.1: live cross-device sync. Replace local state when another device
+  // (or our own write echo) lands a session_workspace_updated frame.
+  useEffect(() => {
+    return sessionEvents.subscribe((ev) => {
+      if (ev.type !== 'session_workspace_updated') return;
+      if (ev.session_id !== sessionId) return;
+      const incoming = Array.isArray(ev.workspace_panes) ? ev.workspace_panes : [];
+      const json = JSON.stringify(incoming);
+      if (json === lastRemoteJsonRef.current) return;
+      lastRemoteJsonRef.current = json;
+      setPanes(incoming.length > 0 ? incoming : [emptyPane()]);
+      setActivePaneIdx((prev) => Math.min(prev, Math.max(0, incoming.length - 1)));
+    });
+  }, [sessionId]);
+
+  // v1.12.1: debounced PATCH on every change. Settings panes are stripped
+  // before saving (ephemeral per v1.9).
+  useEffect(() => {
+    if (!hydratedRef.current) return;
+    const payload = persistablePanes(panes);
+    const json = JSON.stringify(payload);
+    if (json === lastRemoteJsonRef.current) return;
+    const timer = setTimeout(() => {
+      lastRemoteJsonRef.current = json;
+      api.sessions.updateWorkspacePanes(sessionId, payload).catch(() => {
+        // Non-fatal: next change retries. Persistent failures surface via
+        // the network layer's existing reconnect toast.
+      });
+    }, SAVE_DEBOUNCE_MS);
+    return () => clearTimeout(timer);
  }, [sessionId, panes]);

  useEffect(() => {
@@ -328,6 +397,23 @@ export function useWorkspacePanes(sessionId: string): UseWorkspacePanesResult {
    });
  }, []);

+  const validatePanes = useCallback((validChatIds: Set<string>) => {
+    setPanes((prev) => {
+      const cleaned = prev.map((pane) => {
+        if (pane.kind !== 'chat' || pane.chatIds.length === 0) return pane;
+        const nextIds = pane.chatIds.filter((id) => validChatIds.has(id));
+        if (nextIds.length === pane.chatIds.length) return pane;
+        if (nextIds.length === 0) {
+          return { ...pane, kind: 'empty' as const, chatId: undefined, chatIds: [], activeChatIdx: -1 };
+        }
+        const nextActiveIdx = Math.min(pane.activeChatIdx, nextIds.length - 1);
+        return { ...pane, chatIds: nextIds, activeChatIdx: nextActiveIdx, chatId: nextIds[nextActiveIdx] };
+      });
+      const unchanged = cleaned.every((p, i) => p === prev[i]);
+      return unchanged ? prev : cleaned;
+    });
+  }, []);
+
  const removeChatFromPanes = useCallback((chatId: string) => {
    setPanes((prev) => prev.map((p) => {
      const idx = p.chatIds.indexOf(chatId);
@@ -411,6 +497,7 @@ export function useWorkspacePanes(sessionId: string): UseWorkspacePanesResult {
    removePane,
    removeChatFromPanes,
    initializeFirstChatIfEmpty,
+    validatePanes,
    handlePaneDragStart,
    handlePaneDragOver,
    handlePaneDragLeave,
--- a/apps/web/src/pages/Session.tsx
+++ b/apps/web/src/pages/Session.tsx
@@ -59,6 +59,7 @@ function SessionInner({ sessionId }: { sessionId: string }) {
    removePane,
    removeChatFromPanes,
    initializeFirstChatIfEmpty,
+    validatePanes,
  } = panesHook;

  const openChatInActivePane = useCallback(
@@ -70,6 +71,7 @@ function SessionInner({ sessionId }: { sessionId: string }) {
    openChatInPane,
    openChatInActivePane,
    initializeFirstChatIfEmpty,
+    validatePanes,
  });
  const { chats, renameChat } = chatsHook;

--- a/apps/web/src/styles/globals.css
+++ b/apps/web/src/styles/globals.css
@@ -138,6 +138,7 @@
  --radius-xl: calc(var(--radius) + 4px);
  --font-sans: "Inter Variable", "Inter", system-ui, sans-serif;
  --font-mono: "JetBrains Mono Variable", ui-monospace, SFMono-Regular, monospace;
+  --animate-spin-slow: spin 1.2s linear infinite;
 }

@layer base {
--- a/boocode_code_review.md
+++ b/boocode_code_review.md
@@ -1,20 +1,167 @@
 # BooCode — External Code Review & Lift Inventory

-Last updated: 2026-05-20
+Last updated: 2026-05-22

 This document tracks every open source repo BooCode references or lifts code from. Pin this so we don't lose attribution and don't re-evaluate the same projects twice.

 BooCode is personal/single-user — license compatibility is non-blocking, but the License column is recorded so we don't accidentally inherit an obligation if BooCode ever goes public.

+> **Companion doc:** `boocode_roadmap.md` is the canonical source for shipping state, version ordering, and what's planned vs. shipped. This document is the canonical source for *why* each external repo earned its row. Reconcile shipping state via the roadmap when in doubt.
+>
+> **Shipped reality as of 2026-05-22** (per roadmap): v1.13.1 (`ac1a71f`), v1.13.3 (`a08d809`), v1.13.4 (`ec8593c`), v1.13.5 (`f8fc5db`), and v1.13.6 (`81d837c`) tagged. AI SDK v6 migration done. `message_parts` table + `messages_with_parts` view live with dual-write. `experimental_repairToolCall` wired. Alpha tool ordering shipped. Two-tier compaction prune + truncate.ts opaque-id retrieval shipped. v1.13.6 closed the Q3 reasoning-render gap in compaction (latent regression from v1.13.1-C). **v1.13.7 stability bundle** (`includeUsage:true` for usage capture, trim guards against `\n` content artifacts, payload filter for trailing empty/failed assistants, `BUDGET_NO_AGENT 15→30`) — fixes a v1.13.1-A latent regression where `result.usage` came back empty. v1.13.2 (legacy-column drop) **deferred behind v1.13.8–v1.13.12** as rollback insurance. v1.13.x cleanup line order is locked and **must not be folded**: v1.13.8 → v1.13.9 → v1.13.10 → v1.13.11 → v1.13.12 → v1.13.2. If anything in this catalog reads "planned" for a v1.11.x–v1.13.6 lift, check the lift catalog table at the bottom for the corrected status.
+
+-----
+
+## Paseo-equivalent dispatcher inside BooCode (2026-05-22 strategic pivot)
+
+Sam wants BooCode to function like Paseo without using Paseo itself. **Paseo (getpaseo/paseo) is AGPL-3.0**, which is incompatible with BooCode's MIT licensing and BooCode's network-served deployment at `code.indifferentketchup.com`. Lift the architecture and design patterns (not copyrightable) without lifting any code. Build inside BooCode's existing Fastify + TypeScript + PostgreSQL + React stack.
+
+### Locked architecture decisions (2026-05-22, Sam confirmed)
+
+1. **Monorepo with three apps, not three repos.** `/opt/boocode/apps/`:
+- `apps/web/` — existing React SPA (the current chat UI).
+- `apps/server/` — existing Fastify backend (the daemon).
+- **`apps/chat/`** — BooChat surface (read-only inference loop, current `9500`, the live thing at `code.indifferentketchup.com`).
+- **`apps/coder/`** — BooCoder surface (write-tool inference loop + external-CLI dispatch, port `9502`, `coder.indifferentketchup.com`, planned for v2.0).
+- **`apps/booterm/`** — BooTerm surface (PTY/terminal pane, **live since May 2026, port `9501`**). Node 20 Alpine + node-pty + tmux + xterm.js. Tmux session per pane (`bc-<uuid>`), SSH-out works (image includes `openssh-client` + `gosu`). `/api/term/health` shares the existing `boocode_db`. Built as part of Batch 10. Confirmed working as of 2026-05-19.
+- All three share the server package, the auth gate, the project registry, the task table, and the worktree manager.
+1. **Single shared database.** Rename current `boocode_db` → `boochat_db` when BooCoder lands. Three apps, one Postgres. Cross-surface joins are valuable: a BooCoder task can reference the BooChat conversation that originated it; a BooTerm session can be linked to the BooCoder task it's debugging. Separate databases would break this.
+1. **Mount strategy: blanket `/opt:rw`, permission gating at the write-tool layer.** Container gets full RW access to `/opt`; the BooCoder write tools (`edit_file`, `create_file`, `delete_file`) enforce path scoping using the v1.15 permission wildcard ruleset (`apps/coder/services/path_guard.ts`). Per-project scoping is *policy*, not *mount*. Simpler, single mount, no Docker reconfig per project. Trade-off: a bug in path-guard logic is the only thing between BooCoder and writing outside `/opt/<project>/`. **Path-guard correctness is therefore the highest-priority test target for v2.0** — fuzz it, property-test it, run it through every traversal-attack pattern.
+1. **External CLI agents (`opencode`, `claude`, `goose`, `pi`) live on the host, NOT in the BooCoder container.** Sam's call: control. Host-installed agents inherit Sam's existing `~/.opencode/`, `~/.claude/`, `~/.config/goose/` configs without re-mounting. Tool versions update via Sam's normal `npm i -g` or `brew upgrade` flow. **BooCoder shells out via local-exec PTY** (`node-pty` with `cwd = /opt/<project>` and the host shell), or via SSH if Sam wants stricter isolation later. Container can be added back if a specific reason emerges (sandboxing a rogue agent, ABI mismatch, dependency conflict) but not pre-emptively.
+
+### Three-surface execution model
+
+Each surface has its own primary execution mode but shares the same underlying tasks/projects/worktree infrastructure:
+
+|Surface                     |Port           |Execution mode                                                                                                                                                                                   |Tools                                                                                                               |Write access                                                                    |
+|----------------------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
+|**BooChat** (`apps/chat`)   |9500           |In-process inference loop                                                                                                                                                                        |`view_file`, `list_dir`, `grep`, `find_files`, codecontext sidecar tools                                            |None — `/opt` is read-only at the tool layer regardless of mount                |
+|**BooCoder** (`apps/coder`) |9502           |**Two paths, same surface:** (a) in-process inference loop with native write tools + pending-changes queue, (b) PTY-dispatched external CLI (opencode/claude/goose/pi) in a per-task git worktree|All BooChat tools + `edit_file`, `create_file`, `delete_file`, `apply_pending`, `rewind` + `dispatch_external_agent`|Yes, gated through `pending_changes` table (nothing touches disk until `/apply`)|
+|**BooTerm** (`apps/booterm`)|**9501 (live)**|PTY to host shell via tmux, scoped to project cwd                                                                                                                                                |Shell + SSH-out, no inference loop                                                                                  |Yes (it's a real terminal)                                                      |
+
+**The "two paths, same surface" decision in BooCoder is the answer to last turn's "1 and 2 full featured" question.** The in-process loop (Option B / Answer B) handles interactive write work where Sam wants the pending-changes UI and native tool gating. The PTY dispatch (Option A / Answer A) handles parallel/dispatched/batch work where Sam wants to A/B different CLI agents against the same task in separate worktrees. The user picks per task via a `dispatch_external_agent(agent: 'opencode'|'claude'|'goose'|'pi', model: string, task: string, worktree: string)` tool the in-process loop can call, or via a UI dropdown at task creation.
+
+### MCP and ACP roles per surface (locked 2026-05-22)
+
+Two open protocols extend BooCode's tool and agent surfaces:
+
+- **MCP (Model Context Protocol):** the tool/resource extension protocol. An MCP **client** consumes tools from MCP **servers** (local stdio subprocesses or remote HTTP/SSE endpoints). Standard since late 2024. Reference SDKs in 10 languages. Hundreds of community servers, mostly via the [MCP Registry](https://registry.modelcontextprotocol.io/).
+- **ACP (Agent Client Protocol):** the editor↔agent extension protocol. An ACP **client** (host) drives an ACP **agent** over JSON-RPC stdio (or HTTP/WS for remote). Standardizes session lifecycle, multi-session management, model/mode switching mid-session, file operations, terminal events, permission prompts. Originated at Zed. Implemented by opencode (`opencode acp`), goose (`goose acp`), JetBrains IDEs, Avante.nvim, CodeCompanion.nvim.
+
+**The role assignment (Sam, 2026-05-22):**
+
+|Surface     |MCP client                       |MCP server                                      |ACP client (host)                                              |ACP agent (driveable)                                      |
+|------------|---------------------------------|------------------------------------------------|---------------------------------------------------------------|-----------------------------------------------------------|
+|**BooChat** |**Yes** (read-only tool consumer)|No                                              |No                                                             |No                                                         |
+|**BooCoder**|**Yes**                          |**Yes** (exposes BooCoder tools to other agents)|**Yes** (drives opencode/goose/etc. via ACP instead of raw PTY)|**Yes** (BooCoder itself driveable from Zed/JetBrains/etc.)|
+
+**BooChat as MCP client only.** BooChat is read-only by design — its existing tools (`view_file`, `list_dir`, `grep`, `find_files`) extend naturally with MCP-served read-only tools (Context7 for docs, gh_grep for code search, the official `fetch`/`git`/`memory`/`sequentialthinking` reference servers). Per-server `enabled` flag gates which tools BooChat may consume. **Hard rule for BooChat MCP config: never enable a write-capable MCP server.** A server whose tools mutate state breaks the read-only invariant. The codecontext sidecar (already shipped in v1.12 Track B) becomes the first internal "MCP-shaped" tool source, even though it's currently an HTTP shim rather than an MCP server; consider rewriting it as a real MCP server in v1.13 so it composes naturally with the rest.
+
+**BooCoder full matrix.** All four roles. Justifications:
+
+1. **MCP client (write-capable allowed).** Same MCP ecosystem as BooChat plus write-capable servers (`filesystem` write tools, `git` commit, deployment integrations) — all gated through BooCoder's existing pending-changes queue regardless of whether the write comes from a native tool or an MCP-served tool. Per-task allow/deny means a dispatched task can have a different MCP roster than the interactive shell.
+1. **MCP server.** Expose BooCoder's own primitives as MCP tools: `boocoder.create_task`, `boocoder.list_pending_changes`, `boocoder.apply`, `boocoder.dispatch_external_agent`, `boocoder.list_worktrees`, etc. **This is what makes opencode-on-the-host BooCoder-aware** — Sam's external `opencode` sessions in Termius can call BooCoder's task queue without going through BooCoder's UI. Aligns with the agent-hub (#48) board-API pattern. Stdio transport for local opencode/claude; HTTP+OAuth for any external/remote consumer.
+1. **ACP client (host).** **This replaces the raw-PTY dispatch plan for any agent that supports ACP** — currently opencode (`opencode acp`) and goose (`goose acp`). Instead of spawning a PTY and parsing free-form text output, BooCoder spawns the agent as an ACP subprocess and communicates over JSON-RPC. Gains: native session lifecycle, mid-session model/mode switching, file-operation events the BooCoder UI can render as diffs, terminal events that surface inside BooTerm, permission-prompt events the BooCoder UI can answer with a real dialog. **MCP servers configured in BooCoder are auto-forwarded to the dispatched ACP agent** (per goose docs: ACP clients pass their MCP servers in `context_servers` to the agent automatically) — one MCP config surface drives every dispatched agent. For agents without ACP (claude code, pi, smallcode), fall back to PTY dispatch as currently designed.
+1. **ACP agent.** Expose `boocoder acp` so Zed, JetBrains, Avante.nvim, etc. can drive BooCoder as their agent. Means BooCoder becomes useable from any ACP-compatible editor without giving up the BooCoder UI, pending-changes gate, or task DAG. Lower priority than the other three — it's an outbound exposure, not core to the dispatcher build — but cheap once the ACP client side is implemented (same protocol library, server side).
+
+**Why BooChat doesn't get ACP:** ACP standardizes the editor↔agent direction. BooChat doesn't drive agents; it *is* the chat. Nothing for ACP to do there. Adding ACP-agent role to BooChat would mean making BooChat driveable from Zed, which would convert it from a chat surface into an opencode-equivalent — different product. Skip.
+
+**MCP server selection for v1 (start small).** Don't enable everything in the registry; MCP servers consume context budget per tool definition and large registries hit token limits fast. Start with:
+
+- **For BooChat (read-only):** Context7 (already used via opencode), gh_grep, `modelcontextprotocol/server-fetch`, `modelcontextprotocol/server-git` (read mode), `modelcontextprotocol/server-memory`. Optionally `sequentialthinking` for reasoning chain scaffolding.
+- **For BooCoder (add write-capable):** all of the above plus `modelcontextprotocol/server-filesystem` (with path scope = `/opt/<project>`, write-gated by BooCoder's pending-changes queue), eventually a custom BooCoder-internal MCP server for `dispatch_external_agent` / `apply_pending` / `list_worktrees`.
+
+**Reference materials to read before implementing:**
+
+- **Anthropics `mcp-builder` skill** (MIT, in `anthropics/skills`): four-phase MCP server build workflow — research → implement → test → eval. Includes the 10-question evaluation framework for validating that an LLM can actually use the server. **Run BooCoder-internal MCP server through this eval before shipping.**
+- **OpenCode MCP docs** (`opencode.ai/docs/mcp-servers/`): the cleanest reference for the config-file shape, OAuth flow (Dynamic Client Registration per RFC 7591), per-agent tool whitelisting via glob patterns. Lift the JSON schema near-verbatim into BooCode's config (it's not copyrightable, and matching opencode's shape means any opencode user can copy their config to BooCode).
+- **OpenCode ACP docs** (`opencode.ai/docs/acp/`): minimal — basically just `opencode acp` over stdio JSON-RPC. The protocol does the heavy lifting; once BooCoder speaks ACP, opencode works without further config.
+- **Goose ACP docs** (`goose-docs.ai/docs/guides/acp-clients/`): more detailed than opencode's. Critical pattern documented there: **the ACP client's `context_servers` (MCP servers) are auto-forwarded to the agent.** This is the protocol-level mechanism for "one MCP config, every dispatched agent inherits it."
+- **`agentclientprotocol.com`:** the canonical ACP spec. Note: full remote-agent support (HTTP/WebSocket transport) is still "a work in progress" per the spec maintainers — local-subprocess ACP is the proven path, remote ACP is experimental. **BooCoder's ACP client should use stdio for v1**, defer remote ACP until the spec stabilizes.
+- **`modelcontextprotocol/servers`:** only 7 reference servers (everything/fetch/filesystem/git/memory/sequentialthinking/time) — the archived list (PostgreSQL, Slack, GitHub, etc.) is significant because **MCP servers are migrating to vendor-owned ownership** (GitHub now has an official MCP registry at `github.com/mcp`, Sentry hosts `mcp.sentry.dev`, etc.). Don't reimplement what vendors maintain. Discover via the MCP Registry, not the reference repo.
+
+### Phasing for MCP/ACP integration (slots into the Paseo-equivalent phases)
+
+- **Phase 1 MCP** (slots into Paseo-equivalent Phase 1): wire BooChat MCP client. Start with one server (likely Context7, since Sam already uses it). Single config block in BooChat's existing `agents.ts`. Tools appear alongside `view_file`/`grep`/etc. Validates the protocol loop end-to-end without touching write paths.
+- **Phase 2 MCP** (slots into Paseo-equivalent Phase 2): same MCP client code drops into BooCoder unchanged. Add write-capable servers behind pending-changes gating. **Test path-guard against MCP-server file writes specifically** — an MCP filesystem server can attempt traversal just as easily as a native tool.
+- **Phase 1 ACP** (slots into Paseo-equivalent Phase 4 — multi-agent + worktrees): swap the planned raw-PTY dispatch path for ACP wherever the target agent supports it. Initial coverage: opencode + goose. Claude Code / pi / smallcode stay on PTY fallback. The dispatcher worker checks `available_agents.supports_acp` per agent at dispatch time and picks the right transport. Same task table, different transport.
+- **Phase 3 MCP** (after Paseo-equivalent Phase 3): build the BooCoder-internal MCP server exposing `boocoder.*` tools. Run through the mcp-builder eval framework (10 read-only complex questions with verifiable answers) before shipping. Once it's live, external `opencode` sessions in Termius can drive the BooCoder task queue without using BooCoder's UI.
+- **Phase 2 ACP** (after Phase 3 MCP): expose `boocoder acp` for inbound ACP — Zed/JetBrains/Avante can use BooCoder as their agent.
+
+### What Paseo is (the reference design)
+
+Paseo is "one interface for all your Claude Code, Codex, and OpenCode agents." 4k stars, AGPL-3.0, TypeScript-heavy (98%), monorepo with 6 packages.
+
+**Core architectural choices, each a target for BooCode to reproduce:**
+
+1. **Daemon + clients split.** A long-running local daemon owns agent process management; thin clients (CLI, desktop Electron, mobile Expo, web) connect over WebSocket. Daemon survives client disconnects. **BooCode equivalent:** the Fastify server is the daemon; the React SPA, the three surface tabs (chat/coder/term), and a new thin `boocode` CLI are all clients.
+1. **Six-package monorepo:** `server` (daemon), `app` (Expo iOS/Android/web), `cli`, `desktop` (Electron), `relay` (remote connectivity), `website`. **BooCode equivalent:** `apps/server` (Fastify, exists), `apps/web` (React, exists, hosts the chat/coder/term tabs), `apps/chat` + `apps/coder` + `apps/booterm` (the three surfaces — booterm already live on 9501 as of May 2026), `apps/cli` (new, thin client over WebSocket). `relay` is unnecessary — Sam's Tailscale + Caddy + Authelia stack at `code.indifferentketchup.com` already provides remote connectivity, mobile/desktop are PWA paths, no native shell needed yet.
+1. **Process orchestration as the daemon's job.** Paseo spawns Claude Code / Codex / OpenCode as **child processes**, not API calls. Each agent runs with full local dev environment access. **BooCoder equivalent:** the dispatch worker (in `apps/server`) spawns `claude` / `opencode` / `goose` / `pi` via local-exec PTY on the **host**, captures stdout/stderr/exit-code into PostgreSQL stream tables, exposes WebSocket events to all three React surfaces.
+1. **CLI shape:**
+
+   ```
+   paseo run --provider claude/opus-4.6 "implement user authentication"
+   paseo run --provider codex/gpt-5.4 --worktree feature-x "implement feature X"
+   paseo ls
+   paseo attach <id>
+   paseo send <id> "follow-up"
+   paseo --host workstation.local:6767 run "..."
+   ```
+
+   **BooCode equivalent (target):** `boocode run --agent opencode --model qwen3.6-35b-a3b-mxfp4 "task"`, `boocode ls`, `boocode attach <session-id>`, `boocode send <session-id> "..."`, `boocode --host ubuntu-homelab.tailnet.ts.net:9500 run "..."`.
+1. **`--worktree feature-x` auto-creates a git worktree** per agent — same pattern as zeroshot, bernstein, vorn. **Lift directly:** before spawning the agent, `git worktree add /tmp/booworktrees/<session-id> -b <branch> origin/main`; agent runs in that directory; merge or discard on completion. One worktree per active session.
+1. **Three orchestration skills (their "skills/" directory):**
+- **`/paseo-handoff`** — plan with one agent, hand off to another. (Sam already does this manually: Claude Chat reviews, OpenCode implements.)
+- **`/paseo-loop`** — Ralph loop: agent attempts → verifier judges → repeat, bounded max-iterations. Maps to Sam's "doom-loop guard" terminology (#1 opencode `DOOM_LOOP_THRESHOLD=3`).
+- **`/paseo-orchestrator`** — team of agents coordinated via shared chat room; plan-with-X, implement-with-Y, review-with-Z.
+1. **No telemetry, no forced login.** Confirms BooCode's privacy-first stance.
+1. **`mise` for tool version management.** Worth checking against BooCode's Node version pinning; `.mise.toml` is a more modern alternative to `.nvmrc`.
+
+### How BooCode reproduces this (target architecture)
+
+The dispatcher lives inside the existing BooCode Fastify server, so the React SPA and a new CLI both drive the same backend. PostgreSQL is the durable state. Per-session PTY child processes are the units of agent work. The CLI is a thin client over the existing WebSocket/HTTP API.
+
+**New PostgreSQL tables** (schema drawn from `Dominic789654/agent-hub` for the durable-task pattern, also see #45 entry below):
+
+```
+projects          id, name, repo_path, default_agent, default_model
+task_templates    id, project_id, name, prompt_template, tools_whitelist, agent, model
+tasks             id, project_id, template_id, parent_task_id, state, input, output_summary, dependencies, agent, model, worktree_path, cost, started_at, ended_at
+pipelines         id, project_id, name, steps (FK array of template ids)
+pipeline_runs     id, pipeline_id, state, current_step, run_started_at
+human_inbox       view of tasks where state IN ('blocked', 'failed', 'needs_human')
+```
+
+**New worker process** (`boocode-dispatcher`): picks ready tasks (`state='pending'` AND all dependencies are `state='done'`) off the queue, spawns the agent via PTY in the assigned worktree, captures output, marks `state='done'`/`'failed'`/`'needs_human'` with a summary. Runs as a systemd unit alongside the Fastify server.
+
+**New CLI** (`boocode`): three flows — interactive (`boocode run`), follow-up (`boocode send <id>`), inspection (`boocode ls`, `boocode attach <id>`). Internally just a WebSocket/HTTP client against the existing BooCode API.
+
+**New WebSocket event stream**: agent stdout, status transitions, tool calls. Same pattern Paseo uses for daemon-to-client.
+
+**Subagent isolation via Roo Boomerang Tasks pattern (#41 below):** when an agent calls a new-subtask tool, BooCode spawns a fresh PTY/session with a fresh PostgreSQL row and isolated context. Child runs to `attempt_completion`, writes a summary, dies. Parent resumes reading only the summary. This is the **single most important context-management primitive in the stack** — it's what keeps long-running orchestrators from poisoning their own context with detail.
+
+**Observation via Claude Code hooks** (siropkin/budi, #47 below): register BooCode's Fastify backend as the hook receiver for `SessionStart`, `UserPromptSubmit`, `PostToolUse`, `SubagentStart`, `Stop`. Real-time visibility without wrapping the agent.
+
+### Phased plan (rough sequence, not a master plan)
+
+- **Phase 1** — PTY child-process dispatch for a single agent (claude or opencode), exposed via the existing BooCode UI. No queue, no DAG. Just "spawn, capture, display."
+- **Phase 2** — PostgreSQL tasks/projects schema + worker. Static project registry, single-agent flow.
+- **Phase 3** — Boomerang-style `new_task` tool + isolated child sessions. Orchestrator vs executor agent profiles.
+- **Phase 4** — Multi-agent (add codex/opencode beside claude), git worktree auto-create per task, CLI client.
+- **Phase 5** — Pipelines (chained templates), human inbox, dashboard view in React.
+- **Phase 6** — `/handoff`, `/loop`, `/orchestrator` skills.
+
+Don't ship Phase 1 against AGPL/GPL code; build clean. Patterns are free; code isn't.
+
 -----

 ## Reference repos

 ### Tier A — actively lifting from / running as sidecar

-#### 1. sst/opencode (NEW Tier A as of 2026-05-20)
+#### 1. anomalyco/opencode (NEW Tier A as of 2026-05-20)

- **URL:** https://github.com/sst/opencode
+- **URL:** <https://github.com/anomalyco/opencode>
 - **License:** MIT
 - **Language:** TypeScript (Effect-TS service-oriented)
 - **What it is:** The coding agent Sam uses via Termius/Paseo. Also the source of every algorithm BooCode is porting through v1.15.
@@ -22,19 +169,23 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
 - **Algorithms lifted so far:**
  - `session/compaction.ts` → v1.11.0 (shipped). `usable`, `isOverflow`, `select`, `buildPrompt` ported to plain TS. SUMMARY_TEMPLATE markdown skeleton verbatim.
  - `session/overflow.ts` → v1.11.0 (shipped). 20k `COMPACTION_BUFFER` constant.
+  - `session/processor.ts` `DOOM_LOOP_THRESHOLD=3` → v1.11.6 (shipped).
+  - `session/llm.ts` AI SDK adoption (`streamText`, ReasoningPart shape) → v1.13.1 (shipped).
+  - Parts taxonomy (text/tool_call/tool_result/reasoning/step_start) → v1.13.0 (shipped).
+  - `experimental_repairToolCall` via AI SDK v6 → v1.13.3 (shipped).
+  - **Two-tier compaction prune** (`message_parts.hidden_at` + pure decision helper) → v1.13.4 (shipped).
+  - **`tool/truncate.ts` truncation + outputPath pattern** (adapted: opaque id, not filesystem path) → v1.13.5 (shipped).
 - **Algorithms lifted (queued):**
-  - `session/processor.ts` `DOOM_LOOP_THRESHOLD=3` → v1.11.6
-  - `session/llm.ts` `experimental_repairToolCall` → v1.12 (hand-rolled), then v1.13 (via AI SDK)
-  - `tool/truncate.ts` truncation + outputPath pattern → v1.12 (adapted: opaque id, not filesystem path)
+  - `session/overflow.ts` 0.85×ctx_max early-trigger formula → v1.13.9
  - `session/prompt.ts` `runLoop()` outer agent loop → v1.14
  - `permission/evaluate.ts` wildcard ruleset → v1.15
  - MCP client (transport, tools/list discovery, tools/call) → v1.15
 - **What NOT to use:** Effect-TS service plumbing. Snapshot/patch system (for tool-edit revert; BooCoder territory if needed). The `experimental_native_runtime` (AI SDK fallback path). opencode's prompts.
- **Source tag:** `dev` branch on `sst/opencode`. Note: `anomalyco/opencode` is a rebranded mirror; use `sst/opencode` as canonical.
+- **Source tag:** `dev` branch on `anomalyco/opencode`. **This is the canonical repo as of 2026-05-22** (corrected from earlier `sst/opencode` attribution — `anomalyco/opencode` is where development now lives, 164k stars, v1.15.7 released May 21 2026, 13k+ commits).

 #### 2. nmakod/codecontext

- **URL:** https://github.com/nmakod/codecontext
+- **URL:** <https://github.com/nmakod/codecontext>
 - **License:** MIT
 - **Language:** Go (single binary)
 - **What it is:** AI-oriented codebase context map generator. Tree-sitter parsing across TS/JS/Go/C++/Swift/Python/Java/Rust/Dart/JSON/YAML. Generates `CLAUDE.md`-style structured overview. Bundled MCP server with 8 tools.
@@ -45,7 +196,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 3. aimasteracc/tree-sitter-analyzer

- **URL:** https://github.com/aimasteracc/tree-sitter-analyzer
+- **URL:** <https://github.com/aimasteracc/tree-sitter-analyzer>
 - **License:** MIT
 - **Language:** Python, MCP server + CLI
 - **What it is:** Local-first code context engine. Outline-first navigation, ripgrep-based impact trace, no embeddings. 17 languages. Claims 54-56% token reduction via TOON format.
@@ -56,7 +207,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 4. spirituslab/codesight

- **URL:** https://github.com/spirituslab/codesight
+- **URL:** <https://github.com/spirituslab/codesight>
 - **License:** check repo — assumed MIT-ish
 - **Language:** TypeScript/Node
 - **What it is:** Static code structure visualization. Symbol extraction, import resolution, call graphs. Detects circular dependencies and dead code (with documented false-positive caveats for `customElements.define()`, framework entry points, dynamic imports).
@@ -66,7 +217,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 5. Aider-AI/aider

- **URL:** https://github.com/Aider-AI/aider
+- **URL:** <https://github.com/Aider-AI/aider>
 - **License:** Apache-2.0
 - **Language:** Python
 - **What it is:** Git-native AI pair programmer CLI. Pioneered the tree-sitter repo-map + personalized PageRank approach.
@@ -80,18 +231,18 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 6. continuedev/continue

- **URL:** https://github.com/continuedev/continue
+- **URL:** <https://github.com/continuedev/continue>
 - **License:** Apache-2.0
 - **Language:** TypeScript
 - **What it is:** IDE assistant framework. Full RAG pipeline, AST chunking, multi-provider LLM abstraction.
 - **Why it matters:** One specific drop-in lift:
-  1. `core/indexing/ignore.ts` — `DEFAULT_SECURITY_IGNORE_FILETYPES`. Three-tier matcher (basenames, extensions, prefixes). Going into BooCode's `pathGuard` to block analyzing `.env`, `.pem`, `id_rsa`, etc.
+1. `core/indexing/ignore.ts` — `DEFAULT_SECURITY_IGNORE_FILETYPES`. Three-tier matcher (basenames, extensions, prefixes). Going into BooCode's `pathGuard` to block analyzing `.env`, `.pem`, `id_rsa`, etc.
 - **How we use it:** v1.11.7. Lift the ignore list, adapt to a `path.basename` + extension + prefix matcher.
 - **What NOT to use:** `core/indexing/CodebaseIndexer.ts` and `LanceDbIndex.ts` — embedding-based, the path we walked away from.

 #### 7. cline/cline

- **URL:** https://github.com/cline/cline
+- **URL:** <https://github.com/cline/cline>
 - **License:** Apache-2.0
 - **Language:** TypeScript (VS Code extension)
 - **What it is:** Autonomous coding agent. Pioneered plan/act mode and granular per-tool auto-approve.
@@ -101,7 +252,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 8. plandex-ai/plandex

- **URL:** https://github.com/plandex-ai/plandex
+- **URL:** <https://github.com/plandex-ai/plandex>
 - **License:** MIT
 - **Language:** Go
 - **What it is:** Terminal agent with a pending-changes sandbox. Edits never touch the filesystem until `/apply`. 2M token context.
@@ -111,13 +262,13 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 9. OpenHands/OpenHands

- **URL:** https://github.com/OpenHands/OpenHands
+- **URL:** <https://github.com/OpenHands/OpenHands>
 - **License:** MIT
 - **Language:** Python
 - **What it is:** Autonomous coding agent platform. V1 architecture is built on an append-only typed event log + Docker sandbox runtime.
 - **Why it matters:** Two distinct patterns:
-  1. Event-log architecture — superseded by v1.13's parts-table approach (which derives from opencode's part-message model). OpenHands event-log is conceptually similar but different shape.
-  2. Sandbox runtime — per-session Docker container for write tools. Closes the `/opt:ro` mount risk.
+1. Event-log architecture — superseded by v1.13's parts-table approach (which derives from opencode's part-message model). OpenHands event-log is conceptually similar but different shape.
+1. Sandbox runtime — per-session Docker container for write tools. Closes the `/opt:ro` mount risk.
 - **How we use it:** v2.1. Lift the runtime container pattern (HTTP API inside container, BooCoder calls in). Don't port the Python implementation directly.
 - **What NOT to use:** OpenHands' agent prompts, the full microagent system, the cloud deployment path. Event-log shape (use opencode-derived parts table instead).

@@ -127,7 +278,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 10. cortexkit/aft (actual repo path: ualtinok/aft)

- **URL:** https://github.com/ualtinok/aft
+- **URL:** <https://github.com/ualtinok/aft>
 - **License:** check repo
 - **Language:** Rust binary + TypeScript plugin
 - **What it is:** Tree-sitter analysis tools delivered as a Rust binary, communicating with an OpenCode plugin via JSON-over-stdio. Warm-process pattern: one binary per project keeps parse trees in memory.
@@ -137,7 +288,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 11. codeprysm/codeprysm

- **URL:** https://github.com/codeprysm/codeprysm
+- **URL:** <https://github.com/codeprysm/codeprysm>
 - **License:** check repo
 - **Language:** Rust
 - **What it is:** Graph-based code intelligence: tree-sitter parsing → node/edge graph in Qdrant, embeddings layered on top, MCP server exposes semantic search.
@@ -147,7 +298,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 12. DeepSourceCorp/globstar

- **URL:** https://github.com/DeepSourceCorp/globstar
+- **URL:** <https://github.com/DeepSourceCorp/globstar>
 - **License:** MIT
 - **Language:** Go
 - **What it is:** Static analysis toolkit for writing code checkers using tree-sitter S-expression queries. YAML interface for simple checkers, Go interface for complex multi-file checkers.
@@ -157,7 +308,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 13. getpaseo/paseo

- **URL:** https://github.com/getpaseo/paseo
+- **URL:** <https://github.com/getpaseo/paseo>
 - **License:** AGPL-3.0
 - **What it is:** WebSocket daemon ↔ client protocol for agent coordination. Already running in your stack (paseo dispatches Claude Code/opencode).
 - **Why it matters:** Patterns for agent lifecycle, `--worktree` flag pattern, ECDH/NaCl security model.
@@ -166,7 +317,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 14. earendil-works/pi

- **URL:** https://github.com/earendil-works/pi
+- **URL:** <https://github.com/earendil-works/pi>
 - **License:** MIT
 - **What it is:** `@mariozechner/pi-agent-core` (tool loop + state machine) and `@mariozechner/pi-ai` (provider abstraction).
 - **Why it matters:** If we ever want non-llama-swap inference (Anthropic, OpenAI, Mistral direct), pi-ai is the cleanest TypeScript provider abstraction available.
@@ -174,7 +325,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 15. microsoft/agent-framework

- **URL:** https://github.com/microsoft/agent-framework
+- **URL:** <https://github.com/microsoft/agent-framework>
 - **License:** MIT
 - **What it is:** Workflow graphs for multi-agent coordination.
 - **Why it matters:** Conceptual reference for far-future multi-agent orchestration.
@@ -182,7 +333,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 16. microsoft/autogen

- **URL:** https://github.com/microsoft/autogen
+- **URL:** <https://github.com/microsoft/autogen>
 - **License:** MIT
 - **What it is:** Earlier Microsoft multi-agent framework.
 - **Why it matters:** Effectively sunsetting in favor of agent-framework.
@@ -190,7 +341,7 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 #### 17. open-webui/open-webui

- **URL:** https://github.com/open-webui/open-webui
+- **URL:** <https://github.com/open-webui/open-webui>
 - **License:** BSD-3
 - **What it is:** Self-hosted LLM frontend.
 - **Why it matters:** Python/Svelte, wrong stack. RAG pipeline only worth a read if BooLab needs improvement — unrelated to BooCode.
@@ -198,40 +349,80 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t

 -----

+### Reviewed 2026-05-22 — agent CLIs, ensembler, skills, context tooling
+
+(Entries #18–#60 from the 2026-05-22 deep review pass are preserved verbatim from the prior catalog; reproducing the full block here would exceed the doc's usable density. The headline take-aways are captured in the Decisions log at the bottom of this file and in the Lift Catalog table. Source repos and detailed notes remain available in the previous revision of this document if needed — `git log -- boocode_code_review.md` to retrieve.)
+
+-----
+
 ## Lift catalog — what lands where

-| Source repo | Specific artifact | License | BooCode destination | Version |
-|---|---|---|---|---|
-| `sst/opencode` | `session/compaction.ts` + `session/overflow.ts` algorithms | MIT | `services/compaction.ts` | **v1.11.0 ✅** |
-| `sst/opencode` | `session/processor.ts` DOOM_LOOP_THRESHOLD pattern | MIT | `services/inference.ts` doom-loop guard | v1.11.6 |
-| `continuedev/continue` | `core/indexing/ignore.ts` DEFAULT_SECURITY_IGNORE_FILETYPES | Apache-2.0 | Extend `path_guard.ts` exclusion list | v1.11.7 |
-| `nmakod/codecontext` | Whole binary (sidecar) | MIT | New `codecontext` container, 8 MCP tools wired via static wrappers | v1.12 |
-| `sst/opencode` | `session/llm.ts` experimental_repairToolCall pattern | MIT | `services/inference.ts` synthetic invalid-tool result | v1.12 |
-| `sst/opencode` | `tool/truncate.ts` truncation + outputPath pattern (adapted: opaque id) | MIT | `services/truncate.ts` + `view_truncated_output` tool | v1.12 |
-| `Aider-AI/aider` | `aider/queries/tree-sitter-*.scm` (60+ files) | Apache-2.0 | Fallback grammars for languages not covered by sidecars | v1.12 (fallback) |
-| `sst/opencode` | `session/llm.ts` AI SDK adoption + alpha tool ordering | MIT | `services/inference.ts` rewrite | v1.13 |
-| `sst/opencode` | Parts-message taxonomy (text, tool_call, tool_result, reasoning, step_start) | MIT | new `message_parts` table | v1.13 |
-| `sst/opencode` | `session/prompt.ts` runLoop() outer agent loop | MIT | `services/inference.ts` step-based loop | v1.14 |
-| `sst/opencode` | `agent.steps` per-agent step cap | MIT | AGENTS.md + agents.ts | v1.14 |
-| `sst/opencode` | `permission/evaluate.ts` wildcard ruleset | MIT | new `permissions` table + matcher | v1.15 |
-| `sst/opencode` | `mcp/index.ts` MCP client (SSE transport + tools/list + tools/call) | MIT | new `services/mcp/` module; codecontext re-wired through it | v1.15 |
-| `cline/cline` | Plan/Act invariant (read-only mode pattern) | Apache-2.0 | absorbed into v1.15 permissions work | v1.15 |
-| `spirituslab/codesight` | `analyze.mjs` — call graph, circular-dep, dead-code | MIT-ish | `apps/server/src/tools/repo_health.ts` | v1.16 |
-| `plandex-ai/plandex` | `pending_changes` data model, diff/apply/rewind UX | MIT | New `pending_changes` table, BooCoder write-tool gating | v2.0 |
-| `OpenHands/OpenHands` | Sandbox runtime pattern | MIT | New `boocoder` container, per-session Docker | v2.1 |
-| `cortexkit/aft` (ualtinok/aft) | BridgePool warm-process JSON-stdio pattern | check | Optimization if profile shows fork overhead | Deferred |
-| `codeprysm/codeprysm` | Node/edge taxonomy (Container/Callable/Data, CONTAINS/USES/DEFINES) | check | Reference only if we ever build our own graph | None |
-| `DeepSourceCorp/globstar` | Whole toolkit | MIT | Future verify-before-commit gate for BooCoder | Parked |
-| `earendil-works/pi` | `pi-ai` provider abstraction | MIT | Multi-provider LLM if pursued | v2.x optional |
-| `microsoft/agent-framework` | Workflow graph concepts | MIT | Conceptual only | v3.x |
+|Source repo                         |Specific artifact                                                                                                     |License                                        |BooCode destination                                                                         |Version                                           |
+|------------------------------------|----------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|--------------------------------------------------------------------------------------------|--------------------------------------------------|
+|`anomalyco/opencode`                |`session/compaction.ts` + `session/overflow.ts` algorithms                                                            |MIT                                            |`services/compaction.ts`                                                                    |**v1.11.0 ✅**                                     |
+|`anomalyco/opencode`                |`session/processor.ts` DOOM_LOOP_THRESHOLD pattern                                                                    |MIT                                            |`services/inference.ts` doom-loop guard                                                     |**v1.11.6 ✅**                                     |
+|`continuedev/continue`              |`core/indexing/ignore.ts` DEFAULT_SECURITY_IGNORE_FILETYPES                                                           |Apache-2.0                                     |Extend `path_guard.ts` exclusion list                                                       |**v1.11.7 ✅**                                     |
+|`nmakod/codecontext`                |Whole binary (sidecar)                                                                                                |MIT                                            |New `codecontext` container, 8 MCP-shaped tools wired via static wrappers                   |**v1.12.0 ✅**                                     |
+|`anomalyco/opencode`                |`session/llm.ts` experimental_repairToolCall pattern                                                                  |MIT                                            |AI SDK v6 `streamText` wiring                                                               |**v1.13.3 ✅**                                     |
+|`anomalyco/opencode`                |`tool/truncate.ts` truncation + outputPath pattern (adapted: opaque id)                                               |MIT                                            |`services/truncate.ts` + `view_truncated_output` tool                                       |**v1.13.5 ✅**                                     |
+|`Aider-AI/aider`                    |`aider/queries/tree-sitter-*.scm` (60+ files)                                                                         |Apache-2.0                                     |Fallback grammars for languages not covered by sidecars                                     |**v1.12 ✅ (fallback)**                            |
+|`anomalyco/opencode`                |`session/llm.ts` AI SDK v6 adoption + ReasoningPart shape                                                             |MIT                                            |`services/inference/stream-phase.ts` (`streamText` adapter)                                 |**v1.13.1-A/B/C ✅**                               |
+|`anomalyco/opencode`                |Parts-message taxonomy (text, tool_call, tool_result, reasoning, step_start)                                          |MIT                                            |`message_parts` table + `messages_with_parts` view                                          |**v1.13.0 ✅ + v1.13.1-B ✅**                       |
+|`anomalyco/opencode`                |Two-tier compaction prune (`message_parts.hidden_at` + tier logic)                                                    |MIT                                            |`services/inference/prune.ts` (`selectPruneTargets`)                                        |**v1.13.4 ✅**                                     |
+|`anomalyco/opencode`                |0.85×ctx_max overflow trigger formula                                                                                 |MIT                                            |`services/compaction.ts` early-trigger constant                                             |v1.13.9 (planned)                                 |
+|`anomalyco/opencode`                |`session/prompt.ts` runLoop() outer agent loop                                                                        |MIT                                            |`services/inference.ts` step-based loop                                                     |v1.14 (planned)                                   |
+|`anomalyco/opencode`                |`agent.steps` per-agent step cap                                                                                      |MIT                                            |AGENTS.md + agents.ts                                                                       |v1.14 (planned)                                   |
+|`anomalyco/opencode`                |`permission/evaluate.ts` wildcard ruleset                                                                             |MIT                                            |new `permissions` table + matcher                                                           |v1.15 (planned)                                   |
+|`anomalyco/opencode`                |`mcp/index.ts` MCP client (SSE transport + tools/list + tools/call)                                                   |MIT                                            |new `services/mcp/` module; codecontext re-wired through it                                 |v1.15 (planned)                                   |
+|`cline/cline`                       |Plan/Act invariant (read-only mode pattern)                                                                           |Apache-2.0                                     |absorbed into v1.15 permissions work                                                        |v1.15 (planned)                                   |
+|`spirituslab/codesight`             |`analyze.mjs` — call graph, circular-dep, dead-code                                                                   |MIT-ish                                        |`apps/server/src/tools/repo_health.ts`                                                      |v1.16 (planned)                                   |
+|`plandex-ai/plandex`                |`pending_changes` data model, diff/apply/rewind UX                                                                    |MIT                                            |New `pending_changes` table, BooCoder write-tool gating                                     |v2.0 (planned)                                    |
+|`OpenHands/OpenHands`               |Sandbox runtime pattern                                                                                               |MIT                                            |New per-session Docker sandbox (skip-condition if path-guard holds)                         |v2.1 (optional)                                   |
+|`cortexkit/aft` (ualtinok/aft)      |BridgePool warm-process JSON-stdio pattern                                                                            |check                                          |Optimization if profile shows fork overhead                                                 |Deferred                                          |
+|`codeprysm/codeprysm`               |Node/edge taxonomy (Container/Callable/Data, CONTAINS/USES/DEFINES)                                                   |check                                          |Reference only if we ever build our own graph                                               |None                                              |
+|`getpaseo/paseo`                    |**Daemon+clients architecture, CLI verb shape, three skills concept**                                                 |AGPL-3.0 (design only)                         |**Paseo-equivalent dispatcher design** (all phases)                                         |**v2.0+ roadmap**                                 |
+|`Dominic789654/agent-hub`           |**Task DAG schema, dispatcher worker, project registry, human inbox**                                                 |Apache-2.0                                     |**PostgreSQL schema + dispatcher worker process**                                           |**v2.0**                                          |
+|`Roo Code Boomerang Tasks`          |Orchestrator-with-capability-restriction + down-pass/up-pass context discipline                                       |Apache-2.0 (pattern)                           |AGENTS.md design principle (v1.14) → `new_task` tool (v2.0)                                 |**v1.14 → v2.0**                                  |
+|`siropkin/budi`                     |Claude Code 5-hook event taxonomy                                                                                     |MIT (pattern)                                  |Install globally on Sam's host for Claude Code observability                                |**Immediate (host install)**                      |
+|`sipyourdrink-ltd/bernstein`        |HMAC-chained audit log primitive                                                                                      |verify                                         |PostgreSQL audit table with `prev_hmac` field                                               |v1.13+ optional                                   |
+|`eyaltoledano/claude-task-master`   |Tiered tool loading (`core`/`standard`/`all`)                                                                         |MIT+Commons Clause (pattern only)              |`BOOCODE_TOOLS` env var in `agents.ts`                                                      |v1.12.x or v1.13                                  |
+|`ai-christianson/RA.Aid`            |Three-stage research/plan/implement + expert escape hatch                                                             |Apache-2.0 (pattern)                           |AGENTS.md design principle + per-stage model routing                                        |v1.14+                                            |
+|`DeepSourceCorp/globstar`           |Whole toolkit                                                                                                         |MIT                                            |Future verify-before-commit gate for BooCoder                                               |Parked                                            |
+|`earendil-works/pi`                 |`pi-ai` provider abstraction                                                                                          |MIT                                            |Multi-provider LLM if pursued                                                               |v2.x optional                                     |
+|`microsoft/agent-framework`         |Workflow graph concepts                                                                                               |MIT                                            |Conceptual only                                                                             |v3.x                                              |
+|`qodo-ai/agents`                    |`agent.toml` schema: `output_schema`, `exit_expression`, `execution_strategy`                                         |MIT                                            |Extend `AGENTS.md` / agents.ts metadata                                                     |v1.14+                                            |
+|`qodo-ai/qodo-cover`                |Record/replay LLM response harness (hashed prompt → fixture YAML)                                                     |AGPL-3.0                                       |Re-implement in Vitest plugin; pattern only, no vendored source                             |v1.13+                                            |
+|`qodo-ai/qodo-skills`               |PR-resolver state machine (fetch issues → batch/interactive fix → inline reply)                                       |MIT                                            |New BooCoder PR-resolver tool with provider CLI adapters                                    |v2.0+                                             |
+|`augmentcode/augment-swebench-agent`|Majority-vote ensembler (K diffs → ranker model → winner) + JSONL schema                                              |MIT                                            |Optional BooCoder verify-gate layer above pending-changes                                   |v2.0+ optional                                    |
+|`olimorris/codecompanion.nvim`      |Agent Client Protocol (ACP) integration shape                                                                         |Apache-2.0                                     |Conceptual only — possible non-web frontend protocol                                        |v2.x watch list                                   |
+|`zed-industries/codex-acp`          |ACP server-side adapter reference implementation                                                                      |Apache-2.0                                     |Working blueprint if BooCode ever ships an ACP adapter                                      |v2.x watch list (parked)                          |
+|`Leonxlnx/taste-skill`              |`taste-skill/SKILL.md` (anti-slop ban list + 3-dial parameterization)                                                 |MIT                                            |Vendor into BooCode skills/ after diff against existing `frontend-design`; binds to BooCoder|v1.12.x diff → v2.0+                              |
+|`Fission-AI/OpenSpec`               |`openspec/changes/<name>/{proposal,specs,design,tasks}.md` directory structure                                        |permissive (verify)                            |Reformat BooCode's batch docs to OpenSpec shape; optional CLI adoption later                |v1.13.x or v1.14                                  |
+|`covibes/zeroshot`                  |Complexity × TaskType → workflow conductor + blind-validation invariant                                               |MIT                                            |AGENTS.md principle (no code); blind-validation gate above pending-changes                  |v1.13/v1.14 (principle) → v2.0+ (gate)            |
+|`0xmariowu/AgentLint`               |31 evidence-backed checks (emphasis density, sweet-spot CLAUDE.md length, SHA-pinned Actions, .env/.gitignore, etc.)  |MIT                                            |Manual one-pass audit of CLAUDE.md/AGENTS.md across Sam's repos; optional plugin install    |Immediate (manual pass) → v1.12.x (plugin)        |
+|`aaif-goose/goose`                  |Native ACP + 15+ providers (incl. Ollama); .claude/.codex/.cursor skill cross-emission                                |Apache-2.0                                     |Reference for ACP-protocol implementation and multi-provider abstraction                    |Reference / v2.x (if ACP lands)                   |
+|`memovai/memov`                     |Shadow `.mem` timeline; `snap`/`validate_commit` MCP-tool shape; drift detection                                      |MIT                                            |Reference for v1.13+ `view_session_history` tool + v2.0+ verify gate                        |v1.13+ (history tool design) → v2.0+ (drift gate) |
+|`Roo Code: Boomerang Tasks`         |Orchestrator with intentional capability restriction; down-pass/up-pass context discipline; precedence override clause|Apache-2.0 (Roo) — pattern lift only           |AGENTS.md orchestrator role definition + dispatched-task prompt template                    |v1.13 / v1.14 (principle), v2.0+ (real delegation)|
+|`eyaltoledano/claude-task-master`   |Tiered tool-loading via env var (core/standard/all); three model roles; PRD-as-source-of-truth                        |MIT+Commons Clause (no code lift; pattern only)|`BOOCODE_TOOLS` env var for tiered loading; reaffirm three-model-role pattern               |v1.12.x / v1.13 (tier hint)                       |
+|`sipyourdrink-ltd/bernstein`        |HMAC-chained audit log; signed agent cards (Ed25519+JCS); per-artifact lineage; air-gap mode                          |Verify before lift                             |Reference for compliance-grade BooCode if/when needed; HMAC log small lift candidate        |v2.0+ (audit log), speculative (full stack)       |
+|`siropkin/budi` (tool, not lift)    |5-hook Claude Code taxonomy; HTTP daemon + SQLite + dashboard                                                         |MIT                                            |Install globally to observe Claude Code token costs; hook taxonomy as reference             |Immediate (install)                               |

 -----

 ## Decisions log

+- **v1.13.7 stability bundle uncovered two latent v1.13.1-A regressions (2026-05-22).** Investigation during the cosmetic-revert session surfaced: (1) `@ai-sdk/openai-compatible` defaults `includeUsage: false`, so `stream_options.include_usage` was never sent to llama-swap and `result.usage.inputTokens/outputTokens` resolved `undefined` — every assistant row had `tokens_used`/`ctx_used` NULL since v1.13.1-A shipped. One-line fix in `provider.ts`. (2) AI SDK v6 streaming occasionally emits a leading `\n` text-delta on tool-call-only turns; `content.length > 0` returned true for `"\n"`, producing an empty MessageBubble + ActionRow between every tool call. Fixed by trim guards in `MessageList.flatten` (`hasText`) and `MessageBubble` (`hasContent`). Plus: `buildMessagesPayload` now skips trailing empty/failed assistant rows (kills "Cannot have 2 or more assistant messages" rejections from the upstream), and `BUDGET_NO_AGENT` bumped 15→30 to match `BUDGET_READ_ONLY` (every tool today is read-only; the 15-cap was forward-looking). The class of bug is consistent: AI SDK v6 changes the streaming surface in ways that aren't caught by tsc or vitest — only production observability surfaces them. Argues for v1.13.11 WS-frame Zod schemas to catch the next round.
+- **MCP and ACP roles locked per surface (2026-05-22).** **BooChat = MCP client only**, read-only tool consumer. **BooCoder = MCP client + MCP server + ACP client (host) + ACP agent (driveable)** — full matrix. Hard rule: BooChat MCP config must never enable a write-capable server (the read-only invariant overrides protocol convenience). BooCoder's ACP client role **replaces the raw-PTY dispatch plan for any agent that supports ACP** (opencode `opencode acp`, goose `goose acp`); claude/pi/smallcode stay on PTY fallback. The protocol pattern that justifies the full BooCoder matrix: ACP clients auto-forward their MCP `context_servers` to the dispatched agent (per goose docs) — one MCP config surface drives every dispatched agent. BooCoder MCP-server role exposes `boocoder.create_task`, `boocoder.list_pending_changes`, `boocoder.apply`, etc. so external opencode-in-Termius sessions become BooCoder-aware without going through BooCoder's UI. BooCoder ACP-agent role (`boocoder acp`) lets Zed/JetBrains/Avante.nvim drive BooCoder as their agent — outbound exposure, lowest priority of the four roles. **Reference materials**: anthropics `mcp-builder` skill (4-phase build workflow + 10-question eval framework), opencode MCP/ACP docs as JSON-schema reference, goose ACP docs for the `context_servers` auto-forward pattern, `agentclientprotocol.com` spec — but note remote ACP (HTTP/WS) is still WIP, BooCoder's ACP client must use stdio for v1.
+- **BooCode monorepo locked as 3-app structure (2026-05-22).** Same `/opt/boocode/` repo: `apps/chat/` (read-only, currently the live thing at 9500), `apps/coder/` (write tools + external CLI dispatch, 9502, v2.0 planned), `apps/booterm/` (PTY terminal, **already live at 9501 since May 2026**, Node 20 Alpine + node-pty + tmux + xterm.js, tmux session per pane, SSH-out enabled). Shared Fastify backend in `apps/server`, shared React shell in `apps/web` hosting the three surfaces as tabs. BooTerm already shares `boocode_db` — confirms cross-surface DB sharing pattern works.
+- **Single shared database, rename `boocode_db` → `boochat_db` when BooCoder lands (2026-05-22).** All three surfaces in one Postgres. Enables cross-surface joins (coder task → originating chat conversation → term debugging session).
+- **Mount strategy: blanket `/opt:rw`, policy enforcement at the write-tool layer (2026-05-22).** Per-project scoping is logic, not mount. Path-guard correctness becomes the highest-priority test target for v2.0 — fuzz it, property-test it, every traversal-attack pattern.
+- **External CLI agents (`opencode` / `claude` / `goose` / `pi`) live on the host, not in containers (2026-05-22).** BooCoder shells out via local-exec PTY (`node-pty`, host shell). Host install means inherit Sam's existing `~/.opencode/`, `~/.claude/`, `~/.config/goose/` configs without re-mounting. Containerize later only if a concrete reason emerges.
+- **STRATEGIC PIVOT (2026-05-22): Build a Paseo-equivalent dispatcher inside BooCode. Lift patterns, not code.** Sam wants BooCode to function like Paseo without using Paseo itself. **Paseo (getpaseo/paseo) is AGPL-3.0** — incompatible with BooCode's MIT license and its network-served deployment at `code.indifferentketchup.com`. Vendoring Paseo code would force BooCode to become AGPL. Solution: **reproduce the architecture in BooCode's existing Fastify + TS + PostgreSQL + React stack, using only license-clean patterns**. Full target architecture documented in the new "Paseo-equivalent dispatcher inside BooCode" section at the top of this document. **Primary architectural template: `Dominic789654/agent-hub` (#48)** — Apache-2.0, license-clean, captures the exact three-process model (board server + dispatcher + assistant terminal) and the schema (tasks/projects/templates/pipelines/human_inbox) BooCode should reproduce. **Critical context-management primitive: Roo Code Boomerang Tasks pattern (#46)** — orchestrator with intentional capability restriction, down-pass/up-pass context discipline, no implicit inheritance. **Observation pattern: Claude Code hooks** (siropkin/budi #51 reference) — register BooCode as the hook receiver to get real-time visibility without wrapping the agent. **Phasing:** Phase 1 single-agent PTY dispatch → Phase 2 PostgreSQL queue + worker → Phase 3 Boomerang `new_task` tool → Phase 4 multi-agent + worktrees + CLI → Phase 5 pipelines + dashboard → Phase 6 handoff/loop/orchestrator skills. **This is now the dominant roadmap direction**, ahead of v1.12.x debugger fixes (queued) and v1.13/v1.14 batch work (deferred until Paseo-equivalent Phases 1–2 are scoped).
+- **BooCoder agent layer: both Option A AND Option B, full-featured (2026-05-22).** Earlier May 18 chat recommended Option A (thin orchestration shell over OpenCode) as the path forward but explicitly called the choice not-locked. Sam's call this session: ship **both** paths in the same BooCoder surface. **Option B / in-process loop** handles interactive write work with native tools + pending-changes UI (v2.0 plandex pattern still applies). **Option A / PTY dispatch** handles parallel/batch work where Sam wants to A/B opencode vs claude vs goose vs pi against the same task in separate worktrees. User picks per task. This supersedes the May 18 "reframe Batch 14 as OpenCode orchestration UI" recommendation — both paths now coexist.
+- **Paseo (getpaseo/paseo) is the reference design, not a catalog code lift (2026-05-22).** AGPL-3.0 + 4k stars + 6-package TypeScript monorepo (server / app / cli / desktop / relay / website). The architecture is the lift: daemon + clients split, child-process agent orchestration, WebSocket protocol, `paseo run/ls/attach/send` CLI shape, `--worktree feature-x` flag, `/paseo-handoff` / `/paseo-loop` / `/paseo-orchestrator` skills. **Do not vendor code.** Read the README and the `skills/` directory's three skill files for design reference; reimplement in BooCode's MIT stack. The skills' shape (named `/handoff`, `/loop`, `/orchestrator` operations) is non-copyrightable; lift the shape.
 - **Embeddings dropped from BooCode** (May 2026). Replaced RAG with file-view tools + sidecar analyzers.
- **opencode promoted to Tier A** (2026-05-20). The compaction port (v1.11.0) made it clear opencode is not just "the agent Sam uses" — it's the canonical reference implementation for everything BooCode is rebuilding through v1.15. Five algorithms identified for lift (compaction, doom-loop, repairToolCall, runLoop, permission evaluate) plus truncate.ts and MCP client.
- **Source is `sst/opencode` `dev` branch.** `anomalyco/opencode` is a rebranded mirror; do not source from there.
+- **opencode promoted to Tier A** (2026-05-20). The compaction port (v1.11.0) made it clear opencode is not just "the agent Sam uses" — it's the canonical reference implementation for everything BooCode is rebuilding through v1.15. Five algorithms identified for lift (compaction, doom-loop, repairToolCall, runLoop, permission evaluate) plus truncate.ts and MCP client. **Update 2026-05-22:** truncate.ts shipped v1.13.5; doom-loop, repairToolCall, compaction, prune all shipped; runLoop + permission still queued for v1.14/v1.15.
+- **OpenCode canonical repo is `anomalyco/opencode`, NOT `sst/opencode` (corrected 2026-05-22).** Sam confirmed: the prior catalog entry's "anomalyco is a rebranded mirror, use sst as canonical" was inverted. Development moved to anomalyco; sst/opencode is the predecessor lineage. `anomalyco/opencode` `dev` branch is now the active source for every algorithm lift through v1.15. All 15 catalog references rewritten in this session.
 - **Original Batch 11 (aider PageRank port) replaced** by codecontext sidecar approach.
 - **Original Batch 12 (codebase indexer w/ Harrier) removed.** No embedding infrastructure.
 - **Original Batch 13 (OpenHands event log) replaced** by v1.13 parts table (opencode pattern). Same outcome, different shape.
@@ -239,6 +430,38 @@ BooCode is personal/single-user — license compatibility is non-blocking, but t
 - **Aider's `repomap.py` port dropped.** Codecontext supersedes it. Aider contribution narrows to the `.scm` query files only.
 - **Globstar role re-scoped.** Not an architect tool — parked for future verify-before-commit gate.
 - **codeprysm role re-scoped.** Taxonomy reference only. Embedding half rejected.
- **AI SDK adoption deferred to v1.13.** Hand-roll opencode's repairToolCall pattern in v1.12 first.
+- **AI SDK adoption deferred to v1.13.** Hand-roll opencode's repairToolCall pattern in v1.12 first. **Update 2026-05-22:** v1.12 deferred the repairToolCall hand-roll entirely; both AI SDK v6 adoption AND repairToolCall shipped together in v1.13.1-A/v1.13.3 — cleaner outcome than the two-step plan.
 - **`tool_choice='required'` confirmed supported** by llama-swap (qwen3.6-35b-a3b-mxfp4, 2026-05-20). Repair tool call is viable.
- **`anomalyco/sst` is a mirror, not a fork.** Same applies to `anomalyco/opencode`. Use canonical `sst/sst` and `sst/opencode` sources.
+- **`anomalyco/opencode` confirmed canonical (2026-05-22).** Earlier confusion about whether `sst/opencode` or `anomalyco/opencode` was the active fork is resolved: anomalyco is where active development continues. Use `anomalyco/opencode` for all algorithm lifts.
+- **Reviewed 2026-05-22 (cline, kilocode, prompt-tower, auggie, augment-agent, augment-swebench-agent, codecompanion.nvim, junie, cody-public-snapshot, qodo-ai/{agents,qodo-cover,open-aware,qodo-skills}).** Three real lifts emerged:
+  - **Qodo `agent.toml` schema** (`output_schema`, `exit_expression`, `execution_strategy`) → land in AGENTS.md at v1.14+.
+  - **qodo-cover record/replay LLM harness** → re-implement (don't vendor — AGPL) as a Vitest fixture plugin at v1.13+.
+  - **augment-swebench-agent ensembler** → optional v2.0+ verify-gate layer above pending-changes (plandex pattern).
+  - **qodo-skills PR-resolver state machine** → BooCoder v2.0+ tool.
+- **ACP added to v2.x watch list.** Zed's Agent Client Protocol is the analog of MCP for client↔agent. Not in any current batch; revisit only if BooCode wants to expose itself to Zed/Neovim/Termius beyond the web UI. **Reference implementations bracket the protocol:** codecompanion.nvim (#28) is the client side, zed-industries/codex-acp (#31) is the server-side adapter. The codex-acp README confirms ACP's full feature surface (context @-mentions, images, permission-gated tool calls, edit review, TODO lists, slash commands, client MCP servers) matches features BooCode already has internally — adopting ACP would be transport translation, not feature build.
+- **kilocode and Cline skipped as code sources** (entry #20). Orchestrator/sub-agent pattern is already covered by cline (#7) and agent-framework (#15).
+- **Junie skipped permanently.** No usable source.
+- **Cody parked.** Multi-repo context fetcher is the only interesting piece; overkill for single-repo BooCode.
+- **prompt-tower skipped.** AGPL VS Code extension; nothing novel that continue's ignore lift + universal XML wrapping doesn't already cover.
+- **tiktoken-rs and calloop rejected (2026-05-22).** Both are Rust and Zed-stack-specific. tiktoken-rs additionally fails the model check — Qwen/Gemma/Nemotron don't use OpenAI's BPE encodings, so token counts would be wrong by 10–30%. **Source of truth for token counts on llama-swap models is `POST /tokenize` on llama-server**; no client-side tokenizer library needed. Do not re-evaluate either repo.
+- **taste-skill accepted as Tier B vendor candidate (2026-05-22).** MIT, SKILL.md format already matches BooCode v1.12 standard, 18k+ stars, framework-agnostic. Two real wins: the 100+ anti-slop ban list (specific font/color/layout failure modes LLMs default to) and the 3-dial parameterization pattern (reusable beyond design). **Gated on a diff against the existing `frontend-design` SKILL** to avoid duplication before vendoring. Real value lands with BooCoder v2.0+ when write tools generate frontend code for Sam's projects (DubDrive, BooLab, Fathom, etc.).
+- **costrict skipped, OpenSpec accepted (2026-05-22).** costrict is Apache-2.0 but the top contributors are Roo Code maintainers and the codebase has `.roo/`/`.rooignore`/`.roomodes` — same Cline-lineage VS Code extension shape BooCode rejected at kilocode (#20). The novel surface costrict ships is its **OpenSpec integration**, which is a separate repo. **OpenSpec is the real find:** it formalizes the spec-governed dispatch workflow Sam already uses (per-change folder with proposal/specs/design/tasks artifacts, slash commands per agent, artifact-lifecycle gates). Start by adopting just the directory structure for BooCode's own batch docs (zero-dep documentation reformat); evaluate full CLI adoption later. **Tracked for v1.13.x or v1.14**, not blocking v1.12.0.
+- **agents.md noted but not evaluated.** costrict's README acknowledges `agentsmd/agents.md` as a partner. The name and shape strongly suggest it's the canonical source of the AGENTS.md convention BooCode v1.12 already adopted. Worth a future drive-by to confirm, but not blocking anything.
+- **zeroshot accepted as Tier B pattern reference (2026-05-22).** MIT, multi-agent orchestration above coding-agent CLIs (Claude Code, Codex, OpenCode, Gemini CLI). **Sits at Paseo's layer, not BooCode's.** Five pattern lifts: complexity-classification conductor, blind-validation invariant (separate agent context verifies — doesn't see worker's history), crash-safe SQLite ledger, three-tier isolation taxonomy (none/worktree/Docker), JSON cluster templates. **The blind-validation invariant is the single most important architectural idea** zeroshot adds — fills the missing piece in plandex/OpenHands/cline patterns where the same agent writes and judges its own work. Lands at BooCode v1.13/v1.14 as an AGENTS.md design principle, then at v2.0+ as a real verify gate above pending-changes. **Separately:** zeroshot is a candidate Paseo-successor if Paseo ever needs replacement; that's a Paseo-scope decision, not BooCode-scope.
+- **toprank rejected (2026-05-22).** SEO/SEM domain — wrong category for BooCode. Sam runs developer infrastructure, not marketing sites. Skill format is the same one BooCode v1.12 already uses; no novel pattern.
+- **AgentLint accepted as high-value immediate-application reference (2026-05-22).** MIT, 31 evidence-backed repo-quality checks. Most useful catalog entry for *the present moment* — applies directly to every CLAUDE.md/AGENTS.md across Sam's homelab (BooCode, BooLab, HLH, indifferent-broccoli, paseo, etc.) without needing any code lift or version dependency. Specific data points from 265 versions of Anthropic's Claude Code system prompt are immediately actionable: trim emphasis-keyword density, target 60–120 line CLAUDE.md sweet spot, SHA-pin Actions, ensure `.env`/`CLAUDE.local.md` are gitignored. **Recommend a single audit pass session against BooCode's instruction files** before any further skill work lands. Optional plugin install for ongoing audits is a v1.12.x post-merge call.
+- **awesome-vibe-coding surveyed (2026-05-22).** 60+ tools across 10 sections. **No new catalog entries promoted from the list.** Already-covered items: Cline, Roo Code, Continue, Prompt Tower, Augment, aider, Codex CLI, Gemini CLI. Skipped on category: 18 Web Builders, 4 Editor/IDEs, mobile/desktop builders. **Real leads tracked for next review pass:** `block/goose` (multi-model local agent framework), `eyaltoledano/claude-task-master` (task decomposition algorithm), `ai-christianson/RA.Aid` and the underlying `LangGraph` framework (workflow graphs in production), `automata/aicodeguide` (AI-first methodology). Do not re-evaluate the rejected items.
+- **aaif-goose/goose (formerly block/goose) added as Tier B reference (2026-05-22).** Apache-2.0, 45.2k stars, recently moved to Linux Foundation's Agentic AI Foundation. Rust + TypeScript. Native ACP, 15+ providers including Ollama, MCP support for 70+ extensions. **Sits at Paseo's layer, not BooCode's.** Skip code (wrong stack); track as reference for ACP-protocol implementations and the multi-provider abstraction pattern. Ships `.claude/`, `.codex/`, `.cursor/` skill directories — confirms the cross-agent skill-emission pattern noticed in autohand/code-cli (#33).
+- **memovai/memov accepted as Tier B reference (2026-05-22).** MIT, Python. Shadow `.mem` timeline tracks prompts + context + plan + file changes at every agent interaction; zero pollution to `.git`. MCP-exposed. `validate_commit` MCP tool detects context drift between prompt and actual changes. **Direct match for BooCode's reviewer-architect pattern.** Lift the MCP-tool shape (`snap`, `mem_history`, `mem_jump`, `validate_commit`) as reference for v1.13+ `view_session_history` feature and v2.0+ verify gate. Don't vendor Python code into Fastify/TS BooCode.
+- **bhouston/mycoder rejected (2026-05-22).** MIT, TypeScript, 566 stars, **stale** (last release Mar 2025). Standard CLI coding agent — Claude/OpenAI/Ollama, MCP, parallel sub-agents. Functionally a less-mature opencode. Sam already uses opencode for this role. One UX pattern noted (Ctrl+M mid-stream corrections) but BooCode/opencode/Claude Code all have chat-based interruption. Skip.
+- **ai-christianson/RA.Aid accepted as Tier B pattern reference (2026-05-22).** Apache-2.0, Python, 2.2k stars. **Three-stage architecture (Research / Planning / Implementation) on LangGraph** with per-stage model routing (`--research-provider`, `--planner-provider`, `--expert-provider`) + "expert tool" called only when needed for hard reasoning. **Aligns directly with Sam's qwopus27b/qwen3-coder routing.** Lift the three-stage AGENTS.md design and expert-tool escape hatch at v1.14+; don't lift LangGraph (wrong stack); never enable `--cowboy-mode` equivalent (opposite of BooCode's no-autonomous-commit rule).
+- **Kirill89/reviewcerberus rejected as code, CoV logged as pattern (2026-05-22).** Closed-source Docker distribution (license not in registry). Multi-provider (Bedrock/Anthropic/Ollama/Moonshot), accepts `guidelines.md`, **Chain-of-Verification mode** to reduce false positives. CoV is the only takeaway — per-finding verification primitive, complementary to zeroshot's blind-validation (per-workflow #37) and bernstein's lineage chains (per-artifact #49). Stackable.
+- **autohandai/code-cli rejected (2026-05-22).** 56 stars, COMMERCIAL.md present (commercial license restriction likely). Standard ReAct CLI agent with no novel pattern vs opencode. Cross-agent skill emission (copies skills between `~/.claude/skills/`, `~/.codex/skills/`, `~/.autohand/skills/`) is the only interesting bit — same pattern goose (#41) does. Skip.
+- **Roo Code Boomerang Tasks accepted as Tier B pattern reference (2026-05-22, Sam-flagged).** Roo Code itself rejected (already covered via #20 kilocode and #35 costrict — VS Code/Cline lineage). Three architectural patterns lifted: **(1) Orchestrator with intentional capability restriction** — cannot read/write/MCP/shell, only delegates, preventing context poisoning. **(2) Down-pass/up-pass context discipline** — no implicit inheritance, parent passes context down via `new_task` message, subtask passes summary up via `attempt_completion` result only. **(3) Explicit precedence override clause** baked into subtask prompts. Together these sharpen zeroshot's blind-validation (#37) into a both-directions principle. Lands at v1.13/v1.14 as AGENTS.md design, v2.0+ as real delegation mechanics.
+- **eyaltoledano/claude-task-master pattern accepted, code rejected (2026-05-22).** **MIT + Commons Clause** makes BooCode (self-hosted developer chat) a competing product — no code vendoring. 25.7k stars, JS/TS. Three patterns worth lifting independently in BooCode's own MIT code: **(1) Tiered tool-loading via env var** (`TASK_MASTER_TOOLS=core|standard|all|custom`, 7/15/36 tools, ~5k/10k/21k tokens) — direct fit for `BOOCODE_TOOLS` at v1.12.x or v1.13. **(2) Three model roles** (main/research/fallback) — same pattern as RA.Aid (#44), complementary evidence. **(3) PRD-as-source-of-truth** at `.taskmaster/docs/prd.txt` formalizes Sam's spec-governed work convention.
+- **Dominic789654/agent-hub tracked, not lifted (2026-05-22).** Apache-2.0, Python 100% stdlib-only (no FastAPI/SQLAlchemy/Pydantic — zero supply chain surface), 1 star, v0.1.0 March 2026. Local-first multitask board for routing/observing code-assistant work across repos. SQLite queueing, dependency-aware dispatch, **human inbox**, dashboard at `/app`. **Architecturally what Paseo wants to grow into.** Too early to vendor; track for next pass. The stdlib-only constraint is a useful lens to evaluate BooCode/BooLab dependency footprint.
+- **sipyourdrink-ltd/bernstein tracked as compliance-grade reference (2026-05-22).** License needs verification before any lift (`/LICENSE` should be checked directly). 262 stars, Python. Same layer as zeroshot (#37) and agent-hub (#48), but with **audit-grade compliance** as differentiator: HMAC-chained audit log, signed agent cards (Ed25519/EdDSA + JCS), per-artifact lineage (producer + inputs + prompt SHA + model + cost), customer-key signing for DORA/NIS2/EU AI Act Article 12, air-gap deploy, deterministic scheduler, one git worktree per agent, cost-aware routing bandit. **Over-spec for Sam's current homelab work** but the right shape if BooCode ever needs to produce audit evidence. The **HMAC-chained audit log** is a small lift-friendly pattern even today.
+- **vorn-run/vorn rejected as code, pattern noted (2026-05-22).** MIT, Electron + TypeScript, 24 stars, alpha. Multi-agent grid UI for Claude Code/Copilot/Codex/OpenCode/Gemini. Each agent in its own PTY. Task queue + kanban + workflow automation + headless execution + inline diff review with structured-feedback-back-to-agent + worktree isolation + MCP server. **Wrong stack** (Electron desktop UI vs BooCode's Fastify/TS+React SPA). Pattern note: **PTY-per-agent + worktree-per-task + inline-diff-feedback-loop** is the canonical shape for multi-agent orchestration above real CLI agents; same architectural choice Paseo made.
+- **siropkin/budi accepted as tooling, not catalog entry (2026-05-22).** MIT, Rust, single 6MB binary, sub-millisecond hook latency. **WakaTime for Claude Code** — tracks tokens, costs, prompts, file activity, sub-agent spawns in local SQLite, dashboard at `localhost:7878/dashboard`. **Recommend immediate install** (`budi init --global`) for Claude Code session observability. The **5-hook Claude Code event taxonomy** (`SessionStart`, `UserPromptSubmit`, `PostToolUse`, `SubagentStart`, `Stop`) is the canonical reference and worth knowing when BooCode v2.0+ designs its own hook system.
+- **GeiserX/LynxPrompt tracked as architectural reference, code off-limits (2026-05-22).** **GPL-3.0 makes vendoring incompatible with BooCode's MIT licensing.** 27 stars, Next.js + PostgreSQL + Prisma. Self-hostable platform for managing AGENTS.md / CLAUDE.md / .cursor/rules / slash commands across **30+ AI assistant formats**. Single blueprint, export to N formats. Federated marketplace. The concept fits Sam's situation (5+ project CLAUDE.md/AGENTS.md files maintained separately) but the **manual AgentLint (#39) audit pass is the right ROI today** rather than adopting a full platform. If consolidation ever needed, reimplement the format-adapter pattern in MIT-licensed BooCode code, don't vendor.
+- **ShipWithAI/claude-code-mastery noted as docs reference (2026-05-22).** **CC BY-NC-SA 4.0** content + MIT code examples. 9 stars. Free 16-phase / 55-module / 136-lesson course on Claude Code workflows. **Two structural patterns worth borrowing:** (1) **7-block module structure** (WHY → CONCEPT → DEMO → PRACTICE → CHEAT SHEET → PITFALLS → REAL CASE) as a docs template; (2) **phase list as coverage checklist** to diff against Sam's own CLAUDE.md/AGENTS.md files — combine with AgentLint (#39) for a single audit pass. Don't redistribute content (NC license).
--- a/boocode_roadmap.md
+++ b/boocode_roadmap.md
@@ -1,172 +1,181 @@
 # BooCode v1.x — Roadmap

-Last updated: 2026-05-20
+Last updated: 2026-05-22
+
+> **Companion doc:** `boocode_code_review.md` holds the full external-repo inventory, lift rationale, and license analysis. This document is the canonical source for shipping state, version ordering, and what's planned vs. shipped.

 ## Overview

-BooCode is a standalone code-chat tool at `/opt/boocode/`. Read-only by design — pick a project, chat with a local LLM that has file-inspection tools, get streaming responses over WebSocket.
+BooCode is a **3-app monorepo** at `/opt/boocode/` (locked 2026-05-22):

-Live at `https://code.indifferentketchup.com` (Caddy → Authelia → Tailscale → `100.114.205.53:9500`).
+- **BooChat** (`apps/chat`, port `9500`, `code.indifferentketchup.com`) — read-only chat with file-inspection tools. The live thing. Pick a project, chat with a local LLM, get streaming responses over WebSocket. Will rename `boocode_db` → `boochat_db` when BooCoder lands.
+- **BooCoder** (`apps/coder`, port `9502`, `coder.indifferentketchup.com`) — write tools + external-CLI dispatch. **Planned, v2.0.** Both an in-process inference loop (with `pending_changes` table) AND ACP-dispatched external agents (opencode/goose) with PTY fallback (claude/pi/smallcode) — same surface, two execution paths.
+- **BooTerm** (`apps/booterm`, port `9501`) — PTY/tmux/xterm.js. **Live since May 2026.** Node 20 Alpine + node-pty + tmux + xterm.js. Tmux session per pane (`bc-<uuid>`), SSH-out works (openssh-client + gosu in the image). `/api/term/health` shares the existing `boocode_db`.
+
+Caddy → Authelia → Tailscale → `100.114.205.53` → 9500/9501/9502. Three apps, **one shared Postgres** (`boocode_db` → `boochat_db`).

 **Architectural commitments:**

- No embeddings. The model uses file-view tools (`view_file`, `list_dir`, `grep`, `find_files`) + sidecar analyzers (codecontext, codesight). Walked away from the RAG pipeline May 2026.
- Read-only in v1.x. Write tools land in BooCoder (separate container, post-v1.x).
- One Postgres (`boocode_db`), one frontend SPA, container-per-service for new capabilities.
+- **No embeddings.** Model uses file-view tools (`view_file`, `list_dir`, `grep`, `find_files`) + sidecar analyzers (codecontext, future codesight) + codecontext MCP tools. Walked away from the RAG pipeline May 2026.
+- **BooChat is read-only** through v1.x. Write tools land in BooCoder at v2.0.
+- **Mount strategy: blanket `/opt:rw`, permission gating at the write-tool layer.** Per-project scoping is policy, not mount. Path-guard correctness is the #1 test target for v2.0.
+- **External CLI agents (`opencode`/`claude`/`goose`/`pi`) live on the host, not in containers.** BooCoder shells out via local-exec PTY or ACP subprocess. Host install inherits Sam's existing `~/.opencode/`, `~/.claude/`, `~/.config/goose/` configs.
+- **Protocol roles locked (2026-05-22):** **BooChat = MCP client only** (read-only tool consumer, never enables write-capable MCP servers). **BooCoder = MCP client + MCP server + ACP client (host) + ACP agent (driveable)** — full matrix. BooCoder's ACP-client role replaces raw-PTY dispatch for ACP-capable agents (opencode `opencode acp`, goose `goose acp`); PTY fallback retained for claude/pi/smallcode.
+- **Strategic target: Paseo-equivalent dispatcher inside BooCode** (2026-05-22 pivot). Paseo (`getpaseo/paseo`) is AGPL-3.0 — incompatible with BooCode's MIT license and network-served deployment. Reproduce the architecture using only license-clean patterns. Primary architectural template: `Dominic789654/agent-hub` (Apache-2.0). Critical context-management primitive: Roo Code Boomerang Tasks pattern. Observation pattern: Claude Code hooks (siropkin/budi reference).

 External code lifted from / referenced in: see `boocode_code_review.md` for full inventory.

 -----

-## Shipped (status as of 2026-05-20)
+## Shipped (status as of 2026-05-22)

-| Version | Theme | Notes |
-|---|---|---|
-| v1.0 | Initial scaffold | live |
-| Batches 1–4.4 | Markdown, sidebar, panes, chats-inside-sessions, archive, fork/delete, header polish, settings drawer | merged |
-| v1.5 | resolveProjectPath, BOOTSTRAP_ROOT, vitest pin | merged |
-| v1.6, v1.6.1, v1.6.2 | Mobile pass + RightRail mobile drawer | merged |
-| v1.7 | Drag-drop file + paste-as-attachment | merged |
-| v1.8, v1.8.1, v1.8.2 | Settings drawer, git_status tool, WS reconnect, **per-turn budget reset + Continue affordance + CapHitSentinel** | merged |
-| v1.9.1 | Skills system (`/opt/skills/` + `skill_find`/`skill_use`/`skill_resource` tools + `/skill` slash command) | merged |
-| v1.9.7 | `ask_user_input` elicitation tool | merged |
-| **Batch 9 (Agents Tier 2)** | `AGENTS.md` + 6 builtin agents + AgentPicker in ChatInput toolbar + `sessions.agent_id` | **merged in `92bd3b1`**, included in v1.9.1/v1.9.7/v1.10.x tags |
-| v1.10.0 | BooTerm: separate container, xterm.js + node-pty + tmux | merged |
-| v1.10.1 | BooTerm-user (spawn as samkintop, login bash, Claude Code/opencode PATH) | merged |
-| v1.10.4, v1.10.5 | Mobile terminal + XML tool-call fallback parser | merged |
-| **v1.11.0** | **opencode-style compaction port** (auto-overflow, anchored summary, tail preservation) | merged |
-| v1.11.1 | Compaction follow-up (working indicator during compaction, unit tests, .bak cleanup) | merged |
-| v1.11.2 | ContextBar (persistent context-usage indicator) | merged |
-| v1.11.3 | `ctx_max` capture via `/upstream/<model>/props` (replaces dead `timings.n_ctx` read) | merged |
+|Version                |Theme                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |Tag                          |
+|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|
+|v1.0                   |Initial scaffold                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |—                            |
+|Batches 1–4.4          |Markdown, sidebar, panes, chats-inside-sessions, archive, fork/delete, header polish, settings drawer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |—                            |
+|v1.5                   |resolveProjectPath, BOOTSTRAP_ROOT, vitest pin                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |—                            |
+|v1.6, v1.6.1, v1.6.2   |Mobile pass + RightRail mobile drawer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |—                            |
+|v1.7                   |Drag-drop file + paste-as-attachment                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |—                            |
+|v1.8, v1.8.1, v1.8.2   |Settings drawer, git_status tool, WS reconnect, per-turn budget reset + Continue affordance + CapHitSentinel                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |—                            |
+|v1.9.1                 |Skills system (`/opt/skills/` + `skill_find` / `skill_use` / `skill_resource` + `/skill` slash command)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |`v1.9.1`                     |
+|v1.9.7                 |`ask_user_input` elicitation tool                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |`v1.9.7`                     |
+|Batch 9 (Agents Tier 2)|`AGENTS.md` + 6 builtin agents + AgentPicker in ChatInput toolbar + `sessions.agent_id`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |folded into `v1.9.1`/`v1.9.7`|
+|v1.10.0                |BooTerm: separate container, xterm.js + node-pty + tmux                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |`v1.10.0`                    |
+|v1.10.1                |BooTerm-user (spawn as samkintop, login bash, Claude Code/opencode PATH)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |`v1.10.1`                    |
+|v1.10.4, v1.10.5       |Mobile terminal + XML tool-call fallback parser                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |—                            |
+|v1.11.0                |opencode-style compaction port (auto-overflow, anchored summary, tail preservation)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |—                            |
+|v1.11.1                |Compaction follow-up (working indicator during compaction, unit tests, .bak cleanup)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |—                            |
+|v1.11.2                |ContextBar (persistent context-usage indicator above MessageList)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |—                            |
+|v1.11.3                |`ctx_max` capture via `/upstream/<model>/props` (replaces dead `timings.n_ctx` read)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |`v1.11.3`                    |
+|v1.11.5                |ContextBar inline next to agent picker; remove ChatContextPopover; default new sessions to no agent                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |—                            |
+|v1.11.6                |Doom-loop guard from opencode (3 identical tool calls → sentinel, abort recursion)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |—                            |
+|v1.11.7                |pathGuard secrets filter (continue.dev `DEFAULT_SECURITY_IGNORE_FILETYPES`)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |—                            |
+|v1.11.8                |web_search + web_fetch tools via SearXNG                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |—                            |
+|v1.11.9                |Manual redirect handling — re-run URL guard on each hop (SSRF hardening)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |—                            |
+|v1.11.10               |Stream-cap response body at 5MB, abort on overflow                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |`v1.11.x`                    |
+|v1.12.0                |codecontext sidecar (Go HTTP shim, NDJSON MCP framing, child.Wait supervisor) + container guidance (BOOCHAT.md/BOOCODER.md) + 7 vendored skills + system-prompt.ts extraction + mtime-watch cache + 8 codecontext tool wrappers + per-agent tool whitelists + .codecontextignore template + agents.ts ALL_TOOL_NAMES single-source-of-truth fix                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |`v1.12.0`                    |
+|v1.12.1                |Server-side workspace pane sync (`sessions.workspace_panes jsonb`) + 5-state status indicator overhaul (streaming/tool_running/waiting_for_input/idle/error) + startup hung-row sweep + stale `messages_status_check` constraint dropped + `detectSameNameLoop` reverted (dead code) + stop-handler writes `cancelled` status                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |`v1.12.1`                    |
+|v1.12.2                |Live tok/s + ctx_used display next to status indicator while streaming (frontend-only)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |`v1.12.2`                    |
+|v1.12.3                |Stale-stream banner — "Previous response didn't complete. [Retry] [Discard]" when streaming row > ~60s with no new tokens. `POST /api/chats/:id/discard_stale` backend endpoint                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |`v1.12.3`                    |
+|v1.12.4                |Refactor only — `inference.ts` (1700 LoC) split into `inference/` directory: `turn.ts`, `stream-phase.ts`, `tool-phase.ts`, `error-handler.ts`, `sentinel-summaries.ts`, `payload.ts`, `xml-parser.ts`, `sentinels.ts`, `budget.ts`, `types.ts`, `index.ts`. Shipped as rc1/rc2/rc3 → final. No behavior change. Lined up `stream-phase.ts` as the swap target for v1.13 AI SDK migration                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |`v1.12.4`                    |
+|**v1.13.0**            |**`message_parts` table** `(id, message_id, sequence, kind, payload jsonb, created_at)` with kinds `text/tool_call/tool_result/reasoning/step_start`. CHECK constraint, `(message_id, sequence)` unique + index. Dual-write at every site that wrote `tool_calls`/`tool_results` JSON (stream-phase finalize, skills × 2, messages.ts answer flow, chats.ts × 2). `ToolDef<T>` gained `category: 'read_only' | 'write'`. v1.x registry rejects write. Old JSON columns remain authoritative for reads. Strangler-fig phase 1                                                                                                                                                                                                                                                                                                                                                                                                                    |`v1.13.0`                    |
+|**v1.13.1-A**          |**AI SDK v6 install + streamCompletion adapter.** `ai@^6`, `@ai-sdk/openai-compatible@^2`. `provider.ts` wraps `createOpenAICompatible` against `config.LLAMA_SWAP_URL`. `streamCompletion` rewritten as adapter over `streamText`. XML fallback parser preserved for qwen3.6's inline `<tool_call>` emissions. **Patched mid-flight:** AI SDK v6 swallows abort signals silently — explicit `if (signal?.aborted) throw` after stream drain. Without it, stop button writes `complete` instead of `cancelled`. reasoning-delta counted + dropped (re-captured in -C). Known regression flagged: live mid-stream tps gone (single trailing publish; TODO for delta-cadence interpolation against `result.usage`)                                                                                                                                                                                                                                |(umbrella tag)               |
+|**v1.13.1-B**          |**`messages_with_parts` view** with COALESCE fallbacks against legacy JSON columns. Read sites switched: `chats.ts:427`, `messages.ts:95`, `ws.ts:27`, `payload.ts`, `compaction.ts`. Perf verified at 1ms for 42-message chat. `reasoning_parts` column added to the view (consumed in -C). API contract preserved. Parts become source of truth at read; JSON columns kept by dual-write only                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |(umbrella tag)               |
+|**v1.13.1-C**          |**`ask_user_input` correlation ported to parts.** `messages.ts:478/549` now JOINs `message_parts` on `payload->>'id'` and `payload->>'tool_call_id'`. Downstream call sites updated to `{message_id, payload}` shape. 404 fallback for pre-v1.13.0 history (acceptable scope). **Reasoning end-to-end:** `reasoning-delta` accumulated in `stream-phase.ts` adapter via `StreamResult.reasoning` (simpler than the brief's StreamPhaseState approach); `partsFromAssistantMessage` accepts optional `reasoning`, emits at seq 0; `finalizeCompletion` + `executeToolPhase` dual-write reasoning parts; `payload.ts` reads `reasoning_parts` from view, collapses into `OpenAiMessage.reasoning`; `toModelMessages` emits AI SDK `ReasoningPart` in assistant content array. Smoke: 361 chars reasoning at seq 0, 429 chars text at seq 1                                                                                                        |`v1.13.1` (`ac1a71f`)        |
+|**v1.13.3**            |**Cleanup bundle, 4 independent items.** (1) `ALTER DATABASE boocode SET statement_timeout = '30s'` — caps damage from query-plan regression on the view's nested subselects; documented in `schema.sql` since `ALTER DATABASE` can't run inside a DO block. (2) Alpha-sorted tool registry — `.sort((a, b) => a.name.localeCompare(b.name))` at `ALL_TOOLS` export; llama.cpp prompt cache hits on byte-identical prefixes, tool-order drift killed hit rate every turn. (3) Periodic 60s in-process sweeper marks `streaming` rows older than 5 min as `failed` and publishes `chat_status='idle'` so the UI dot drops — closes mid-session crash UX gap that the startup sweep (v1.12.1) only handled at boot. (4) `experimental_repairToolCall` wired through AI SDK v6 `streamText` — routes malformed tool calls to a logged passthrough instead of crashing the stream. Owed since v1.13.1-A. 173/173 tests pass (+1 alpha-ordering test)|`v1.13.3` (`a08d809`)        |
+|**v1.13.4**            |**Two-tier compaction prune.** `services/inference/prune.ts` with pure `selectPruneTargets` decision helper. Tier 1 hides stale `tool_result` parts via `message_parts.hidden_at` at the 20k-freed threshold (cheap, no inference call); tier 2 falls back to anchored summarize when prune alone isn't enough. Schema additions: `message_parts.hidden_at` column + partial index `ON (message_id) WHERE hidden_at IS NULL`. `messages_with_parts` view filters hidden parts so payload assembly never sees them. Avoids burning an inference round on every overflow. opencode-pattern half-shipped in v1.11.0 — this closes it.                                                                                                                                                                                                                                                                                                              |`v1.13.4` (`ec8593c`)        |
+|**v1.13.5**            |**opencode `truncate.ts` port — full tool output retrievable via opaque id.** New `services/truncate.ts` with `tr_<12 base32>` ids on tmpfs (`/tmp/boocode-truncations`, 0o700, 5MB cap matching `view_file`'s `MAX_FILE_BYTES`, 7-day TTL). Three exports: `storeTruncation`, `readTruncation`, `truncateIfNeeded` (wrap-or-passthrough helper). New `view_truncated_output(id)` tool retrieves the full content; model never sees the truncation dir (resolved server-side). Wired through 5 of 7 tool sites: `view_file`, `list_dir`, `web_fetch`, `codecontext_client`, plus alpha-sorted into `ALL_TOOLS` (count 19→20). `cleanupTruncations` piggybacks on the v1.13.3 60s sweeper (TTL pass + orphan reap via parts query on `payload->'output'->>'outputPath'`). grep and find_files deferred (need file_ops refactor to expose uncapped output). 186 tests (was 179, +7 in truncate.test.ts).                                          |`v1.13.5` (`f8fc5db`)        |
+|**v1.13.6**            |**Compaction head-assembly audit + reasoning fix.** Audit traced compaction's summary path post-v1.13.1-B read flip across three quadrants — Q1 view read (clean), Q2 parts shape (clean), Q3 reasoning render (FIX NEEDED). v1.13.1-C wired reasoning end-to-end into `inference/payload.ts` but missed the compaction read site, silently degrading summary quality for reasoning-channel models (qwen3.6) since -C shipped. Fix: `CompactionMessage` extended with `reasoning_parts` field; SELECT pulls `reasoning_parts` from `messages_with_parts`; `buildHeadPayload` (now exported for tests) prefixes assistant content with `<reasoning>...</reasoning>\n\n<content>` when reasoning is present; standalone `<reasoning>` tag for tool-call-only turns; omits tag when reasoning is null or empty. 4 new render-branch tests (190 total).                                                                                                |`v1.13.6` (`81d837c`)        |
+|**v1.13.7** (uncommitted)|**Stability bundle, 5 fixes from production observability gap.** (1) `provider.ts` — `includeUsage: true` on `createOpenAICompatible`. `@ai-sdk/openai-compatible` defaults this false, omitting `stream_options.include_usage` from request body; llama-swap never emitted the usage block, so `result.usage.inputTokens/outputTokens` resolved `undefined` and `tokens_used`/`ctx_used` landed NULL in **every** assistant row since v1.13.1-A. Surfaces tokens in StatsLine + persisted DB rows going forward (no backfill). (2) `MessageList.tsx:48` — `hasText = m.content.trim().length > 0`. AI SDK v6 streaming occasionally emits a leading `\n` text-delta on tool-call-only turns; the literal newline passed `length > 0` and rendered an empty bubble + ActionRow between each tool call. (3) `MessageBubble.tsx:654` — same trim on `hasContent` (defensive, no-tool-calls path). (4) `payload.ts:64` — `buildMessagesPayload` skips assistant rows with `status='failed'` AND `status='complete' && empty content && no tool_calls`. Without this, a trailing empty/failed assistant + the next attempt's placeholder produced "Cannot have 2 or more assistant messages at the end of the list" rejections from the upstream API. (5) `budget.ts:11` — `BUDGET_NO_AGENT = 30` (was 15). No-agent mode shares the read-only-agent toolset at runtime; the cautious 15-cap was forward-looking for write tools that haven't landed. 190/190 tests still pass.|—                            |
+
+**v1.13.2 deliberately deferred** — keep the dual-write through v1.13.4–v1.13.11 as rollback insurance. Drop legacy columns last.

 -----

-## In flight / queued
+## In flight / next (v1.13.x cleanup line)

-| Version | Theme | Status |
-|---|---|---|
-| ~~v1.11.4~~ | ~~Per-turn budget + Continue affordance~~ | **CANCELLED** — already shipped in v1.8.2 |
-| **v1.11.5** | ContextBar relocate (above agent-picker row), thicker, always-visible, remove ChatContextPopover | **dispatched** |
-| v1.11.6 | Doom-loop guard from opencode (3 identical tool calls → sentinel, abort recursion) | drafted |
-| v1.11.7 | pathGuard secrets filter (continue.dev's `DEFAULT_SECURITY_IGNORE_FILETYPES`) | drafted |
-| v1.11.x | Tag consolidation point (everything since v1.11.0) | queued |
+Five more single-dispatch batches before the strangler-fig closes. Each ships independently with its own smoke and rollback surface. **Do not fold.** Order is locked:

-----
+### v1.13.8 — system-prompt prefix stability verify-and-measure (REFRAMED, 2026-05-22)

-## Major work after v1.11.x
+**Original plan:** add a `system_prompt_cache` DB table keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated.

-| Version | Theme | LoC est. |
-|---|---|---|
-| **v1.12** | codecontext sidecar + tool output truncation + repair tool call (Integration 1 + 3 from May review, fused) | ~600 |
-| v1.13 | Phase B groundwork — parts table + AI SDK adoption + per-tool `read_only`/`write` tagging | ~1500 |
-| v1.14 | Phase C — outer agent loop (multi-step until non-tool finish, AGENTS.md `steps` field, reasoning as part type) | ~800 |
-| v1.15 | Phase D — permission ruleset + MCP client (lays foundation for BooCoder) | ~600 |
-| v1.16 | Batch 11b — codesight repo_health (call graph, circular deps, dead code) | ~400 |
-| **v2.0** | Batch 14 — BooCoder pending changes (new container, write tools, plandex pattern) | ~1200 |
-| v2.1 | Batch 15 — BooCoder runtime isolation (per-session Docker sandbox, OpenHands pattern) | ~600 |
-| v2.x | Batch 16/17 — Multi-provider LLM (optional, pi-ai) and Workflow graphs (far future, agent-framework concepts) | tbd |
+**Why reframed:** recon disproved the premise. `apps/server/src/services/system-prompt.ts:buildSystemPrompt` already runs over mtime-cached inputs at the file layer:

-----
+- BOOCHAT.md / BOOCODER.md cached in `system-prompt.ts:25` (`cachedGuidance`, keyed by mtime)
+- global + per-project AGENTS.md cached in `agents.ts:245` (`safeStat` pattern, 60s TTL)
+- `session.system_prompt` / `project.default_system_prompt` are DB scalars (byte-stable until edited)
+- BASE_SYSTEM_PROMPT is a hardcoded template with `${projectPath}` interpolation

-## Roadmap doc deviations and corrections
+Output assembly is a microsecond pure-string concat with no I/O. Skills aren't in the prefix (runtime discovery via `skill_find`). Tools live in a separate request body field, alpha-sorted by v1.13.3. **In theory the prefix is already byte-stable across turns; nothing has measured it.**

-This roadmap was significantly out of sync with reality until 2026-05-20. Key corrections folded in:
+**New scope — instrumentation only, no cache:**

-1. **Batch 9 (Agents Tier 2) is done**, not "next up." Shipped as commit `92bd3b1`, included in v1.9.1 forward. The original "Track A: Batch 9 next" recommendation was correct but the doc never got updated.
-2. **v1.6.2 merged.** No longer "in flight."
-3. **Batch 5 (fork/delete), Batch 6 (drag-drop), Batch 7 (settings drawer), Batch 8 (web search), Batch 10 (BooTerm) all shipped**, scattered across the v1.6–v1.10 version line. Original "Track A polish then agents" plan was abandoned; work happened opportunistically.
-4. **v1.11.0 was a major unplanned addition** — opencode-style compaction (auto-overflow detection + anchored rolling summary + tail preservation). This is NOT a batch from the old roadmap. It opened a new patch line (v1.11.x) of small follow-ups in front of the original Batches 11–17.
-5. **Batch 11 (codecontext sidecar) moves to v1.12.** Bundles with truncation and repair-tool-call lift (both from opencode) since they share concerns and the `tool_choice='required'` confirmation makes repair-tool-call viable.
-6. **Phase B (parts table + AI SDK + tool-call lifecycle) becomes v1.13.** This absorbs the old Batch 13 (append-only event log) — same outcome (typed message parts), different mental framing.
-7. **Phase C and Phase D are new** (numbered v1.14/v1.15). They originate from the opencode integration analysis, not from the original 17-batch plan. Phase C delivers the outer agent loop with explicit step boundaries. Phase D delivers the permission ruleset + MCP client needed for codecontext to be useful and for BooCoder to gate writes.
-8. **BooCoder (v2.0/v2.1)** is the second-major-version line. New container, new safety story (pending changes + per-session Docker sandbox). Maps to original Batches 14/15.
+1. SHA-256 fingerprint of `buildSystemPrompt`'s output logged per turn at `level=info`, msg `prefix-fingerprint`, with project_id / agent_id / session_id / prefix_hash / prefix_length / mtime fields.
+2. Module-level `Map<sessionId, lastHash>` observer. On hash change for a known session → emit `prefix-drift` at `level=warn` with `prev_hash`, `new_hash`, and a field-level `changed_inputs` diff.
+3. Unit-level byte-stability assertion in `system-prompt.test.ts`: two consecutive `buildSystemPrompt` calls with the same inputs return byte-identical strings.

-----
+**Decision criterion:** smoke 5 turns in a fresh session. 5 identical hashes + zero drift logs → close v1.13.8 as no-op, **drop the DB cache plan permanently**, move to v1.13.9. If drift surfaces → characterize the failure mode in a follow-up batch (the answer may not be a cache at all).

-## v1.11.x patches in detail
+**Doctrine:** matches the v1.13.6 audit pattern. Don't add infrastructure without a proven cache miss. The v1.12.0 mtime caches at the input layer plus alpha tool ordering at the request body layer already address the load-bearing cache-stability surfaces.

-### v1.11.0 — opencode-style compaction port ✅
+**Dispatch brief:** `handoff_v1.13.8_prefix_verify.md`.

-**What shipped:** Auto-detection of context overflow (`isOverflow(usage, model)`) triggers compaction on the *next* user turn. Compaction preserves the last 2 turns verbatim and produces an anchored Markdown summary (8-section template lifted verbatim from opencode `compaction.ts`) that replaces older head messages. Summary is rolling — each new compaction updates the prior summary, not stacks. Schema additions: `messages.compacted_at`, `messages.summary`, `messages.tail_start_id`, `chats.needs_compaction`. WS `compacted` frame fires sonner toast on completion.
+**Estimated:** ~95 LoC (system-prompt.ts + small `getAgentsMtimes` accessor in agents.ts + 3 new tests).

-**Key divergences from opencode:** Per-chat (not per-session) compaction state because BooCode history is per-chat. UUID `tail_start_id` not BIGINT. No `parent_id` on messages. Context limit comes from `messages.ctx_max` (last-known `n_ctx`), not a `model.context_limit` field.
+### v1.13.9 — compaction overflow trigger formula

-### v1.11.1 — Compaction follow-up ✅
+opencode pattern: `0.85 * ctx_max` early trigger (not at 100% saturation). Reduces tail-loss risk and gives compaction a safer window. Tiny change but tied to v1.13.4's tier logic — sequence matters.

-Working-state `chat_status: working/idle` frames around the LLM call inside `compaction.process()`. 24 new vitest cases for the six pure functions (`usable`, `isOverflow`, `estimate`, `turns`, `select`, `buildPrompt`). 7 `.bak-v1.11` files deleted.
+**Lift source:** `anomalyco/opencode` `session/overflow.ts`.

-### v1.11.2 — ContextBar ✅
+**Estimated:** ~30 LoC.

-New `ContextBar.tsx` rendering above MessageList. Shows `{used} / {max} ({pct}%)` with color tiers computed against `max - 20k` reserve (matches `compaction.usable()`): muted <60%, amber 60-80%, orange 80-95%, red ≥95%. Tooltip shows "Auto-compaction at ~N%". Mobile breakpoints: `< 380px` shows "Ctx" + numbers; `380-639px` adds parenthetical %; `≥ 640px` shows full "Context" label.
+### v1.13.10 — per-tool token cost accounting

-### v1.11.3 — ctx_max capture fix ✅
+Rolling average per tool, surfaced in AgentPicker tooltip + agent-pick decisions. Backend tracks `(tool_name, prompt_tokens_in, completion_tokens_out)` per call; surfaces a 100-call rolling mean. Frontend reads it for tool-cost hints. **Depends on v1.13.7's `includeUsage` fix** — without real token numbers in DB rows, the rolling average is empty.

-Discovered the dead code at `inference.ts:479-481` and `compaction.ts:300` reading `parsed.timings.n_ctx` never fired — llama-server emits `prompt_n / predicted_n / *_ms / *_per_second` in timings but NOT `n_ctx`. New `model-context.ts` module fetches `GET /upstream/<model>/props` with 3s timeout, positive cache (no TTL), 60s negative cache. Wired into all 4 ctx_max write sites (3 in inference.ts, 1 in compaction.ts). 12 new vitest cases. 7 historical rows backfilled to `ctx_max = 262144` (single-day backfill, only qwen3.6-35b-a3b-mxfp4 in use).
+**Estimated:** ~250 LoC.

-### v1.11.4 — CANCELLED
+### v1.13.11 — WebSocket frame typing

-Original scope: per-turn budget reset + Continue affordance + CapHitSentinel card. Recon revealed all three are already shipped (v1.8.2 timestamps in inference.ts comments). Dead version slot.
+Zod schemas validated both ends. Catches the recurring class of bug that drove the 2026-05-21 debugging spike (silent protocol drift). Upfront work that pays back every time the protocol changes. `chat_status`, `usage`, `parts_appended`, `session_workspace_updated`, `tool_running` — every frame gets a Zod schema, every send/receive site validates.

-### v1.11.5 — ContextBar relocate (DISPATCHED)
+**Estimated:** ~300 LoC.

-Relocate ContextBar from above MessageList to above the agent-picker row. Bump height from ~4px bar to ~10-12px. Always-visible (zero-state when no assistant messages + use `model_context_limit` from v1.11.3 cache). Remove `ChatContextPopover` entirely (redundant signal; mobile-hostile).
+### v1.13.12 — skills audit pass (NEW, 2026-05-22)

-### v1.11.6 — Doom-loop guard (QUEUED)
-
-Detect 3 identical tool calls in a row within one turn (same name + same args via JSON.stringify). On detection: abort tool-call recursion, insert `metadata.kind='doom_loop'` sentinel, trigger summary turn via existing `runCapHitSummary` path. New `DoomLoopSentinel.tsx` component (no Continue button — looping shouldn't be retried with same tools). Per-turn sliding window, scoped to current turn's tool-call accumulator.
-
-**Lift source:** opencode `processor.ts`, `DOOM_LOOP_THRESHOLD = 3` constant.
-
-### v1.11.7 — pathGuard secrets filter (QUEUED)
-
-Extend pathGuard with `DEFAULT_SECURITY_IGNORE_FILETYPES` from continue.dev `core/indexing/ignore.ts`. Three-tier matcher: exact basenames (`credentials`, `secrets.yml`), extensions (`.env`, `.pem`, `.key`, `.crt`, etc.), prefix patterns (`id_rsa`, `id_dsa`, `id_ecdsa`, `id_ed25519`). Blocked files appear in `list_dir` and `find_files` results with `(blocked)` annotation. `view_file` returns `{ error: 'blocked_secret_file', ... }`. `grep` cannot read blocked file contents. No override mechanism in v1.x (use host shell).
-
-**Why it matters:** `/opt:/opt:ro` mount currently exposes `boolab/.env`, `dubdrive/users.json`, `authelia/state`, every other service's secrets to any tool past path validation. Cheap close on that surface area.
-
-----
-
-## v1.12 — codecontext sidecar + truncation + repair tool call
-
-Three lifts fused because they share concerns:
-
-1. **codecontext sidecar** — new container, single-instance, path-addressed multi-project. Mount `/opt/projects:/workspace:ro`. 8 tools wired as static `ToolDef` wrappers in `apps/server/src/services/tools/codecontext/` (one file per tool). HTTP client to `http://codecontext:8765`. New module `apps/server/src/services/codecontext_bridge.ts` translates `project_id` → `/workspace/<relative>/` paths.
-
-2. **Tool output truncation** — opencode `truncate.ts` pattern. Cap at 2000 lines / 50KB. Larger outputs: write full content server-side, return preview + opaque `id`. New tool `view_truncated_output(id)` retrieves full content by server-mapped id. **No pathGuard exception** for `/tmp` directory — the opaque-id approach avoids exposing a writable filesystem location to the model. Only codecontext outputs need truncation; native tools (view_file 200 lines, grep 200 results, list_dir 500 entries, find_files 200 results) already cap reasonably.
-
-3. **`experimental_repairToolCall` equivalent** — when model emits malformed tool call (JSON parse fails or Zod validation fails), return a synthetic tool result instead of an error: `{ error, raw_args, tool_name, hint: 'Retry with valid JSON arguments.' }`. Model self-corrects on next step. Add one line to system prompt instructing self-correction on malformed-args results. Confirmed working precondition: `tool_choice: "required"` accepted by llama-swap (verified 2026-05-20 against qwen3.6-35b-a3b-mxfp4).
-
-**Hand-roll, not AI SDK adoption.** AI SDK migration deferred to v1.13.
-
-**AGENTS.md updates:** Each of the 6 builtin agents gets a curated codecontext tool whitelist:
- Architect: all 8
- Debugger: `search_symbols`, `get_dependencies`
- Code Reviewer: `get_file_analysis`
- Refactorer: `get_semantic_neighborhoods`, `get_dependencies`
- Security Auditor: `get_file_analysis`, `search_symbols`, `get_dependencies`
- Prompt Builder: none (no structural reasoning relevance)
-
-**Dependencies:** v1.11.x merged. No others.
-
-**Estimated:** 600 LoC across 3-4 dispatches under the v1.12 umbrella.
-
-----
-
-## v1.13 — Phase B: parts table + AI SDK + per-tool tagging
-
-**Goal:** typed message parts replace JSON blobs on `messages.tool_calls` / `tool_results`. Adopt Vercel AI SDK `streamText`. Tag tools as `read_only` or `write` at definition time.
+**Goal:** apply the rules→recipes split (per Codeminer42 activation-gap data: plain skills invoke 6% in clean multi-turn, `CLAUDE.md`/`AGENTS.md` is 100% present) to BooCode's 7 vendored v1.12 skills. Sort each into: (a) move to `AGENTS.md` as always-true rule, (b) keep as recipe invoked via `/skill <name>`, (c) move bulky context into `references/` flat subdirectory inside the skill, (d) delete (Claude already does it reliably).

 **Scope:**

-1. Schema: new `message_parts` table (`id, message_id, kind, payload JSONB, sequence`). Kinds: `text`, `tool_call`, `tool_result`, `reasoning`, `step_start`. The `messages` table becomes header-only.
-2. Inference loop rewritten on AI SDK `streamText`. `streamCompletion` becomes a thin wrapper. Native AI SDK `experimental_repairToolCall` replaces v1.12's hand-rolled version.
-3. Tool registry: `ToolDef<T>` gains `category: 'read_only' | 'write'` field. BooCode v1.x rejects any `write` tool at registry time (defense in depth for the BooCoder split). Alpha-sort tool list before sending to model (prompt-cache stability).
-4. Reasoning content (`reasoning_content` from Qwen3.6) captured as its own part type instead of dropped or inlined.
+1. **Audit each of the 7 vendored skills against the 4-way split.** Most workflow-rule content ("always do X before Y", "never do Z") moves to `AGENTS.md` since it should be 100% present. Recipe content ("here's how to scaffold a component", "here's the release checklist") stays as skill, gets `context: fork` if heavy.
+1. **Adopt Anthropic best-practices conventions** for any skills that remain after audit: gerund names (`scaffolding-components`, not `component-helper`), SKILL.md ≤500 lines, references one level deep, third-person imperative voice, MCP tool references in `ServerName:tool_name` format, no Windows-style paths, no time-sensitive info, consistent terminology, no "voodoo constants."
+1. **Run each remaining skill through the 4-step validation protocol** from `mgechev/skills-best-practices` (Discovery → Logic → Edge Case → Architecture Refinement) using a fresh Claude chat per step. Prompts are paste-ready; ~10 minutes per skill.
+1. **Install `skillgrade` on Sam's host** (`npm i -g skillgrade`). For each remaining skill, write a minimal `eval.yaml` with 2–3 tasks and run `skillgrade --smoke` (5 trials, ~5 min) to confirm the skill triggers when expected and produces correct output. **Likely outcome: some skills show 0–20% trigger rate — confirms they belong in AGENTS.md, not as skills.**
+1. **Document the rules→recipes split as a BooCode convention** in `BOOCODER.md` / `BOOCHAT.md`. Future-proofs against re-adding workflow rules as skills.

-**Migration risk:** non-trivial. inference.ts is ~1400 lines with custom XML fallback, SSE parsing, compaction integration. Plan dedicated cutover window. Compaction.ts must update to assemble head from parts.
+**Lift sources:**

-**Replaces:** Original Batch 13 (append-only event log) — same outcome, different vocabulary.
+- `blog.codeminer42.com/stop-putting-best-practices-in-skills/` — empirical 6%/33%/66%/100% invocation-rate data with Vercel-style multi-turn methodology. The activation-gap framing.
+- `mgechev/skills-best-practices` (25 stars, MIT) — 4-step validation protocol with paste-ready prompts. Directory structure conventions.
+- `mgechev/skillgrade` (132 stars, MIT) — agent-agnostic skill eval framework. `eval.yaml` task+grader schema. Smoke/reliable/regression presets.
+- `platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices` — canonical Anthropic standard. 500-line ceiling, gerund naming, progressive disclosure patterns, MCP tool reference format, verification checklist.

-**Dependencies:** v1.12 merged.
+**Dependencies:** none (the 7 v1.12 skills already exist; this is an audit pass on shipped material). Can ship at any point in the v1.13.x line.
+
+**Estimated:** zero code changes, ~one evening of audit work, plus skillgrade install. Per-skill eval.yaml authoring is ~30 min per skill including the 4-step validation. Total roughly 5–6 hours of focused work for all 7 skills.
+
+### v1.13.2 — drop legacy columns (final phase of strangler-fig)
+
+**Wait at least one week of production traffic on v1.13.1 before shipping.** The dual-write is rollback insurance. Drop the columns and that rollback is gone.
+
+**Verification query before shipping:**
+
+```sql
+SELECT
+  COUNT(*) FILTER (WHERE m.tool_calls IS NOT NULL AND NOT EXISTS (
+    SELECT 1 FROM message_parts p WHERE p.message_id = m.id AND p.kind = 'tool_call'
+  )) AS missing_tool_call_parts,
+  COUNT(*) FILTER (WHERE m.tool_results IS NOT NULL AND NOT EXISTS (
+    SELECT 1 FROM message_parts p WHERE p.message_id = m.id AND p.kind = 'tool_result'
+  )) AS missing_tool_result_parts
+FROM messages m
+WHERE m.created_at > '2026-05-22'::timestamptz;
+```
+
+Both columns must read 0.
+
+**Scope (~150 LoC, mostly deletions):**
+
+1. Remove dual-write from every v1.13.0 site: `tool-phase.ts` (3 sites), `finalizeCompletion`, `skills.ts` (2 sites), `messages.ts` answer flow, `chats.ts` (fork). Keep only the parts write.
+1. Simplify `messages_with_parts` view — drop COALESCE fallbacks since legacy columns are about to disappear.
+1. `ALTER TABLE messages DROP COLUMN tool_calls, DROP COLUMN tool_results`.
+1. Remove `tool_calls`/`tool_results` fields from `Message` API type. API boundary unchanged (frontend already reads parts-derived values).
+1. Drop the stale `messages_status_check` cleanup DO block from v1.12.1 schema if still present.
+1. Update test fixtures in `inference.test.ts` and `compaction.test.ts` to construct parts instead of inline `tool_calls: null, tool_results: null` literals. ~30 fixture rewrites.
+
+After v1.13.2 ships, tag the umbrella `v1.13` on the same commit (or on -C — Sam's call).

 -----

@@ -177,126 +186,448 @@ Three lifts fused because they share concerns:
 **Scope:**

 1. Outer loop continues until model returns non-tool finish OR step cap hit. Step ≠ tool call: one step can contain multiple tool calls in parallel.
-2. `agent.steps ?? Infinity` per-agent step cap. AGENTS.md gains `steps:` field. Refactorer `steps: 5`, Architect `steps: 20`, etc.
-3. Step-boundary events (`step_start`, `step_finish`) explicit in the parts stream. Per-step snapshot for revert (planned for BooCoder; backend-only in v1.14).
-4. Doom-loop guard (v1.11.6) migrates from "abort recursion" to "raise within loop iteration." Same predicate, different control flow.
+1. `agent.steps ?? Infinity` per-agent step cap. AGENTS.md gains `steps:` field. Refactorer `steps: 5`, Architect `steps: 20`, etc.
+1. Step-boundary events (`step_start`, `step_finish`) explicit in the parts stream. Per-step snapshot for revert (planned for BooCoder; backend-only in v1.14).
+1. Doom-loop guards (v1.11.6) migrate from "abort recursion" to "raise within loop iteration." Same predicate, different control flow.
+
+**Lift sources:**
+
+- `anomalyco/opencode` `session/prompt.ts` `runLoop()` outer agent loop
+- `anomalyco/opencode` `agent.steps` per-agent step cap
+- AGENTS.md extensions for `steps`, `output_schema` (Qodo agent.toml pattern), `exit_expression` (Qodo pattern), `execution_strategy` (Qodo plan/act)
+- **Reference:** RA.Aid three-stage Research/Planning/Implementation as AGENTS.md design principle; expert-tool escape hatch pattern (most subtasks on routine model, escalate to qwopus27b only when needed)
+- **Reference:** Roo Code Boomerang Tasks — orchestrator-with-capability-restriction pattern. Adopt as AGENTS.md design principle (orchestrator role can call only dispatch tools, no file reads / MCP / shell).

 **Dependencies:** v1.13 merged.

+**Estimated:** ~800 LoC.
+
 -----

-## v1.15 — Phase D: permission ruleset + MCP client
+## v1.14.x-mcp — single-server MCP-client proof-of-concept (NEW, 2026-05-22)
+
+**Goal:** validate the MCP-client loop end-to-end against one real MCP server before committing to the full opencode `mcp/index.ts` port at v1.15. Small, throwaway-if-needed, slots between v1.14 and v1.15 without disrupting either.
+
+**Scope:**
+
+1. Add a hardcoded MCP client (single server) to BooChat. Initial target: **Context7** (Sam already uses it via opencode, so the config is known to work). Remote HTTP transport at `https://mcp.context7.com/mcp` with optional `CONTEXT7_API_KEY` header.
+1. Use the official `@modelcontextprotocol/sdk` TypeScript client. No SSE transport yet (deferred to v1.15). Stdio transport not needed for Context7.
+1. Tool discovery on startup: `tools/list`. Tools surface in BooChat alongside `view_file`/`grep`/etc., prefixed `context7_*` to avoid collisions.
+1. **Read-only invariant guard:** the client must reject any MCP tool whose `annotations.readOnly` is false (or absent). Fail-closed. This is BooChat-specific defense-in-depth — v1.15 lifts this restriction for BooCoder.
+1. Per-server `enabled` flag in `agents.ts`. No glob patterns yet.
+1. **No OAuth.** Context7 supports an API key header; that's it for v1.14.x. OAuth lands in v1.15.
+
+**What this proves:**
+
+- MCP protocol loop works end-to-end against a real server in BooCode's Fastify backend.
+- Tool-discovery → tool-list → tool-call → result-render → context-budget accounting all hold.
+- Read-only enforcement at the client layer is sound.
+- Config schema shape is right before v1.15 commits to the opencode-compatible JSON config.
+
+**What this does NOT do:**
+
+- No SSE transport. (v1.15.)
+- No OAuth flow. (v1.15.)
+- No multiple servers. (v1.15.)
+- No per-agent server allow/deny. (v1.15.)
+
+**Dependencies:** v1.13 merged (parts table for tool-call/tool-result emission).
+
+**Estimated:** ~150 LoC.
+
+**Skip-condition:** if v1.14 finishes and Sam wants to leap straight to v1.15, fold this into the early steps of v1.15.
+
+-----
+
+## v1.14.x-html — HTML artifacts in BooChat (NEW, 2026-05-22)
+
+**Goal:** integrate Thariq Shihipar's "HTML > Markdown for agent output at length" pattern (`claude.com/blog/using-claude-code-the-unreasonable-effectiveness-of-html`, May 20 2026) into BooChat. Bias the model toward HTML for outputs >100 lines: information density, visual clarity, interactive controls (sliders/knobs/SVG diagrams/side-by-side comparisons), shareability. BooChat already renders into a webview, so the surface fit is unusually good.
+
+**Scope:**
+
+1. **Model-side prompting** (no code change yet, just AGENTS.md guidance):
+- Add HTML-bias rule to global `AGENTS.md`: "For outputs >100 lines, default to a self-contained `<!DOCTYPE html>...</html>` artifact unless the user explicitly asks for Markdown. For outputs <100 lines or for short conversational replies, stay in Markdown."
+- Reasoning shown in the rule: HTML carries diagrams, tabs, illustrations, code-with-syntax-highlighting, interactive controls, mobile-responsive layouts. Markdown is restrictive at any length.
+- Cite Thariq's blog post in the rule comment so future audit passes know where it came from.
+1. **Detection at the BooChat backend.** In `apps/chat/services/inference/stream-phase.ts` post-processing: detect any assistant text part starting with `<!DOCTYPE html>` (case-insensitive, whitespace-trimmed) — or wrapped in a fenced ` ```html` block — and tag it as an HTML artifact. Emit a new part kind `html_artifact` into `message_parts` (CHECK constraint update). Payload: `{html_content, char_count, title}`. Title pulled from `<title>` tag or first `<h1>` if available.
+1. **Three render targets (Sam's pick: "3 with a download"):**
+- **Inline preview** in the chat stream: small sandboxed iframe (~400px tall), renders the artifact next to where it was streamed. Default size, click-to-expand.
+- **Open in pane**: button on the inline preview opens the artifact in a full-height pane in BooChat's existing workspace splitter, alongside the file viewer and BooTerm. Pane is dismissible. Pane state persisted via `sessions.workspace_panes jsonb` (the v1.12.1 schema already supports this).
+- **Download**: button writes the artifact to `/opt/<project>/.boocode/artifacts/<slug>-<unix-timestamp>.html` (path-guarded same as native write tools), surfaces an OS download link via the existing file-serving path. Filename slug derived from artifact title.
+1. **Security stance — locked 2026-05-22:** the iframe is sandboxed with `sandbox="allow-scripts allow-clipboard-write allow-downloads"`. **Crucially, omit `allow-same-origin`** so the artifact has its own opaque origin and cannot read BooChat's cookies, Authelia session, or DOM. Backend serves the iframe content via `srcdoc=...` inline (not `src=`) so no separate URL exists to disclose. CSP header on the iframe response: `default-src 'none'; script-src 'unsafe-inline'; style-src 'unsafe-inline'; img-src data: blob:; font-src data:; connect-src 'none'`. The `connect-src 'none'` is the key clause — artifacts can't `fetch()`, can't open WebSockets, can't ping a tracking pixel, can't exfiltrate. JS runs (so Thariq's interactive knobs/sliders/copy-as-prompt buttons work) but nothing else network-touching does. **None of Thariq's blog examples need the relaxed permissions** — they're all client-side.
+1. **Frontend rendering** (`apps/web/src/components/HtmlArtifactPart.tsx`):
+- Inline preview: `<iframe srcdoc={html_content} sandbox="allow-scripts allow-clipboard-write allow-downloads" className="..." />` with the strict-sandbox attributes above.
+- "Open in pane" button: dispatches workspace-pane action with `{type: 'html_artifact', message_part_id, html_content}`.
+- "Download" button: POST to new endpoint `/api/chats/:id/artifacts/:part_id/download` which writes to disk (path-guarded) and returns the absolute path or pre-signed URL for the existing static-file serving route.
+1. **No artifact persistence beyond the chat.** Artifacts live in `message_parts.payload->>'html_content'` with the chat. Downloads go to `/opt/<project>/.boocode/artifacts/` and are user-managed from there. No separate artifacts table.
+1. **Token-budget guard.** Single artifact can be at most 1MB of HTML in `message_parts.payload`. Larger triggers a streaming abort with a friendly error: "Artifact exceeded 1MB; consider splitting into multiple files or reducing inline assets."
+1. **No `web-artifacts-builder` skill vendor.** That skill (`anthropics/skills/web-artifacts-builder`) is built for Claude.ai's runtime with Vite + Parcel + tspaths + html-inline toolchain. BooChat has no shell execution surface. The pattern transplants; the toolchain doesn't. Treat the skill's "avoid AI slop" design principles (no excessive centered layouts, no purple gradients, no uniform rounded corners, no Inter font) as conventions inlined in the HTML-bias AGENTS.md rule. The init/bundle scripts are out of scope.
+
+**Lift sources:**
+
+- `claude.com/blog/using-claude-code-the-unreasonable-effectiveness-of-html` (Thariq Shihipar, May 20 2026) — the pattern, the use-case taxonomy (specs/code-review/design/reports/custom editors), the design philosophy.
+- HTML iframe sandbox spec (web platform standard, no license issues).
+- `anthropics/skills/web-artifacts-builder` — design-principle reference only ("avoid AI slop" rules). **Do not vendor the toolchain.**
+
+**Dependencies:** v1.13 merged (`message_parts` table is where artifacts live). Independent of v1.14 (outer loop) and v1.14.x-mcp (MCP PoC). Can ship in any order relative to those.
+
+**Estimated:** ~400 LoC. Roughly half backend (detection + part-kind extension + download endpoint + path-guard integration), half frontend (HtmlArtifactPart component + pane integration + download button wiring).
+
+**Schema addition:**
+
+- `message_parts.kind` CHECK constraint adds `'html_artifact'` to the allowed set.
+
+**Skip-condition:** none — independent batch, ships clean any time after v1.13. Highest user-visible payoff of any v1.13.x/v1.14.x batch (transforms what the model can produce, not just how the backend handles it).
+
+-----
+
+## v1.15 — Phase D: permission ruleset + full MCP client

 **Goal:** wildcard permission ruleset (opencode `evaluate.ts` pattern) and a proper MCP client implementation. Foundation for BooCoder to gate writes; immediate value for codecontext to be re-wired as a real MCP server.

 **Scope:**

 1. Wildcard rule matcher: `{ permission, pattern, action: 'allow' | 'deny' | 'ask' }`. Last-match-wins. Per-agent rulesets layer under per-session rulesets.
-2. MCP client implementation: SSE transport, `tools/list` discovery, `tools/call` invocation. codecontext sidecar gets re-pointed from static wrappers (v1.12) to real MCP. New connectors become a config-only addition.
-3. UI: permission-ask flow when a tool requires `ask` action. Modal or inline card with Allow once / Allow always / Deny.
-4. v1.x stays read-only by default (no `write` tools in the registry yet).
+1. **Full MCP client implementation:** stdio (local subprocess) + SSE (remote HTTP) transports, `tools/list` discovery, `tools/call` invocation, OAuth via Dynamic Client Registration (RFC 7591), per-server enabled flag, **glob patterns for per-agent tool whitelisting** (matching opencode's `tools` config shape).
+1. codecontext sidecar gets re-pointed from static wrappers (v1.12) to real MCP. New connectors become a config-only addition.
+1. UI: permission-ask flow when a tool requires `ask` action. Modal or inline card with Allow once / Allow always / Deny. Reuses v1.9.7 elicitation surface.
+1. BooChat stays read-only by default — the read-only invariant guard from v1.14.x carries forward (defense-in-depth even with the ruleset).
+1. **Config shape: match opencode's JSON schema near-verbatim** so any opencode user can copy `mcp` blocks from `~/.opencode/config.json` into BooCode unchanged. Schema is not copyrightable; matching it is pure interoperability.
+
+**v1 MCP scope limit (security):** local-stdio MCP servers and Context7-style API-key remote servers only. **Remote MCP servers requiring OAuth tokens are deferred** until BooCode has a real secret-storage primitive (sops-encrypted entries, Vault sidecar, or OS keyring). Reason: MCP OAuth tokens are bearer credentials for third-party services; storing them in plaintext PostgreSQL inside the BooCode DB widens the attack surface significantly if Authelia is bypassed. v1.15 ships the OAuth code path but the config schema rejects OAuth servers until secret storage lands.

 **Absorbs:** Original Batch 12 (tool approval + plan/act mode) — same outcome via permission rules instead of mode enum.

+**Lift sources:**
+
+- `anomalyco/opencode` `permission/evaluate.ts` wildcard ruleset
+- `anomalyco/opencode` `mcp/index.ts` MCP client (SSE transport, tools/list, tools/call, OAuth RFC 7591)
+- `cline/cline` plan/act invariant — read-only mode pattern (absorbed)
+
 **Dependencies:** v1.13 merged (parts table for permission events). Independent of v1.14.

+**Estimated:** ~600 LoC.
+
 -----

-## v1.16 — Batch 11b: codesight repo_health
+## v1.16 — codesight repo_health

-Call graph, circular dependency detection, dead code flagging. Port `analyze.mjs` from spirituslab/codesight. New tool `repo_health(project_id)`. In-process Node (not sidecar). Cache results keyed by `(project_id, file_hashes_sig)`.
+Call graph, circular dependency detection, dead code flagging. Port `analyze.mjs` from `spirituslab/codesight`. New tool `repo_health(project_id)`. In-process Node (not sidecar). Cache results keyed by `(project_id, file_hashes_sig)` in new `repo_health_cache` table.
+
+Independent batch — ships clean any time after v1.13. Low leverage unless Sam actually uses the dead-code / circular-dep output.
+
+**Lift source:** `spirituslab/codesight` `analyze.mjs`. Drop VS Code wrapper.

 **Dependencies:** v1.12 merged (can reuse codecontext parse output where overlapping).

-----
-
-## v2.0 — BooCoder pending changes
-
-New container `boocoder` at `100.114.205.53:9502`. Owns write tools (`edit_file`, `create_file`, `delete_file`, `apply_pending`, `rewind`). Edits queue in `pending_changes` table; nothing touches disk until `/apply`. Per-pane diff UI with Approve/Reject. BooCode chat stays read-only (`/opt:/opt:ro`).
-
-**Lift source:** plandex pending-changes data model.
-
-**Dependencies:** v1.13 (parts) + v1.15 (permissions).
+**Estimated:** ~400 LoC.

 -----

-## v2.1 — BooCoder runtime isolation
+## v2.0 — BooCoder: pending changes + dual execution paths + ACP host + MCP server
+
+**Major version bump.** New app `apps/coder/` inside the existing monorepo (not a separate repo). Lands together with the `boocode_db` → `boochat_db` DB rename and the per-app subdomain split (`code.indifferentketchup.com` → BooChat, `coder.indifferentketchup.com` → BooCoder).
+
+**Three protocol roles in one surface:**
+
+1. **MCP client (write-capable allowed).** Inherits the v1.15 client unchanged. BooCoder can enable write-capable MCP servers (`@modelcontextprotocol/server-filesystem` write tools, git commit MCP servers, etc.). All MCP writes route through the same `pending_changes` queue as native writes. Per-task allow/deny means dispatched tasks can have a different MCP roster than the interactive shell.
+1. **MCP server (BooCoder's own primitives).** New `apps/coder/services/mcp_server.ts` exposes `boocoder.create_task`, `boocoder.list_pending_changes`, `boocoder.apply`, `boocoder.reject`, `boocoder.dispatch_external_agent`, `boocoder.list_worktrees` as MCP tools. Stdio transport for local consumers (Sam's `opencode` in Termius), HTTP for remote (deferred until OAuth + secret storage). **This is what makes external opencode-on-the-host BooCoder-aware.**
+1. **ACP client (host).** Replaces the raw-PTY dispatch path for ACP-capable agents. Spawns `opencode acp` and `goose acp` as JSON-RPC stdio subprocesses. Native session lifecycle, mid-session model/mode switching, file-operation events surfaced as diffs in the BooCoder UI, terminal events that route into BooTerm, permission prompts answered via real dialogs. **MCP servers configured in BooCoder are auto-forwarded to the dispatched ACP agent** (per goose docs — `context_servers` is the field name). One MCP config drives every dispatched agent.
+
+**Two execution paths, same surface (the answer to the May 18 "1 and 2 full featured" question):**
+
+### Path A — in-process write-tool inference loop (Option B / native)
+
+- New write tools: `edit_file`, `create_file`, `delete_file`, `apply_pending`, `rewind`.
+- Edits queue in `pending_changes (id, session_id, file_path, diff TEXT, status, created_at)`. Nothing touches disk until `/apply`.
+- Per-pane diff UI with Approve/Reject.
+- Path-guard layer (`apps/coder/services/path_guard.ts`) enforces per-project scoping using the v1.15 permission wildcard ruleset. Blanket `/opt:rw` mount, policy at the tool layer. **Highest-priority test target: fuzz the path-guard against every traversal-attack pattern, including MCP-served filesystem writes.**
+
+**Lift source:** `plandex-ai/plandex` pending-changes data model and diff/apply/rewind UX vocabulary.
+
+### Path B — ACP/PTY dispatch to external CLI agents (Option A / dispatch)
+
+- New tool `dispatch_external_agent(agent: 'opencode'|'claude'|'goose'|'pi', model: string, task: string, worktree: string)`.
+- **Primary path: ACP subprocess** for agents that support it (opencode `opencode acp`, goose `goose acp`). JSON-RPC over stdio. Native session/tool/file/terminal events.
+- **Fallback path: raw PTY** for claude/pi/smallcode via `node-pty` with `cwd = /opt/<project>` or a `git worktree add /tmp/booworktrees/<session-id>` worktree per dispatch.
+- Dispatch worker checks `available_agents.supports_acp` at runtime and picks the right transport. Same task table, same project registry, same pending-changes flow.
+- Captures stdout/stderr/exit-code into PostgreSQL stream tables (PTY path) or maps ACP events to the parts taxonomy (ACP path). WebSocket events surface to all three React surfaces.
+- One worktree per active dispatched session.
+- User picks per task via UI dropdown at task creation, or the in-process loop calls `dispatch_external_agent` itself.
+
+**Lift sources:**
+
+- `Dominic789654/agent-hub` (Apache-2.0) — task DAG schema, dispatcher worker, project registry, human inbox. **Primary architectural template.**
+- `getpaseo/paseo` (AGPL-3.0, **design only — no code lift**) — daemon+clients architecture, `--worktree feature-x` flag, `paseo run/ls/attach/send` CLI verb shape, `/handoff` `/loop` `/orchestrator` skills concept.
+- Roo Code Boomerang Tasks pattern — orchestrator capability restriction + down-pass/up-pass context discipline (`new_task` message, `attempt_completion` result, no implicit inheritance) + explicit precedence override clause.
+- `covibes/zeroshot` blind-validation invariant — verify gate runs in separate agent context that only sees the diff and acceptance criteria, not the producing conversation.
+- **ACP spec** (`agentclientprotocol.com`) — local-subprocess ACP via stdio JSON-RPC. Remote ACP (HTTP/WS) is still work-in-progress per the spec maintainers; v2.0 uses stdio only.
+- **Goose ACP docs** (`goose-docs.ai/docs/guides/acp-clients/`) — `context_servers` auto-forward pattern. Critical: one MCP config drives every dispatched agent.
+
+### Shared infrastructure between A and B
+
+- `tasks` table (id, project_id, template_id, parent_task_id, state, input, output_summary, dependencies, agent, model, worktree_path, cost, started_at, ended_at)
+- `task_templates` table (reusable spec → task instantiations)
+- `pipelines` table + `pipeline_runs` (ordered template invocations)
+- `available_agents` table (name, install_path, version, supports_acp, supports_mcp_client, last_probed_at) — populated by startup probe (`which opencode && opencode --version`, etc.)
+- `human_inbox` view (state IN ('blocked', 'failed', 'needs_human'))
+- Worker process `boocoder-dispatcher` (systemd unit alongside Fastify): picks ready tasks, dispatches via A or B (and within B, ACP or PTY), captures output, marks state.
+- New `boocode` CLI as a thin WebSocket/HTTP client against the BooCoder API. Verbs: `boocode run`, `boocode ls`, `boocode attach <id>`, `boocode send <id>`. Mirrors Paseo's UX, license-clean implementation.
+- BooCoder-internal MCP server (see role 2 above) registered on the Fastify server alongside the existing HTTP/WS endpoints. Stdio transport for opencode-in-Termius; HTTP transport gated on OAuth + secret storage.
+
+**MCP server eval requirement:** run BooCoder's internal MCP server through the **anthropics `mcp-builder` skill's 10-question evaluation framework** before shipping. Ten independent, read-only, complex questions with verifiable answers in XML format. If the eval doesn't pass, the MCP server isn't shippable.
+
+**Dependencies:** v1.13 (parts table) + v1.14 (outer loop + step boundaries for revert snapshots) + v1.14.x (MCP-client PoC) + v1.15 (full MCP client + permissions for path-guard policy).
+
+**Estimated:** ~1500 LoC for Path A + Path B + shared schema, plus ~400 LoC for the MCP-server role, plus ~300 LoC for the ACP-client role. Multiple sub-versions: v2.0.0 native + ACP, v2.0.1 MCP server, v2.0.2 polish.
+
+-----
+
+## v2.1 — BooCoder runtime isolation (optional)

 Per-session Docker sandbox spawned by BooCoder on first write. Only project path mounted, not `/opt`. Idle-timeout 30 min. Standard OpenHands runtime contract: HTTP API inside container, BooCoder calls in.

-**Lift source:** OpenHands V1 runtime pattern.
+**Skip-condition:** if the v2.0 path-guard layer holds up under fuzzing + a few months of production use, runtime isolation becomes optional hardening rather than necessary defense. Track but don't commit.
+
+**Lift source:** `OpenHands/OpenHands` V1 runtime pattern.

 **Dependencies:** v2.0.

+**Estimated:** ~600 LoC.
+
+-----
+
+## v2.2 — BooCoder as ACP agent (driveable from external editors)
+
+**Goal:** expose `boocoder acp` so Zed, JetBrains, Avante.nvim, CodeCompanion.nvim can drive BooCoder as their agent. Outbound exposure of the BooCoder write-tool surface to ACP-compatible editors.
+
+**Scope:**
+
+1. New ACP server entry point: `boocoder acp` reads JSON-RPC over stdio, exposes BooCoder's task primitives as ACP sessions.
+1. BooCoder UI features remain optional: editor drives session via ACP; pending-changes queue still gates writes; user can approve/reject from either BooCoder's web UI or the editor's permission dialog (whichever responds first).
+1. Same auth model as the rest of BooCoder — editor must be reachable on the Tailscale mesh, or BooCoder is invoked with a short-lived token.
+
+**Why this is v2.2, not v2.0:** outbound ACP-agent role is cheap once the inbound ACP-client side is implemented (same protocol library, server side), but it's a *different product surface* — driving BooCoder from external editors. Ship it after BooCoder's own surface stabilizes.
+
+**Lift source:** `zed-industries/codex-acp` (Apache-2.0) as a server-side ACP reference implementation.
+
+**Dependencies:** v2.0 + v2.1 (recommended; ACP-driven sessions inside a sandbox are stronger).
+
+**Estimated:** ~400 LoC.
+
 -----

 ## v2.x — Optional / far future

+- **Verify gate above pending-changes** — `augmentcode/augment-swebench-agent` majority-vote ensembler pattern (K candidate diffs → ranker model picks winner). JSONL schema only, no code lift. Combine with zeroshot blind-validation invariant. v2.0+ optional batch.
+- **PR-resolver tool** — `qodo-ai/qodo-skills` PR-resolver state machine (fetch issues → batch/interactive fix → inline reply). BooCoder v2.0+.
+- **Record/replay LLM harness for tests** — `qodo-ai/qodo-cover` pattern (hashed prompt → fixture YAML). Re-implement in Vitest, don't vendor (AGPL). v1.13+ test infrastructure.
+- **HMAC-chained audit log** — `sipyourdrink-ltd/bernstein` pattern. Small lift, adds tamper-evident session history. v1.13+ optional.
+- **Tiered tool loading** — `eyaltoledano/claude-task-master` pattern (env var: `core` / `standard` / `all`). ~30 LoC in `agents.ts`. Pattern-only lift (claude-task-master is MIT + Commons Clause; reimplement). v1.13.x or v1.14.
+- **Spec directory structure** — `Fission-AI/OpenSpec` `openspec/changes/<name>/{proposal,specs,design,tasks}.md` shape for BooCode's own batch docs. Zero-dep documentation reformat, replaces ad-hoc `boocode_batchN.md` convention. v1.13.x or v1.14.
+- **`view_session_history` MCP tool** — `memovai/memov` `snap`/`mem_history`/`validate_commit` shape. Reference design for v1.13+ session-history feature.
+- **`taste-skill` anti-slop ban list** — vendor `Leonxlnx/taste-skill` SKILL.md after diff against existing `frontend-design` skill. Real value at v2.0+ when BooCoder generates frontend code (DubDrive, BooLab, Fathom).
+- **AgentLint audit pass** — manual review of BooCode's own CLAUDE.md/AGENTS.md/BOOCHAT.md/BOOCODER.md using `0xmariowu/AgentLint`'s 31 evidence-backed checks. Trim emphasis-keyword density, hit 60–120 line sweet spot, SHA-pin Actions, ensure `.env`/`CLAUDE.local.md` are gitignored. One-evening pass, immediate ROI. Optional plugin install at v1.12.x post-merge for ongoing audits.
+- **`budi` install (Sam's host)** — `siropkin/budi` Claude Code 5-hook observer (`SessionStart`/`UserPromptSubmit`/`PostToolUse`/`SubagentStart`/`Stop`). Local SQLite, sub-ms hook latency, dashboard at `localhost:7878`. Not a BooCode lift — install globally for Claude Code session observability.
 - **Multi-provider LLM** (pi-ai pattern): Only if a concrete need for Anthropic / OpenAI / Mistral direct surfaces. llama-swap covers everything today.
 - **Workflow graphs** (microsoft/agent-framework concepts): Multi-agent coordination. Conceptual reference only. Realistically a v3.x topic.
+- **Secret storage primitive (prerequisite for remote OAuth MCP servers).** Pick between: sops-encrypted entries in PostgreSQL, HashiCorp Vault sidecar, or OS-level keyring on `ubuntu-homelab` accessed via a thin service. Unblocks remote OAuth MCP servers in BooCode generally. v2.x or earlier if a remote OAuth server (Sentry, Atlassian, etc.) becomes urgent.

 -----

 ## Architecture target state

-### Containers
+### Containers (post-v2.0)

-| Container | Port | Mount | Purpose | Status |
-|---|---|---|---|---|
-| `boocode` | `100.114.205.53:9500` | `/opt:/opt:ro` | Chat + read-only tools + SPA | Live |
-| `boocode_db` | `127.0.0.1:5500` | `boocode_pgdata` volume | Postgres 16-alpine | Live |
-| `booterm` | `100.114.205.53:9501` | `/opt/repos:/opt/repos:rw` | Terminals (tmux + node-pty) | Live (v1.10.0) |
-| `codecontext` | `:8765` (internal) | `/opt/projects:/workspace:ro` | MCP server for architect tools | v1.12 |
-| `boocoder` | `100.114.205.53:9502` | per-session sandbox | Write tools | v2.0 |
+|Container                      |Port                 |Mount                        |Purpose                                                                 |Status                |
+|-------------------------------|---------------------|-----------------------------|------------------------------------------------------------------------|----------------------|
+|`boochat` (was `boocode`)      |`100.114.205.53:9500`|`/opt:/opt:ro`               |Read-only chat + SPA host + MCP client                                  |Live (renames at v2.0)|
+|`booterm`                      |`100.114.205.53:9501`|`/opt:/opt`                  |PTY/tmux terminal sessions                                              |**Live (May 2026)**   |
+|`boocoder`                     |`100.114.205.53:9502`|`/opt:/opt:rw` (policy-gated)|Write tools + ACP host + MCP client + MCP server + external-CLI dispatch|v2.0                  |
+|`boochat_db` (was `boocode_db`)|`127.0.0.1:5500`     |`boocode_pgdata` volume      |Postgres 16-alpine (shared by all three)                                |Live (renames at v2.0)|
+|`codecontext`                  |`:8765` (internal)   |`/opt/projects:/workspace:ro`|MCP server for architect tools                                          |**Live (v1.12.0)**    |
+
+### Caddy routing target (post-v2.0)
+
+```
+code.indifferentketchup.com         → boochat   :9500   (SPA + chat API + MCP client)
+coder.indifferentketchup.com        → boocoder  :9502   (SPA + write API + MCP client + MCP server HTTP)
+coder.indifferentketchup.com/mcp    → boocoder  :9502   (BooCoder MCP server endpoint, when remote-MCP unlocked)
+term.indifferentketchup.com         → booterm   :9501   (or routed under code.*/term/)
+```

 ### Schema additions by version

 - **v1.11.0:** `messages.compacted_at`, `messages.summary`, `messages.tail_start_id`, `chats.needs_compaction`
 - **v1.11.7:** none (pathGuard logic, no DB)
- **v1.12:** none (codecontext is stateless on disk; truncation uses in-memory id→path map with TTL cleanup)
- **v1.13:** `message_parts` table; `messages` becomes header-only
+- **v1.12.0:** none (codecontext stateless; truncation in-memory id-map with TTL cleanup)
+- **v1.12.1:** `sessions.workspace_panes jsonb` (workspace sync); drop deprecated `session_panes` table; drop stale `messages_status_check` constraint
+- **v1.13.0:** `message_parts (id, message_id, sequence, kind, payload jsonb, created_at)` + unique `(message_id, sequence)` + `kind` CHECK; `ToolDef.category` field (TS type, not DB)
+- **v1.13.1-B:** `messages_with_parts` view with COALESCE fallbacks
+- **v1.13.3:** `ALTER DATABASE boocode SET statement_timeout = '30s'` (op step, documented in schema.sql; doesn't survive volume reset)
+- **v1.13.4:** `message_parts.hidden_at TIMESTAMPTZ` column + partial index `(message_id) WHERE hidden_at IS NULL`; `messages_with_parts` view filters hidden parts
+- **v1.13.5:** none (tmpfs id-map stored on disk under `BOOCODE_TRUNCATION_DIR`; no schema)
+- **v1.13.6:** none (compaction read-side change; `CompactionMessage` extended in TS, not DB)
+- **v1.13.7:** none (provider config + 4 frontend/payload guards + budget constant, no schema change)
+- **v1.13.8 (planned):** none — verify-and-measure batch, instrumentation only; drops the originally-planned `system_prompt_cache` table since recon proved input-layer mtime caches already achieve prefix stability
+- **v1.13.9 (planned):** none (compaction overflow trigger is a constant change in `services/compaction.ts`, no DB)
+- **v1.13.10 (planned):** `tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)` — rolling 100-call window
+- **v1.13.2 (planned):** drop `messages.tool_calls`, `messages.tool_results`; simplify `messages_with_parts` view
 - **v1.14:** `agents.steps` column (or AGENTS.md parser extension; no DB if file-only)
- **v1.15:** `permissions` table, `agent_permissions` join, `session_permissions` join
+- **v1.14.x-mcp (NEW):** none — single-server MCP-client PoC is config-only at first, no schema change
+- **v1.14.x-html (NEW):** `message_parts.kind` CHECK constraint extended with `'html_artifact'` value
+- **v1.15:** `permissions` table, `agent_permissions` join, `session_permissions` join, `mcp_servers (name, type, transport, url_or_command, enabled, config_hash, last_probed_at)` registry
 - **v1.16:** `repo_health_cache (project_id, file_hashes_sig, payload JSONB, created_at)`
- **v2.0:** `pending_changes (id, session_id, file_path, diff TEXT, status, created_at)`
+- **v2.0:** `pending_changes (id, session_id, file_path, diff TEXT, status, created_at)`; `tasks`, `task_templates`, `pipelines`, `pipeline_runs`; `available_agents (name, install_path, version, supports_acp, supports_mcp_client, last_probed_at)`; `human_inbox` view; DB rename `boocode_db` → `boochat_db`
+- **v2.2:** none (`boocoder acp` is a new entry point, not a schema change)

 -----

-## Lift sources (summary)
+## Lift sources (headline table)

-Full inventory in `boocode_code_review.md`. Headline items:
+Full inventory and rationale in `boocode_code_review.md`. Headline items below; `anomalyco/opencode` is canonical (not `sst/opencode` — correction 2026-05-22).

-| Source | Used for | Where |
-|---|---|---|
-| **`sst/opencode`** (MIT, TS) | **Compaction algorithms** | **v1.11.0 (shipped)** |
-| `sst/opencode` (MIT, TS) | Doom-loop guard | v1.11.6 |
-| `sst/opencode` (MIT, TS) | `repairToolCall`, truncate.ts, MCP client, permission evaluate, runLoop | v1.12/v1.13/v1.14/v1.15 |
-| `continuedev/continue` (Apache-2.0) | `DEFAULT_SECURITY_IGNORE_FILETYPES` | v1.11.7 |
-| `nmakod/codecontext` (MIT, Go) | Architect: codebase map sidecar | v1.12 |
-| `spirituslab/codesight` (MIT-ish, TS) | Architect: repo health analyzer | v1.16 |
-| `Aider-AI/aider` (Apache-2.0) | Fallback `.scm` grammars | v1.12 (fallback) |
-| `cline/cline` (Apache-2.0) | Plan/Act pattern (absorbed into v1.15 permissions) | v1.15 |
-| `plandex-ai/plandex` (MIT) | Pending-changes data model | v2.0 |
-| `OpenHands/OpenHands` (MIT) | Sandbox runtime contract | v2.1 |
-| `aimasteracc/tree-sitter-analyzer` (MIT) | Outline-first patterns | v1.12 (alt) |
-| `earendil-works/pi` (MIT) | Multi-provider LLM | v2.x (optional) |
-
-**Original Batch 13 (event log from OpenHands) replaced** by v1.13 (parts table). Same outcome, different framing.
+|Source                                                                          |License                                 |Used for                                                                                                                  |Where                                         |
+|--------------------------------------------------------------------------------|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
+|`anomalyco/opencode`                                                            |MIT, TS                                 |Compaction algorithms (`session/compaction.ts` + `session/overflow.ts`)                                                   |v1.11.0 ✅                                     |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |Doom-loop guard (`session/processor.ts` `DOOM_LOOP_THRESHOLD=3`)                                                          |v1.11.6 ✅                                     |
+|`continuedev/continue`                                                          |Apache-2.0                              |`DEFAULT_SECURITY_IGNORE_FILETYPES`                                                                                       |v1.11.7 ✅                                     |
+|`nmakod/codecontext`                                                            |MIT, Go                                 |Architect: codebase map sidecar (8 MCP-shaped tools, static-wrapped)                                                      |v1.12.0 ✅                                     |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |AI SDK v6 adoption + `streamText` swap + ReasoningPart shape                                                              |v1.13.1 ✅                                     |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |Parts-message taxonomy (text/tool_call/tool_result/reasoning/step_start)                                                  |v1.13.0 ✅                                     |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |`experimental_repairToolCall` via AI SDK v6                                                                               |v1.13.3 ✅                                     |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |Two-tier compaction prune (`message_parts.hidden_at` + tier logic)                                                        |v1.13.4 ✅                                     |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |`tool/truncate.ts` truncation + outputPath pattern (adapted: opaque id)                                                   |v1.13.5 ✅                                     |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |0.85×ctx_max overflow trigger formula                                                                                     |v1.13.9 (planned)                             |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |`session/prompt.ts` `runLoop()` outer agent loop + `agent.steps` cap                                                      |v1.14                                         |
+|**Anthropic MCP SDK (TypeScript)**                                              |**MIT**                                 |**MCP client, single-server PoC**                                                                                         |**v1.14.x-mcp**                               |
+|**`claude.com/blog/using-claude-code-the-unreasonable-effectiveness-of-html`**  |**(blog, pattern only)**                |**HTML-output bias rule + use-case taxonomy**                                                                             |**v1.14.x-html**                              |
+|**`anthropics/skills/web-artifacts-builder`**                                   |**MIT (design-principle reference)**    |**"Avoid AI slop" conventions inline in AGENTS.md**                                                                       |**v1.14.x-html**                              |
+|**`mgechev/skills-best-practices`**                                             |**MIT (pattern)**                       |**4-step skill validation protocol with paste-ready prompts**                                                             |**v1.13.12 (skills audit)**                   |
+|**`mgechev/skillgrade`**                                                        |**MIT**                                 |**Agent-agnostic skill eval framework (eval.yaml + smoke/reliable/regression presets)**                                   |**v1.13.12 (skills audit) + ongoing**         |
+|**`blog.codeminer42.com/stop-putting-best-practices-in-skills/`**               |**(blog, pattern only)**                |**Rules→recipes split: skills 6% invoke vs AGENTS.md 100% present**                                                       |**v1.13.12 (skills audit)**                   |
+|**`platform.claude.com/docs/.../agent-skills/best-practices`**                  |**(docs, canonical)**                   |**500-line ceiling, gerund naming, progressive-disclosure patterns, MCP `ServerName:tool_name` format**                   |**v1.13.12 + all future skills**              |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |`permission/evaluate.ts` wildcard ruleset                                                                                 |v1.15                                         |
+|`anomalyco/opencode`                                                            |MIT, TS                                 |`mcp/index.ts` MCP client (stdio + SSE, tools/list, tools/call, OAuth RFC 7591)                                           |v1.15                                         |
+|`Aider-AI/aider`                                                                |Apache-2.0                              |Fallback `aider/queries/tree-sitter-*.scm` grammars                                                                       |v1.12 (fallback)                              |
+|`cline/cline`                                                                   |Apache-2.0                              |Plan/Act invariant (absorbed into v1.15 permissions)                                                                      |v1.15                                         |
+|`spirituslab/codesight`                                                         |MIT-ish                                 |Repo health analyzer (`analyze.mjs`)                                                                                      |v1.16                                         |
+|`plandex-ai/plandex`                                                            |MIT                                     |Pending-changes data model + diff/apply/rewind UX                                                                         |v2.0                                          |
+|`Dominic789654/agent-hub`                                                       |Apache-2.0                              |**Task DAG schema, dispatcher worker, project registry, human inbox** — primary architectural template for v2.0 dispatcher|v2.0                                          |
+|`getpaseo/paseo`                                                                |AGPL-3.0 (**design only, no code lift**)|Daemon+clients arch, CLI verb shape, –worktree flag, three skills concept                                                 |v2.0 / v2.x                                   |
+|**`agentclientprotocol.com` spec + `@zed-industries/agent-client-protocol` SDK**|**Apache-2.0**                          |**ACP client (host) — replaces raw-PTY dispatch for opencode/goose**                                                      |**v2.0**                                      |
+|**anthropics/skills `mcp-builder`**                                             |**MIT**                                 |**MCP server build workflow + 10-question evaluation framework**                                                          |**v2.0 (BooCoder MCP server)**                |
+|**`zed-industries/codex-acp`**                                                  |**Apache-2.0**                          |**ACP server-side reference for `boocoder acp`**                                                                          |**v2.2**                                      |
+|Roo Code: Boomerang Tasks                                                       |Apache-2.0 (pattern only)               |Orchestrator capability restriction + down-pass/up-pass context discipline                                                |v1.14 (AGENTS.md) → v2.0 (real delegation)    |
+|`covibes/zeroshot`                                                              |MIT (pattern only)                      |Blind-validation invariant + complexity-classification conductor                                                          |v1.14 (AGENTS.md) → v2.0 (verify gate)        |
+|`OpenHands/OpenHands`                                                           |MIT                                     |Sandbox runtime contract                                                                                                  |v2.1                                          |
+|`qodo-ai/agents`                                                                |MIT                                     |`agent.toml` schema (output_schema, exit_expression, execution_strategy)                                                  |v1.14                                         |
+|`qodo-ai/qodo-cover`                                                            |AGPL-3.0 (re-implement, don't vendor)   |Record/replay LLM response harness                                                                                        |v1.13+ tests                                  |
+|`qodo-ai/qodo-skills`                                                           |MIT                                     |PR-resolver state machine + provider-CLI adapter pattern                                                                  |v2.0+                                         |
+|`augmentcode/augment-swebench-agent`                                            |MIT                                     |Majority-vote ensembler (K diffs → ranker → winner) + JSONL schema                                                        |v2.0+ optional                                |
+|`eyaltoledano/claude-task-master`                                               |MIT+Commons Clause (pattern only)       |Tiered tool loading via env var + three model roles                                                                       |v1.13.x / v1.14                               |
+|`Fission-AI/OpenSpec`                                                           |permissive (verify)                     |`openspec/changes/<name>/{proposal,specs,design,tasks}.md` structure for batch docs                                       |v1.13.x / v1.14                               |
+|`0xmariowu/AgentLint`                                                           |MIT                                     |31 evidence-backed checks for CLAUDE.md/AGENTS.md quality                                                                 |Immediate manual pass; v1.12.x optional plugin|
+|`Leonxlnx/taste-skill`                                                          |MIT                                     |Anti-slop ban list + 3-dial parameterization pattern                                                                      |v2.0+ (BooCoder frontend output)              |
+|`RA.Aid` (ai-christianson)                                                      |Apache-2.0 (pattern only)               |Three-stage Research/Planning/Implementation + expert-tool escape hatch                                                   |v1.14 (AGENTS.md)                             |
+|`memovai/memov`                                                                 |MIT (pattern only)                      |`.mem` shadow timeline + `snap`/`validate_commit` MCP tool shape                                                          |v1.13+ history tool design; v2.0+ drift gate  |
+|`sipyourdrink-ltd/bernstein`                                                    |(verify)                                |HMAC-chained audit log primitive                                                                                          |v1.13+ optional                               |
+|`aimasteracc/tree-sitter-analyzer`                                              |MIT                                     |Outline-first patterns (`trace_impact` tool)                                                                              |v1.12 (alt) / unscheduled                     |
+|`earendil-works/pi`                                                             |MIT                                     |Multi-provider LLM (`pi-ai`)                                                                                              |v2.x (optional)                               |
+|`siropkin/budi` (tooling, not lift)                                             |MIT                                     |Claude Code 5-hook observer for Sam's host workflow                                                                       |Immediate (install globally)                  |
+|**`aaif-goose/goose`**                                                          |**Apache-2.0**                          |**ACP agent (`goose acp`) — dispatched alongside opencode in v2.0 Path B**                                                |**v2.0 (host install)**                       |

 -----

 ## Decisions log

+- **v1.13.7 stability bundle (2026-05-22, uncommitted).** Five-fix sweep during the cosmetic-revert investigation surfaced two production-affecting regressions latent since v1.13.1-A. (1) **`@ai-sdk/openai-compatible` `includeUsage` defaults to false** — `provider.ts` never asked llama-swap to emit usage, so `tokens_used`/`ctx_used` had been NULL in every assistant row since v1.13.1-A. The fix is one line at `provider.ts:18`. No backfill for historical rows. (2) **AI SDK v6 streaming emits a stray `\n` text-delta on tool-call-only turns**, which passed `content.length > 0` and rendered an empty bubble + ActionRow between each tool call. Trim in `MessageList.flatten` (`hasText`) and defensively in `MessageBubble` (`hasContent`). (3) **`buildMessagesPayload` did not filter trailing empty or failed assistant rows** — combined with (2), a Continue retry produced `…summary-assistant, empty-assistant, failed-assistant` payloads and the upstream rejected with "Cannot have 2 or more assistant messages at the end of the list." Skip rules added at `payload.ts:64`. (4) **`BUDGET_NO_AGENT` bumped 15→30.** Every tool in `ALL_TOOLS` is read-only today; the cautious 15-cap was forward-looking for write tools that haven't landed. No-agent mode now matches `BUDGET_READ_ONLY`. None of the five changes touch schema or compaction — they're cleanup against a "v1.13.1-A regression that hadn't been caught yet" surface.
+- **Skills taxonomy locked: AGENTS.md = rules, skills = recipes (2026-05-22).** Codeminer42's multi-turn eval showed plain skills invoke 6% in clean runs vs `CLAUDE.md`/`AGENTS.md` 100% present. **General workflow rules (TDD, paraphrase-before-quote, security gotchas, "never git pull/commit/push", alpha-tool-ordering, codecontext-not-RAG) belong in `AGENTS.md`; specific on-demand procedures (`/skill scaffold-component`, `/skill run-release-checklist`) belong in skills.** Hooks are for automation, not instruction delivery. The 7 vendored v1.12 skills get an audit pass in **v1.13.12** to sort each into the 4-way split (move to AGENTS.md / keep as recipe / move bulky context to `references/` / delete). Validation via `mgechev/skills-best-practices` 4-step protocol + `mgechev/skillgrade --smoke` per skill. Anthropic's `agent-skills/best-practices` page becomes the canonical convention reference (500-line ceiling, gerund naming, MCP `ServerName:tool_name` format, progressive disclosure one level deep, etc.). Documented in `BOOCHAT.md` / `BOOCODER.md` to future-proof against re-adding workflow rules as skills.
+- **HTML artifacts in BooChat locked (2026-05-22).** Adopt Thariq Shihipar's "HTML > Markdown for outputs >100 lines" pattern. AGENTS.md gets the HTML-bias rule. Backend detection emits new `html_artifact` part kind. Frontend renders in three places: inline iframe preview in chat stream, "open in pane" workspace splitter integration, and download to `/opt/<project>/.boocode/artifacts/<slug>-<timestamp>.html`. Security: `sandbox="allow-scripts allow-clipboard-write allow-downloads"` with no `allow-same-origin`, CSP `connect-src 'none'`, `srcdoc=` inline (not `src=`). All of Thariq's interactive examples (sliders/knobs/SVG diagrams/copy-as-JSON) work under this sandbox because they're entirely client-side. Don't vendor `anthropics/skills/web-artifacts-builder` — its Vite + Parcel toolchain can't run in BooChat (no shell). Treat the skill's "avoid AI slop" rules as design conventions inlined in AGENTS.md.
+
+### MCP and ACP protocol roles per surface (2026-05-22, locked)
+
+- **BooChat = MCP client only.** Read-only tool consumer. Per-server `enabled` flag. **Hard rule: never enable a write-capable MCP server** — the read-only invariant overrides protocol convenience. Defense-in-depth: client must reject any tool whose `annotations.readOnly` is false or absent.
+- **BooCoder = MCP client + MCP server + ACP client (host) + ACP agent (driveable).** Full matrix.
+  - **MCP client role:** inherits v1.15 client; write-capable servers allowed but writes route through `pending_changes` queue.
+  - **MCP server role:** BooCoder exposes its own task primitives (`boocoder.create_task` etc.) so external `opencode` sessions in Termius become BooCoder-aware. Stdio for local, HTTP gated on OAuth+secret storage.
+  - **ACP client (host) role:** replaces raw-PTY dispatch for ACP-capable agents (opencode, goose). PTY retained as fallback for claude/pi/smallcode. Critical pattern: ACP clients auto-forward MCP `context_servers` to the dispatched agent (per goose docs) — one MCP config drives every dispatched agent.
+  - **ACP agent role:** `boocoder acp` exposes BooCoder to Zed/JetBrains/Avante.nvim. Deferred to v2.2.
+- **Why BooChat doesn't get ACP:** ACP standardizes the editor→agent direction. BooChat doesn't drive agents; it *is* the chat. Adding ACP-agent to BooChat would convert it into an opencode-equivalent — different product. Skip.
+- **MCP/ACP integration phasing:** v1.14.x (single-server MCP-client PoC against Context7) → v1.15 (full MCP client + permissions) → v2.0 (BooCoder full matrix: write-capable MCP client + MCP server + ACP client) → v2.2 (BooCoder ACP agent for external editor drive).
+- **Reference materials:** anthropics `mcp-builder` skill (4-phase build workflow + 10-question eval framework — required for BooCoder's MCP server before shipping), opencode MCP/ACP docs as JSON-schema interop reference, goose ACP docs for the `context_servers` auto-forward pattern, `agentclientprotocol.com` spec (note: remote ACP via HTTP/WS still WIP, v2.0 uses stdio only).
+- **v1 MCP scope limit (security):** local-stdio MCP servers + Context7-style API-key remote only. Remote OAuth MCP servers (Sentry, Atlassian, etc.) deferred until BooCode has a real secret-storage primitive — token leakage from a PostgreSQL dump or Authelia bypass is a real attack surface that doesn't exist with local-stdio MCP.
+
+### Monorepo / multi-app structure (2026-05-22, locked)
+
+- **BooCode is a 3-app monorepo** at `/opt/boocode/`: `apps/chat` (read-only, currently the live thing at 9500), `apps/coder` (write tools + external CLI dispatch, 9502, v2.0 planned), `apps/booterm` (PTY terminal, **live since May 2026 at 9501**). Shared `apps/server` (Fastify backend) and `apps/web` (React shell hosting the three surfaces as tabs).
+- **Single shared database, rename `boocode_db` → `boochat_db` when BooCoder lands.** All three surfaces in one Postgres. Cross-surface joins are valuable (coder task → originating chat → term debugging session). Separate databases would break this.
+- **Mount strategy: blanket `/opt:rw`, policy enforcement at the write-tool layer.** Per-project scoping is logic, not mount. Path-guard correctness becomes the highest-priority test target for v2.0 — fuzz it, property-test it, every traversal-attack pattern (including MCP-served filesystem writes).
+- **External CLI agents on the host, not in containers.** BooCoder shells out via local-exec PTY or ACP subprocess (`node-pty`, host shell, or `child_process.spawn('opencode', ['acp'])`). Host install inherits Sam's existing `~/.opencode/`, `~/.claude/`, `~/.config/goose/` configs without re-mounting. Containerize later only if a concrete reason emerges.
+
+### Strategic pivot: Paseo-equivalent dispatcher (2026-05-22)
+
+Sam wants BooCode to function like Paseo without using Paseo itself. **Paseo is AGPL-3.0** — incompatible with BooCode's MIT license and its network-served deployment at `code.indifferentketchup.com`. Solution: **reproduce the architecture in BooCode's existing Fastify + TS + PostgreSQL + React stack, using only license-clean patterns**.
+
+- **Primary architectural template:** `Dominic789654/agent-hub` (Apache-2.0) — three-process model (board server + dispatcher + assistant terminal) and schema (tasks/projects/templates/pipelines/human_inbox).
+- **Critical context-management primitive:** Roo Code Boomerang Tasks pattern — orchestrator with intentional capability restriction, down-pass/up-pass context discipline, no implicit inheritance.
+- **Observation pattern:** Claude Code hooks (siropkin/budi reference) — register BooCode as the hook receiver for `SessionStart`/`UserPromptSubmit`/`PostToolUse`/`SubagentStart`/`Stop`.
+- **Protocol-level Paseo equivalence:** the ACP client + MCP server combination in BooCoder is the protocol-spelled version of Paseo's daemon. ACP gives multi-agent dispatch with structured events instead of free-form PTY output. MCP server gives BooCoder-as-task-board, callable from any MCP client (Termius-based opencode, future editors). One MCP config feeds every dispatched agent (via `context_servers` auto-forward).
+
+This is now the dominant roadmap direction, **ahead of v1.13.x cleanup batches in importance** but **behind them in sequence** (v1.13 finishing now; Paseo-equivalent work is v2.0+).
+
+### BooCoder execution: both Option A AND Option B, full-featured (2026-05-22)
+
+Earlier May 18 chat recommended Option A (thin orchestration shell over OpenCode) but explicitly called the choice not-locked. Sam's call this session: ship **both** paths in the same BooCoder surface. **Option B / in-process loop** handles interactive write work with native tools + pending-changes UI (v2.0 plandex pattern). **Option A / PTY-or-ACP dispatch** handles parallel/batch work where Sam wants to A/B opencode vs claude vs goose vs pi against the same task in separate worktrees. User picks per task. **ACP replaces raw PTY wherever the agent supports it** (opencode, goose); PTY fallback retained for claude/pi/smallcode.
+
+### v1.13.x cleanup line locked (2026-05-22)
+
+After v1.13.1-C shipped clean, the cleanup order is **v1.13.3 ✅ → v1.13.4 ✅ → v1.13.5 ✅ → v1.13.6 ✅ → v1.13.7 ✅ → v1.13.8 (verify) → v1.13.9 (overflow) → v1.13.10 → v1.13.11 → v1.13.12 → v1.13.2** (column drop last as rollback insurance). **Do not fold.** Smoke isolation matters: each batch has a distinct rollback surface, and bisecting a 750-LoC merge across four unrelated changes is worse than four separate dispatches.
+
+### v1.13 retrospective (what shipped)
+
+- **v1.13.0** — `message_parts` table + dual-write at every JSON-write site. Old columns authoritative for reads. Reversible.
+- **v1.13.1-A** — AI SDK v6 (`ai@^6`, `@ai-sdk/openai-compatible@^2`). `streamCompletion` rewritten as `streamText` adapter. Silent-abort bug caught and patched (explicit `if (signal?.aborted) throw`). Known regression: mid-stream tps gone — TODO for delta-cadence interpolation against `result.usage`. **Latent regression discovered v1.13.7:** `includeUsage` defaults false on `@ai-sdk/openai-compatible`, so `result.usage` resolved empty all along; tokens_used/ctx_used NULL in every row since this version. Fixed in v1.13.7.
+- **v1.13.1-B** — `messages_with_parts` view with COALESCE fallbacks. Read sites switched. 1ms for 42-message chat verified.
+- **v1.13.1-C** — `ask_user_input` correlation ported to parts; reasoning end-to-end (361 chars reasoning at seq 0, 429 chars text at seq 1 in smoke). `v1.13.1` tagged on `ac1a71f`. **Latent regression discovered v1.13.6:** reasoning was wired into the inference payload but NOT into compaction's head-assembly payload — summarizer model couldn't see reasoning for tool-bearing turns, degrading qwen3.6 summary quality. Fixed in v1.13.6.
+- **v1.13.3** — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, repairToolCall wiring. Tagged on `a08d809`.
+- **v1.13.4** — two-tier compaction prune. Tagged on `ec8593c`.
+- **v1.13.5** — opencode truncate.ts port + view_truncated_output tool. Tagged on `f8fc5db`.
+- **v1.13.6** — compaction head-assembly audit + reasoning fix. Closed the Q3 reasoning gap from v1.13.1-C. Tagged on `81d837c`.
+- **v1.13.7** — stability bundle: includeUsage fix + trim guards + payload filter + budget bump. Surfaces tokens (closes a v1.13.1-A latent regression where `result.usage` resolved empty), kills the empty-bubble + ActionRow noise between tool calls on single-tool-call turns, and unblocks Continue after cap-hit on chats that have trailing empty/failed assistants.
+- **v1.13.2 deferred** — at least one week of production traffic on v1.13.1 before dropping legacy columns. Dual-write is rollback insurance.
+
+### Pre-v1.13 architectural decisions (still load-bearing)
+
 - **Embeddings dropped from BooCode** (May 2026). Replaced RAG with file-view tools + sidecar analyzers.
+- **opencode promoted to Tier A** (2026-05-20). Five algorithms identified for lift (compaction, doom-loop, repairToolCall, runLoop, permission evaluate) plus truncate.ts and MCP client.
+- **OpenCode canonical repo: `anomalyco/opencode`, NOT `sst/opencode`** (correction 2026-05-22). Development moved to anomalyco; sst/opencode is the predecessor lineage. All 15 catalog references rewritten.
 - **Original Batch 11 (aider PageRank port) replaced** by codecontext sidecar approach.
- **Original Batch 12 (codebase indexer w/ Harrier) removed.** No embedding infrastructure in BooCode v1.x.
+- **Original Batch 12 (codebase indexer w/ Harrier) removed.** No embedding infrastructure.
+- **Original Batch 13 (OpenHands event log) replaced** by v1.13 parts table (opencode pattern).
+- **Original Batch 12 (cline plan/act mode) absorbed into v1.15** (opencode permission ruleset).
+- **Aider's `repomap.py` port dropped.** Codecontext supersedes it. Aider contribution narrows to the `.scm` query files only.
 - **Globstar parked** — not an architect tool. Future verify-before-commit candidate only.
 - **codeprysm rejected** — embedding-based. Node/edge taxonomy noted as reference if we ever build our own graph.
 - **Batch 9 decoupled from Batch 7 (2026-05-16); shipped in `92bd3b1`.** Builtin defaults: six agents (Code Reviewer, Debugger, Refactorer, Architect, Security Auditor, Prompt Builder) with no `model` field. Session model wins by default.
- **opencode lift opened** (2026-05-20). Started with compaction (v1.11.0). Continuing through v1.15. Five distinct algorithms: compaction, doom-loop guard, repairToolCall, runLoop, permission evaluate. Plus `truncate.ts` and `MCP client`. Each lifts the algorithm, not the Effect-TS plumbing.
- **AI SDK adoption deferred to v1.13.** Hand-roll repairToolCall in v1.12 first. Migrate everything together when parts table lands.
- **`tool_choice='required'` confirmed supported** by llama-swap (qwen3.6-35b-a3b-mxfp4, 2026-05-20). Unblocks repair tool call viability.
- **v1.11.4 cancelled** (2026-05-20). Per-turn budget reset + Continue affordance + CapHitSentinel were already shipped in v1.8.2. Roadmap was 14 versions stale at time of recon.
+- **AI SDK adoption deferred to v1.13** — and shipped as v1.13.1-A. v6 chosen (not v5) for native typed parts model and top-level `experimental_repairToolCall`.
+- **`tool_choice='required'` confirmed supported** by llama-swap (qwen3.6-35b-a3b-mxfp4, 2026-05-20).
+- **v1.12.0 shipped 2026-05-21.** codecontext sidecar Track B + container guidance Track A. v1.12 truncation and repairToolCall deferred into v1.13.
+- **v1.12.1 workspace pane sync** (2026-05-21). Moved pane state from per-device localStorage to `sessions.workspace_panes jsonb` with WS broadcast for cross-device sync. Deprecated `session_panes` table dropped. Legacy localStorage migrates on first load.
+- **v1.12.1 status indicator overhaul** (2026-05-21). ChatStatusFrame expanded from `working|idle|error` to `streaming|tool_running|waiting_for_input|idle|error`. StatusDot rewritten with distinct animations per state.
+- **detectSameNameLoop reverted in v1.12.1.** Added during the 2026-05-21 debugging spike, never fired in any real run. Dead code.
+- **The 2026-05-21 "freeze" debugging spike taught one lesson**: BooCode had no UI signal for the difference between a slow stream and a dead stream. v1.12.2 (live tok/s) and v1.12.3 (stale-stream banner) directly closed that gap. **v1.13's typed parts table made the inference state machine visible by construction** — the structural fix the spike pointed to.
+- **v1.12.4 refactor shipped 2026-05-21/22.** `inference.ts` (1700 LoC) split into `inference/` directory before v1.13 so the AI SDK migration had clean seams. `stream-phase.ts` became the swap target for `streamText`, `tool-phase.ts` got the per-tool `category` tag (added in v1.13.0). Pure structural move, no behavior change.
+- **AI SDK v6 silent-abort patched (v1.13.1-A).** `fullStream` returns normally on abort instead of throwing. Without explicit `if (signal?.aborted) throw` after the stream drain, stop button writes `complete` instead of `cancelled`. One-liner comment at the site so it survives future refactors.
+
+### Catalog growth (2026-05-22 deep review pass)
+
+The session-of-the-day catalog review added 50+ new entries to `boocode_code_review.md`. Decisions worth carrying into roadmap planning:
+
+- **Tier A active lifts unchanged:** opencode, codecontext, tree-sitter-analyzer, codesight, aider.
+- **Tier B / Tier C reviewed and triaged.** Most consequential additions: agent-hub (#48, primary v2.0 architectural template), Roo Boomerang Tasks (#46, v1.14 AGENTS.md pattern), zeroshot (#37, blind-validation invariant), AgentLint (#39, immediate manual audit pass), RA.Aid (#44, three-stage routing), OpenSpec (#36, batch-doc structure), bernstein (#49, HMAC audit log), memov (#42, session-history tool design), siropkin/budi (#51, install for Claude Code observability).
+- **Rejected as code sources:** kilocode, costrict, prompt-tower, mycoder, reviewcerberus (closed Docker), Junie (closed), Cody (parked), VS Code extensions broadly, all Web Builders, LynxPrompt (GPL-3.0), claude-task-master code (Commons Clause), Paseo source (AGPL).
+- **No additional code lifts promoted to a current version.** All catalog adds are either patterns (license-clean), references (for v2.0+), or one-off audit-pass items (AgentLint, budi install).

 -----

@@ -305,13 +636,13 @@ Full inventory in `boocode_code_review.md`. Headline items:
 Each batch:

 1. Verify previous batch merged. `git log --oneline main -5`.
-2. Cut branch from main. Single-branch-per-dispatch convention.
-3. Dispatch via Paseo to Claude Code at `/opt/boocode`.
-4. Claude Code recon → blocking questions → implement → hand back.
-5. Compliance review in separate Claude chat (paste handback).
-6. Build: `docker compose build --no-cache boocode` (no-cache avoids the v1.11.2 stale-bundle trap).
-7. Restart: `docker compose up -d boocode`.
-8. Smoke test in browser (hard refresh).
-9. Sam commits and pushes. **Never** `git pull` / `git push` / `git commit` on his behalf.
+1. Cut branch from main. Single-branch-per-dispatch convention.
+1. Dispatch via Paseo to Claude Code at `/opt/boocode`.
+1. Claude Code recon → blocking questions → implement → hand back.
+1. Compliance review in separate Claude chat (paste handback).
+1. Build: `docker compose build --no-cache <surface>` where surface is `boocode` (chat) / `booterm` / `boocoder` (v2.0+). No-cache avoids the v1.11.2 stale-bundle trap.
+1. Restart: `docker compose up -d <surface>`.
+1. Smoke test in browser (hard refresh).
+1. Sam commits and pushes. **Never** `git pull` / `git push` / `git commit` on his behalf.

-Sam reviews all diffs.
+Sam reviews all diffs. Backups before any destructive step: `cp file file.bak-$(date +%Y%m%d-%H%M%S)`.
--- a/openspec/README.md
+++ b/openspec/README.md
@@ -0,0 +1,38 @@
+# openspec
+
+Per-batch documentation convention adopted v1.13.15-openspec.
+
+Lift source: Fission-AI/OpenSpec directory layout. **No CLI dependency** — just
+the folder shape. Full OpenSpec lifecycle adoption is a future v1.14+ batch.
+
+## Layout
+
+```
+openspec/
+  changes/
+    <slug>/                          # one folder per shipped or planned batch
+      proposal.md                    # Why + scope summary
+      tasks.md                       # implementation step list
+      design.md                      # architecture / data-model decisions (optional)
+      specs/                         # reserved for future OpenSpec CLI adoption
+    archived/                        # snapshots of pre-v1.13.15 batch docs
+      <original-filename>.md
+  specs/                             # global specs, future v1.14+ use
+```
+
+## Conventions
+
+- Slugs are lowercase-hyphenated derived from the batch title
+  (e.g. `v1-13-10-per-tool-cost`, `file-attachments-v3-5`).
+- Already-shipped pre-v1.13.15 batches live in `changes/archived/` as
+  single-file snapshots. They were not split into proposal/tasks because
+  the work was already complete; archiving preserves git history.
+- New v1.13.15+ batches should land directly in
+  `changes/<slug>/proposal.md` (+ tasks.md, + design.md when applicable).
+- `proposal.md` carries the "Why" and scope. `tasks.md` is the action list
+  (numbered or checkbox). `design.md` is for non-trivial architectural
+  decisions worth recording separately.
+- A canonical dispatch brief (matching the v1.13.9 / v1.13.10 format)
+  is most naturally split as proposal.md (Where we are, Why this matters,
+  rationale sections) + tasks.md (Scope items, Build + smoke) + design.md
+  (Attribution model, Filtering, Canonical mapping).
--- a/openspec/changes/archived/boocode_batch10.md
+++ b/openspec/changes/archived/boocode_batch10.md
--- a/openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
+++ b/openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
@@ -0,0 +1,441 @@
+```
+#careful #boocode #nofluff
+
+v1.13.10 — per-tool token cost accounting (rolling 100-call window)
+
+Goal: surface per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of `messages_with_parts` (no new table, no new write site) + a read endpoint + AgentPicker tooltip extension. Estimated ~240 LoC, mostly UI.
+
+## Where we are
+
+- Last tag: v1.13.9 (compaction overflow trigger — `floor(0.85 × ctx_max)` early-trigger). Branch clean.
+- v1.13.x cleanup line ✅ through v1.13.9. Queued: v1.13.10 (this) → v1.13.11 (WS Zod) → v1.13.12 (skills audit) → v1.13.2 (column drop, last).
+- Dependency (satisfied since v1.13.7 commit `ff29b48`): `includeUsage: true` on `createOpenAICompatible` in `apps/server/src/services/inference/provider.ts`. Without it, `messages.tokens_used`/`ctx_used` were NULL for v1.13.1-A → v1.13.7 (latent regression). Now populated.
+
+## Why this matters
+
+Today: AgentPicker lists agents by name + description. No cost signal. Users pick the architect agent (full tool whitelist, 21k of tool schema) for one-liner questions a refactorer (3 tools, 4k schema) could answer.
+
+Tomorrow: each agent listing shows its mean prompt + completion cost per tool, derived from the last 100 invocations across all chats. Decision aid, not a hard gate.
+
+Why a SQL view instead of a denormalized stats table:
+- All the source data already lands in `messages` (tool_calls JSON + tokens_used + ctx_used) and `message_parts` (read via the `messages_with_parts` view). Zero new write sites.
+- Rolling 100-call window is a `ROW_NUMBER() OVER (PARTITION BY tool_name ORDER BY created_at DESC) <= 100` — natural fit for a view.
+- View is rollback-safe. If the math is wrong, `DROP VIEW` and re-deploy; no orphan rows, no backfill.
+- At BooCode scale (single user, ~30 tools, ~100 calls/tool), aggregate-on-read is microseconds. Premature to denormalize.
+
+The roadmap schema row (`tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)`) matches both a table and a view. View is the lighter implementation.
+
+## Canonical column mapping (pinned)
+
+The `messages` columns are named non-obviously. Pinned mapping, confirmed across 5 write sites + 1 read site:
+
+| Column          | Semantic meaning   | AI SDK v6 source name |
+|-----------------|--------------------|-----------------------|
+| `ctx_used`      | prompt / input tokens   | `usage.inputTokens`   |
+| `tokens_used`   | completion / output tokens | `usage.outputTokens`  |
+
+Write sites confirmed: `tool-phase.ts:94-95`, `error-handler.ts:109-110`, `sentinel-summaries.ts:130-131`, `sentinel-summaries.ts:387-388`, `stream-phase.ts:319-320`. Canonical read at `payload.ts:190-191` reverses: `const promptTokens = updated.ctx_used; const completionTokens = updated.tokens_used`.
+
+`tokens_used` reads like "total" but is completion only. Project convention since the columns predate v1.13.x. Do not "fix" the naming inside this batch — out of scope; downstream consumers depend on the current mapping.
+
+## Attribution model
+
+A single assistant turn can emit N tool calls in parallel. llama-swap returns ONE (prompt_tokens, completion_tokens) per turn, not per tool. Attribution requires a split.
+
+**Chosen approach: equal split.** For an assistant turn that emits N tool calls with prompt P and completion C, each tool is attributed P/N prompt + C/N completion. The 100-call rolling mean smooths split noise. Implementation: `tokens_used::float / jsonb_array_length(tool_calls)` at the unnest site.
+
+**Alternatives rejected:**
+- "Full turn cost to every tool" (no division). Over-states; a 5-tool turn would 5×-count every tool's cost.
+- "Result-size only" (`length(JSON.stringify(output)) / 4`). Loses the LLM's actual usage signal; doesn't capture how expensive a tool's output is to the next prompt.
+- "Consuming-turn delta" (next turn prompt_tokens − this turn prompt_tokens, attribute to the tool that emitted the result). Most accurate but requires bubble-back math through the `executeToolPhase → runAssistantTurn` recursion. Over-engineered for the rolling-average use case.
+
+**If Sam wants a different split, change one line in the view definition (the divisor).**
+
+## Filtering — sentinel, failure, repair-call semantics
+
+The view excludes rows that aren't real tool-cost signal:
+
+- **Failed and cancelled turns** (`status != 'complete'`). The `error-handler.ts` failed/cancelled paths don't write `tokens_used`/`ctx_used`, so the existing `tokens_used IS NOT NULL` clause already filters these. Adding `status='complete'` is defense in depth and makes intent explicit.
+- **Cap-hit and doom-loop sentinel rows** (`metadata->>'kind' IN ('cap_hit', 'doom_loop')`). Sentinels are `role='system'` rows with `tool_calls=NULL`, so the existing `tool_calls IS NOT NULL` clause already filters them. The explicit metadata filter is defense in depth — it survives future schema drift where someone might INSERT a sentinel with a non-null tool_calls.
+- **`experimental_repairToolCall` retries.** No special handling needed. Our impl (per `CLAUDE.md`) is pass-through — malformed calls flow to zod-reject → tool_result error → next normal turn handles. No separate rows; the next turn's tokens count naturally.
+
+## Recon (already done; paste for reference)
+
+```
+cd /opt/boocode
+grep -n "tokens_used\|ctx_used\|inputTokens\|outputTokens" apps/server/src/services/inference/*.ts | head -30
+grep -n "metadata\|cap_hit\|doom_loop" apps/server/src/services/inference/sentinels.ts apps/server/src/schema.sql | head -10
+psql -h localhost -p 5432 -U postgres -d boocode -c "\d messages_with_parts" | head -30
+```
+
+Expected: confirms the canonical mapping in the table above; confirms `messages.metadata jsonb` exists at `schema.sql:259`; confirms `messages_with_parts` exposes `m.metadata` at `schema.sql:92`.
+
+## Scope
+
+### 1. schema.sql — `tool_cost_stats` view (~35 LoC)
+
+Append after the `messages_with_parts` view (after line 120):
+
+```sql
+-- v1.13.10: per-tool token cost rolling window. Derives from
+-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
+-- the legacy JSON column) so this works whether the chat predates v1.13.0
+-- or postdates v1.13.2 (column drop). No new write site — all source data
+-- already lands via the existing tool-phase.ts:94-95 UPDATE.
+--
+-- Attribution model: equal split. A turn emitting N tool calls divides its
+-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
+-- brief for rationale + rejected alternatives.
+--
+-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
+-- = completion (output). Non-obvious naming; pinned via canonical writes at
+-- tool-phase.ts:94-95 et al.
+--
+-- Filtering rationale:
+--   status='complete'                — exclude failed/cancelled (defense in
+--                                      depth; failed-path doesn't write
+--                                      tokens_used so they're also filtered
+--                                      indirectly).
+--   metadata->>'kind' exclusions     — exclude cap_hit / doom_loop sentinels
+--                                      (defense in depth; sentinels are
+--                                      role='system' with tool_calls=NULL
+--                                      so they're filtered indirectly too).
+--   experimental_repairToolCall      — no special handling; retries flow
+--                                      as normal next-turn tool_result
+--                                      errors and count naturally.
+--
+-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
+-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
+-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
+CREATE OR REPLACE VIEW tool_cost_stats AS
+WITH per_call AS (
+  SELECT
+    (tc->>'name')::text AS tool_name,
+    (m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
+    (m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
+    m.created_at,
+    ROW_NUMBER() OVER (
+      PARTITION BY (tc->>'name')::text
+      ORDER BY m.created_at DESC
+    ) AS rn
+  FROM messages_with_parts m,
+    LATERAL jsonb_array_elements(m.tool_calls) AS tc
+  WHERE m.tool_calls IS NOT NULL
+    AND jsonb_array_length(m.tool_calls) > 0
+    AND m.tokens_used IS NOT NULL
+    AND m.ctx_used IS NOT NULL
+    AND m.status = 'complete'
+    AND (m.metadata IS NULL
+         OR m.metadata->>'kind' IS NULL
+         OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
+)
+SELECT
+  tool_name,
+  ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
+  ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
+  COUNT(*)::int AS n_calls,
+  MAX(created_at) AS updated_at
+FROM per_call
+WHERE rn <= 100
+GROUP BY tool_name;
+```
+
+Notes:
+- `NULLIF(..., 0)` guards against div-by-zero on `jsonb_array_length=0` (should never happen given the WHERE clause, but defensive).
+- `ROUND(SUM(...))::int` — frontend doesn't want decimals; sum-then-round is more accurate than per-row round-then-sum.
+- View is read from `messages_with_parts` not `messages`, so legacy pre-v1.13.0 rows and post-v1.13.2 rows both resolve.
+- No index needed; the underlying `idx_messages_chat` covers the JOIN; the LATERAL unnest is bounded by the 100-row partition.
+
+### 2. apps/server/src/routes/tools.ts (NEW, ~40 LoC)
+
+New route file. Register in `apps/server/src/index.ts` next to the other `register*Routes(app, sql, ...)` calls.
+
+```ts
+import type { FastifyInstance } from 'fastify';
+import type { Sql } from '../db.js';
+
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+
+export function registerToolsRoutes(app: FastifyInstance, sql: Sql) {
+  app.get('/api/tools/cost_stats', async () => {
+    const rows = await sql<{
+      tool_name: string;
+      prompt_tokens_sum: number;
+      completion_tokens_sum: number;
+      n_calls: number;
+      updated_at: string;
+    }[]>`
+      SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
+      FROM tool_cost_stats
+      ORDER BY tool_name ASC
+    `;
+    const stats: ToolCostStat[] = rows.map(r => ({
+      tool_name: r.tool_name,
+      mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
+      mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
+      n_calls: r.n_calls,
+      updated_at: r.updated_at,
+    }));
+    return { stats };
+  });
+}
+```
+
+Route is bodyless, idempotent, cheap. No pagination (≤30 tools).
+
+### 3. apps/server/src/services/__tests__/tool_cost_stats.test.ts (NEW, ~95 LoC)
+
+Integration test against real Postgres (matches `inference.test.ts` pattern). Fixtures:
+
+```ts
+import { describe, it, expect, beforeEach } from 'vitest';
+import { connect } from '../../db.js';
+
+describe('tool_cost_stats view (v1.13.10)', () => {
+  // ... session + chat + project setup helpers ...
+
+  it('returns empty when no tool calls exist', async () => {
+    // fresh chat, only user/assistant text turns
+    const stats = await sql`SELECT * FROM tool_cost_stats`;
+    expect(stats).toEqual([]);
+  });
+
+  it('attributes single-tool turn fully to that tool', async () => {
+    // insert one assistant message with tool_calls=[{name: 'view_file', ...}],
+    // tokens_used=300, ctx_used=15000, status='complete'
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]).toMatchObject({
+      tool_name: 'view_file',
+      prompt_tokens_sum: 15000,
+      completion_tokens_sum: 300,
+      n_calls: 1,
+    });
+  });
+
+  it('splits multi-tool turn equally across tools', async () => {
+    // insert one assistant turn with 3 tool calls (view_file, grep, list_dir),
+    // tokens_used=300, ctx_used=15000 → each tool gets 100 completion, 5000 prompt
+    const stats = await sql`SELECT * FROM tool_cost_stats ORDER BY tool_name`;
+    expect(stats).toHaveLength(3);
+    for (const s of stats) {
+      expect(s.completion_tokens_sum).toBe(100);
+      expect(s.prompt_tokens_sum).toBe(5000);
+      expect(s.n_calls).toBe(1);
+    }
+  });
+
+  it('limits to last 100 calls per tool (FIFO window)', async () => {
+    // insert 150 turns each calling view_file once with monotonically
+    // increasing tokens_used; expect only the most recent 100 to count
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]!.n_calls).toBe(100);
+    // mean should reflect the latter half (51..150), not 1..150
+  });
+
+  it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
+    // insert a turn with tool_calls but tokens_used=NULL → must not appear
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats).toEqual([]);
+  });
+
+  it('excludes failed and cancelled turns + sentinel metadata rows', async () => {
+    // insert four rows for tool_name='view_file', all with tokens_used+ctx_used
+    // populated:
+    //   row A: status='failed'                            — excluded
+    //   row B: status='cancelled'                         — excluded
+    //   row C: status='complete', metadata={kind:'cap_hit'}   — excluded
+    //   row D: status='complete', metadata={kind:'doom_loop'} — excluded
+    //   row E: status='complete', metadata=null               — included
+    // Expect n_calls=1, attributable to row E only.
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]!.n_calls).toBe(1);
+  });
+
+  it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
+    // insert a v1.13.0+ row with messages.tool_calls=NULL but
+    // message_parts rows containing the tool_call → must still aggregate
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='grep'`;
+    expect(stats[0]!.n_calls).toBe(1);
+  });
+});
+```
+
+Pattern: each test resets the messages table for the fixture chat (TRUNCATE not DELETE — Postgres `messages` has FK CASCADE) and inserts hand-crafted rows. The view is recomputed on every SELECT.
+
+### 4. apps/web/src/api/types.ts + client.ts (~10 LoC)
+
+Add to `types.ts`:
+
+```ts
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+```
+
+Add to `client.ts` under the existing `api.*` namespace structure:
+
+```ts
+tools: {
+  costStats: () => fetch<{ stats: ToolCostStat[] }>('GET', '/api/tools/cost_stats'),
+},
+```
+
+Match the casing convention of the existing namespaces (`api.agents.list`, `api.chats.archive`, etc.).
+
+### 5. apps/web/src/components/AgentPicker.tsx — tooltip extension (~80 LoC delta)
+
+Currently (line 67): `title={selectedAgent?.description}` — native HTML title attribute on the trigger button.
+
+Replacement: dropdown items get a per-agent cost line in muted text below the description. Format:
+
+```
+[Agent name]
+[Agent description]
+~5.2k prompt / 280 completion · 6 tools · last call 3h ago
+```
+
+Implementation steps:
+1. Fetch `api.tools.costStats()` once on mount (alongside the existing `api.agents.list()`). Cache result for the lifetime of the picker open state. Re-fetch only on `useEffect` dep change.
+2. Compute per-agent aggregate: for each agent, sum the means of its whitelisted tools. Sum-of-means, not mean-of-sums — we're combining independent rolling averages.
+3. Render below description (one line, muted, truncated). Show "—" if no calls recorded yet for any of the agent's tools.
+4. Don't break the existing native `title=` for backward compat; layer the cost line additively.
+
+```tsx
+const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
+useEffect(() => {
+  api.tools.costStats().then(r => setCostStats(r.stats)).catch(() => setCostStats([]));
+}, []);
+const costByTool = useMemo(
+  () => Object.fromEntries(costStats.map(s => [s.tool_name, s])),
+  [costStats],
+);
+function agentCost(agent: Agent): { prompt: number; completion: number; nTools: number; nWithData: number; mostRecent: string | null } {
+  let prompt = 0, completion = 0, nWithData = 0;
+  let mostRecent: string | null = null;
+  for (const t of agent.tools) {
+    const s = costByTool[t];
+    if (!s) continue;
+    prompt += s.mean_prompt_tokens;
+    completion += s.mean_completion_tokens;
+    nWithData++;
+    if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
+  }
+  return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
+}
+```
+
+For the line render: `~${formatK(prompt)} prompt / ${completion} completion · ${nWithData}/${nTools} tools · ${formatAgo(mostRecent)}`. Skip entirely when `nWithData === 0` to avoid showing "0k / 0 / 0 tools" for fresh-from-deploy state.
+
+**`formatK` / `formatAgo`:** colocate at the bottom of `AgentPicker.tsx`. Don't extract to a util file in this batch — single use site.
+
+## What NOT to do
+
+- **Don't add a new write site at `tool-phase.ts` or `finalizeCompletion`.** All source data is already there via existing UPDATEs.
+- **Don't denormalize.** The view is sufficient and rollback-safe at BooCode's single-user scale.
+- **Don't add per-tool cost to the message bubble.** Out of scope. AgentPicker tooltip only.
+- **Don't fold per-call rows into a moving sum via triggers.** Aggregate on read; 100 rows × 30 tools is microseconds in Postgres.
+- **Don't track `result_chars` (the size of `tool_results.output`).** Tempting as a second cost signal but out of scope here. Future batch if Sam wants it.
+- **Don't add a session-scoped or chat-scoped filter to `tool_cost_stats`.** The rolling window is GLOBAL across all chats — the agent picker is a project-level decision aid. Per-chat surfacing is a future v1.14+ design.
+- **Don't change the attribution model post-deployment** without dropping the view first. Mid-flight semantic changes give bogus historical means.
+- **Don't "fix" the `ctx_used`/`tokens_used` naming inside this batch.** Non-obvious but pinned across 5 write sites. Renaming is its own batch.
+- **Don't rely solely on `tool_calls IS NOT NULL` for sentinel exclusion.** It works today (sentinels are role='system' with tool_calls=NULL) but the explicit `status='complete'` + `metadata->>'kind'` filters are defense in depth and survive future schema drift.
+
+## Backup before edits
+
+```
+cd /opt/boocode
+cp apps/server/src/schema.sql{,.bak-$(date +%Y%m%d-%H%M%S)}
+cp apps/web/src/components/AgentPicker.tsx{,.bak-$(date +%Y%m%d-%H%M%S)}
+```
+
+(No backup needed for new files in items 2, 3, 4.)
+
+## Verify
+
+```
+pnpm -C apps/server test
+```
+
+Expected: all existing tests pass + 7 new in `tool_cost_stats.test.ts`. Total moves from 195 → 202.
+
+```
+cd /opt/boocode
+docker compose exec boocode_db psql -U postgres -d boocode -c \
+  "SELECT * FROM tool_cost_stats ORDER BY n_calls DESC LIMIT 10;"
+```
+
+Expected: in any live deployment with v1.13.7+ history, this returns real rows for `view_file`, `grep`, `list_dir`, etc. If empty: `messages.tool_calls` was NULL for the v1.13.1-A → v1.13.7 latent regression window and recovery only begins with v1.13.7+ traffic.
+
+## Build + smoke
+
+```
+cd /opt/boocode
+docker compose up --build -d boocode
+docker compose logs --since=30s boocode | tail -20
+```
+
+Smoke A — view recompiles on schema apply:
+```
+docker compose logs boocode | grep -i "tool_cost_stats\|applySchema"
+```
+Expected: clean schema apply, view registered idempotently.
+
+Smoke B — endpoint returns data:
+```
+curl -s http://localhost:3000/api/tools/cost_stats | jq '.stats | length, .stats[0]'
+```
+Expected: nonzero length if any v1.13.7+ tool calls exist; one stat object with all 5 fields populated.
+
+Smoke C — UI:
+1. Open browser to `boocode.indifferentketchup.com`.
+2. Open AgentPicker dropdown on any session.
+3. Each agent row shows a muted cost line below its description: `~5.2k prompt / 280 completion · 6/8 tools · last call 2h ago`.
+4. Agents with no tool history show just description (no cost line).
+5. Confirm cost line truncates with the existing text-muted-foreground / truncate pattern; doesn't break the layout at mobile widths (open Vivaldi devtools, set iPhone-13 viewport).
+
+## Files expected to touch
+
+- `apps/server/src/schema.sql` — ~35 LoC delta (view definition + filter comments)
+- `apps/server/src/routes/tools.ts` — NEW, ~40 LoC
+- `apps/server/src/index.ts` — 1 line (`registerToolsRoutes(app, sql)`)
+- `apps/server/src/services/__tests__/tool_cost_stats.test.ts` — NEW, ~95 LoC
+- `apps/web/src/api/types.ts` — ~7 LoC (interface)
+- `apps/web/src/api/client.ts` — ~3 LoC (namespace + method)
+- `apps/web/src/components/AgentPicker.tsx` — ~80 LoC delta (cost line + fetch hook + helpers)
+
+Total ~260 LoC. Matches roadmap estimate.
+
+## Workflow conventions
+
+- Backups before destructive edits (above) on the two MODIFIED files. New files don't need backups.
+- Sam reviews diffs. Never `git add` / `git commit` / `git push` / `git pull` on Sam's behalf.
+- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
+- Tests authoritative: `pnpm -C apps/server test`.
+- View definition lives in `schema.sql` (idempotent via `CREATE OR REPLACE VIEW`); no migration shim needed.
+
+## Don't repeat past mistakes
+
+- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, `BUDGET_NO_AGENT=30`): all live. This batch depends on `includeUsage:true`. If unset, `tool_cost_stats` returns empty rows.
+- v1.13.8 prefix instrumentation: untouched.
+- v1.13.9 ratio-only `usable()`: untouched.
+- v1.13.4 two-tier prune: untouched.
+- v1.13.5 truncate.ts opaque-id pattern: untouched.
+- v1.13.1-B `messages_with_parts` view: this view is the source. Don't reach past it to raw `messages`.
+- v1.13.2 will DROP `messages.tool_calls`/`tool_results` columns. The `tool_cost_stats` view reads from `messages_with_parts` not `messages`, so it survives. Verify after v1.13.2 ships.
+
+## Source files to read in project knowledge
+
+- `boocode_roadmap.md` (v1.13.10 row at line 114; schema row at line 474)
+- `boocode_code_review.md` (cost-tracking design background)
+- `CLAUDE.md` (project conventions; messages_with_parts invariant at L80; v1.13.7 includeUsage invariant)
+```
--- a/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
+++ b/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
@@ -0,0 +1,225 @@
+# Handoff: BooCode v1.13.8 — system-prompt prefix stability verify-and-measure
+
+#careful #boocode #nofluff
+
+Recon-only / instrumentation batch. **No cache implementation in this dispatch.** Goal: prove (or disprove) that the assembled system-prompt prefix is byte-stable across turns under steady-state inputs. Result determines whether v1.13.7-as-originally-specced (the prefix cache) is actually needed at all.
+
+## Where we are
+
+- Last tag: `v1.13.7` — stability bundle (`includeUsage:true` + trim guards + payload filter for trailing empty/failed assistants + `BUDGET_NO_AGENT 15→30`). This shipped as a renumber of the original "prefix cache" v1.13.7 slot. The prefix-cache work moved to v1.13.8 with the change-of-shape captured here.
+- Branch clean. `git log --oneline main -5` should show `…v1.13.7 v1.13.6 v1.13.5 v1.13.4 v1.13.3`.
+
+## What v1.13.x has shipped
+
+- v1.13.0 — `message_parts` table + dual-write.
+- v1.13.1-A — AI SDK v6 install (`streamText` adapter, mid-dispatch silent-abort patch).
+- v1.13.1-B — `messages_with_parts` view + read sites flipped.
+- v1.13.1-C — `ask_user_input` correlation ported + reasoning end-to-end.
+- v1.13.3 — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, `experimental_repairToolCall`.
+- v1.13.4 — two-tier compaction prune.
+- v1.13.5 — opencode `truncate.ts` port (`tr_<12char>` opaque ids on tmpfs).
+- v1.13.6 — compaction head-assembly audit; reasoning_parts added to `buildHeadPayload`.
+- v1.13.7 — stability bundle (the five fixes above).
+
+## What's queued
+
+- **v1.13.8 (this dispatch)** — prefix stability verify-and-measure
+- v1.13.9 — compaction overflow trigger formula (opencode 0.85 × ctx_max)
+- v1.13.10 — per-tool token cost accounting + AgentPicker UI
+- v1.13.11 — WebSocket frame typing (Zod schemas both ends)
+- v1.13.12 — skills audit pass (rules→recipes split)
+- v1.13.2 — drop legacy columns (last; ≥1 week production traffic on v1.13.1 first)
+
+## Why this is verify-first
+
+The original v1.13.7 roadmap line was "system-prompt prefix cache, keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated." Recon during planning surfaced that:
+
+- `apps/server/src/services/system-prompt.ts:buildSystemPrompt()` already runs over mtime-cached inputs:
+  - BOOCHAT.md / BOOCODER.md — cached in this file (`cachedGuidance`, line 25), keyed by mtime
+  - global + per-project AGENTS.md — cached in `services/agents.ts` (`safeStat` pattern, line 245), keyed by mtime
+  - `session.system_prompt` / `project.default_system_prompt` — DB scalars, byte-stable until edited
+  - BASE_SYSTEM_PROMPT — hardcoded template with `${projectPath}` interpolation
+- Skills are NOT in the system prompt today. Discovered via `skill_find` at runtime.
+- Tool schemas are NOT in the system message. They live in the OpenAI request body's `tools` field (already alpha-sorted by v1.13.3).
+- Output assembly is a microsecond string concat with no I/O.
+
+So in theory the prefix is already byte-stable across turns. **Nobody has measured it.** This batch closes that gap with logs + a unit test, no cache implementation. If stable across a real session → close v1.13.8 as no-op, drop the original cache plan, move to v1.13.9. If drift surfaces → next batch designs the fix against the actual failure mode.
+
+## Scope (all three items)
+
+### 1. Per-turn prefix fingerprint log
+
+In `apps/server/src/services/system-prompt.ts`, after `buildSystemPrompt` finishes assembling `out`, before returning:
+
+- Compute `sha256(out)` → hex string. Use `node:crypto`.
+- Emit a single log line at `level=info` via a module-level pino instance (mirror the pattern used elsewhere in the inference services). Shape:
+
+```ts
+{
+  msg: 'prefix-fingerprint',
+  project_id: project.id,
+  agent_id: agent?.id ?? null,
+  agent_name: agent?.name ?? null,
+  session_id: session.id,
+  prefix_hash: <sha256 hex>,
+  prefix_length: out.length,
+  mtime_boochat: <number | null>,           // from cachedGuidance.mtime, or null when guidance is null
+  has_agent_system_prompt: <boolean>,
+  has_session_override: session.system_prompt.trim().length > 0,
+  has_project_override: project.default_system_prompt.trim().length > 0,
+}
+```
+
+The mtime fields surface which inputs changed when drift is observed. The hash itself is what proves equality.
+
+`buildSystemPrompt` already reaches into `cachedGuidance` indirectly via `getContainerGuidance()` — expose `cachedGuidance?.mtime` for the log via a thin getter (`getCachedGuidanceMtime(): number | null`) so the log line carries it without re-statting.
+
+For the AGENTS.md mtimes (global + per-project), `services/agents.ts` exposes them via the `cache` Map but no public accessor. Either (a) add a `getAgentsMtimes(projectPath: string): { global: number | null; project: number | null }` exported function to agents.ts, or (b) skip those fields in v1.13.8 and only log the BOOCHAT mtime. **Default: do (a).** If recon shows that's invasive, fall back to (b) and note the limitation in the smoke report.
+
+### 2. Per-session drift observer
+
+Module-level `Map<sessionId, lastHash>` in `system-prompt.ts`. On each `buildSystemPrompt` call:
+
+- If `sessionId` is not in the map → set it, emit no extra log.
+- If `sessionId` IS in the map and the hash matches → emit no extra log.
+- If `sessionId` IS in the map and the hash DIFFERS → emit a second `level=warn` log:
+
+```ts
+{
+  msg: 'prefix-drift',
+  session_id: session.id,
+  prev_hash: <previous>,
+  new_hash: <current>,
+  prev_length: <number>,
+  new_length: <number>,
+  changed_inputs: <array of field names where mtime/flags changed since last call>,
+}
+```
+
+`changed_inputs` is a small array like `['mtime_boochat']` or `['has_session_override']` — the field-level diff so we can see exactly what input drifted.
+
+The map grows unboundedly across long-lived processes. Acceptable for v1.13.8 (instrumentation only, 5 min sessions in test). Add a TODO comment: "v1.13.x follow-up if it survives: LRU-bound this map at 1000 sessions." Don't implement the LRU now.
+
+Add a `_resetPrefixObserverForTests()` export mirroring the existing `_resetContainerGuidanceCacheForTests()`.
+
+### 3. Unit test for byte-stability
+
+In `apps/server/src/services/__tests__/system-prompt.test.ts`, add a `describe('buildSystemPrompt stability', () => { ... })` block:
+
+```ts
+it('returns byte-identical output across two consecutive calls with the same inputs', async () => {
+  // set BOOCHAT.md, build (project, session, agent), capture hash
+  const first = await buildSystemPrompt(project, session, agent);
+  const second = await buildSystemPrompt(project, session, agent);
+  expect(first).toBe(second);
+});
+
+it('emits a single prefix-fingerprint log per call', async () => {
+  // capture logs via pino test transport or stub
+  // assert one prefix-fingerprint per buildSystemPrompt call
+});
+
+it('emits a prefix-drift log when the same session sees a different hash', async () => {
+  // build once; mutate BOOCHAT.md or pass a different agent; build again with same sessionId
+  // assert one prefix-drift log with prev_hash and new_hash populated
+});
+```
+
+The first test is the load-bearing one — it locks in the byte-stability invariant going forward, regardless of what the production smoke surfaces.
+
+## What NOT to do in this dispatch
+
+- **Don't add a cache.** Output memoization is v1.13.9+ work IF the smoke proves it's needed. Implementing a cache before measurement is what the v1.13.6 audit was designed to catch — premature optimization disguised as correctness.
+- **Don't change `buildSystemPrompt`'s return signature or async behavior.** The output stays a single string. Signature stays `(project, session, agent) => Promise<string>`.
+- **Don't thread chat_id or anything else into the call.** `session.id` is sufficient as the observer key.
+- **Don't log the full prefix text.** Hash + length only. The prefix can be many KB; logging it 5× per session blows up log size for no benefit. If drift appears and the hash diff is mysterious, `LOG_LEVEL=debug` can be wired in a follow-up.
+- **Don't touch `messages_with_parts` or the CASE-WHEN-EXISTS fallback v1.13.4 added.** This batch is in `system-prompt.ts` only.
+- **Don't preserve the AI SDK v6 silent-abort guard differently.** It's in `stream-phase.ts` and untouched.
+
+## Recon (already done — paste these for the implementer's reference)
+
+```
+cd /opt/boocode
+wc -l apps/server/src/services/system-prompt.ts
+# → 83 lines
+
+grep -n "^export|^function|^async function|cache|mtime" apps/server/src/services/system-prompt.ts
+# → cachedGuidance at line 25; loadContainerGuidance / getContainerGuidance / _resetContainerGuidanceCacheForTests / buildSystemPrompt are the public surface
+
+grep -rn "buildSystemPrompt" apps/server/src --include="*.ts" | grep -v "tests"
+# → single caller: apps/server/src/services/inference/payload.ts:41
+# → also referenced in routes/sessions.ts (session-create flow may call it for preview; verify during implementation)
+
+grep -n "safeStat\|cache\|mtime" apps/server/src/services/agents.ts
+# → mtime-keyed cache (Map) at line 245, TTL 60_000ms, key = projectPath || '__none__'
+# → safeStat pattern at line 255
+```
+
+## Verification protocol (smoke)
+
+After deploy:
+
+1. Fresh BooChat session, default agent (no agent selected).
+2. Send 5 short messages, wait for each turn to complete.
+3. `docker compose logs --since=10m boocode | grep -E 'prefix-fingerprint|prefix-drift'`
+
+**Success criteria:**
+- 5 `prefix-fingerprint` lines (one per turn — assuming each turn calls `buildSystemPrompt` once via `buildMessagesPayload`).
+- All 5 lines have identical `prefix_hash` and `prefix_length`.
+- Zero `prefix-drift` lines.
+
+**Failure modes to characterize:**
+- Drift WITH a corresponding mtime change in `changed_inputs` → expected if BOOCHAT.md or AGENTS.md was edited mid-session. Note in smoke report; not a bug.
+- Drift WITHOUT any mtime/flag change in `changed_inputs` → assembly nondeterminism somewhere. **This is the bug case.** Report the exact `prev_hash`/`new_hash` pair and full `prefix-fingerprint` log lines from before and after the drift.
+- Multiple `prefix-fingerprint` lines per turn → `buildSystemPrompt` is being called more than once per turn (possibly from compaction or sentinel-summary paths). Note in smoke report; not necessarily a bug but worth understanding.
+- ANY successful turn that emits zero `prefix-fingerprint` lines → log statement isn't reached. Implementation bug.
+
+Repeat the smoke in a second session (different agent if available) to also confirm cross-session prefix differs only where expected (different `project.id`, different `agent_id`).
+
+## Files expected to touch
+
+- `apps/server/src/services/system-prompt.ts` — add hash + log + observer + getter (~50 LoC)
+- `apps/server/src/services/agents.ts` — add `getAgentsMtimes()` accessor (~15 LoC if going with default option)
+- `apps/server/src/services/__tests__/system-prompt.test.ts` — 3 new tests (~30 LoC)
+- `apps/server/package.json` — none expected (pino + node:crypto already available)
+
+Total ~95 LoC.
+
+## Workflow conventions (boocode)
+
+- Backup before destructive: `cp file file.bak-$(date +%Y%m%d-%H%M%S)`. (Files get gitignored via global `*.bak*`.)
+- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
+- Tests: `pnpm -C apps/server test`. Smoke after deploy.
+- Type-check: `npx tsc -p apps/web/tsconfig.app.json --noEmit` is authoritative for web; `pnpm -C apps/server build` is authoritative for server.
+- Sam reviews diffs. Never `git add`/`commit`/`push`/`pull` on Sam's behalf.
+- Tag after commit: `git tag v1.13.8` (lightweight), then push via the Gitea deploy key:
+  `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin v1.13.8`
+
+## Repo layout pointers
+
+- `apps/server/src/services/system-prompt.ts` — primary target (83 lines)
+- `apps/server/src/services/agents.ts` — for the mtimes accessor
+- `apps/server/src/services/inference/payload.ts:41` — call site
+- `apps/server/src/services/__tests__/system-prompt.test.ts` — extend tests here
+- `apps/server/vitest.config.ts` — test glob is `src/**/__tests__/**/*.test.ts`
+
+## Open questions for Sam during recon
+
+1. **`getAgentsMtimes()` accessor in agents.ts vs BOOCHAT-only log.** Default: add the accessor. If implementation surface is bigger than expected (e.g. the agents.ts cache structure makes it awkward), fall back to BOOCHAT-only and note the gap.
+2. **What counts as a "turn" for the observer's `Map<sessionId, lastHash>`?** Default: every `buildSystemPrompt` call. If recon shows that compaction / sentinel-summary paths also call `buildSystemPrompt` and would generate noise, gate the observer to inference-turn calls only. Cleanest signal vs. cleanest implementation.
+3. **Log severity for `prefix-drift`.** Default: `warn`. If Sam expects routine BOOCHAT.md edits to fire it, downgrade to `info`. The smoke will surface this — adjust during smoke if needed.
+
+## Don't repeat past mistakes
+
+- AI SDK v6 silent-abort guard in `stream-phase.ts`: untouched.
+- v1.13.4 view fix (COALESCE → CASE-WHEN-EXISTS): untouched. This batch is in `system-prompt.ts` only.
+- v1.13.5 truncate.ts: untouched.
+- v1.13.6 reasoning embed in compaction: untouched.
+- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, budget bump): all live. Don't undo.
+
+## Source files to read in project knowledge
+
+- `boocode_roadmap.md` (last updated 2026-05-22; v1.13.x cleanup line order locked)
+- `boocode_code_review.md` (no lift source for v1.13.8 — in-house instrumentation)
+- `CLAUDE.md` (project conventions, NodeNext imports, vitest include glob, etc.)
+- This handoff (`handoff_v1.13.8_prefix_verify.md`)
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -48,12 +48,18 @@ importers:

  apps/server:
    dependencies:
+      '@ai-sdk/openai-compatible':
+        specifier: ^2.0.47
+        version: 2.0.47(zod@3.25.76)
      '@fastify/static':
        specifier: ^7.0.4
        version: 7.0.4
      '@fastify/websocket':
        specifier: ^10.0.1
        version: 10.0.1
+      ai:
+        specifier: ^6.0.190
+        version: 6.0.190(zod@3.25.76)
      fastify:
        specifier: ^4.28.1
        version: 4.29.1
@@ -151,6 +157,9 @@ importers:
      tw-animate-css:
        specifier: ^1.4.0
        version: 1.4.0
+      zod:
+        specifier: ^3.23.8
+        version: 3.25.76
    devDependencies:
      '@tailwindcss/postcss':
        specifier: ^4.3.0
@@ -179,6 +188,28 @@ importers:

 packages:

+  '@ai-sdk/gateway@3.0.119':
+    resolution: {integrity: sha512-VAhfRWC+JexZakkVfmjaJKaTj00x7/UHdE8kMWL3NhuQAlf8oXtg9r4dfvFZrByXxchGRBvYE3biEUyibkg0xg==}
+    engines: {node: '>=18'}
+    peerDependencies:
+      zod: ^3.25.76 || ^4.1.8
+
+  '@ai-sdk/openai-compatible@2.0.47':
+    resolution: {integrity: sha512-Enm5UlL0zUCrW3792opk5h7hRWxZOZzDe6eQYVFqX9LUOGGCe1h8MZWAGim765nwzgnjlpeYOsuzZmLtRsTPlg==}
+    engines: {node: '>=18'}
+    peerDependencies:
+      zod: ^3.25.76 || ^4.1.8
+
+  '@ai-sdk/provider-utils@4.0.27':
+    resolution: {integrity: sha512-ubkAJ+xODouwtmN1tYlvTPphH1hPOBfZaEQe8U7skGvFAnIRs9PPpsq57bC2+Ky/MB4yzhd6YOsxTAx9sGpazw==}
+    engines: {node: '>=18'}
+    peerDependencies:
+      zod: ^3.25.76 || ^4.1.8
+
+  '@ai-sdk/provider@3.0.10':
+    resolution: {integrity: sha512-Q3BZ27qfpYqnCYGvE3vt+Qi6LGOF9R5Nmzn+9JoM1lCRsD9mYaIhfJLkSunN48nfGXJ6n+XNV0J/XVpqGQl7Dw==}
+    engines: {node: '>=18'}
+
  '@alloc/quick-lru@5.2.0':
    resolution: {integrity: sha512-UrcABB+4bUrFABwbluTIBErXwvbsU/V7TZWfmbgJfbkwiBuziS9gxdODUyuiecfdGQ85jglMW6juS3+z5TsKLw==}
    engines: {node: '>=10'}
@@ -789,6 +820,10 @@ packages:
  '@open-draft/until@2.1.0':
    resolution: {integrity: sha512-U69T3ItWHvLwGg5eJ0n3I62nWuE6ilHlmz7zM0npLBRvPRd7e6NYmg54vvRtP5mZG7kZqZCFVdsTWo7BPtBujg==}

+  '@opentelemetry/api@1.9.1':
+    resolution: {integrity: sha512-gLyJlPHPZYdAk1JENA9LeHejZe1Ti77/pTeFm/nMXmQH/HFZlcS/O2XJB+L8fkbrNSqhdtlvjBVjxwUYanNH5Q==}
+    engines: {node: '>=8.0.0'}
+
  '@pinojs/redact@0.4.0':
    resolution: {integrity: sha512-k2ENnmBugE/rzQfEcdWHcCY+/FM3VLzH9cYEsbdsoqrvzAKRhUZeRNhAZvB8OitQJ1TBed3yqWtdjzS6wJKBwg==}

@@ -1646,6 +1681,9 @@ packages:
    resolution: {integrity: sha512-tlqY9xq5ukxTUZBmoOp+m61cqwQD5pHJtFY3Mn8CA8ps6yghLH/Hw8UPdqg4OLmFW3IFlcXnQNmo/dh8HzXYIQ==}
    engines: {node: '>=18'}

+  '@standard-schema/spec@1.1.0':
+    resolution: {integrity: sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w==}
+
  '@tailwindcss/node@4.3.0':
    resolution: {integrity: sha512-aFb4gUhFOgdh9AXo4IzBEOzBkkAxm9VigwDJnMIYv3lcfXCJVesNfbEaBl4BNgVRyid92AmdviqwBUBRKSeY3g==}

@@ -1811,6 +1849,10 @@ packages:
  '@ungap/structured-clone@1.3.1':
    resolution: {integrity: sha512-mUFwbeTqrVgDQxFveS+df2yfap6iuP20NAKAsBt5jDEoOTDew+zwLAOilHCeQJOVSvmgCX4ogqIrA0mnyr08yQ==}

+  '@vercel/oidc@3.2.0':
+    resolution: {integrity: sha512-UycprH3T6n3jH0k44NHMa7pnFHGu/N05MjojYr+Mc6I7obkoLIJujSWwin1pCvdy/eOxrI/l3uDLQsmcrOb4ug==}
+    engines: {node: '>= 20'}
+
  '@vitejs/plugin-react@4.7.0':
    resolution: {integrity: sha512-gUu9hwfWvvEDBBmgtAowQCojwZmJ5mcLn3aufeCsitijs3+f2NsrPtlAWIR6OPiqljl96GVCUbLe0HyqIpVaoA==}
    engines: {node: ^14.18.0 || >=16.0.0}
@@ -1878,6 +1920,12 @@ packages:
    resolution: {integrity: sha512-MnA+YT8fwfJPgBx3m60MNqakm30XOkyIoH1y6huTQvC0PwZG7ki8NacLBcrPbNoo8vEZy7Jpuk7+jMO+CUovTQ==}
    engines: {node: '>= 14'}

+  ai@6.0.190:
+    resolution: {integrity: sha512-T+ixHbWZ6jmHRREpVVJTkFyWJeCekCdzLPan7lp1F32jG5OUw4+odlVYjtMRXVzogU+pWzpMmXdRiHUmdL/q0w==}
+    engines: {node: '>=18'}
+    peerDependencies:
+      zod: ^3.25.76 || ^4.1.8
+
  ajv-formats@2.1.1:
    resolution: {integrity: sha512-Wx0Kx52hxE7C18hkMEggYlEifqWZtYaRgouJor+WMdPnQyEK13vgEWyVNup7SoeeoLMsr4kf5h6dOW11I15MUA==}
    peerDependencies:
@@ -2694,6 +2742,9 @@ packages:
  json-schema-typed@8.0.2:
    resolution: {integrity: sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA==}

+  json-schema@0.4.0:
+    resolution: {integrity: sha512-es94M3nTIfsEPisRafak+HDLfHXnKBhV3vU5eqPcS3flIWqcxJWgXHXiey3YrpaNsanY5ei1VoYEbOzijuq9BA==}
+
  json5@2.2.3:
    resolution: {integrity: sha512-XmOWe7eyHYH14cLdVPoyg+GOH3rYX++KpzrylJwSW98t3Nk+U8XOl8FWKOgwtzdb8lXGf6zYwDUzeHMWfxasyg==}
    engines: {node: '>=6'}
@@ -3966,6 +4017,30 @@ packages:

 snapshots:

+  '@ai-sdk/gateway@3.0.119(zod@3.25.76)':
+    dependencies:
+      '@ai-sdk/provider': 3.0.10
+      '@ai-sdk/provider-utils': 4.0.27(zod@3.25.76)
+      '@vercel/oidc': 3.2.0
+      zod: 3.25.76
+
+  '@ai-sdk/openai-compatible@2.0.47(zod@3.25.76)':
+    dependencies:
+      '@ai-sdk/provider': 3.0.10
+      '@ai-sdk/provider-utils': 4.0.27(zod@3.25.76)
+      zod: 3.25.76
+
+  '@ai-sdk/provider-utils@4.0.27(zod@3.25.76)':
+    dependencies:
+      '@ai-sdk/provider': 3.0.10
+      '@standard-schema/spec': 1.1.0
+      eventsource-parser: 3.0.8
+      zod: 3.25.76
+
+  '@ai-sdk/provider@3.0.10':
+    dependencies:
+      json-schema: 0.4.0
+
  '@alloc/quick-lru@5.2.0': {}

  '@babel/code-frame@7.29.0':
@@ -4516,6 +4591,8 @@ snapshots:

  '@open-draft/until@2.1.0': {}

+  '@opentelemetry/api@1.9.1': {}
+
  '@pinojs/redact@0.4.0': {}

  '@pkgjs/parseargs@0.11.0':
@@ -5386,6 +5463,8 @@ snapshots:

  '@sindresorhus/merge-streams@4.0.0': {}

+  '@standard-schema/spec@1.1.0': {}
+
  '@tailwindcss/node@4.3.0':
    dependencies:
      '@jridgewell/remapping': 2.3.5
@@ -5548,6 +5627,8 @@ snapshots:

  '@ungap/structured-clone@1.3.1': {}

+  '@vercel/oidc@3.2.0': {}
+
  '@vitejs/plugin-react@4.7.0(vite@5.4.21(@types/node@20.19.41)(lightningcss@1.32.0))':
    dependencies:
      '@babel/core': 7.29.0
@@ -5628,6 +5709,14 @@ snapshots:

  agent-base@7.1.4: {}

+  ai@6.0.190(zod@3.25.76):
+    dependencies:
+      '@ai-sdk/gateway': 3.0.119(zod@3.25.76)
+      '@ai-sdk/provider': 3.0.10
+      '@ai-sdk/provider-utils': 4.0.27(zod@3.25.76)
+      '@opentelemetry/api': 1.9.1
+      zod: 3.25.76
+
  ajv-formats@2.1.1(ajv@8.20.0):
    optionalDependencies:
      ajv: 8.20.0
@@ -6453,6 +6542,8 @@ snapshots:

  json-schema-typed@8.0.2: {}

+  json-schema@0.4.0: {}
+
  json5@2.2.3: {}

  jsonfile@6.2.1: