v1.13.15-openspec: reformat batch docs to OpenSpec directory structure

Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal, specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep documentation reformat; replaces ad-hoc boocode_batchN.md / handoff_vN.N.N.md convention. Existing batch docs moved into openspec/changes/archived/ via git mv (preserves history): - boocode_batch10.md - handoff_v1.13.8_prefix_verify.md - handoff_v1.13.10_per_tool_cost.md Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The work was already shipped; the originals are preserved as archived snapshots. New v1.13.15+ batches land directly in openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when applicable) per the convention documented in openspec/README.md. CLAUDE.md gained a one-line pointer to the convention (workflow section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both remain well under the AgentLint hard caps. specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+). No CLI dep added in this batch — directory structure only. If/when the full OpenSpec lifecycle is adopted, that lands as a separate batch.
2026-05-22 14:54:17 +00:00
parent fc11e8dc91
commit 5a3f357ce9
5 changed files with 39 additions and 0 deletions
--- a/openspec/README.md
+++ b/openspec/README.md
@@ -0,0 +1,38 @@
+# openspec
+
+Per-batch documentation convention adopted v1.13.15-openspec.
+
+Lift source: Fission-AI/OpenSpec directory layout. **No CLI dependency** — just
+the folder shape. Full OpenSpec lifecycle adoption is a future v1.14+ batch.
+
+## Layout
+
+```
+openspec/
+  changes/
+    <slug>/                          # one folder per shipped or planned batch
+      proposal.md                    # Why + scope summary
+      tasks.md                       # implementation step list
+      design.md                      # architecture / data-model decisions (optional)
+      specs/                         # reserved for future OpenSpec CLI adoption
+    archived/                        # snapshots of pre-v1.13.15 batch docs
+      <original-filename>.md
+  specs/                             # global specs, future v1.14+ use
+```
+
+## Conventions
+
+- Slugs are lowercase-hyphenated derived from the batch title
+  (e.g. `v1-13-10-per-tool-cost`, `file-attachments-v3-5`).
+- Already-shipped pre-v1.13.15 batches live in `changes/archived/` as
+  single-file snapshots. They were not split into proposal/tasks because
+  the work was already complete; archiving preserves git history.
+- New v1.13.15+ batches should land directly in
+  `changes/<slug>/proposal.md` (+ tasks.md, + design.md when applicable).
+- `proposal.md` carries the "Why" and scope. `tasks.md` is the action list
+  (numbered or checkbox). `design.md` is for non-trivial architectural
+  decisions worth recording separately.
+- A canonical dispatch brief (matching the v1.13.9 / v1.13.10 format)
+  is most naturally split as proposal.md (Where we are, Why this matters,
+  rationale sections) + tasks.md (Scope items, Build + smoke) + design.md
+  (Attribution model, Filtering, Canonical mapping).
--- a/openspec/changes/archived/boocode_batch10.md
+++ b/openspec/changes/archived/boocode_batch10.md
@@ -0,0 +1,269 @@
+# BooCode v1.1 — Batch 10
+
+**Theme:** BooTerm. Second container, dedicated to in-browser terminals. Per-session tmux. xterm.js + node-pty in-container. New pane type wires into the BooCode shell.
+**Status:** Planned. Largest batch in v1.1. Depends on Batch 3 (pane system), Batch 7 (settings drawer pattern reused).
+**Repo:** `/opt/boocode/` (shared monorepo). New `apps/booterm/` subdirectory.
+
+## Goals
+
+1. New container `booterm` running Fastify + node-pty + tmux. Per-session tmux session keyed by `(user, session_id)`.
+2. xterm.js terminal pane in the BooCode shell. Multiple terminal panes per session, each attached to a separate tmux window.
+3. PTY traffic over WebSocket. Auth via `Remote-User`.
+4. tmux as session manager so terminals survive WebSocket reconnects, page refreshes, even container restarts.
+5. Read+write capability scoped to project root. No `cd ..` escape.
+6. Path-based routing: `code.indifferentketchup.com/api/term/*` → booterm; `/ws/term/*` → booterm.
+
+## Architecture
+
+```
+browser ──HTTPS──> Caddy (droplet) ──Tailscale──> Authelia
+                                                      │
+                                                      ├── /api/chat/*, /ws/chat/*  → boocode  :9500
+                                                      ├── /api/term/*, /ws/term/*  → booterm  :9501
+                                                      └── /                        → boocode (SPA)
+
+booterm container:
+  - Fastify (Node 20)
+  - node-pty
+  - tmux installed in container (apk add tmux)
+  - same Postgres (boocode_db)
+  - mounts projects rw (scoped)
+```
+
+### Mount strategy
+
+Decided: Option A. Per-project bind mounts in `docker-compose.yml`. Already applied: booterm has `/opt:/opt:rw` to keep parity with the existing boocode mount and avoid enumerating roots. Project root for any given session derives from `projects.root_path` and tmux launches with `cwd` set there.
+
+### tmux session naming
+
+Per-session tmux:
+
+```
+tmux session name: bc-<session_id>     (UUID, sanitized — alphanumeric + hyphen)
+tmux windows:      term-<pane_id>      (one window per terminal pane)
+```
+
+booterm spawns `tmux new-session -d -s bc-<sid> -c <project_root>` lazily on first attach. Subsequent attaches do `tmux new-window -t bc-<sid>` for additional panes, or `tmux attach -t bc-<sid>` and select window.
+
+## Data model
+
+| Column | On | Type | Default | Notes |
+|---|---|---|---|---|
+| (none) | — | — | — | terminals are tmux-managed, no DB rows |
+| `kind = 'terminal'` | `session_panes.kind` CHECK | — | — | Extend CHECK to include `'terminal'` |
+| `state.tmux_window` | `session_panes.state` JSONB | TEXT | NULL | Which tmux window this pane attaches to |
+
+Schema (already applied to live DB + schema.sql):
+
+```sql
+ALTER TABLE session_panes DROP CONSTRAINT IF EXISTS session_panes_kind_check;
+ALTER TABLE session_panes ADD CONSTRAINT session_panes_kind_check
+  CHECK (kind IN ('chat', 'file_browser', 'terminal'));
+```
+
+## Backend (booterm)
+
+New app at `apps/booterm/`:
+
+```
+apps/booterm/
+├── src/
+│   ├── index.ts        # Fastify + WS + auth
+│   ├── auth.ts         # Remote-User middleware (same pattern as boocode)
+│   ├── db.ts           # pg pool (shared boocode_db)
+│   ├── routes/
+│   │   ├── health.ts
+│   │   └── terminals.ts  # POST /api/term/sessions/:sid/panes/:pid/start (creates tmux window)
+│   ├── pty/
+│   │   ├── manager.ts    # tmux process management
+│   │   └── pty.ts        # node-pty wrapper for `tmux attach -t ... -d`
+│   └── ws/
+│       └── attach.ts     # WS /ws/term/sessions/:sid/panes/:pid → PTY bidi pipe
+├── package.json
+└── tsconfig.json
+```
+
+### Endpoints
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/api/term/health` | Ping |
+| POST | `/api/term/sessions/:sid/panes/:pid/start` | Idempotent tmux window create. Returns `{tmux_window: "term-<pid>"}` |
+| WS | `/ws/term/sessions/:sid/panes/:pid` | Attach PTY |
+| POST | `/api/term/sessions/:sid/panes/:pid/resize` | `{cols, rows}` |
+| POST | `/api/term/sessions/:sid/panes/:pid/kill` | Kill the tmux window |
+
+WS frames (binary or text):
+
+```
+client → server: pty input (raw bytes, typed by user)
+server → client: pty output (raw bytes from shell)
+server → client: {type: "exit", code} on window close
+```
+
+### Auth + scoping
+
+- `Remote-User` required on WS upgrade.
+- `session_id` validated: lookup in `sessions` table; require row exists.
+- `pane_id` validated: must exist in `session_panes` with `kind = 'terminal'` and matching `session_id`.
+- Project root derived from `sessions.project_id → projects.root_path`. tmux starts `cd <root>` in that dir. **No chroot.** User can `cd /` and read anything mounted into the container.
+  - Future hardening: namespace/chroot. Out of v1.1 scope.
+
+### tmux config
+
+`apps/booterm/tmux.conf` bundled into image at `/etc/booterm/tmux.conf`; tmux invocations use `-f /etc/booterm/tmux.conf`:
+
+```
+set -g default-terminal "screen-256color"
+set -g history-limit 50000
+set -g mouse on
+setw -g mode-keys vi
+set -g status off
+set -g destroy-unattached off
+```
+
+Boolab pattern (from `services/tmux_session.py`).
+
+## Frontend
+
+| File | Change |
+|---|---|
+| `apps/web/src/components/panes/TerminalPane.tsx` (NEW) | xterm.js mount, WS attach, resize handler |
+| `apps/web/src/api/client.ts` | `api.terminals.start(sessionId, paneId)`, `api.terminals.resize(...)`, `api.terminals.kill(...)` |
+| `apps/web/src/components/Workspace.tsx` | Add 'terminal' to the pane kind enum; spawn button → POST start → render TerminalPane. Tab UI lives in Workspace.tsx — there is no PaneTab.tsx file. |
+| `apps/web/package.json` | `xterm` + `xterm-addon-fit` + `xterm-addon-web-links` |
+
+### TerminalPane
+
+```tsx
+useEffect(() => {
+  const term = new Terminal({ fontFamily: 'JetBrains Mono', fontSize: 14, theme: ... });
+  const fit = new FitAddon();
+  term.loadAddon(fit);
+  term.loadAddon(new WebLinksAddon());
+  term.open(containerRef.current);
+  fit.fit();
+
+  const proto = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
+  const ws = new WebSocket(`${proto}//${window.location.host}/ws/term/sessions/${sid}/panes/${pid}`);
+  ws.binaryType = 'arraybuffer';
+  ws.onmessage = e => term.write(typeof e.data === 'string' ? e.data : new Uint8Array(e.data));
+  term.onData(data => ws.send(data));
+  term.onResize(({ cols, rows }) => api.terminals.resize(sid, pid, cols, rows));
+
+  const ro = new ResizeObserver(() => fit.fit());
+  ro.observe(containerRef.current);
+
+  return () => { ws.close(); term.dispose(); ro.disconnect(); };
+}, [sid, pid]);
+```
+
+Dev: vite.config.ts needs `/api/term` and `/ws/term` proxy entries mirroring the existing `/api` and `/ws` ones.
+
+## Send-to-terminal from chat
+
+Boolab pattern: select text in a message → "Send to terminal" button → text becomes terminal input.
+
+- Right-click context menu on selected text in chat → "Send to terminal" submenu lists open terminal panes.
+- Click target → sends `<text>\n` to that pane's WS.
+
+Implementation:
+
+| File | Change |
+|---|---|
+| `apps/web/src/components/MessageBubble.tsx` | Selection handler + context menu |
+| `apps/web/src/lib/events.ts` | New event `send_to_terminal` with payload `{pane_id, text}` |
+| `apps/web/src/components/panes/TerminalPane.tsx` | Subscribe to event for its `pane_id`, write to WS |
+
+## Docker compose (already applied)
+
+booterm service is already in `docker-compose.yml` with:
+- build context `.`, dockerfile `apps/booterm/Dockerfile`
+- port `100.114.205.53:9501:3000`
+- `/opt:/opt:rw` mount
+- `DATABASE_URL` env pointing at `boocode_db`
+- `boocode_net` network
+- depends_on: `boocode_db`
+
+Do not re-edit compose.
+
+## Backend dependencies
+
+`apps/booterm/package.json`:
+- `fastify`
+- `@fastify/websocket`
+- `pg`
+- `zod`
+- `node-pty`
+- `tslib`
+
+`node-pty` requires native build. Dockerfile installs `python3 make g++` in build stage and `tmux` in runtime stage:
+
+```dockerfile
+FROM node:20-alpine AS build
+RUN apk add --no-cache python3 make g++ tmux
+WORKDIR /app
+COPY ...
+RUN pnpm install --frozen-lockfile && pnpm build
+
+FROM node:20-alpine
+RUN apk add --no-cache tmux
+WORKDIR /app
+COPY --from=build /app/apps/booterm/dist ./dist
+COPY --from=build /app/node_modules ./node_modules
+EXPOSE 3000
+CMD ["node", "dist/index.js"]
+```
+
+## Files to touch
+
+**New app:**
+
+- `apps/booterm/` (entire subtree)
+
+**Existing changes:**
+
+- `apps/web/package.json`
+- `apps/web/src/api/client.ts`
+- `apps/web/src/api/types.ts`
+- `apps/web/src/components/Workspace.tsx`
+- `apps/web/src/components/MessageBubble.tsx`
+- `apps/web/src/components/panes/TerminalPane.tsx` (NEW)
+- `apps/web/src/lib/events.ts`
+- `apps/web/vite.config.ts` (proxy entries)
+
+**Already done by user — do not touch:**
+
+- `docker-compose.yml` (booterm service added)
+- `apps/server/src/schema.sql` (terminal CHECK constraint)
+- Live DB constraint applied
+
+## Verification
+
+1. `docker compose up -d --build booterm` → container healthy.
+2. `curl -s http://100.114.205.53:9501/api/term/health -H 'Remote-User: sam'` → 200.
+3. Browser smoke test:
+   - Open a session. Workspace → "+ Terminal" → terminal pane appears with shell prompt in project root.
+   - Type `ls -la` → output.
+   - Type `vim test.txt`, write something, save, `:q` → file exists on host (since rw mount).
+   - Refresh browser → terminal reconnects, history intact (tmux persistence).
+   - Open second terminal pane → same project, separate tmux window. Both work independently.
+   - Select code in chat → right-click → "Send to terminal" → terminal pane receives the text.
+   - Container restart (`docker compose restart booterm`) → on reconnect, tmux session resumes from where it left off.
+   - Close pane via tab context menu → tmux window killed. Reopen pane → fresh shell.
+
+## Constraints
+
+- node-pty is a native dep. Image size grows.
+- tmux history capped at 50k lines per window.
+- WebSocket frames are bidirectional binary; `binaryType = 'arraybuffer'`.
+- Resize debounced 100ms client-side; backend `tmux resize-window` per resize.
+- No chroot/namespace isolation in v1.1. User has full read+write under `/opt/`. Acceptable for single-user homelab.
+- Don't expose 9501 on 0.0.0.0. Tailscale binding only (already configured in compose).
+
+## Open
+
+- Color theme matching for xterm.js. Defer.
+- File-drop into terminal (upload via terminal pane). Out of scope.
+- Multi-user (each user gets own tmux server) — defer until BooCode goes multi-user, which isn't planned.
+- BooCoder container — same skeleton as booterm but with edit_file / create_file tools instead of PTY. Will follow this pattern when built.
--- a/openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
+++ b/openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
@@ -0,0 +1,441 @@
+```
+#careful #boocode #nofluff
+
+v1.13.10 — per-tool token cost accounting (rolling 100-call window)
+
+Goal: surface per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of `messages_with_parts` (no new table, no new write site) + a read endpoint + AgentPicker tooltip extension. Estimated ~240 LoC, mostly UI.
+
+## Where we are
+
+- Last tag: v1.13.9 (compaction overflow trigger — `floor(0.85 × ctx_max)` early-trigger). Branch clean.
+- v1.13.x cleanup line ✅ through v1.13.9. Queued: v1.13.10 (this) → v1.13.11 (WS Zod) → v1.13.12 (skills audit) → v1.13.2 (column drop, last).
+- Dependency (satisfied since v1.13.7 commit `ff29b48`): `includeUsage: true` on `createOpenAICompatible` in `apps/server/src/services/inference/provider.ts`. Without it, `messages.tokens_used`/`ctx_used` were NULL for v1.13.1-A → v1.13.7 (latent regression). Now populated.
+
+## Why this matters
+
+Today: AgentPicker lists agents by name + description. No cost signal. Users pick the architect agent (full tool whitelist, 21k of tool schema) for one-liner questions a refactorer (3 tools, 4k schema) could answer.
+
+Tomorrow: each agent listing shows its mean prompt + completion cost per tool, derived from the last 100 invocations across all chats. Decision aid, not a hard gate.
+
+Why a SQL view instead of a denormalized stats table:
+- All the source data already lands in `messages` (tool_calls JSON + tokens_used + ctx_used) and `message_parts` (read via the `messages_with_parts` view). Zero new write sites.
+- Rolling 100-call window is a `ROW_NUMBER() OVER (PARTITION BY tool_name ORDER BY created_at DESC) <= 100` — natural fit for a view.
+- View is rollback-safe. If the math is wrong, `DROP VIEW` and re-deploy; no orphan rows, no backfill.
+- At BooCode scale (single user, ~30 tools, ~100 calls/tool), aggregate-on-read is microseconds. Premature to denormalize.
+
+The roadmap schema row (`tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)`) matches both a table and a view. View is the lighter implementation.
+
+## Canonical column mapping (pinned)
+
+The `messages` columns are named non-obviously. Pinned mapping, confirmed across 5 write sites + 1 read site:
+
+| Column          | Semantic meaning   | AI SDK v6 source name |
+|-----------------|--------------------|-----------------------|
+| `ctx_used`      | prompt / input tokens   | `usage.inputTokens`   |
+| `tokens_used`   | completion / output tokens | `usage.outputTokens`  |
+
+Write sites confirmed: `tool-phase.ts:94-95`, `error-handler.ts:109-110`, `sentinel-summaries.ts:130-131`, `sentinel-summaries.ts:387-388`, `stream-phase.ts:319-320`. Canonical read at `payload.ts:190-191` reverses: `const promptTokens = updated.ctx_used; const completionTokens = updated.tokens_used`.
+
+`tokens_used` reads like "total" but is completion only. Project convention since the columns predate v1.13.x. Do not "fix" the naming inside this batch — out of scope; downstream consumers depend on the current mapping.
+
+## Attribution model
+
+A single assistant turn can emit N tool calls in parallel. llama-swap returns ONE (prompt_tokens, completion_tokens) per turn, not per tool. Attribution requires a split.
+
+**Chosen approach: equal split.** For an assistant turn that emits N tool calls with prompt P and completion C, each tool is attributed P/N prompt + C/N completion. The 100-call rolling mean smooths split noise. Implementation: `tokens_used::float / jsonb_array_length(tool_calls)` at the unnest site.
+
+**Alternatives rejected:**
+- "Full turn cost to every tool" (no division). Over-states; a 5-tool turn would 5×-count every tool's cost.
+- "Result-size only" (`length(JSON.stringify(output)) / 4`). Loses the LLM's actual usage signal; doesn't capture how expensive a tool's output is to the next prompt.
+- "Consuming-turn delta" (next turn prompt_tokens − this turn prompt_tokens, attribute to the tool that emitted the result). Most accurate but requires bubble-back math through the `executeToolPhase → runAssistantTurn` recursion. Over-engineered for the rolling-average use case.
+
+**If Sam wants a different split, change one line in the view definition (the divisor).**
+
+## Filtering — sentinel, failure, repair-call semantics
+
+The view excludes rows that aren't real tool-cost signal:
+
+- **Failed and cancelled turns** (`status != 'complete'`). The `error-handler.ts` failed/cancelled paths don't write `tokens_used`/`ctx_used`, so the existing `tokens_used IS NOT NULL` clause already filters these. Adding `status='complete'` is defense in depth and makes intent explicit.
+- **Cap-hit and doom-loop sentinel rows** (`metadata->>'kind' IN ('cap_hit', 'doom_loop')`). Sentinels are `role='system'` rows with `tool_calls=NULL`, so the existing `tool_calls IS NOT NULL` clause already filters them. The explicit metadata filter is defense in depth — it survives future schema drift where someone might INSERT a sentinel with a non-null tool_calls.
+- **`experimental_repairToolCall` retries.** No special handling needed. Our impl (per `CLAUDE.md`) is pass-through — malformed calls flow to zod-reject → tool_result error → next normal turn handles. No separate rows; the next turn's tokens count naturally.
+
+## Recon (already done; paste for reference)
+
+```
+cd /opt/boocode
+grep -n "tokens_used\|ctx_used\|inputTokens\|outputTokens" apps/server/src/services/inference/*.ts | head -30
+grep -n "metadata\|cap_hit\|doom_loop" apps/server/src/services/inference/sentinels.ts apps/server/src/schema.sql | head -10
+psql -h localhost -p 5432 -U postgres -d boocode -c "\d messages_with_parts" | head -30
+```
+
+Expected: confirms the canonical mapping in the table above; confirms `messages.metadata jsonb` exists at `schema.sql:259`; confirms `messages_with_parts` exposes `m.metadata` at `schema.sql:92`.
+
+## Scope
+
+### 1. schema.sql — `tool_cost_stats` view (~35 LoC)
+
+Append after the `messages_with_parts` view (after line 120):
+
+```sql
+-- v1.13.10: per-tool token cost rolling window. Derives from
+-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
+-- the legacy JSON column) so this works whether the chat predates v1.13.0
+-- or postdates v1.13.2 (column drop). No new write site — all source data
+-- already lands via the existing tool-phase.ts:94-95 UPDATE.
+--
+-- Attribution model: equal split. A turn emitting N tool calls divides its
+-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
+-- brief for rationale + rejected alternatives.
+--
+-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
+-- = completion (output). Non-obvious naming; pinned via canonical writes at
+-- tool-phase.ts:94-95 et al.
+--
+-- Filtering rationale:
+--   status='complete'                — exclude failed/cancelled (defense in
+--                                      depth; failed-path doesn't write
+--                                      tokens_used so they're also filtered
+--                                      indirectly).
+--   metadata->>'kind' exclusions     — exclude cap_hit / doom_loop sentinels
+--                                      (defense in depth; sentinels are
+--                                      role='system' with tool_calls=NULL
+--                                      so they're filtered indirectly too).
+--   experimental_repairToolCall      — no special handling; retries flow
+--                                      as normal next-turn tool_result
+--                                      errors and count naturally.
+--
+-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
+-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
+-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
+CREATE OR REPLACE VIEW tool_cost_stats AS
+WITH per_call AS (
+  SELECT
+    (tc->>'name')::text AS tool_name,
+    (m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
+    (m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
+    m.created_at,
+    ROW_NUMBER() OVER (
+      PARTITION BY (tc->>'name')::text
+      ORDER BY m.created_at DESC
+    ) AS rn
+  FROM messages_with_parts m,
+    LATERAL jsonb_array_elements(m.tool_calls) AS tc
+  WHERE m.tool_calls IS NOT NULL
+    AND jsonb_array_length(m.tool_calls) > 0
+    AND m.tokens_used IS NOT NULL
+    AND m.ctx_used IS NOT NULL
+    AND m.status = 'complete'
+    AND (m.metadata IS NULL
+         OR m.metadata->>'kind' IS NULL
+         OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
+)
+SELECT
+  tool_name,
+  ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
+  ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
+  COUNT(*)::int AS n_calls,
+  MAX(created_at) AS updated_at
+FROM per_call
+WHERE rn <= 100
+GROUP BY tool_name;
+```
+
+Notes:
+- `NULLIF(..., 0)` guards against div-by-zero on `jsonb_array_length=0` (should never happen given the WHERE clause, but defensive).
+- `ROUND(SUM(...))::int` — frontend doesn't want decimals; sum-then-round is more accurate than per-row round-then-sum.
+- View is read from `messages_with_parts` not `messages`, so legacy pre-v1.13.0 rows and post-v1.13.2 rows both resolve.
+- No index needed; the underlying `idx_messages_chat` covers the JOIN; the LATERAL unnest is bounded by the 100-row partition.
+
+### 2. apps/server/src/routes/tools.ts (NEW, ~40 LoC)
+
+New route file. Register in `apps/server/src/index.ts` next to the other `register*Routes(app, sql, ...)` calls.
+
+```ts
+import type { FastifyInstance } from 'fastify';
+import type { Sql } from '../db.js';
+
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+
+export function registerToolsRoutes(app: FastifyInstance, sql: Sql) {
+  app.get('/api/tools/cost_stats', async () => {
+    const rows = await sql<{
+      tool_name: string;
+      prompt_tokens_sum: number;
+      completion_tokens_sum: number;
+      n_calls: number;
+      updated_at: string;
+    }[]>`
+      SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
+      FROM tool_cost_stats
+      ORDER BY tool_name ASC
+    `;
+    const stats: ToolCostStat[] = rows.map(r => ({
+      tool_name: r.tool_name,
+      mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
+      mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
+      n_calls: r.n_calls,
+      updated_at: r.updated_at,
+    }));
+    return { stats };
+  });
+}
+```
+
+Route is bodyless, idempotent, cheap. No pagination (≤30 tools).
+
+### 3. apps/server/src/services/__tests__/tool_cost_stats.test.ts (NEW, ~95 LoC)
+
+Integration test against real Postgres (matches `inference.test.ts` pattern). Fixtures:
+
+```ts
+import { describe, it, expect, beforeEach } from 'vitest';
+import { connect } from '../../db.js';
+
+describe('tool_cost_stats view (v1.13.10)', () => {
+  // ... session + chat + project setup helpers ...
+
+  it('returns empty when no tool calls exist', async () => {
+    // fresh chat, only user/assistant text turns
+    const stats = await sql`SELECT * FROM tool_cost_stats`;
+    expect(stats).toEqual([]);
+  });
+
+  it('attributes single-tool turn fully to that tool', async () => {
+    // insert one assistant message with tool_calls=[{name: 'view_file', ...}],
+    // tokens_used=300, ctx_used=15000, status='complete'
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]).toMatchObject({
+      tool_name: 'view_file',
+      prompt_tokens_sum: 15000,
+      completion_tokens_sum: 300,
+      n_calls: 1,
+    });
+  });
+
+  it('splits multi-tool turn equally across tools', async () => {
+    // insert one assistant turn with 3 tool calls (view_file, grep, list_dir),
+    // tokens_used=300, ctx_used=15000 → each tool gets 100 completion, 5000 prompt
+    const stats = await sql`SELECT * FROM tool_cost_stats ORDER BY tool_name`;
+    expect(stats).toHaveLength(3);
+    for (const s of stats) {
+      expect(s.completion_tokens_sum).toBe(100);
+      expect(s.prompt_tokens_sum).toBe(5000);
+      expect(s.n_calls).toBe(1);
+    }
+  });
+
+  it('limits to last 100 calls per tool (FIFO window)', async () => {
+    // insert 150 turns each calling view_file once with monotonically
+    // increasing tokens_used; expect only the most recent 100 to count
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]!.n_calls).toBe(100);
+    // mean should reflect the latter half (51..150), not 1..150
+  });
+
+  it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
+    // insert a turn with tool_calls but tokens_used=NULL → must not appear
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats).toEqual([]);
+  });
+
+  it('excludes failed and cancelled turns + sentinel metadata rows', async () => {
+    // insert four rows for tool_name='view_file', all with tokens_used+ctx_used
+    // populated:
+    //   row A: status='failed'                            — excluded
+    //   row B: status='cancelled'                         — excluded
+    //   row C: status='complete', metadata={kind:'cap_hit'}   — excluded
+    //   row D: status='complete', metadata={kind:'doom_loop'} — excluded
+    //   row E: status='complete', metadata=null               — included
+    // Expect n_calls=1, attributable to row E only.
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]!.n_calls).toBe(1);
+  });
+
+  it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
+    // insert a v1.13.0+ row with messages.tool_calls=NULL but
+    // message_parts rows containing the tool_call → must still aggregate
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='grep'`;
+    expect(stats[0]!.n_calls).toBe(1);
+  });
+});
+```
+
+Pattern: each test resets the messages table for the fixture chat (TRUNCATE not DELETE — Postgres `messages` has FK CASCADE) and inserts hand-crafted rows. The view is recomputed on every SELECT.
+
+### 4. apps/web/src/api/types.ts + client.ts (~10 LoC)
+
+Add to `types.ts`:
+
+```ts
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+```
+
+Add to `client.ts` under the existing `api.*` namespace structure:
+
+```ts
+tools: {
+  costStats: () => fetch<{ stats: ToolCostStat[] }>('GET', '/api/tools/cost_stats'),
+},
+```
+
+Match the casing convention of the existing namespaces (`api.agents.list`, `api.chats.archive`, etc.).
+
+### 5. apps/web/src/components/AgentPicker.tsx — tooltip extension (~80 LoC delta)
+
+Currently (line 67): `title={selectedAgent?.description}` — native HTML title attribute on the trigger button.
+
+Replacement: dropdown items get a per-agent cost line in muted text below the description. Format:
+
+```
+[Agent name]
+[Agent description]
+~5.2k prompt / 280 completion · 6 tools · last call 3h ago
+```
+
+Implementation steps:
+1. Fetch `api.tools.costStats()` once on mount (alongside the existing `api.agents.list()`). Cache result for the lifetime of the picker open state. Re-fetch only on `useEffect` dep change.
+2. Compute per-agent aggregate: for each agent, sum the means of its whitelisted tools. Sum-of-means, not mean-of-sums — we're combining independent rolling averages.
+3. Render below description (one line, muted, truncated). Show "—" if no calls recorded yet for any of the agent's tools.
+4. Don't break the existing native `title=` for backward compat; layer the cost line additively.
+
+```tsx
+const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
+useEffect(() => {
+  api.tools.costStats().then(r => setCostStats(r.stats)).catch(() => setCostStats([]));
+}, []);
+const costByTool = useMemo(
+  () => Object.fromEntries(costStats.map(s => [s.tool_name, s])),
+  [costStats],
+);
+function agentCost(agent: Agent): { prompt: number; completion: number; nTools: number; nWithData: number; mostRecent: string | null } {
+  let prompt = 0, completion = 0, nWithData = 0;
+  let mostRecent: string | null = null;
+  for (const t of agent.tools) {
+    const s = costByTool[t];
+    if (!s) continue;
+    prompt += s.mean_prompt_tokens;
+    completion += s.mean_completion_tokens;
+    nWithData++;
+    if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
+  }
+  return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
+}
+```
+
+For the line render: `~${formatK(prompt)} prompt / ${completion} completion · ${nWithData}/${nTools} tools · ${formatAgo(mostRecent)}`. Skip entirely when `nWithData === 0` to avoid showing "0k / 0 / 0 tools" for fresh-from-deploy state.
+
+**`formatK` / `formatAgo`:** colocate at the bottom of `AgentPicker.tsx`. Don't extract to a util file in this batch — single use site.
+
+## What NOT to do
+
+- **Don't add a new write site at `tool-phase.ts` or `finalizeCompletion`.** All source data is already there via existing UPDATEs.
+- **Don't denormalize.** The view is sufficient and rollback-safe at BooCode's single-user scale.
+- **Don't add per-tool cost to the message bubble.** Out of scope. AgentPicker tooltip only.
+- **Don't fold per-call rows into a moving sum via triggers.** Aggregate on read; 100 rows × 30 tools is microseconds in Postgres.
+- **Don't track `result_chars` (the size of `tool_results.output`).** Tempting as a second cost signal but out of scope here. Future batch if Sam wants it.
+- **Don't add a session-scoped or chat-scoped filter to `tool_cost_stats`.** The rolling window is GLOBAL across all chats — the agent picker is a project-level decision aid. Per-chat surfacing is a future v1.14+ design.
+- **Don't change the attribution model post-deployment** without dropping the view first. Mid-flight semantic changes give bogus historical means.
+- **Don't "fix" the `ctx_used`/`tokens_used` naming inside this batch.** Non-obvious but pinned across 5 write sites. Renaming is its own batch.
+- **Don't rely solely on `tool_calls IS NOT NULL` for sentinel exclusion.** It works today (sentinels are role='system' with tool_calls=NULL) but the explicit `status='complete'` + `metadata->>'kind'` filters are defense in depth and survive future schema drift.
+
+## Backup before edits
+
+```
+cd /opt/boocode
+cp apps/server/src/schema.sql{,.bak-$(date +%Y%m%d-%H%M%S)}
+cp apps/web/src/components/AgentPicker.tsx{,.bak-$(date +%Y%m%d-%H%M%S)}
+```
+
+(No backup needed for new files in items 2, 3, 4.)
+
+## Verify
+
+```
+pnpm -C apps/server test
+```
+
+Expected: all existing tests pass + 7 new in `tool_cost_stats.test.ts`. Total moves from 195 → 202.
+
+```
+cd /opt/boocode
+docker compose exec boocode_db psql -U postgres -d boocode -c \
+  "SELECT * FROM tool_cost_stats ORDER BY n_calls DESC LIMIT 10;"
+```
+
+Expected: in any live deployment with v1.13.7+ history, this returns real rows for `view_file`, `grep`, `list_dir`, etc. If empty: `messages.tool_calls` was NULL for the v1.13.1-A → v1.13.7 latent regression window and recovery only begins with v1.13.7+ traffic.
+
+## Build + smoke
+
+```
+cd /opt/boocode
+docker compose up --build -d boocode
+docker compose logs --since=30s boocode | tail -20
+```
+
+Smoke A — view recompiles on schema apply:
+```
+docker compose logs boocode | grep -i "tool_cost_stats\|applySchema"
+```
+Expected: clean schema apply, view registered idempotently.
+
+Smoke B — endpoint returns data:
+```
+curl -s http://localhost:3000/api/tools/cost_stats | jq '.stats | length, .stats[0]'
+```
+Expected: nonzero length if any v1.13.7+ tool calls exist; one stat object with all 5 fields populated.
+
+Smoke C — UI:
+1. Open browser to `boocode.indifferentketchup.com`.
+2. Open AgentPicker dropdown on any session.
+3. Each agent row shows a muted cost line below its description: `~5.2k prompt / 280 completion · 6/8 tools · last call 2h ago`.
+4. Agents with no tool history show just description (no cost line).
+5. Confirm cost line truncates with the existing text-muted-foreground / truncate pattern; doesn't break the layout at mobile widths (open Vivaldi devtools, set iPhone-13 viewport).
+
+## Files expected to touch
+
+- `apps/server/src/schema.sql` — ~35 LoC delta (view definition + filter comments)
+- `apps/server/src/routes/tools.ts` — NEW, ~40 LoC
+- `apps/server/src/index.ts` — 1 line (`registerToolsRoutes(app, sql)`)
+- `apps/server/src/services/__tests__/tool_cost_stats.test.ts` — NEW, ~95 LoC
+- `apps/web/src/api/types.ts` — ~7 LoC (interface)
+- `apps/web/src/api/client.ts` — ~3 LoC (namespace + method)
+- `apps/web/src/components/AgentPicker.tsx` — ~80 LoC delta (cost line + fetch hook + helpers)
+
+Total ~260 LoC. Matches roadmap estimate.
+
+## Workflow conventions
+
+- Backups before destructive edits (above) on the two MODIFIED files. New files don't need backups.
+- Sam reviews diffs. Never `git add` / `git commit` / `git push` / `git pull` on Sam's behalf.
+- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
+- Tests authoritative: `pnpm -C apps/server test`.
+- View definition lives in `schema.sql` (idempotent via `CREATE OR REPLACE VIEW`); no migration shim needed.
+
+## Don't repeat past mistakes
+
+- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, `BUDGET_NO_AGENT=30`): all live. This batch depends on `includeUsage:true`. If unset, `tool_cost_stats` returns empty rows.
+- v1.13.8 prefix instrumentation: untouched.
+- v1.13.9 ratio-only `usable()`: untouched.
+- v1.13.4 two-tier prune: untouched.
+- v1.13.5 truncate.ts opaque-id pattern: untouched.
+- v1.13.1-B `messages_with_parts` view: this view is the source. Don't reach past it to raw `messages`.
+- v1.13.2 will DROP `messages.tool_calls`/`tool_results` columns. The `tool_cost_stats` view reads from `messages_with_parts` not `messages`, so it survives. Verify after v1.13.2 ships.
+
+## Source files to read in project knowledge
+
+- `boocode_roadmap.md` (v1.13.10 row at line 114; schema row at line 474)
+- `boocode_code_review.md` (cost-tracking design background)
+- `CLAUDE.md` (project conventions; messages_with_parts invariant at L80; v1.13.7 includeUsage invariant)
+```
--- a/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
+++ b/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
@@ -0,0 +1,225 @@
+# Handoff: BooCode v1.13.8 — system-prompt prefix stability verify-and-measure
+
+#careful #boocode #nofluff
+
+Recon-only / instrumentation batch. **No cache implementation in this dispatch.** Goal: prove (or disprove) that the assembled system-prompt prefix is byte-stable across turns under steady-state inputs. Result determines whether v1.13.7-as-originally-specced (the prefix cache) is actually needed at all.
+
+## Where we are
+
+- Last tag: `v1.13.7` — stability bundle (`includeUsage:true` + trim guards + payload filter for trailing empty/failed assistants + `BUDGET_NO_AGENT 15→30`). This shipped as a renumber of the original "prefix cache" v1.13.7 slot. The prefix-cache work moved to v1.13.8 with the change-of-shape captured here.
+- Branch clean. `git log --oneline main -5` should show `…v1.13.7 v1.13.6 v1.13.5 v1.13.4 v1.13.3`.
+
+## What v1.13.x has shipped
+
+- v1.13.0 — `message_parts` table + dual-write.
+- v1.13.1-A — AI SDK v6 install (`streamText` adapter, mid-dispatch silent-abort patch).
+- v1.13.1-B — `messages_with_parts` view + read sites flipped.
+- v1.13.1-C — `ask_user_input` correlation ported + reasoning end-to-end.
+- v1.13.3 — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, `experimental_repairToolCall`.
+- v1.13.4 — two-tier compaction prune.
+- v1.13.5 — opencode `truncate.ts` port (`tr_<12char>` opaque ids on tmpfs).
+- v1.13.6 — compaction head-assembly audit; reasoning_parts added to `buildHeadPayload`.
+- v1.13.7 — stability bundle (the five fixes above).
+
+## What's queued
+
+- **v1.13.8 (this dispatch)** — prefix stability verify-and-measure
+- v1.13.9 — compaction overflow trigger formula (opencode 0.85 × ctx_max)
+- v1.13.10 — per-tool token cost accounting + AgentPicker UI
+- v1.13.11 — WebSocket frame typing (Zod schemas both ends)
+- v1.13.12 — skills audit pass (rules→recipes split)
+- v1.13.2 — drop legacy columns (last; ≥1 week production traffic on v1.13.1 first)
+
+## Why this is verify-first
+
+The original v1.13.7 roadmap line was "system-prompt prefix cache, keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated." Recon during planning surfaced that:
+
+- `apps/server/src/services/system-prompt.ts:buildSystemPrompt()` already runs over mtime-cached inputs:
+  - BOOCHAT.md / BOOCODER.md — cached in this file (`cachedGuidance`, line 25), keyed by mtime
+  - global + per-project AGENTS.md — cached in `services/agents.ts` (`safeStat` pattern, line 245), keyed by mtime
+  - `session.system_prompt` / `project.default_system_prompt` — DB scalars, byte-stable until edited
+  - BASE_SYSTEM_PROMPT — hardcoded template with `${projectPath}` interpolation
+- Skills are NOT in the system prompt today. Discovered via `skill_find` at runtime.
+- Tool schemas are NOT in the system message. They live in the OpenAI request body's `tools` field (already alpha-sorted by v1.13.3).
+- Output assembly is a microsecond string concat with no I/O.
+
+So in theory the prefix is already byte-stable across turns. **Nobody has measured it.** This batch closes that gap with logs + a unit test, no cache implementation. If stable across a real session → close v1.13.8 as no-op, drop the original cache plan, move to v1.13.9. If drift surfaces → next batch designs the fix against the actual failure mode.
+
+## Scope (all three items)
+
+### 1. Per-turn prefix fingerprint log
+
+In `apps/server/src/services/system-prompt.ts`, after `buildSystemPrompt` finishes assembling `out`, before returning:
+
+- Compute `sha256(out)` → hex string. Use `node:crypto`.
+- Emit a single log line at `level=info` via a module-level pino instance (mirror the pattern used elsewhere in the inference services). Shape:
+
+```ts
+{
+  msg: 'prefix-fingerprint',
+  project_id: project.id,
+  agent_id: agent?.id ?? null,
+  agent_name: agent?.name ?? null,
+  session_id: session.id,
+  prefix_hash: <sha256 hex>,
+  prefix_length: out.length,
+  mtime_boochat: <number | null>,           // from cachedGuidance.mtime, or null when guidance is null
+  has_agent_system_prompt: <boolean>,
+  has_session_override: session.system_prompt.trim().length > 0,
+  has_project_override: project.default_system_prompt.trim().length > 0,
+}
+```
+
+The mtime fields surface which inputs changed when drift is observed. The hash itself is what proves equality.
+
+`buildSystemPrompt` already reaches into `cachedGuidance` indirectly via `getContainerGuidance()` — expose `cachedGuidance?.mtime` for the log via a thin getter (`getCachedGuidanceMtime(): number | null`) so the log line carries it without re-statting.
+
+For the AGENTS.md mtimes (global + per-project), `services/agents.ts` exposes them via the `cache` Map but no public accessor. Either (a) add a `getAgentsMtimes(projectPath: string): { global: number | null; project: number | null }` exported function to agents.ts, or (b) skip those fields in v1.13.8 and only log the BOOCHAT mtime. **Default: do (a).** If recon shows that's invasive, fall back to (b) and note the limitation in the smoke report.
+
+### 2. Per-session drift observer
+
+Module-level `Map<sessionId, lastHash>` in `system-prompt.ts`. On each `buildSystemPrompt` call:
+
+- If `sessionId` is not in the map → set it, emit no extra log.
+- If `sessionId` IS in the map and the hash matches → emit no extra log.
+- If `sessionId` IS in the map and the hash DIFFERS → emit a second `level=warn` log:
+
+```ts
+{
+  msg: 'prefix-drift',
+  session_id: session.id,
+  prev_hash: <previous>,
+  new_hash: <current>,
+  prev_length: <number>,
+  new_length: <number>,
+  changed_inputs: <array of field names where mtime/flags changed since last call>,
+}
+```
+
+`changed_inputs` is a small array like `['mtime_boochat']` or `['has_session_override']` — the field-level diff so we can see exactly what input drifted.
+
+The map grows unboundedly across long-lived processes. Acceptable for v1.13.8 (instrumentation only, 5 min sessions in test). Add a TODO comment: "v1.13.x follow-up if it survives: LRU-bound this map at 1000 sessions." Don't implement the LRU now.
+
+Add a `_resetPrefixObserverForTests()` export mirroring the existing `_resetContainerGuidanceCacheForTests()`.
+
+### 3. Unit test for byte-stability
+
+In `apps/server/src/services/__tests__/system-prompt.test.ts`, add a `describe('buildSystemPrompt stability', () => { ... })` block:
+
+```ts
+it('returns byte-identical output across two consecutive calls with the same inputs', async () => {
+  // set BOOCHAT.md, build (project, session, agent), capture hash
+  const first = await buildSystemPrompt(project, session, agent);
+  const second = await buildSystemPrompt(project, session, agent);
+  expect(first).toBe(second);
+});
+
+it('emits a single prefix-fingerprint log per call', async () => {
+  // capture logs via pino test transport or stub
+  // assert one prefix-fingerprint per buildSystemPrompt call
+});
+
+it('emits a prefix-drift log when the same session sees a different hash', async () => {
+  // build once; mutate BOOCHAT.md or pass a different agent; build again with same sessionId
+  // assert one prefix-drift log with prev_hash and new_hash populated
+});
+```
+
+The first test is the load-bearing one — it locks in the byte-stability invariant going forward, regardless of what the production smoke surfaces.
+
+## What NOT to do in this dispatch
+
+- **Don't add a cache.** Output memoization is v1.13.9+ work IF the smoke proves it's needed. Implementing a cache before measurement is what the v1.13.6 audit was designed to catch — premature optimization disguised as correctness.
+- **Don't change `buildSystemPrompt`'s return signature or async behavior.** The output stays a single string. Signature stays `(project, session, agent) => Promise<string>`.
+- **Don't thread chat_id or anything else into the call.** `session.id` is sufficient as the observer key.
+- **Don't log the full prefix text.** Hash + length only. The prefix can be many KB; logging it 5× per session blows up log size for no benefit. If drift appears and the hash diff is mysterious, `LOG_LEVEL=debug` can be wired in a follow-up.
+- **Don't touch `messages_with_parts` or the CASE-WHEN-EXISTS fallback v1.13.4 added.** This batch is in `system-prompt.ts` only.
+- **Don't preserve the AI SDK v6 silent-abort guard differently.** It's in `stream-phase.ts` and untouched.
+
+## Recon (already done — paste these for the implementer's reference)
+
+```
+cd /opt/boocode
+wc -l apps/server/src/services/system-prompt.ts
+# → 83 lines
+
+grep -n "^export|^function|^async function|cache|mtime" apps/server/src/services/system-prompt.ts
+# → cachedGuidance at line 25; loadContainerGuidance / getContainerGuidance / _resetContainerGuidanceCacheForTests / buildSystemPrompt are the public surface
+
+grep -rn "buildSystemPrompt" apps/server/src --include="*.ts" | grep -v "tests"
+# → single caller: apps/server/src/services/inference/payload.ts:41
+# → also referenced in routes/sessions.ts (session-create flow may call it for preview; verify during implementation)
+
+grep -n "safeStat\|cache\|mtime" apps/server/src/services/agents.ts
+# → mtime-keyed cache (Map) at line 245, TTL 60_000ms, key = projectPath || '__none__'
+# → safeStat pattern at line 255
+```
+
+## Verification protocol (smoke)
+
+After deploy:
+
+1. Fresh BooChat session, default agent (no agent selected).
+2. Send 5 short messages, wait for each turn to complete.
+3. `docker compose logs --since=10m boocode | grep -E 'prefix-fingerprint|prefix-drift'`
+
+**Success criteria:**
+- 5 `prefix-fingerprint` lines (one per turn — assuming each turn calls `buildSystemPrompt` once via `buildMessagesPayload`).
+- All 5 lines have identical `prefix_hash` and `prefix_length`.
+- Zero `prefix-drift` lines.
+
+**Failure modes to characterize:**
+- Drift WITH a corresponding mtime change in `changed_inputs` → expected if BOOCHAT.md or AGENTS.md was edited mid-session. Note in smoke report; not a bug.
+- Drift WITHOUT any mtime/flag change in `changed_inputs` → assembly nondeterminism somewhere. **This is the bug case.** Report the exact `prev_hash`/`new_hash` pair and full `prefix-fingerprint` log lines from before and after the drift.
+- Multiple `prefix-fingerprint` lines per turn → `buildSystemPrompt` is being called more than once per turn (possibly from compaction or sentinel-summary paths). Note in smoke report; not necessarily a bug but worth understanding.
+- ANY successful turn that emits zero `prefix-fingerprint` lines → log statement isn't reached. Implementation bug.
+
+Repeat the smoke in a second session (different agent if available) to also confirm cross-session prefix differs only where expected (different `project.id`, different `agent_id`).
+
+## Files expected to touch
+
+- `apps/server/src/services/system-prompt.ts` — add hash + log + observer + getter (~50 LoC)
+- `apps/server/src/services/agents.ts` — add `getAgentsMtimes()` accessor (~15 LoC if going with default option)
+- `apps/server/src/services/__tests__/system-prompt.test.ts` — 3 new tests (~30 LoC)
+- `apps/server/package.json` — none expected (pino + node:crypto already available)
+
+Total ~95 LoC.
+
+## Workflow conventions (boocode)
+
+- Backup before destructive: `cp file file.bak-$(date +%Y%m%d-%H%M%S)`. (Files get gitignored via global `*.bak*`.)
+- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
+- Tests: `pnpm -C apps/server test`. Smoke after deploy.
+- Type-check: `npx tsc -p apps/web/tsconfig.app.json --noEmit` is authoritative for web; `pnpm -C apps/server build` is authoritative for server.
+- Sam reviews diffs. Never `git add`/`commit`/`push`/`pull` on Sam's behalf.
+- Tag after commit: `git tag v1.13.8` (lightweight), then push via the Gitea deploy key:
+  `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin v1.13.8`
+
+## Repo layout pointers
+
+- `apps/server/src/services/system-prompt.ts` — primary target (83 lines)
+- `apps/server/src/services/agents.ts` — for the mtimes accessor
+- `apps/server/src/services/inference/payload.ts:41` — call site
+- `apps/server/src/services/__tests__/system-prompt.test.ts` — extend tests here
+- `apps/server/vitest.config.ts` — test glob is `src/**/__tests__/**/*.test.ts`
+
+## Open questions for Sam during recon
+
+1. **`getAgentsMtimes()` accessor in agents.ts vs BOOCHAT-only log.** Default: add the accessor. If implementation surface is bigger than expected (e.g. the agents.ts cache structure makes it awkward), fall back to BOOCHAT-only and note the gap.
+2. **What counts as a "turn" for the observer's `Map<sessionId, lastHash>`?** Default: every `buildSystemPrompt` call. If recon shows that compaction / sentinel-summary paths also call `buildSystemPrompt` and would generate noise, gate the observer to inference-turn calls only. Cleanest signal vs. cleanest implementation.
+3. **Log severity for `prefix-drift`.** Default: `warn`. If Sam expects routine BOOCHAT.md edits to fire it, downgrade to `info`. The smoke will surface this — adjust during smoke if needed.
+
+## Don't repeat past mistakes
+
+- AI SDK v6 silent-abort guard in `stream-phase.ts`: untouched.
+- v1.13.4 view fix (COALESCE → CASE-WHEN-EXISTS): untouched. This batch is in `system-prompt.ts` only.
+- v1.13.5 truncate.ts: untouched.
+- v1.13.6 reasoning embed in compaction: untouched.
+- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, budget bump): all live. Don't undo.
+
+## Source files to read in project knowledge
+
+- `boocode_roadmap.md` (last updated 2026-05-22; v1.13.x cleanup line order locked)
+- `boocode_code_review.md` (no lift source for v1.13.8 — in-house instrumentation)
+- `CLAUDE.md` (project conventions, NodeNext imports, vitest include glob, etc.)
+- This handoff (`handoff_v1.13.8_prefix_verify.md`)