v1.13.15-openspec: reformat batch docs to OpenSpec directory structure

Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal,
specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep
documentation reformat; replaces ad-hoc boocode_batchN.md /
handoff_vN.N.N.md convention.

Existing batch docs moved into openspec/changes/archived/ via git mv
(preserves history):
- boocode_batch10.md
- handoff_v1.13.8_prefix_verify.md
- handoff_v1.13.10_per_tool_cost.md

Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The
work was already shipped; the originals are preserved as archived
snapshots. New v1.13.15+ batches land directly in
openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when
applicable) per the convention documented in openspec/README.md.

CLAUDE.md gained a one-line pointer to the convention (workflow
section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both
remain well under the AgentLint hard caps.

specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+).
No CLI dep added in this batch — directory structure only. If/when the
full OpenSpec lifecycle is adopted, that lands as a separate batch.
This commit is contained in:
2026-05-22 14:54:17 +00:00
parent fc11e8dc91
commit 5a3f357ce9
5 changed files with 39 additions and 0 deletions

38
openspec/README.md Normal file
View File

@@ -0,0 +1,38 @@
# openspec
Per-batch documentation convention adopted v1.13.15-openspec.
Lift source: Fission-AI/OpenSpec directory layout. **No CLI dependency** — just
the folder shape. Full OpenSpec lifecycle adoption is a future v1.14+ batch.
## Layout
```
openspec/
changes/
<slug>/ # one folder per shipped or planned batch
proposal.md # Why + scope summary
tasks.md # implementation step list
design.md # architecture / data-model decisions (optional)
specs/ # reserved for future OpenSpec CLI adoption
archived/ # snapshots of pre-v1.13.15 batch docs
<original-filename>.md
specs/ # global specs, future v1.14+ use
```
## Conventions
- Slugs are lowercase-hyphenated derived from the batch title
(e.g. `v1-13-10-per-tool-cost`, `file-attachments-v3-5`).
- Already-shipped pre-v1.13.15 batches live in `changes/archived/` as
single-file snapshots. They were not split into proposal/tasks because
the work was already complete; archiving preserves git history.
- New v1.13.15+ batches should land directly in
`changes/<slug>/proposal.md` (+ tasks.md, + design.md when applicable).
- `proposal.md` carries the "Why" and scope. `tasks.md` is the action list
(numbered or checkbox). `design.md` is for non-trivial architectural
decisions worth recording separately.
- A canonical dispatch brief (matching the v1.13.9 / v1.13.10 format)
is most naturally split as proposal.md (Where we are, Why this matters,
rationale sections) + tasks.md (Scope items, Build + smoke) + design.md
(Attribution model, Filtering, Canonical mapping).

View File

@@ -0,0 +1,269 @@
# BooCode v1.1 — Batch 10
**Theme:** BooTerm. Second container, dedicated to in-browser terminals. Per-session tmux. xterm.js + node-pty in-container. New pane type wires into the BooCode shell.
**Status:** Planned. Largest batch in v1.1. Depends on Batch 3 (pane system), Batch 7 (settings drawer pattern reused).
**Repo:** `/opt/boocode/` (shared monorepo). New `apps/booterm/` subdirectory.
## Goals
1. New container `booterm` running Fastify + node-pty + tmux. Per-session tmux session keyed by `(user, session_id)`.
2. xterm.js terminal pane in the BooCode shell. Multiple terminal panes per session, each attached to a separate tmux window.
3. PTY traffic over WebSocket. Auth via `Remote-User`.
4. tmux as session manager so terminals survive WebSocket reconnects, page refreshes, even container restarts.
5. Read+write capability scoped to project root. No `cd ..` escape.
6. Path-based routing: `code.indifferentketchup.com/api/term/*` → booterm; `/ws/term/*` → booterm.
## Architecture
```
browser ──HTTPS──> Caddy (droplet) ──Tailscale──> Authelia
├── /api/chat/*, /ws/chat/* → boocode :9500
├── /api/term/*, /ws/term/* → booterm :9501
└── / → boocode (SPA)
booterm container:
- Fastify (Node 20)
- node-pty
- tmux installed in container (apk add tmux)
- same Postgres (boocode_db)
- mounts projects rw (scoped)
```
### Mount strategy
Decided: Option A. Per-project bind mounts in `docker-compose.yml`. Already applied: booterm has `/opt:/opt:rw` to keep parity with the existing boocode mount and avoid enumerating roots. Project root for any given session derives from `projects.root_path` and tmux launches with `cwd` set there.
### tmux session naming
Per-session tmux:
```
tmux session name: bc-<session_id> (UUID, sanitized — alphanumeric + hyphen)
tmux windows: term-<pane_id> (one window per terminal pane)
```
booterm spawns `tmux new-session -d -s bc-<sid> -c <project_root>` lazily on first attach. Subsequent attaches do `tmux new-window -t bc-<sid>` for additional panes, or `tmux attach -t bc-<sid>` and select window.
## Data model
| Column | On | Type | Default | Notes |
|---|---|---|---|---|
| (none) | — | — | — | terminals are tmux-managed, no DB rows |
| `kind = 'terminal'` | `session_panes.kind` CHECK | — | — | Extend CHECK to include `'terminal'` |
| `state.tmux_window` | `session_panes.state` JSONB | TEXT | NULL | Which tmux window this pane attaches to |
Schema (already applied to live DB + schema.sql):
```sql
ALTER TABLE session_panes DROP CONSTRAINT IF EXISTS session_panes_kind_check;
ALTER TABLE session_panes ADD CONSTRAINT session_panes_kind_check
CHECK (kind IN ('chat', 'file_browser', 'terminal'));
```
## Backend (booterm)
New app at `apps/booterm/`:
```
apps/booterm/
├── src/
│ ├── index.ts # Fastify + WS + auth
│ ├── auth.ts # Remote-User middleware (same pattern as boocode)
│ ├── db.ts # pg pool (shared boocode_db)
│ ├── routes/
│ │ ├── health.ts
│ │ └── terminals.ts # POST /api/term/sessions/:sid/panes/:pid/start (creates tmux window)
│ ├── pty/
│ │ ├── manager.ts # tmux process management
│ │ └── pty.ts # node-pty wrapper for `tmux attach -t ... -d`
│ └── ws/
│ └── attach.ts # WS /ws/term/sessions/:sid/panes/:pid → PTY bidi pipe
├── package.json
└── tsconfig.json
```
### Endpoints
| Method | Path | Notes |
|---|---|---|
| GET | `/api/term/health` | Ping |
| POST | `/api/term/sessions/:sid/panes/:pid/start` | Idempotent tmux window create. Returns `{tmux_window: "term-<pid>"}` |
| WS | `/ws/term/sessions/:sid/panes/:pid` | Attach PTY |
| POST | `/api/term/sessions/:sid/panes/:pid/resize` | `{cols, rows}` |
| POST | `/api/term/sessions/:sid/panes/:pid/kill` | Kill the tmux window |
WS frames (binary or text):
```
client → server: pty input (raw bytes, typed by user)
server → client: pty output (raw bytes from shell)
server → client: {type: "exit", code} on window close
```
### Auth + scoping
- `Remote-User` required on WS upgrade.
- `session_id` validated: lookup in `sessions` table; require row exists.
- `pane_id` validated: must exist in `session_panes` with `kind = 'terminal'` and matching `session_id`.
- Project root derived from `sessions.project_id → projects.root_path`. tmux starts `cd <root>` in that dir. **No chroot.** User can `cd /` and read anything mounted into the container.
- Future hardening: namespace/chroot. Out of v1.1 scope.
### tmux config
`apps/booterm/tmux.conf` bundled into image at `/etc/booterm/tmux.conf`; tmux invocations use `-f /etc/booterm/tmux.conf`:
```
set -g default-terminal "screen-256color"
set -g history-limit 50000
set -g mouse on
setw -g mode-keys vi
set -g status off
set -g destroy-unattached off
```
Boolab pattern (from `services/tmux_session.py`).
## Frontend
| File | Change |
|---|---|
| `apps/web/src/components/panes/TerminalPane.tsx` (NEW) | xterm.js mount, WS attach, resize handler |
| `apps/web/src/api/client.ts` | `api.terminals.start(sessionId, paneId)`, `api.terminals.resize(...)`, `api.terminals.kill(...)` |
| `apps/web/src/components/Workspace.tsx` | Add 'terminal' to the pane kind enum; spawn button → POST start → render TerminalPane. Tab UI lives in Workspace.tsx — there is no PaneTab.tsx file. |
| `apps/web/package.json` | `xterm` + `xterm-addon-fit` + `xterm-addon-web-links` |
### TerminalPane
```tsx
useEffect(() => {
const term = new Terminal({ fontFamily: 'JetBrains Mono', fontSize: 14, theme: ... });
const fit = new FitAddon();
term.loadAddon(fit);
term.loadAddon(new WebLinksAddon());
term.open(containerRef.current);
fit.fit();
const proto = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
const ws = new WebSocket(`${proto}//${window.location.host}/ws/term/sessions/${sid}/panes/${pid}`);
ws.binaryType = 'arraybuffer';
ws.onmessage = e => term.write(typeof e.data === 'string' ? e.data : new Uint8Array(e.data));
term.onData(data => ws.send(data));
term.onResize(({ cols, rows }) => api.terminals.resize(sid, pid, cols, rows));
const ro = new ResizeObserver(() => fit.fit());
ro.observe(containerRef.current);
return () => { ws.close(); term.dispose(); ro.disconnect(); };
}, [sid, pid]);
```
Dev: vite.config.ts needs `/api/term` and `/ws/term` proxy entries mirroring the existing `/api` and `/ws` ones.
## Send-to-terminal from chat
Boolab pattern: select text in a message → "Send to terminal" button → text becomes terminal input.
- Right-click context menu on selected text in chat → "Send to terminal" submenu lists open terminal panes.
- Click target → sends `<text>\n` to that pane's WS.
Implementation:
| File | Change |
|---|---|
| `apps/web/src/components/MessageBubble.tsx` | Selection handler + context menu |
| `apps/web/src/lib/events.ts` | New event `send_to_terminal` with payload `{pane_id, text}` |
| `apps/web/src/components/panes/TerminalPane.tsx` | Subscribe to event for its `pane_id`, write to WS |
## Docker compose (already applied)
booterm service is already in `docker-compose.yml` with:
- build context `.`, dockerfile `apps/booterm/Dockerfile`
- port `100.114.205.53:9501:3000`
- `/opt:/opt:rw` mount
- `DATABASE_URL` env pointing at `boocode_db`
- `boocode_net` network
- depends_on: `boocode_db`
Do not re-edit compose.
## Backend dependencies
`apps/booterm/package.json`:
- `fastify`
- `@fastify/websocket`
- `pg`
- `zod`
- `node-pty`
- `tslib`
`node-pty` requires native build. Dockerfile installs `python3 make g++` in build stage and `tmux` in runtime stage:
```dockerfile
FROM node:20-alpine AS build
RUN apk add --no-cache python3 make g++ tmux
WORKDIR /app
COPY ...
RUN pnpm install --frozen-lockfile && pnpm build
FROM node:20-alpine
RUN apk add --no-cache tmux
WORKDIR /app
COPY --from=build /app/apps/booterm/dist ./dist
COPY --from=build /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/index.js"]
```
## Files to touch
**New app:**
- `apps/booterm/` (entire subtree)
**Existing changes:**
- `apps/web/package.json`
- `apps/web/src/api/client.ts`
- `apps/web/src/api/types.ts`
- `apps/web/src/components/Workspace.tsx`
- `apps/web/src/components/MessageBubble.tsx`
- `apps/web/src/components/panes/TerminalPane.tsx` (NEW)
- `apps/web/src/lib/events.ts`
- `apps/web/vite.config.ts` (proxy entries)
**Already done by user — do not touch:**
- `docker-compose.yml` (booterm service added)
- `apps/server/src/schema.sql` (terminal CHECK constraint)
- Live DB constraint applied
## Verification
1. `docker compose up -d --build booterm` → container healthy.
2. `curl -s http://100.114.205.53:9501/api/term/health -H 'Remote-User: sam'` → 200.
3. Browser smoke test:
- Open a session. Workspace → "+ Terminal" → terminal pane appears with shell prompt in project root.
- Type `ls -la` → output.
- Type `vim test.txt`, write something, save, `:q` → file exists on host (since rw mount).
- Refresh browser → terminal reconnects, history intact (tmux persistence).
- Open second terminal pane → same project, separate tmux window. Both work independently.
- Select code in chat → right-click → "Send to terminal" → terminal pane receives the text.
- Container restart (`docker compose restart booterm`) → on reconnect, tmux session resumes from where it left off.
- Close pane via tab context menu → tmux window killed. Reopen pane → fresh shell.
## Constraints
- node-pty is a native dep. Image size grows.
- tmux history capped at 50k lines per window.
- WebSocket frames are bidirectional binary; `binaryType = 'arraybuffer'`.
- Resize debounced 100ms client-side; backend `tmux resize-window` per resize.
- No chroot/namespace isolation in v1.1. User has full read+write under `/opt/`. Acceptable for single-user homelab.
- Don't expose 9501 on 0.0.0.0. Tailscale binding only (already configured in compose).
## Open
- Color theme matching for xterm.js. Defer.
- File-drop into terminal (upload via terminal pane). Out of scope.
- Multi-user (each user gets own tmux server) — defer until BooCode goes multi-user, which isn't planned.
- BooCoder container — same skeleton as booterm but with edit_file / create_file tools instead of PTY. Will follow this pattern when built.

View File

@@ -0,0 +1,441 @@
```
#careful #boocode #nofluff
v1.13.10 — per-tool token cost accounting (rolling 100-call window)
Goal: surface per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of `messages_with_parts` (no new table, no new write site) + a read endpoint + AgentPicker tooltip extension. Estimated ~240 LoC, mostly UI.
## Where we are
- Last tag: v1.13.9 (compaction overflow trigger — `floor(0.85 × ctx_max)` early-trigger). Branch clean.
- v1.13.x cleanup line ✅ through v1.13.9. Queued: v1.13.10 (this) → v1.13.11 (WS Zod) → v1.13.12 (skills audit) → v1.13.2 (column drop, last).
- Dependency (satisfied since v1.13.7 commit `ff29b48`): `includeUsage: true` on `createOpenAICompatible` in `apps/server/src/services/inference/provider.ts`. Without it, `messages.tokens_used`/`ctx_used` were NULL for v1.13.1-A → v1.13.7 (latent regression). Now populated.
## Why this matters
Today: AgentPicker lists agents by name + description. No cost signal. Users pick the architect agent (full tool whitelist, 21k of tool schema) for one-liner questions a refactorer (3 tools, 4k schema) could answer.
Tomorrow: each agent listing shows its mean prompt + completion cost per tool, derived from the last 100 invocations across all chats. Decision aid, not a hard gate.
Why a SQL view instead of a denormalized stats table:
- All the source data already lands in `messages` (tool_calls JSON + tokens_used + ctx_used) and `message_parts` (read via the `messages_with_parts` view). Zero new write sites.
- Rolling 100-call window is a `ROW_NUMBER() OVER (PARTITION BY tool_name ORDER BY created_at DESC) <= 100` — natural fit for a view.
- View is rollback-safe. If the math is wrong, `DROP VIEW` and re-deploy; no orphan rows, no backfill.
- At BooCode scale (single user, ~30 tools, ~100 calls/tool), aggregate-on-read is microseconds. Premature to denormalize.
The roadmap schema row (`tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)`) matches both a table and a view. View is the lighter implementation.
## Canonical column mapping (pinned)
The `messages` columns are named non-obviously. Pinned mapping, confirmed across 5 write sites + 1 read site:
| Column | Semantic meaning | AI SDK v6 source name |
|-----------------|--------------------|-----------------------|
| `ctx_used` | prompt / input tokens | `usage.inputTokens` |
| `tokens_used` | completion / output tokens | `usage.outputTokens` |
Write sites confirmed: `tool-phase.ts:94-95`, `error-handler.ts:109-110`, `sentinel-summaries.ts:130-131`, `sentinel-summaries.ts:387-388`, `stream-phase.ts:319-320`. Canonical read at `payload.ts:190-191` reverses: `const promptTokens = updated.ctx_used; const completionTokens = updated.tokens_used`.
`tokens_used` reads like "total" but is completion only. Project convention since the columns predate v1.13.x. Do not "fix" the naming inside this batch — out of scope; downstream consumers depend on the current mapping.
## Attribution model
A single assistant turn can emit N tool calls in parallel. llama-swap returns ONE (prompt_tokens, completion_tokens) per turn, not per tool. Attribution requires a split.
**Chosen approach: equal split.** For an assistant turn that emits N tool calls with prompt P and completion C, each tool is attributed P/N prompt + C/N completion. The 100-call rolling mean smooths split noise. Implementation: `tokens_used::float / jsonb_array_length(tool_calls)` at the unnest site.
**Alternatives rejected:**
- "Full turn cost to every tool" (no division). Over-states; a 5-tool turn would 5×-count every tool's cost.
- "Result-size only" (`length(JSON.stringify(output)) / 4`). Loses the LLM's actual usage signal; doesn't capture how expensive a tool's output is to the next prompt.
- "Consuming-turn delta" (next turn prompt_tokens this turn prompt_tokens, attribute to the tool that emitted the result). Most accurate but requires bubble-back math through the `executeToolPhase → runAssistantTurn` recursion. Over-engineered for the rolling-average use case.
**If Sam wants a different split, change one line in the view definition (the divisor).**
## Filtering — sentinel, failure, repair-call semantics
The view excludes rows that aren't real tool-cost signal:
- **Failed and cancelled turns** (`status != 'complete'`). The `error-handler.ts` failed/cancelled paths don't write `tokens_used`/`ctx_used`, so the existing `tokens_used IS NOT NULL` clause already filters these. Adding `status='complete'` is defense in depth and makes intent explicit.
- **Cap-hit and doom-loop sentinel rows** (`metadata->>'kind' IN ('cap_hit', 'doom_loop')`). Sentinels are `role='system'` rows with `tool_calls=NULL`, so the existing `tool_calls IS NOT NULL` clause already filters them. The explicit metadata filter is defense in depth — it survives future schema drift where someone might INSERT a sentinel with a non-null tool_calls.
- **`experimental_repairToolCall` retries.** No special handling needed. Our impl (per `CLAUDE.md`) is pass-through — malformed calls flow to zod-reject → tool_result error → next normal turn handles. No separate rows; the next turn's tokens count naturally.
## Recon (already done; paste for reference)
```
cd /opt/boocode
grep -n "tokens_used\|ctx_used\|inputTokens\|outputTokens" apps/server/src/services/inference/*.ts | head -30
grep -n "metadata\|cap_hit\|doom_loop" apps/server/src/services/inference/sentinels.ts apps/server/src/schema.sql | head -10
psql -h localhost -p 5432 -U postgres -d boocode -c "\d messages_with_parts" | head -30
```
Expected: confirms the canonical mapping in the table above; confirms `messages.metadata jsonb` exists at `schema.sql:259`; confirms `messages_with_parts` exposes `m.metadata` at `schema.sql:92`.
## Scope
### 1. schema.sql — `tool_cost_stats` view (~35 LoC)
Append after the `messages_with_parts` view (after line 120):
```sql
-- v1.13.10: per-tool token cost rolling window. Derives from
-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
-- the legacy JSON column) so this works whether the chat predates v1.13.0
-- or postdates v1.13.2 (column drop). No new write site — all source data
-- already lands via the existing tool-phase.ts:94-95 UPDATE.
--
-- Attribution model: equal split. A turn emitting N tool calls divides its
-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
-- brief for rationale + rejected alternatives.
--
-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
-- = completion (output). Non-obvious naming; pinned via canonical writes at
-- tool-phase.ts:94-95 et al.
--
-- Filtering rationale:
-- status='complete' — exclude failed/cancelled (defense in
-- depth; failed-path doesn't write
-- tokens_used so they're also filtered
-- indirectly).
-- metadata->>'kind' exclusions — exclude cap_hit / doom_loop sentinels
-- (defense in depth; sentinels are
-- role='system' with tool_calls=NULL
-- so they're filtered indirectly too).
-- experimental_repairToolCall — no special handling; retries flow
-- as normal next-turn tool_result
-- errors and count naturally.
--
-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
CREATE OR REPLACE VIEW tool_cost_stats AS
WITH per_call AS (
SELECT
(tc->>'name')::text AS tool_name,
(m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
(m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
m.created_at,
ROW_NUMBER() OVER (
PARTITION BY (tc->>'name')::text
ORDER BY m.created_at DESC
) AS rn
FROM messages_with_parts m,
LATERAL jsonb_array_elements(m.tool_calls) AS tc
WHERE m.tool_calls IS NOT NULL
AND jsonb_array_length(m.tool_calls) > 0
AND m.tokens_used IS NOT NULL
AND m.ctx_used IS NOT NULL
AND m.status = 'complete'
AND (m.metadata IS NULL
OR m.metadata->>'kind' IS NULL
OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
)
SELECT
tool_name,
ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
COUNT(*)::int AS n_calls,
MAX(created_at) AS updated_at
FROM per_call
WHERE rn <= 100
GROUP BY tool_name;
```
Notes:
- `NULLIF(..., 0)` guards against div-by-zero on `jsonb_array_length=0` (should never happen given the WHERE clause, but defensive).
- `ROUND(SUM(...))::int` — frontend doesn't want decimals; sum-then-round is more accurate than per-row round-then-sum.
- View is read from `messages_with_parts` not `messages`, so legacy pre-v1.13.0 rows and post-v1.13.2 rows both resolve.
- No index needed; the underlying `idx_messages_chat` covers the JOIN; the LATERAL unnest is bounded by the 100-row partition.
### 2. apps/server/src/routes/tools.ts (NEW, ~40 LoC)
New route file. Register in `apps/server/src/index.ts` next to the other `register*Routes(app, sql, ...)` calls.
```ts
import type { FastifyInstance } from 'fastify';
import type { Sql } from '../db.js';
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
export function registerToolsRoutes(app: FastifyInstance, sql: Sql) {
app.get('/api/tools/cost_stats', async () => {
const rows = await sql<{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
updated_at: string;
}[]>`
SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
FROM tool_cost_stats
ORDER BY tool_name ASC
`;
const stats: ToolCostStat[] = rows.map(r => ({
tool_name: r.tool_name,
mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
n_calls: r.n_calls,
updated_at: r.updated_at,
}));
return { stats };
});
}
```
Route is bodyless, idempotent, cheap. No pagination (≤30 tools).
### 3. apps/server/src/services/__tests__/tool_cost_stats.test.ts (NEW, ~95 LoC)
Integration test against real Postgres (matches `inference.test.ts` pattern). Fixtures:
```ts
import { describe, it, expect, beforeEach } from 'vitest';
import { connect } from '../../db.js';
describe('tool_cost_stats view (v1.13.10)', () => {
// ... session + chat + project setup helpers ...
it('returns empty when no tool calls exist', async () => {
// fresh chat, only user/assistant text turns
const stats = await sql`SELECT * FROM tool_cost_stats`;
expect(stats).toEqual([]);
});
it('attributes single-tool turn fully to that tool', async () => {
// insert one assistant message with tool_calls=[{name: 'view_file', ...}],
// tokens_used=300, ctx_used=15000, status='complete'
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
expect(stats[0]).toMatchObject({
tool_name: 'view_file',
prompt_tokens_sum: 15000,
completion_tokens_sum: 300,
n_calls: 1,
});
});
it('splits multi-tool turn equally across tools', async () => {
// insert one assistant turn with 3 tool calls (view_file, grep, list_dir),
// tokens_used=300, ctx_used=15000 → each tool gets 100 completion, 5000 prompt
const stats = await sql`SELECT * FROM tool_cost_stats ORDER BY tool_name`;
expect(stats).toHaveLength(3);
for (const s of stats) {
expect(s.completion_tokens_sum).toBe(100);
expect(s.prompt_tokens_sum).toBe(5000);
expect(s.n_calls).toBe(1);
}
});
it('limits to last 100 calls per tool (FIFO window)', async () => {
// insert 150 turns each calling view_file once with monotonically
// increasing tokens_used; expect only the most recent 100 to count
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
expect(stats[0]!.n_calls).toBe(100);
// mean should reflect the latter half (51..150), not 1..150
});
it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
// insert a turn with tool_calls but tokens_used=NULL → must not appear
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
expect(stats).toEqual([]);
});
it('excludes failed and cancelled turns + sentinel metadata rows', async () => {
// insert four rows for tool_name='view_file', all with tokens_used+ctx_used
// populated:
// row A: status='failed' — excluded
// row B: status='cancelled' — excluded
// row C: status='complete', metadata={kind:'cap_hit'} — excluded
// row D: status='complete', metadata={kind:'doom_loop'} — excluded
// row E: status='complete', metadata=null — included
// Expect n_calls=1, attributable to row E only.
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
expect(stats[0]!.n_calls).toBe(1);
});
it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
// insert a v1.13.0+ row with messages.tool_calls=NULL but
// message_parts rows containing the tool_call → must still aggregate
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='grep'`;
expect(stats[0]!.n_calls).toBe(1);
});
});
```
Pattern: each test resets the messages table for the fixture chat (TRUNCATE not DELETE — Postgres `messages` has FK CASCADE) and inserts hand-crafted rows. The view is recomputed on every SELECT.
### 4. apps/web/src/api/types.ts + client.ts (~10 LoC)
Add to `types.ts`:
```ts
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
```
Add to `client.ts` under the existing `api.*` namespace structure:
```ts
tools: {
costStats: () => fetch<{ stats: ToolCostStat[] }>('GET', '/api/tools/cost_stats'),
},
```
Match the casing convention of the existing namespaces (`api.agents.list`, `api.chats.archive`, etc.).
### 5. apps/web/src/components/AgentPicker.tsx — tooltip extension (~80 LoC delta)
Currently (line 67): `title={selectedAgent?.description}` — native HTML title attribute on the trigger button.
Replacement: dropdown items get a per-agent cost line in muted text below the description. Format:
```
[Agent name]
[Agent description]
~5.2k prompt / 280 completion · 6 tools · last call 3h ago
```
Implementation steps:
1. Fetch `api.tools.costStats()` once on mount (alongside the existing `api.agents.list()`). Cache result for the lifetime of the picker open state. Re-fetch only on `useEffect` dep change.
2. Compute per-agent aggregate: for each agent, sum the means of its whitelisted tools. Sum-of-means, not mean-of-sums — we're combining independent rolling averages.
3. Render below description (one line, muted, truncated). Show "—" if no calls recorded yet for any of the agent's tools.
4. Don't break the existing native `title=` for backward compat; layer the cost line additively.
```tsx
const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
useEffect(() => {
api.tools.costStats().then(r => setCostStats(r.stats)).catch(() => setCostStats([]));
}, []);
const costByTool = useMemo(
() => Object.fromEntries(costStats.map(s => [s.tool_name, s])),
[costStats],
);
function agentCost(agent: Agent): { prompt: number; completion: number; nTools: number; nWithData: number; mostRecent: string | null } {
let prompt = 0, completion = 0, nWithData = 0;
let mostRecent: string | null = null;
for (const t of agent.tools) {
const s = costByTool[t];
if (!s) continue;
prompt += s.mean_prompt_tokens;
completion += s.mean_completion_tokens;
nWithData++;
if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
}
return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
}
```
For the line render: `~${formatK(prompt)} prompt / ${completion} completion · ${nWithData}/${nTools} tools · ${formatAgo(mostRecent)}`. Skip entirely when `nWithData === 0` to avoid showing "0k / 0 / 0 tools" for fresh-from-deploy state.
**`formatK` / `formatAgo`:** colocate at the bottom of `AgentPicker.tsx`. Don't extract to a util file in this batch — single use site.
## What NOT to do
- **Don't add a new write site at `tool-phase.ts` or `finalizeCompletion`.** All source data is already there via existing UPDATEs.
- **Don't denormalize.** The view is sufficient and rollback-safe at BooCode's single-user scale.
- **Don't add per-tool cost to the message bubble.** Out of scope. AgentPicker tooltip only.
- **Don't fold per-call rows into a moving sum via triggers.** Aggregate on read; 100 rows × 30 tools is microseconds in Postgres.
- **Don't track `result_chars` (the size of `tool_results.output`).** Tempting as a second cost signal but out of scope here. Future batch if Sam wants it.
- **Don't add a session-scoped or chat-scoped filter to `tool_cost_stats`.** The rolling window is GLOBAL across all chats — the agent picker is a project-level decision aid. Per-chat surfacing is a future v1.14+ design.
- **Don't change the attribution model post-deployment** without dropping the view first. Mid-flight semantic changes give bogus historical means.
- **Don't "fix" the `ctx_used`/`tokens_used` naming inside this batch.** Non-obvious but pinned across 5 write sites. Renaming is its own batch.
- **Don't rely solely on `tool_calls IS NOT NULL` for sentinel exclusion.** It works today (sentinels are role='system' with tool_calls=NULL) but the explicit `status='complete'` + `metadata->>'kind'` filters are defense in depth and survive future schema drift.
## Backup before edits
```
cd /opt/boocode
cp apps/server/src/schema.sql{,.bak-$(date +%Y%m%d-%H%M%S)}
cp apps/web/src/components/AgentPicker.tsx{,.bak-$(date +%Y%m%d-%H%M%S)}
```
(No backup needed for new files in items 2, 3, 4.)
## Verify
```
pnpm -C apps/server test
```
Expected: all existing tests pass + 7 new in `tool_cost_stats.test.ts`. Total moves from 195 → 202.
```
cd /opt/boocode
docker compose exec boocode_db psql -U postgres -d boocode -c \
"SELECT * FROM tool_cost_stats ORDER BY n_calls DESC LIMIT 10;"
```
Expected: in any live deployment with v1.13.7+ history, this returns real rows for `view_file`, `grep`, `list_dir`, etc. If empty: `messages.tool_calls` was NULL for the v1.13.1-A → v1.13.7 latent regression window and recovery only begins with v1.13.7+ traffic.
## Build + smoke
```
cd /opt/boocode
docker compose up --build -d boocode
docker compose logs --since=30s boocode | tail -20
```
Smoke A — view recompiles on schema apply:
```
docker compose logs boocode | grep -i "tool_cost_stats\|applySchema"
```
Expected: clean schema apply, view registered idempotently.
Smoke B — endpoint returns data:
```
curl -s http://localhost:3000/api/tools/cost_stats | jq '.stats | length, .stats[0]'
```
Expected: nonzero length if any v1.13.7+ tool calls exist; one stat object with all 5 fields populated.
Smoke C — UI:
1. Open browser to `boocode.indifferentketchup.com`.
2. Open AgentPicker dropdown on any session.
3. Each agent row shows a muted cost line below its description: `~5.2k prompt / 280 completion · 6/8 tools · last call 2h ago`.
4. Agents with no tool history show just description (no cost line).
5. Confirm cost line truncates with the existing text-muted-foreground / truncate pattern; doesn't break the layout at mobile widths (open Vivaldi devtools, set iPhone-13 viewport).
## Files expected to touch
- `apps/server/src/schema.sql` — ~35 LoC delta (view definition + filter comments)
- `apps/server/src/routes/tools.ts` — NEW, ~40 LoC
- `apps/server/src/index.ts` — 1 line (`registerToolsRoutes(app, sql)`)
- `apps/server/src/services/__tests__/tool_cost_stats.test.ts` — NEW, ~95 LoC
- `apps/web/src/api/types.ts` — ~7 LoC (interface)
- `apps/web/src/api/client.ts` — ~3 LoC (namespace + method)
- `apps/web/src/components/AgentPicker.tsx` — ~80 LoC delta (cost line + fetch hook + helpers)
Total ~260 LoC. Matches roadmap estimate.
## Workflow conventions
- Backups before destructive edits (above) on the two MODIFIED files. New files don't need backups.
- Sam reviews diffs. Never `git add` / `git commit` / `git push` / `git pull` on Sam's behalf.
- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
- Tests authoritative: `pnpm -C apps/server test`.
- View definition lives in `schema.sql` (idempotent via `CREATE OR REPLACE VIEW`); no migration shim needed.
## Don't repeat past mistakes
- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, `BUDGET_NO_AGENT=30`): all live. This batch depends on `includeUsage:true`. If unset, `tool_cost_stats` returns empty rows.
- v1.13.8 prefix instrumentation: untouched.
- v1.13.9 ratio-only `usable()`: untouched.
- v1.13.4 two-tier prune: untouched.
- v1.13.5 truncate.ts opaque-id pattern: untouched.
- v1.13.1-B `messages_with_parts` view: this view is the source. Don't reach past it to raw `messages`.
- v1.13.2 will DROP `messages.tool_calls`/`tool_results` columns. The `tool_cost_stats` view reads from `messages_with_parts` not `messages`, so it survives. Verify after v1.13.2 ships.
## Source files to read in project knowledge
- `boocode_roadmap.md` (v1.13.10 row at line 114; schema row at line 474)
- `boocode_code_review.md` (cost-tracking design background)
- `CLAUDE.md` (project conventions; messages_with_parts invariant at L80; v1.13.7 includeUsage invariant)
```

View File

@@ -0,0 +1,225 @@
# Handoff: BooCode v1.13.8 — system-prompt prefix stability verify-and-measure
#careful #boocode #nofluff
Recon-only / instrumentation batch. **No cache implementation in this dispatch.** Goal: prove (or disprove) that the assembled system-prompt prefix is byte-stable across turns under steady-state inputs. Result determines whether v1.13.7-as-originally-specced (the prefix cache) is actually needed at all.
## Where we are
- Last tag: `v1.13.7` — stability bundle (`includeUsage:true` + trim guards + payload filter for trailing empty/failed assistants + `BUDGET_NO_AGENT 15→30`). This shipped as a renumber of the original "prefix cache" v1.13.7 slot. The prefix-cache work moved to v1.13.8 with the change-of-shape captured here.
- Branch clean. `git log --oneline main -5` should show `…v1.13.7 v1.13.6 v1.13.5 v1.13.4 v1.13.3`.
## What v1.13.x has shipped
- v1.13.0 — `message_parts` table + dual-write.
- v1.13.1-A — AI SDK v6 install (`streamText` adapter, mid-dispatch silent-abort patch).
- v1.13.1-B — `messages_with_parts` view + read sites flipped.
- v1.13.1-C — `ask_user_input` correlation ported + reasoning end-to-end.
- v1.13.3 — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, `experimental_repairToolCall`.
- v1.13.4 — two-tier compaction prune.
- v1.13.5 — opencode `truncate.ts` port (`tr_<12char>` opaque ids on tmpfs).
- v1.13.6 — compaction head-assembly audit; reasoning_parts added to `buildHeadPayload`.
- v1.13.7 — stability bundle (the five fixes above).
## What's queued
- **v1.13.8 (this dispatch)** — prefix stability verify-and-measure
- v1.13.9 — compaction overflow trigger formula (opencode 0.85 × ctx_max)
- v1.13.10 — per-tool token cost accounting + AgentPicker UI
- v1.13.11 — WebSocket frame typing (Zod schemas both ends)
- v1.13.12 — skills audit pass (rules→recipes split)
- v1.13.2 — drop legacy columns (last; ≥1 week production traffic on v1.13.1 first)
## Why this is verify-first
The original v1.13.7 roadmap line was "system-prompt prefix cache, keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated." Recon during planning surfaced that:
- `apps/server/src/services/system-prompt.ts:buildSystemPrompt()` already runs over mtime-cached inputs:
- BOOCHAT.md / BOOCODER.md — cached in this file (`cachedGuidance`, line 25), keyed by mtime
- global + per-project AGENTS.md — cached in `services/agents.ts` (`safeStat` pattern, line 245), keyed by mtime
- `session.system_prompt` / `project.default_system_prompt` — DB scalars, byte-stable until edited
- BASE_SYSTEM_PROMPT — hardcoded template with `${projectPath}` interpolation
- Skills are NOT in the system prompt today. Discovered via `skill_find` at runtime.
- Tool schemas are NOT in the system message. They live in the OpenAI request body's `tools` field (already alpha-sorted by v1.13.3).
- Output assembly is a microsecond string concat with no I/O.
So in theory the prefix is already byte-stable across turns. **Nobody has measured it.** This batch closes that gap with logs + a unit test, no cache implementation. If stable across a real session → close v1.13.8 as no-op, drop the original cache plan, move to v1.13.9. If drift surfaces → next batch designs the fix against the actual failure mode.
## Scope (all three items)
### 1. Per-turn prefix fingerprint log
In `apps/server/src/services/system-prompt.ts`, after `buildSystemPrompt` finishes assembling `out`, before returning:
- Compute `sha256(out)` → hex string. Use `node:crypto`.
- Emit a single log line at `level=info` via a module-level pino instance (mirror the pattern used elsewhere in the inference services). Shape:
```ts
{
msg: 'prefix-fingerprint',
project_id: project.id,
agent_id: agent?.id ?? null,
agent_name: agent?.name ?? null,
session_id: session.id,
prefix_hash: <sha256 hex>,
prefix_length: out.length,
mtime_boochat: <number | null>, // from cachedGuidance.mtime, or null when guidance is null
has_agent_system_prompt: <boolean>,
has_session_override: session.system_prompt.trim().length > 0,
has_project_override: project.default_system_prompt.trim().length > 0,
}
```
The mtime fields surface which inputs changed when drift is observed. The hash itself is what proves equality.
`buildSystemPrompt` already reaches into `cachedGuidance` indirectly via `getContainerGuidance()` — expose `cachedGuidance?.mtime` for the log via a thin getter (`getCachedGuidanceMtime(): number | null`) so the log line carries it without re-statting.
For the AGENTS.md mtimes (global + per-project), `services/agents.ts` exposes them via the `cache` Map but no public accessor. Either (a) add a `getAgentsMtimes(projectPath: string): { global: number | null; project: number | null }` exported function to agents.ts, or (b) skip those fields in v1.13.8 and only log the BOOCHAT mtime. **Default: do (a).** If recon shows that's invasive, fall back to (b) and note the limitation in the smoke report.
### 2. Per-session drift observer
Module-level `Map<sessionId, lastHash>` in `system-prompt.ts`. On each `buildSystemPrompt` call:
- If `sessionId` is not in the map → set it, emit no extra log.
- If `sessionId` IS in the map and the hash matches → emit no extra log.
- If `sessionId` IS in the map and the hash DIFFERS → emit a second `level=warn` log:
```ts
{
msg: 'prefix-drift',
session_id: session.id,
prev_hash: <previous>,
new_hash: <current>,
prev_length: <number>,
new_length: <number>,
changed_inputs: <array of field names where mtime/flags changed since last call>,
}
```
`changed_inputs` is a small array like `['mtime_boochat']` or `['has_session_override']` — the field-level diff so we can see exactly what input drifted.
The map grows unboundedly across long-lived processes. Acceptable for v1.13.8 (instrumentation only, 5 min sessions in test). Add a TODO comment: "v1.13.x follow-up if it survives: LRU-bound this map at 1000 sessions." Don't implement the LRU now.
Add a `_resetPrefixObserverForTests()` export mirroring the existing `_resetContainerGuidanceCacheForTests()`.
### 3. Unit test for byte-stability
In `apps/server/src/services/__tests__/system-prompt.test.ts`, add a `describe('buildSystemPrompt stability', () => { ... })` block:
```ts
it('returns byte-identical output across two consecutive calls with the same inputs', async () => {
// set BOOCHAT.md, build (project, session, agent), capture hash
const first = await buildSystemPrompt(project, session, agent);
const second = await buildSystemPrompt(project, session, agent);
expect(first).toBe(second);
});
it('emits a single prefix-fingerprint log per call', async () => {
// capture logs via pino test transport or stub
// assert one prefix-fingerprint per buildSystemPrompt call
});
it('emits a prefix-drift log when the same session sees a different hash', async () => {
// build once; mutate BOOCHAT.md or pass a different agent; build again with same sessionId
// assert one prefix-drift log with prev_hash and new_hash populated
});
```
The first test is the load-bearing one — it locks in the byte-stability invariant going forward, regardless of what the production smoke surfaces.
## What NOT to do in this dispatch
- **Don't add a cache.** Output memoization is v1.13.9+ work IF the smoke proves it's needed. Implementing a cache before measurement is what the v1.13.6 audit was designed to catch — premature optimization disguised as correctness.
- **Don't change `buildSystemPrompt`'s return signature or async behavior.** The output stays a single string. Signature stays `(project, session, agent) => Promise<string>`.
- **Don't thread chat_id or anything else into the call.** `session.id` is sufficient as the observer key.
- **Don't log the full prefix text.** Hash + length only. The prefix can be many KB; logging it 5× per session blows up log size for no benefit. If drift appears and the hash diff is mysterious, `LOG_LEVEL=debug` can be wired in a follow-up.
- **Don't touch `messages_with_parts` or the CASE-WHEN-EXISTS fallback v1.13.4 added.** This batch is in `system-prompt.ts` only.
- **Don't preserve the AI SDK v6 silent-abort guard differently.** It's in `stream-phase.ts` and untouched.
## Recon (already done — paste these for the implementer's reference)
```
cd /opt/boocode
wc -l apps/server/src/services/system-prompt.ts
# → 83 lines
grep -n "^export|^function|^async function|cache|mtime" apps/server/src/services/system-prompt.ts
# → cachedGuidance at line 25; loadContainerGuidance / getContainerGuidance / _resetContainerGuidanceCacheForTests / buildSystemPrompt are the public surface
grep -rn "buildSystemPrompt" apps/server/src --include="*.ts" | grep -v "tests"
# → single caller: apps/server/src/services/inference/payload.ts:41
# → also referenced in routes/sessions.ts (session-create flow may call it for preview; verify during implementation)
grep -n "safeStat\|cache\|mtime" apps/server/src/services/agents.ts
# → mtime-keyed cache (Map) at line 245, TTL 60_000ms, key = projectPath || '__none__'
# → safeStat pattern at line 255
```
## Verification protocol (smoke)
After deploy:
1. Fresh BooChat session, default agent (no agent selected).
2. Send 5 short messages, wait for each turn to complete.
3. `docker compose logs --since=10m boocode | grep -E 'prefix-fingerprint|prefix-drift'`
**Success criteria:**
- 5 `prefix-fingerprint` lines (one per turn — assuming each turn calls `buildSystemPrompt` once via `buildMessagesPayload`).
- All 5 lines have identical `prefix_hash` and `prefix_length`.
- Zero `prefix-drift` lines.
**Failure modes to characterize:**
- Drift WITH a corresponding mtime change in `changed_inputs` → expected if BOOCHAT.md or AGENTS.md was edited mid-session. Note in smoke report; not a bug.
- Drift WITHOUT any mtime/flag change in `changed_inputs` → assembly nondeterminism somewhere. **This is the bug case.** Report the exact `prev_hash`/`new_hash` pair and full `prefix-fingerprint` log lines from before and after the drift.
- Multiple `prefix-fingerprint` lines per turn → `buildSystemPrompt` is being called more than once per turn (possibly from compaction or sentinel-summary paths). Note in smoke report; not necessarily a bug but worth understanding.
- ANY successful turn that emits zero `prefix-fingerprint` lines → log statement isn't reached. Implementation bug.
Repeat the smoke in a second session (different agent if available) to also confirm cross-session prefix differs only where expected (different `project.id`, different `agent_id`).
## Files expected to touch
- `apps/server/src/services/system-prompt.ts` — add hash + log + observer + getter (~50 LoC)
- `apps/server/src/services/agents.ts` — add `getAgentsMtimes()` accessor (~15 LoC if going with default option)
- `apps/server/src/services/__tests__/system-prompt.test.ts` — 3 new tests (~30 LoC)
- `apps/server/package.json` — none expected (pino + node:crypto already available)
Total ~95 LoC.
## Workflow conventions (boocode)
- Backup before destructive: `cp file file.bak-$(date +%Y%m%d-%H%M%S)`. (Files get gitignored via global `*.bak*`.)
- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
- Tests: `pnpm -C apps/server test`. Smoke after deploy.
- Type-check: `npx tsc -p apps/web/tsconfig.app.json --noEmit` is authoritative for web; `pnpm -C apps/server build` is authoritative for server.
- Sam reviews diffs. Never `git add`/`commit`/`push`/`pull` on Sam's behalf.
- Tag after commit: `git tag v1.13.8` (lightweight), then push via the Gitea deploy key:
`GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin v1.13.8`
## Repo layout pointers
- `apps/server/src/services/system-prompt.ts` — primary target (83 lines)
- `apps/server/src/services/agents.ts` — for the mtimes accessor
- `apps/server/src/services/inference/payload.ts:41` — call site
- `apps/server/src/services/__tests__/system-prompt.test.ts` — extend tests here
- `apps/server/vitest.config.ts` — test glob is `src/**/__tests__/**/*.test.ts`
## Open questions for Sam during recon
1. **`getAgentsMtimes()` accessor in agents.ts vs BOOCHAT-only log.** Default: add the accessor. If implementation surface is bigger than expected (e.g. the agents.ts cache structure makes it awkward), fall back to BOOCHAT-only and note the gap.
2. **What counts as a "turn" for the observer's `Map<sessionId, lastHash>`?** Default: every `buildSystemPrompt` call. If recon shows that compaction / sentinel-summary paths also call `buildSystemPrompt` and would generate noise, gate the observer to inference-turn calls only. Cleanest signal vs. cleanest implementation.
3. **Log severity for `prefix-drift`.** Default: `warn`. If Sam expects routine BOOCHAT.md edits to fire it, downgrade to `info`. The smoke will surface this — adjust during smoke if needed.
## Don't repeat past mistakes
- AI SDK v6 silent-abort guard in `stream-phase.ts`: untouched.
- v1.13.4 view fix (COALESCE → CASE-WHEN-EXISTS): untouched. This batch is in `system-prompt.ts` only.
- v1.13.5 truncate.ts: untouched.
- v1.13.6 reasoning embed in compaction: untouched.
- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, budget bump): all live. Don't undo.
## Source files to read in project knowledge
- `boocode_roadmap.md` (last updated 2026-05-22; v1.13.x cleanup line order locked)
- `boocode_code_review.md` (no lift source for v1.13.8 — in-house instrumentation)
- `CLAUDE.md` (project conventions, NodeNext imports, vitest include glob, etc.)
- This handoff (`handoff_v1.13.8_prefix_verify.md`)