v1.13.15-openspec: reformat batch docs to OpenSpec directory structure
Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal,
specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep
documentation reformat; replaces ad-hoc boocode_batchN.md /
handoff_vN.N.N.md convention.
Existing batch docs moved into openspec/changes/archived/ via git mv
(preserves history):
- boocode_batch10.md
- handoff_v1.13.8_prefix_verify.md
- handoff_v1.13.10_per_tool_cost.md
Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The
work was already shipped; the originals are preserved as archived
snapshots. New v1.13.15+ batches land directly in
openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when
applicable) per the convention documented in openspec/README.md.
CLAUDE.md gained a one-line pointer to the convention (workflow
section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both
remain well under the AgentLint hard caps.
specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+).
No CLI dep added in this batch — directory structure only. If/when the
full OpenSpec lifecycle is adopted, that lands as a separate batch.
This commit is contained in:
38
openspec/README.md
Normal file
38
openspec/README.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# openspec
|
||||
|
||||
Per-batch documentation convention adopted v1.13.15-openspec.
|
||||
|
||||
Lift source: Fission-AI/OpenSpec directory layout. **No CLI dependency** — just
|
||||
the folder shape. Full OpenSpec lifecycle adoption is a future v1.14+ batch.
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
openspec/
|
||||
changes/
|
||||
<slug>/ # one folder per shipped or planned batch
|
||||
proposal.md # Why + scope summary
|
||||
tasks.md # implementation step list
|
||||
design.md # architecture / data-model decisions (optional)
|
||||
specs/ # reserved for future OpenSpec CLI adoption
|
||||
archived/ # snapshots of pre-v1.13.15 batch docs
|
||||
<original-filename>.md
|
||||
specs/ # global specs, future v1.14+ use
|
||||
```
|
||||
|
||||
## Conventions
|
||||
|
||||
- Slugs are lowercase-hyphenated derived from the batch title
|
||||
(e.g. `v1-13-10-per-tool-cost`, `file-attachments-v3-5`).
|
||||
- Already-shipped pre-v1.13.15 batches live in `changes/archived/` as
|
||||
single-file snapshots. They were not split into proposal/tasks because
|
||||
the work was already complete; archiving preserves git history.
|
||||
- New v1.13.15+ batches should land directly in
|
||||
`changes/<slug>/proposal.md` (+ tasks.md, + design.md when applicable).
|
||||
- `proposal.md` carries the "Why" and scope. `tasks.md` is the action list
|
||||
(numbered or checkbox). `design.md` is for non-trivial architectural
|
||||
decisions worth recording separately.
|
||||
- A canonical dispatch brief (matching the v1.13.9 / v1.13.10 format)
|
||||
is most naturally split as proposal.md (Where we are, Why this matters,
|
||||
rationale sections) + tasks.md (Scope items, Build + smoke) + design.md
|
||||
(Attribution model, Filtering, Canonical mapping).
|
||||
269
openspec/changes/archived/boocode_batch10.md
Normal file
269
openspec/changes/archived/boocode_batch10.md
Normal file
@@ -0,0 +1,269 @@
|
||||
# BooCode v1.1 — Batch 10
|
||||
|
||||
**Theme:** BooTerm. Second container, dedicated to in-browser terminals. Per-session tmux. xterm.js + node-pty in-container. New pane type wires into the BooCode shell.
|
||||
**Status:** Planned. Largest batch in v1.1. Depends on Batch 3 (pane system), Batch 7 (settings drawer pattern reused).
|
||||
**Repo:** `/opt/boocode/` (shared monorepo). New `apps/booterm/` subdirectory.
|
||||
|
||||
## Goals
|
||||
|
||||
1. New container `booterm` running Fastify + node-pty + tmux. Per-session tmux session keyed by `(user, session_id)`.
|
||||
2. xterm.js terminal pane in the BooCode shell. Multiple terminal panes per session, each attached to a separate tmux window.
|
||||
3. PTY traffic over WebSocket. Auth via `Remote-User`.
|
||||
4. tmux as session manager so terminals survive WebSocket reconnects, page refreshes, even container restarts.
|
||||
5. Read+write capability scoped to project root. No `cd ..` escape.
|
||||
6. Path-based routing: `code.indifferentketchup.com/api/term/*` → booterm; `/ws/term/*` → booterm.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
browser ──HTTPS──> Caddy (droplet) ──Tailscale──> Authelia
|
||||
│
|
||||
├── /api/chat/*, /ws/chat/* → boocode :9500
|
||||
├── /api/term/*, /ws/term/* → booterm :9501
|
||||
└── / → boocode (SPA)
|
||||
|
||||
booterm container:
|
||||
- Fastify (Node 20)
|
||||
- node-pty
|
||||
- tmux installed in container (apk add tmux)
|
||||
- same Postgres (boocode_db)
|
||||
- mounts projects rw (scoped)
|
||||
```
|
||||
|
||||
### Mount strategy
|
||||
|
||||
Decided: Option A. Per-project bind mounts in `docker-compose.yml`. Already applied: booterm has `/opt:/opt:rw` to keep parity with the existing boocode mount and avoid enumerating roots. Project root for any given session derives from `projects.root_path` and tmux launches with `cwd` set there.
|
||||
|
||||
### tmux session naming
|
||||
|
||||
Per-session tmux:
|
||||
|
||||
```
|
||||
tmux session name: bc-<session_id> (UUID, sanitized — alphanumeric + hyphen)
|
||||
tmux windows: term-<pane_id> (one window per terminal pane)
|
||||
```
|
||||
|
||||
booterm spawns `tmux new-session -d -s bc-<sid> -c <project_root>` lazily on first attach. Subsequent attaches do `tmux new-window -t bc-<sid>` for additional panes, or `tmux attach -t bc-<sid>` and select window.
|
||||
|
||||
## Data model
|
||||
|
||||
| Column | On | Type | Default | Notes |
|
||||
|---|---|---|---|---|
|
||||
| (none) | — | — | — | terminals are tmux-managed, no DB rows |
|
||||
| `kind = 'terminal'` | `session_panes.kind` CHECK | — | — | Extend CHECK to include `'terminal'` |
|
||||
| `state.tmux_window` | `session_panes.state` JSONB | TEXT | NULL | Which tmux window this pane attaches to |
|
||||
|
||||
Schema (already applied to live DB + schema.sql):
|
||||
|
||||
```sql
|
||||
ALTER TABLE session_panes DROP CONSTRAINT IF EXISTS session_panes_kind_check;
|
||||
ALTER TABLE session_panes ADD CONSTRAINT session_panes_kind_check
|
||||
CHECK (kind IN ('chat', 'file_browser', 'terminal'));
|
||||
```
|
||||
|
||||
## Backend (booterm)
|
||||
|
||||
New app at `apps/booterm/`:
|
||||
|
||||
```
|
||||
apps/booterm/
|
||||
├── src/
|
||||
│ ├── index.ts # Fastify + WS + auth
|
||||
│ ├── auth.ts # Remote-User middleware (same pattern as boocode)
|
||||
│ ├── db.ts # pg pool (shared boocode_db)
|
||||
│ ├── routes/
|
||||
│ │ ├── health.ts
|
||||
│ │ └── terminals.ts # POST /api/term/sessions/:sid/panes/:pid/start (creates tmux window)
|
||||
│ ├── pty/
|
||||
│ │ ├── manager.ts # tmux process management
|
||||
│ │ └── pty.ts # node-pty wrapper for `tmux attach -t ... -d`
|
||||
│ └── ws/
|
||||
│ └── attach.ts # WS /ws/term/sessions/:sid/panes/:pid → PTY bidi pipe
|
||||
├── package.json
|
||||
└── tsconfig.json
|
||||
```
|
||||
|
||||
### Endpoints
|
||||
|
||||
| Method | Path | Notes |
|
||||
|---|---|---|
|
||||
| GET | `/api/term/health` | Ping |
|
||||
| POST | `/api/term/sessions/:sid/panes/:pid/start` | Idempotent tmux window create. Returns `{tmux_window: "term-<pid>"}` |
|
||||
| WS | `/ws/term/sessions/:sid/panes/:pid` | Attach PTY |
|
||||
| POST | `/api/term/sessions/:sid/panes/:pid/resize` | `{cols, rows}` |
|
||||
| POST | `/api/term/sessions/:sid/panes/:pid/kill` | Kill the tmux window |
|
||||
|
||||
WS frames (binary or text):
|
||||
|
||||
```
|
||||
client → server: pty input (raw bytes, typed by user)
|
||||
server → client: pty output (raw bytes from shell)
|
||||
server → client: {type: "exit", code} on window close
|
||||
```
|
||||
|
||||
### Auth + scoping
|
||||
|
||||
- `Remote-User` required on WS upgrade.
|
||||
- `session_id` validated: lookup in `sessions` table; require row exists.
|
||||
- `pane_id` validated: must exist in `session_panes` with `kind = 'terminal'` and matching `session_id`.
|
||||
- Project root derived from `sessions.project_id → projects.root_path`. tmux starts `cd <root>` in that dir. **No chroot.** User can `cd /` and read anything mounted into the container.
|
||||
- Future hardening: namespace/chroot. Out of v1.1 scope.
|
||||
|
||||
### tmux config
|
||||
|
||||
`apps/booterm/tmux.conf` bundled into image at `/etc/booterm/tmux.conf`; tmux invocations use `-f /etc/booterm/tmux.conf`:
|
||||
|
||||
```
|
||||
set -g default-terminal "screen-256color"
|
||||
set -g history-limit 50000
|
||||
set -g mouse on
|
||||
setw -g mode-keys vi
|
||||
set -g status off
|
||||
set -g destroy-unattached off
|
||||
```
|
||||
|
||||
Boolab pattern (from `services/tmux_session.py`).
|
||||
|
||||
## Frontend
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `apps/web/src/components/panes/TerminalPane.tsx` (NEW) | xterm.js mount, WS attach, resize handler |
|
||||
| `apps/web/src/api/client.ts` | `api.terminals.start(sessionId, paneId)`, `api.terminals.resize(...)`, `api.terminals.kill(...)` |
|
||||
| `apps/web/src/components/Workspace.tsx` | Add 'terminal' to the pane kind enum; spawn button → POST start → render TerminalPane. Tab UI lives in Workspace.tsx — there is no PaneTab.tsx file. |
|
||||
| `apps/web/package.json` | `xterm` + `xterm-addon-fit` + `xterm-addon-web-links` |
|
||||
|
||||
### TerminalPane
|
||||
|
||||
```tsx
|
||||
useEffect(() => {
|
||||
const term = new Terminal({ fontFamily: 'JetBrains Mono', fontSize: 14, theme: ... });
|
||||
const fit = new FitAddon();
|
||||
term.loadAddon(fit);
|
||||
term.loadAddon(new WebLinksAddon());
|
||||
term.open(containerRef.current);
|
||||
fit.fit();
|
||||
|
||||
const proto = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
||||
const ws = new WebSocket(`${proto}//${window.location.host}/ws/term/sessions/${sid}/panes/${pid}`);
|
||||
ws.binaryType = 'arraybuffer';
|
||||
ws.onmessage = e => term.write(typeof e.data === 'string' ? e.data : new Uint8Array(e.data));
|
||||
term.onData(data => ws.send(data));
|
||||
term.onResize(({ cols, rows }) => api.terminals.resize(sid, pid, cols, rows));
|
||||
|
||||
const ro = new ResizeObserver(() => fit.fit());
|
||||
ro.observe(containerRef.current);
|
||||
|
||||
return () => { ws.close(); term.dispose(); ro.disconnect(); };
|
||||
}, [sid, pid]);
|
||||
```
|
||||
|
||||
Dev: vite.config.ts needs `/api/term` and `/ws/term` proxy entries mirroring the existing `/api` and `/ws` ones.
|
||||
|
||||
## Send-to-terminal from chat
|
||||
|
||||
Boolab pattern: select text in a message → "Send to terminal" button → text becomes terminal input.
|
||||
|
||||
- Right-click context menu on selected text in chat → "Send to terminal" submenu lists open terminal panes.
|
||||
- Click target → sends `<text>\n` to that pane's WS.
|
||||
|
||||
Implementation:
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `apps/web/src/components/MessageBubble.tsx` | Selection handler + context menu |
|
||||
| `apps/web/src/lib/events.ts` | New event `send_to_terminal` with payload `{pane_id, text}` |
|
||||
| `apps/web/src/components/panes/TerminalPane.tsx` | Subscribe to event for its `pane_id`, write to WS |
|
||||
|
||||
## Docker compose (already applied)
|
||||
|
||||
booterm service is already in `docker-compose.yml` with:
|
||||
- build context `.`, dockerfile `apps/booterm/Dockerfile`
|
||||
- port `100.114.205.53:9501:3000`
|
||||
- `/opt:/opt:rw` mount
|
||||
- `DATABASE_URL` env pointing at `boocode_db`
|
||||
- `boocode_net` network
|
||||
- depends_on: `boocode_db`
|
||||
|
||||
Do not re-edit compose.
|
||||
|
||||
## Backend dependencies
|
||||
|
||||
`apps/booterm/package.json`:
|
||||
- `fastify`
|
||||
- `@fastify/websocket`
|
||||
- `pg`
|
||||
- `zod`
|
||||
- `node-pty`
|
||||
- `tslib`
|
||||
|
||||
`node-pty` requires native build. Dockerfile installs `python3 make g++` in build stage and `tmux` in runtime stage:
|
||||
|
||||
```dockerfile
|
||||
FROM node:20-alpine AS build
|
||||
RUN apk add --no-cache python3 make g++ tmux
|
||||
WORKDIR /app
|
||||
COPY ...
|
||||
RUN pnpm install --frozen-lockfile && pnpm build
|
||||
|
||||
FROM node:20-alpine
|
||||
RUN apk add --no-cache tmux
|
||||
WORKDIR /app
|
||||
COPY --from=build /app/apps/booterm/dist ./dist
|
||||
COPY --from=build /app/node_modules ./node_modules
|
||||
EXPOSE 3000
|
||||
CMD ["node", "dist/index.js"]
|
||||
```
|
||||
|
||||
## Files to touch
|
||||
|
||||
**New app:**
|
||||
|
||||
- `apps/booterm/` (entire subtree)
|
||||
|
||||
**Existing changes:**
|
||||
|
||||
- `apps/web/package.json`
|
||||
- `apps/web/src/api/client.ts`
|
||||
- `apps/web/src/api/types.ts`
|
||||
- `apps/web/src/components/Workspace.tsx`
|
||||
- `apps/web/src/components/MessageBubble.tsx`
|
||||
- `apps/web/src/components/panes/TerminalPane.tsx` (NEW)
|
||||
- `apps/web/src/lib/events.ts`
|
||||
- `apps/web/vite.config.ts` (proxy entries)
|
||||
|
||||
**Already done by user — do not touch:**
|
||||
|
||||
- `docker-compose.yml` (booterm service added)
|
||||
- `apps/server/src/schema.sql` (terminal CHECK constraint)
|
||||
- Live DB constraint applied
|
||||
|
||||
## Verification
|
||||
|
||||
1. `docker compose up -d --build booterm` → container healthy.
|
||||
2. `curl -s http://100.114.205.53:9501/api/term/health -H 'Remote-User: sam'` → 200.
|
||||
3. Browser smoke test:
|
||||
- Open a session. Workspace → "+ Terminal" → terminal pane appears with shell prompt in project root.
|
||||
- Type `ls -la` → output.
|
||||
- Type `vim test.txt`, write something, save, `:q` → file exists on host (since rw mount).
|
||||
- Refresh browser → terminal reconnects, history intact (tmux persistence).
|
||||
- Open second terminal pane → same project, separate tmux window. Both work independently.
|
||||
- Select code in chat → right-click → "Send to terminal" → terminal pane receives the text.
|
||||
- Container restart (`docker compose restart booterm`) → on reconnect, tmux session resumes from where it left off.
|
||||
- Close pane via tab context menu → tmux window killed. Reopen pane → fresh shell.
|
||||
|
||||
## Constraints
|
||||
|
||||
- node-pty is a native dep. Image size grows.
|
||||
- tmux history capped at 50k lines per window.
|
||||
- WebSocket frames are bidirectional binary; `binaryType = 'arraybuffer'`.
|
||||
- Resize debounced 100ms client-side; backend `tmux resize-window` per resize.
|
||||
- No chroot/namespace isolation in v1.1. User has full read+write under `/opt/`. Acceptable for single-user homelab.
|
||||
- Don't expose 9501 on 0.0.0.0. Tailscale binding only (already configured in compose).
|
||||
|
||||
## Open
|
||||
|
||||
- Color theme matching for xterm.js. Defer.
|
||||
- File-drop into terminal (upload via terminal pane). Out of scope.
|
||||
- Multi-user (each user gets own tmux server) — defer until BooCode goes multi-user, which isn't planned.
|
||||
- BooCoder container — same skeleton as booterm but with edit_file / create_file tools instead of PTY. Will follow this pattern when built.
|
||||
441
openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
Normal file
441
openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
Normal file
@@ -0,0 +1,441 @@
|
||||
```
|
||||
#careful #boocode #nofluff
|
||||
|
||||
v1.13.10 — per-tool token cost accounting (rolling 100-call window)
|
||||
|
||||
Goal: surface per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of `messages_with_parts` (no new table, no new write site) + a read endpoint + AgentPicker tooltip extension. Estimated ~240 LoC, mostly UI.
|
||||
|
||||
## Where we are
|
||||
|
||||
- Last tag: v1.13.9 (compaction overflow trigger — `floor(0.85 × ctx_max)` early-trigger). Branch clean.
|
||||
- v1.13.x cleanup line ✅ through v1.13.9. Queued: v1.13.10 (this) → v1.13.11 (WS Zod) → v1.13.12 (skills audit) → v1.13.2 (column drop, last).
|
||||
- Dependency (satisfied since v1.13.7 commit `ff29b48`): `includeUsage: true` on `createOpenAICompatible` in `apps/server/src/services/inference/provider.ts`. Without it, `messages.tokens_used`/`ctx_used` were NULL for v1.13.1-A → v1.13.7 (latent regression). Now populated.
|
||||
|
||||
## Why this matters
|
||||
|
||||
Today: AgentPicker lists agents by name + description. No cost signal. Users pick the architect agent (full tool whitelist, 21k of tool schema) for one-liner questions a refactorer (3 tools, 4k schema) could answer.
|
||||
|
||||
Tomorrow: each agent listing shows its mean prompt + completion cost per tool, derived from the last 100 invocations across all chats. Decision aid, not a hard gate.
|
||||
|
||||
Why a SQL view instead of a denormalized stats table:
|
||||
- All the source data already lands in `messages` (tool_calls JSON + tokens_used + ctx_used) and `message_parts` (read via the `messages_with_parts` view). Zero new write sites.
|
||||
- Rolling 100-call window is a `ROW_NUMBER() OVER (PARTITION BY tool_name ORDER BY created_at DESC) <= 100` — natural fit for a view.
|
||||
- View is rollback-safe. If the math is wrong, `DROP VIEW` and re-deploy; no orphan rows, no backfill.
|
||||
- At BooCode scale (single user, ~30 tools, ~100 calls/tool), aggregate-on-read is microseconds. Premature to denormalize.
|
||||
|
||||
The roadmap schema row (`tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)`) matches both a table and a view. View is the lighter implementation.
|
||||
|
||||
## Canonical column mapping (pinned)
|
||||
|
||||
The `messages` columns are named non-obviously. Pinned mapping, confirmed across 5 write sites + 1 read site:
|
||||
|
||||
| Column | Semantic meaning | AI SDK v6 source name |
|
||||
|-----------------|--------------------|-----------------------|
|
||||
| `ctx_used` | prompt / input tokens | `usage.inputTokens` |
|
||||
| `tokens_used` | completion / output tokens | `usage.outputTokens` |
|
||||
|
||||
Write sites confirmed: `tool-phase.ts:94-95`, `error-handler.ts:109-110`, `sentinel-summaries.ts:130-131`, `sentinel-summaries.ts:387-388`, `stream-phase.ts:319-320`. Canonical read at `payload.ts:190-191` reverses: `const promptTokens = updated.ctx_used; const completionTokens = updated.tokens_used`.
|
||||
|
||||
`tokens_used` reads like "total" but is completion only. Project convention since the columns predate v1.13.x. Do not "fix" the naming inside this batch — out of scope; downstream consumers depend on the current mapping.
|
||||
|
||||
## Attribution model
|
||||
|
||||
A single assistant turn can emit N tool calls in parallel. llama-swap returns ONE (prompt_tokens, completion_tokens) per turn, not per tool. Attribution requires a split.
|
||||
|
||||
**Chosen approach: equal split.** For an assistant turn that emits N tool calls with prompt P and completion C, each tool is attributed P/N prompt + C/N completion. The 100-call rolling mean smooths split noise. Implementation: `tokens_used::float / jsonb_array_length(tool_calls)` at the unnest site.
|
||||
|
||||
**Alternatives rejected:**
|
||||
- "Full turn cost to every tool" (no division). Over-states; a 5-tool turn would 5×-count every tool's cost.
|
||||
- "Result-size only" (`length(JSON.stringify(output)) / 4`). Loses the LLM's actual usage signal; doesn't capture how expensive a tool's output is to the next prompt.
|
||||
- "Consuming-turn delta" (next turn prompt_tokens − this turn prompt_tokens, attribute to the tool that emitted the result). Most accurate but requires bubble-back math through the `executeToolPhase → runAssistantTurn` recursion. Over-engineered for the rolling-average use case.
|
||||
|
||||
**If Sam wants a different split, change one line in the view definition (the divisor).**
|
||||
|
||||
## Filtering — sentinel, failure, repair-call semantics
|
||||
|
||||
The view excludes rows that aren't real tool-cost signal:
|
||||
|
||||
- **Failed and cancelled turns** (`status != 'complete'`). The `error-handler.ts` failed/cancelled paths don't write `tokens_used`/`ctx_used`, so the existing `tokens_used IS NOT NULL` clause already filters these. Adding `status='complete'` is defense in depth and makes intent explicit.
|
||||
- **Cap-hit and doom-loop sentinel rows** (`metadata->>'kind' IN ('cap_hit', 'doom_loop')`). Sentinels are `role='system'` rows with `tool_calls=NULL`, so the existing `tool_calls IS NOT NULL` clause already filters them. The explicit metadata filter is defense in depth — it survives future schema drift where someone might INSERT a sentinel with a non-null tool_calls.
|
||||
- **`experimental_repairToolCall` retries.** No special handling needed. Our impl (per `CLAUDE.md`) is pass-through — malformed calls flow to zod-reject → tool_result error → next normal turn handles. No separate rows; the next turn's tokens count naturally.
|
||||
|
||||
## Recon (already done; paste for reference)
|
||||
|
||||
```
|
||||
cd /opt/boocode
|
||||
grep -n "tokens_used\|ctx_used\|inputTokens\|outputTokens" apps/server/src/services/inference/*.ts | head -30
|
||||
grep -n "metadata\|cap_hit\|doom_loop" apps/server/src/services/inference/sentinels.ts apps/server/src/schema.sql | head -10
|
||||
psql -h localhost -p 5432 -U postgres -d boocode -c "\d messages_with_parts" | head -30
|
||||
```
|
||||
|
||||
Expected: confirms the canonical mapping in the table above; confirms `messages.metadata jsonb` exists at `schema.sql:259`; confirms `messages_with_parts` exposes `m.metadata` at `schema.sql:92`.
|
||||
|
||||
## Scope
|
||||
|
||||
### 1. schema.sql — `tool_cost_stats` view (~35 LoC)
|
||||
|
||||
Append after the `messages_with_parts` view (after line 120):
|
||||
|
||||
```sql
|
||||
-- v1.13.10: per-tool token cost rolling window. Derives from
|
||||
-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
|
||||
-- the legacy JSON column) so this works whether the chat predates v1.13.0
|
||||
-- or postdates v1.13.2 (column drop). No new write site — all source data
|
||||
-- already lands via the existing tool-phase.ts:94-95 UPDATE.
|
||||
--
|
||||
-- Attribution model: equal split. A turn emitting N tool calls divides its
|
||||
-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
|
||||
-- brief for rationale + rejected alternatives.
|
||||
--
|
||||
-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
|
||||
-- = completion (output). Non-obvious naming; pinned via canonical writes at
|
||||
-- tool-phase.ts:94-95 et al.
|
||||
--
|
||||
-- Filtering rationale:
|
||||
-- status='complete' — exclude failed/cancelled (defense in
|
||||
-- depth; failed-path doesn't write
|
||||
-- tokens_used so they're also filtered
|
||||
-- indirectly).
|
||||
-- metadata->>'kind' exclusions — exclude cap_hit / doom_loop sentinels
|
||||
-- (defense in depth; sentinels are
|
||||
-- role='system' with tool_calls=NULL
|
||||
-- so they're filtered indirectly too).
|
||||
-- experimental_repairToolCall — no special handling; retries flow
|
||||
-- as normal next-turn tool_result
|
||||
-- errors and count naturally.
|
||||
--
|
||||
-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
|
||||
-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
|
||||
-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
|
||||
CREATE OR REPLACE VIEW tool_cost_stats AS
|
||||
WITH per_call AS (
|
||||
SELECT
|
||||
(tc->>'name')::text AS tool_name,
|
||||
(m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
|
||||
(m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
|
||||
m.created_at,
|
||||
ROW_NUMBER() OVER (
|
||||
PARTITION BY (tc->>'name')::text
|
||||
ORDER BY m.created_at DESC
|
||||
) AS rn
|
||||
FROM messages_with_parts m,
|
||||
LATERAL jsonb_array_elements(m.tool_calls) AS tc
|
||||
WHERE m.tool_calls IS NOT NULL
|
||||
AND jsonb_array_length(m.tool_calls) > 0
|
||||
AND m.tokens_used IS NOT NULL
|
||||
AND m.ctx_used IS NOT NULL
|
||||
AND m.status = 'complete'
|
||||
AND (m.metadata IS NULL
|
||||
OR m.metadata->>'kind' IS NULL
|
||||
OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
|
||||
)
|
||||
SELECT
|
||||
tool_name,
|
||||
ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
|
||||
ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
|
||||
COUNT(*)::int AS n_calls,
|
||||
MAX(created_at) AS updated_at
|
||||
FROM per_call
|
||||
WHERE rn <= 100
|
||||
GROUP BY tool_name;
|
||||
```
|
||||
|
||||
Notes:
|
||||
- `NULLIF(..., 0)` guards against div-by-zero on `jsonb_array_length=0` (should never happen given the WHERE clause, but defensive).
|
||||
- `ROUND(SUM(...))::int` — frontend doesn't want decimals; sum-then-round is more accurate than per-row round-then-sum.
|
||||
- View is read from `messages_with_parts` not `messages`, so legacy pre-v1.13.0 rows and post-v1.13.2 rows both resolve.
|
||||
- No index needed; the underlying `idx_messages_chat` covers the JOIN; the LATERAL unnest is bounded by the 100-row partition.
|
||||
|
||||
### 2. apps/server/src/routes/tools.ts (NEW, ~40 LoC)
|
||||
|
||||
New route file. Register in `apps/server/src/index.ts` next to the other `register*Routes(app, sql, ...)` calls.
|
||||
|
||||
```ts
|
||||
import type { FastifyInstance } from 'fastify';
|
||||
import type { Sql } from '../db.js';
|
||||
|
||||
export interface ToolCostStat {
|
||||
tool_name: string;
|
||||
mean_prompt_tokens: number;
|
||||
mean_completion_tokens: number;
|
||||
n_calls: number;
|
||||
updated_at: string;
|
||||
}
|
||||
|
||||
export function registerToolsRoutes(app: FastifyInstance, sql: Sql) {
|
||||
app.get('/api/tools/cost_stats', async () => {
|
||||
const rows = await sql<{
|
||||
tool_name: string;
|
||||
prompt_tokens_sum: number;
|
||||
completion_tokens_sum: number;
|
||||
n_calls: number;
|
||||
updated_at: string;
|
||||
}[]>`
|
||||
SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
|
||||
FROM tool_cost_stats
|
||||
ORDER BY tool_name ASC
|
||||
`;
|
||||
const stats: ToolCostStat[] = rows.map(r => ({
|
||||
tool_name: r.tool_name,
|
||||
mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
|
||||
mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
|
||||
n_calls: r.n_calls,
|
||||
updated_at: r.updated_at,
|
||||
}));
|
||||
return { stats };
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
Route is bodyless, idempotent, cheap. No pagination (≤30 tools).
|
||||
|
||||
### 3. apps/server/src/services/__tests__/tool_cost_stats.test.ts (NEW, ~95 LoC)
|
||||
|
||||
Integration test against real Postgres (matches `inference.test.ts` pattern). Fixtures:
|
||||
|
||||
```ts
|
||||
import { describe, it, expect, beforeEach } from 'vitest';
|
||||
import { connect } from '../../db.js';
|
||||
|
||||
describe('tool_cost_stats view (v1.13.10)', () => {
|
||||
// ... session + chat + project setup helpers ...
|
||||
|
||||
it('returns empty when no tool calls exist', async () => {
|
||||
// fresh chat, only user/assistant text turns
|
||||
const stats = await sql`SELECT * FROM tool_cost_stats`;
|
||||
expect(stats).toEqual([]);
|
||||
});
|
||||
|
||||
it('attributes single-tool turn fully to that tool', async () => {
|
||||
// insert one assistant message with tool_calls=[{name: 'view_file', ...}],
|
||||
// tokens_used=300, ctx_used=15000, status='complete'
|
||||
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
|
||||
expect(stats[0]).toMatchObject({
|
||||
tool_name: 'view_file',
|
||||
prompt_tokens_sum: 15000,
|
||||
completion_tokens_sum: 300,
|
||||
n_calls: 1,
|
||||
});
|
||||
});
|
||||
|
||||
it('splits multi-tool turn equally across tools', async () => {
|
||||
// insert one assistant turn with 3 tool calls (view_file, grep, list_dir),
|
||||
// tokens_used=300, ctx_used=15000 → each tool gets 100 completion, 5000 prompt
|
||||
const stats = await sql`SELECT * FROM tool_cost_stats ORDER BY tool_name`;
|
||||
expect(stats).toHaveLength(3);
|
||||
for (const s of stats) {
|
||||
expect(s.completion_tokens_sum).toBe(100);
|
||||
expect(s.prompt_tokens_sum).toBe(5000);
|
||||
expect(s.n_calls).toBe(1);
|
||||
}
|
||||
});
|
||||
|
||||
it('limits to last 100 calls per tool (FIFO window)', async () => {
|
||||
// insert 150 turns each calling view_file once with monotonically
|
||||
// increasing tokens_used; expect only the most recent 100 to count
|
||||
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
|
||||
expect(stats[0]!.n_calls).toBe(100);
|
||||
// mean should reflect the latter half (51..150), not 1..150
|
||||
});
|
||||
|
||||
it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
|
||||
// insert a turn with tool_calls but tokens_used=NULL → must not appear
|
||||
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
|
||||
expect(stats).toEqual([]);
|
||||
});
|
||||
|
||||
it('excludes failed and cancelled turns + sentinel metadata rows', async () => {
|
||||
// insert four rows for tool_name='view_file', all with tokens_used+ctx_used
|
||||
// populated:
|
||||
// row A: status='failed' — excluded
|
||||
// row B: status='cancelled' — excluded
|
||||
// row C: status='complete', metadata={kind:'cap_hit'} — excluded
|
||||
// row D: status='complete', metadata={kind:'doom_loop'} — excluded
|
||||
// row E: status='complete', metadata=null — included
|
||||
// Expect n_calls=1, attributable to row E only.
|
||||
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
|
||||
expect(stats[0]!.n_calls).toBe(1);
|
||||
});
|
||||
|
||||
it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
|
||||
// insert a v1.13.0+ row with messages.tool_calls=NULL but
|
||||
// message_parts rows containing the tool_call → must still aggregate
|
||||
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='grep'`;
|
||||
expect(stats[0]!.n_calls).toBe(1);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
Pattern: each test resets the messages table for the fixture chat (TRUNCATE not DELETE — Postgres `messages` has FK CASCADE) and inserts hand-crafted rows. The view is recomputed on every SELECT.
|
||||
|
||||
### 4. apps/web/src/api/types.ts + client.ts (~10 LoC)
|
||||
|
||||
Add to `types.ts`:
|
||||
|
||||
```ts
|
||||
export interface ToolCostStat {
|
||||
tool_name: string;
|
||||
mean_prompt_tokens: number;
|
||||
mean_completion_tokens: number;
|
||||
n_calls: number;
|
||||
updated_at: string;
|
||||
}
|
||||
```
|
||||
|
||||
Add to `client.ts` under the existing `api.*` namespace structure:
|
||||
|
||||
```ts
|
||||
tools: {
|
||||
costStats: () => fetch<{ stats: ToolCostStat[] }>('GET', '/api/tools/cost_stats'),
|
||||
},
|
||||
```
|
||||
|
||||
Match the casing convention of the existing namespaces (`api.agents.list`, `api.chats.archive`, etc.).
|
||||
|
||||
### 5. apps/web/src/components/AgentPicker.tsx — tooltip extension (~80 LoC delta)
|
||||
|
||||
Currently (line 67): `title={selectedAgent?.description}` — native HTML title attribute on the trigger button.
|
||||
|
||||
Replacement: dropdown items get a per-agent cost line in muted text below the description. Format:
|
||||
|
||||
```
|
||||
[Agent name]
|
||||
[Agent description]
|
||||
~5.2k prompt / 280 completion · 6 tools · last call 3h ago
|
||||
```
|
||||
|
||||
Implementation steps:
|
||||
1. Fetch `api.tools.costStats()` once on mount (alongside the existing `api.agents.list()`). Cache result for the lifetime of the picker open state. Re-fetch only on `useEffect` dep change.
|
||||
2. Compute per-agent aggregate: for each agent, sum the means of its whitelisted tools. Sum-of-means, not mean-of-sums — we're combining independent rolling averages.
|
||||
3. Render below description (one line, muted, truncated). Show "—" if no calls recorded yet for any of the agent's tools.
|
||||
4. Don't break the existing native `title=` for backward compat; layer the cost line additively.
|
||||
|
||||
```tsx
|
||||
const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
|
||||
useEffect(() => {
|
||||
api.tools.costStats().then(r => setCostStats(r.stats)).catch(() => setCostStats([]));
|
||||
}, []);
|
||||
const costByTool = useMemo(
|
||||
() => Object.fromEntries(costStats.map(s => [s.tool_name, s])),
|
||||
[costStats],
|
||||
);
|
||||
function agentCost(agent: Agent): { prompt: number; completion: number; nTools: number; nWithData: number; mostRecent: string | null } {
|
||||
let prompt = 0, completion = 0, nWithData = 0;
|
||||
let mostRecent: string | null = null;
|
||||
for (const t of agent.tools) {
|
||||
const s = costByTool[t];
|
||||
if (!s) continue;
|
||||
prompt += s.mean_prompt_tokens;
|
||||
completion += s.mean_completion_tokens;
|
||||
nWithData++;
|
||||
if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
|
||||
}
|
||||
return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
|
||||
}
|
||||
```
|
||||
|
||||
For the line render: `~${formatK(prompt)} prompt / ${completion} completion · ${nWithData}/${nTools} tools · ${formatAgo(mostRecent)}`. Skip entirely when `nWithData === 0` to avoid showing "0k / 0 / 0 tools" for fresh-from-deploy state.
|
||||
|
||||
**`formatK` / `formatAgo`:** colocate at the bottom of `AgentPicker.tsx`. Don't extract to a util file in this batch — single use site.
|
||||
|
||||
## What NOT to do
|
||||
|
||||
- **Don't add a new write site at `tool-phase.ts` or `finalizeCompletion`.** All source data is already there via existing UPDATEs.
|
||||
- **Don't denormalize.** The view is sufficient and rollback-safe at BooCode's single-user scale.
|
||||
- **Don't add per-tool cost to the message bubble.** Out of scope. AgentPicker tooltip only.
|
||||
- **Don't fold per-call rows into a moving sum via triggers.** Aggregate on read; 100 rows × 30 tools is microseconds in Postgres.
|
||||
- **Don't track `result_chars` (the size of `tool_results.output`).** Tempting as a second cost signal but out of scope here. Future batch if Sam wants it.
|
||||
- **Don't add a session-scoped or chat-scoped filter to `tool_cost_stats`.** The rolling window is GLOBAL across all chats — the agent picker is a project-level decision aid. Per-chat surfacing is a future v1.14+ design.
|
||||
- **Don't change the attribution model post-deployment** without dropping the view first. Mid-flight semantic changes give bogus historical means.
|
||||
- **Don't "fix" the `ctx_used`/`tokens_used` naming inside this batch.** Non-obvious but pinned across 5 write sites. Renaming is its own batch.
|
||||
- **Don't rely solely on `tool_calls IS NOT NULL` for sentinel exclusion.** It works today (sentinels are role='system' with tool_calls=NULL) but the explicit `status='complete'` + `metadata->>'kind'` filters are defense in depth and survive future schema drift.
|
||||
|
||||
## Backup before edits
|
||||
|
||||
```
|
||||
cd /opt/boocode
|
||||
cp apps/server/src/schema.sql{,.bak-$(date +%Y%m%d-%H%M%S)}
|
||||
cp apps/web/src/components/AgentPicker.tsx{,.bak-$(date +%Y%m%d-%H%M%S)}
|
||||
```
|
||||
|
||||
(No backup needed for new files in items 2, 3, 4.)
|
||||
|
||||
## Verify
|
||||
|
||||
```
|
||||
pnpm -C apps/server test
|
||||
```
|
||||
|
||||
Expected: all existing tests pass + 7 new in `tool_cost_stats.test.ts`. Total moves from 195 → 202.
|
||||
|
||||
```
|
||||
cd /opt/boocode
|
||||
docker compose exec boocode_db psql -U postgres -d boocode -c \
|
||||
"SELECT * FROM tool_cost_stats ORDER BY n_calls DESC LIMIT 10;"
|
||||
```
|
||||
|
||||
Expected: in any live deployment with v1.13.7+ history, this returns real rows for `view_file`, `grep`, `list_dir`, etc. If empty: `messages.tool_calls` was NULL for the v1.13.1-A → v1.13.7 latent regression window and recovery only begins with v1.13.7+ traffic.
|
||||
|
||||
## Build + smoke
|
||||
|
||||
```
|
||||
cd /opt/boocode
|
||||
docker compose up --build -d boocode
|
||||
docker compose logs --since=30s boocode | tail -20
|
||||
```
|
||||
|
||||
Smoke A — view recompiles on schema apply:
|
||||
```
|
||||
docker compose logs boocode | grep -i "tool_cost_stats\|applySchema"
|
||||
```
|
||||
Expected: clean schema apply, view registered idempotently.
|
||||
|
||||
Smoke B — endpoint returns data:
|
||||
```
|
||||
curl -s http://localhost:3000/api/tools/cost_stats | jq '.stats | length, .stats[0]'
|
||||
```
|
||||
Expected: nonzero length if any v1.13.7+ tool calls exist; one stat object with all 5 fields populated.
|
||||
|
||||
Smoke C — UI:
|
||||
1. Open browser to `boocode.indifferentketchup.com`.
|
||||
2. Open AgentPicker dropdown on any session.
|
||||
3. Each agent row shows a muted cost line below its description: `~5.2k prompt / 280 completion · 6/8 tools · last call 2h ago`.
|
||||
4. Agents with no tool history show just description (no cost line).
|
||||
5. Confirm cost line truncates with the existing text-muted-foreground / truncate pattern; doesn't break the layout at mobile widths (open Vivaldi devtools, set iPhone-13 viewport).
|
||||
|
||||
## Files expected to touch
|
||||
|
||||
- `apps/server/src/schema.sql` — ~35 LoC delta (view definition + filter comments)
|
||||
- `apps/server/src/routes/tools.ts` — NEW, ~40 LoC
|
||||
- `apps/server/src/index.ts` — 1 line (`registerToolsRoutes(app, sql)`)
|
||||
- `apps/server/src/services/__tests__/tool_cost_stats.test.ts` — NEW, ~95 LoC
|
||||
- `apps/web/src/api/types.ts` — ~7 LoC (interface)
|
||||
- `apps/web/src/api/client.ts` — ~3 LoC (namespace + method)
|
||||
- `apps/web/src/components/AgentPicker.tsx` — ~80 LoC delta (cost line + fetch hook + helpers)
|
||||
|
||||
Total ~260 LoC. Matches roadmap estimate.
|
||||
|
||||
## Workflow conventions
|
||||
|
||||
- Backups before destructive edits (above) on the two MODIFIED files. New files don't need backups.
|
||||
- Sam reviews diffs. Never `git add` / `git commit` / `git push` / `git pull` on Sam's behalf.
|
||||
- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
|
||||
- Tests authoritative: `pnpm -C apps/server test`.
|
||||
- View definition lives in `schema.sql` (idempotent via `CREATE OR REPLACE VIEW`); no migration shim needed.
|
||||
|
||||
## Don't repeat past mistakes
|
||||
|
||||
- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, `BUDGET_NO_AGENT=30`): all live. This batch depends on `includeUsage:true`. If unset, `tool_cost_stats` returns empty rows.
|
||||
- v1.13.8 prefix instrumentation: untouched.
|
||||
- v1.13.9 ratio-only `usable()`: untouched.
|
||||
- v1.13.4 two-tier prune: untouched.
|
||||
- v1.13.5 truncate.ts opaque-id pattern: untouched.
|
||||
- v1.13.1-B `messages_with_parts` view: this view is the source. Don't reach past it to raw `messages`.
|
||||
- v1.13.2 will DROP `messages.tool_calls`/`tool_results` columns. The `tool_cost_stats` view reads from `messages_with_parts` not `messages`, so it survives. Verify after v1.13.2 ships.
|
||||
|
||||
## Source files to read in project knowledge
|
||||
|
||||
- `boocode_roadmap.md` (v1.13.10 row at line 114; schema row at line 474)
|
||||
- `boocode_code_review.md` (cost-tracking design background)
|
||||
- `CLAUDE.md` (project conventions; messages_with_parts invariant at L80; v1.13.7 includeUsage invariant)
|
||||
```
|
||||
225
openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
Normal file
225
openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
Normal file
@@ -0,0 +1,225 @@
|
||||
# Handoff: BooCode v1.13.8 — system-prompt prefix stability verify-and-measure
|
||||
|
||||
#careful #boocode #nofluff
|
||||
|
||||
Recon-only / instrumentation batch. **No cache implementation in this dispatch.** Goal: prove (or disprove) that the assembled system-prompt prefix is byte-stable across turns under steady-state inputs. Result determines whether v1.13.7-as-originally-specced (the prefix cache) is actually needed at all.
|
||||
|
||||
## Where we are
|
||||
|
||||
- Last tag: `v1.13.7` — stability bundle (`includeUsage:true` + trim guards + payload filter for trailing empty/failed assistants + `BUDGET_NO_AGENT 15→30`). This shipped as a renumber of the original "prefix cache" v1.13.7 slot. The prefix-cache work moved to v1.13.8 with the change-of-shape captured here.
|
||||
- Branch clean. `git log --oneline main -5` should show `…v1.13.7 v1.13.6 v1.13.5 v1.13.4 v1.13.3`.
|
||||
|
||||
## What v1.13.x has shipped
|
||||
|
||||
- v1.13.0 — `message_parts` table + dual-write.
|
||||
- v1.13.1-A — AI SDK v6 install (`streamText` adapter, mid-dispatch silent-abort patch).
|
||||
- v1.13.1-B — `messages_with_parts` view + read sites flipped.
|
||||
- v1.13.1-C — `ask_user_input` correlation ported + reasoning end-to-end.
|
||||
- v1.13.3 — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, `experimental_repairToolCall`.
|
||||
- v1.13.4 — two-tier compaction prune.
|
||||
- v1.13.5 — opencode `truncate.ts` port (`tr_<12char>` opaque ids on tmpfs).
|
||||
- v1.13.6 — compaction head-assembly audit; reasoning_parts added to `buildHeadPayload`.
|
||||
- v1.13.7 — stability bundle (the five fixes above).
|
||||
|
||||
## What's queued
|
||||
|
||||
- **v1.13.8 (this dispatch)** — prefix stability verify-and-measure
|
||||
- v1.13.9 — compaction overflow trigger formula (opencode 0.85 × ctx_max)
|
||||
- v1.13.10 — per-tool token cost accounting + AgentPicker UI
|
||||
- v1.13.11 — WebSocket frame typing (Zod schemas both ends)
|
||||
- v1.13.12 — skills audit pass (rules→recipes split)
|
||||
- v1.13.2 — drop legacy columns (last; ≥1 week production traffic on v1.13.1 first)
|
||||
|
||||
## Why this is verify-first
|
||||
|
||||
The original v1.13.7 roadmap line was "system-prompt prefix cache, keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated." Recon during planning surfaced that:
|
||||
|
||||
- `apps/server/src/services/system-prompt.ts:buildSystemPrompt()` already runs over mtime-cached inputs:
|
||||
- BOOCHAT.md / BOOCODER.md — cached in this file (`cachedGuidance`, line 25), keyed by mtime
|
||||
- global + per-project AGENTS.md — cached in `services/agents.ts` (`safeStat` pattern, line 245), keyed by mtime
|
||||
- `session.system_prompt` / `project.default_system_prompt` — DB scalars, byte-stable until edited
|
||||
- BASE_SYSTEM_PROMPT — hardcoded template with `${projectPath}` interpolation
|
||||
- Skills are NOT in the system prompt today. Discovered via `skill_find` at runtime.
|
||||
- Tool schemas are NOT in the system message. They live in the OpenAI request body's `tools` field (already alpha-sorted by v1.13.3).
|
||||
- Output assembly is a microsecond string concat with no I/O.
|
||||
|
||||
So in theory the prefix is already byte-stable across turns. **Nobody has measured it.** This batch closes that gap with logs + a unit test, no cache implementation. If stable across a real session → close v1.13.8 as no-op, drop the original cache plan, move to v1.13.9. If drift surfaces → next batch designs the fix against the actual failure mode.
|
||||
|
||||
## Scope (all three items)
|
||||
|
||||
### 1. Per-turn prefix fingerprint log
|
||||
|
||||
In `apps/server/src/services/system-prompt.ts`, after `buildSystemPrompt` finishes assembling `out`, before returning:
|
||||
|
||||
- Compute `sha256(out)` → hex string. Use `node:crypto`.
|
||||
- Emit a single log line at `level=info` via a module-level pino instance (mirror the pattern used elsewhere in the inference services). Shape:
|
||||
|
||||
```ts
|
||||
{
|
||||
msg: 'prefix-fingerprint',
|
||||
project_id: project.id,
|
||||
agent_id: agent?.id ?? null,
|
||||
agent_name: agent?.name ?? null,
|
||||
session_id: session.id,
|
||||
prefix_hash: <sha256 hex>,
|
||||
prefix_length: out.length,
|
||||
mtime_boochat: <number | null>, // from cachedGuidance.mtime, or null when guidance is null
|
||||
has_agent_system_prompt: <boolean>,
|
||||
has_session_override: session.system_prompt.trim().length > 0,
|
||||
has_project_override: project.default_system_prompt.trim().length > 0,
|
||||
}
|
||||
```
|
||||
|
||||
The mtime fields surface which inputs changed when drift is observed. The hash itself is what proves equality.
|
||||
|
||||
`buildSystemPrompt` already reaches into `cachedGuidance` indirectly via `getContainerGuidance()` — expose `cachedGuidance?.mtime` for the log via a thin getter (`getCachedGuidanceMtime(): number | null`) so the log line carries it without re-statting.
|
||||
|
||||
For the AGENTS.md mtimes (global + per-project), `services/agents.ts` exposes them via the `cache` Map but no public accessor. Either (a) add a `getAgentsMtimes(projectPath: string): { global: number | null; project: number | null }` exported function to agents.ts, or (b) skip those fields in v1.13.8 and only log the BOOCHAT mtime. **Default: do (a).** If recon shows that's invasive, fall back to (b) and note the limitation in the smoke report.
|
||||
|
||||
### 2. Per-session drift observer
|
||||
|
||||
Module-level `Map<sessionId, lastHash>` in `system-prompt.ts`. On each `buildSystemPrompt` call:
|
||||
|
||||
- If `sessionId` is not in the map → set it, emit no extra log.
|
||||
- If `sessionId` IS in the map and the hash matches → emit no extra log.
|
||||
- If `sessionId` IS in the map and the hash DIFFERS → emit a second `level=warn` log:
|
||||
|
||||
```ts
|
||||
{
|
||||
msg: 'prefix-drift',
|
||||
session_id: session.id,
|
||||
prev_hash: <previous>,
|
||||
new_hash: <current>,
|
||||
prev_length: <number>,
|
||||
new_length: <number>,
|
||||
changed_inputs: <array of field names where mtime/flags changed since last call>,
|
||||
}
|
||||
```
|
||||
|
||||
`changed_inputs` is a small array like `['mtime_boochat']` or `['has_session_override']` — the field-level diff so we can see exactly what input drifted.
|
||||
|
||||
The map grows unboundedly across long-lived processes. Acceptable for v1.13.8 (instrumentation only, 5 min sessions in test). Add a TODO comment: "v1.13.x follow-up if it survives: LRU-bound this map at 1000 sessions." Don't implement the LRU now.
|
||||
|
||||
Add a `_resetPrefixObserverForTests()` export mirroring the existing `_resetContainerGuidanceCacheForTests()`.
|
||||
|
||||
### 3. Unit test for byte-stability
|
||||
|
||||
In `apps/server/src/services/__tests__/system-prompt.test.ts`, add a `describe('buildSystemPrompt stability', () => { ... })` block:
|
||||
|
||||
```ts
|
||||
it('returns byte-identical output across two consecutive calls with the same inputs', async () => {
|
||||
// set BOOCHAT.md, build (project, session, agent), capture hash
|
||||
const first = await buildSystemPrompt(project, session, agent);
|
||||
const second = await buildSystemPrompt(project, session, agent);
|
||||
expect(first).toBe(second);
|
||||
});
|
||||
|
||||
it('emits a single prefix-fingerprint log per call', async () => {
|
||||
// capture logs via pino test transport or stub
|
||||
// assert one prefix-fingerprint per buildSystemPrompt call
|
||||
});
|
||||
|
||||
it('emits a prefix-drift log when the same session sees a different hash', async () => {
|
||||
// build once; mutate BOOCHAT.md or pass a different agent; build again with same sessionId
|
||||
// assert one prefix-drift log with prev_hash and new_hash populated
|
||||
});
|
||||
```
|
||||
|
||||
The first test is the load-bearing one — it locks in the byte-stability invariant going forward, regardless of what the production smoke surfaces.
|
||||
|
||||
## What NOT to do in this dispatch
|
||||
|
||||
- **Don't add a cache.** Output memoization is v1.13.9+ work IF the smoke proves it's needed. Implementing a cache before measurement is what the v1.13.6 audit was designed to catch — premature optimization disguised as correctness.
|
||||
- **Don't change `buildSystemPrompt`'s return signature or async behavior.** The output stays a single string. Signature stays `(project, session, agent) => Promise<string>`.
|
||||
- **Don't thread chat_id or anything else into the call.** `session.id` is sufficient as the observer key.
|
||||
- **Don't log the full prefix text.** Hash + length only. The prefix can be many KB; logging it 5× per session blows up log size for no benefit. If drift appears and the hash diff is mysterious, `LOG_LEVEL=debug` can be wired in a follow-up.
|
||||
- **Don't touch `messages_with_parts` or the CASE-WHEN-EXISTS fallback v1.13.4 added.** This batch is in `system-prompt.ts` only.
|
||||
- **Don't preserve the AI SDK v6 silent-abort guard differently.** It's in `stream-phase.ts` and untouched.
|
||||
|
||||
## Recon (already done — paste these for the implementer's reference)
|
||||
|
||||
```
|
||||
cd /opt/boocode
|
||||
wc -l apps/server/src/services/system-prompt.ts
|
||||
# → 83 lines
|
||||
|
||||
grep -n "^export|^function|^async function|cache|mtime" apps/server/src/services/system-prompt.ts
|
||||
# → cachedGuidance at line 25; loadContainerGuidance / getContainerGuidance / _resetContainerGuidanceCacheForTests / buildSystemPrompt are the public surface
|
||||
|
||||
grep -rn "buildSystemPrompt" apps/server/src --include="*.ts" | grep -v "tests"
|
||||
# → single caller: apps/server/src/services/inference/payload.ts:41
|
||||
# → also referenced in routes/sessions.ts (session-create flow may call it for preview; verify during implementation)
|
||||
|
||||
grep -n "safeStat\|cache\|mtime" apps/server/src/services/agents.ts
|
||||
# → mtime-keyed cache (Map) at line 245, TTL 60_000ms, key = projectPath || '__none__'
|
||||
# → safeStat pattern at line 255
|
||||
```
|
||||
|
||||
## Verification protocol (smoke)
|
||||
|
||||
After deploy:
|
||||
|
||||
1. Fresh BooChat session, default agent (no agent selected).
|
||||
2. Send 5 short messages, wait for each turn to complete.
|
||||
3. `docker compose logs --since=10m boocode | grep -E 'prefix-fingerprint|prefix-drift'`
|
||||
|
||||
**Success criteria:**
|
||||
- 5 `prefix-fingerprint` lines (one per turn — assuming each turn calls `buildSystemPrompt` once via `buildMessagesPayload`).
|
||||
- All 5 lines have identical `prefix_hash` and `prefix_length`.
|
||||
- Zero `prefix-drift` lines.
|
||||
|
||||
**Failure modes to characterize:**
|
||||
- Drift WITH a corresponding mtime change in `changed_inputs` → expected if BOOCHAT.md or AGENTS.md was edited mid-session. Note in smoke report; not a bug.
|
||||
- Drift WITHOUT any mtime/flag change in `changed_inputs` → assembly nondeterminism somewhere. **This is the bug case.** Report the exact `prev_hash`/`new_hash` pair and full `prefix-fingerprint` log lines from before and after the drift.
|
||||
- Multiple `prefix-fingerprint` lines per turn → `buildSystemPrompt` is being called more than once per turn (possibly from compaction or sentinel-summary paths). Note in smoke report; not necessarily a bug but worth understanding.
|
||||
- ANY successful turn that emits zero `prefix-fingerprint` lines → log statement isn't reached. Implementation bug.
|
||||
|
||||
Repeat the smoke in a second session (different agent if available) to also confirm cross-session prefix differs only where expected (different `project.id`, different `agent_id`).
|
||||
|
||||
## Files expected to touch
|
||||
|
||||
- `apps/server/src/services/system-prompt.ts` — add hash + log + observer + getter (~50 LoC)
|
||||
- `apps/server/src/services/agents.ts` — add `getAgentsMtimes()` accessor (~15 LoC if going with default option)
|
||||
- `apps/server/src/services/__tests__/system-prompt.test.ts` — 3 new tests (~30 LoC)
|
||||
- `apps/server/package.json` — none expected (pino + node:crypto already available)
|
||||
|
||||
Total ~95 LoC.
|
||||
|
||||
## Workflow conventions (boocode)
|
||||
|
||||
- Backup before destructive: `cp file file.bak-$(date +%Y%m%d-%H%M%S)`. (Files get gitignored via global `*.bak*`.)
|
||||
- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
|
||||
- Tests: `pnpm -C apps/server test`. Smoke after deploy.
|
||||
- Type-check: `npx tsc -p apps/web/tsconfig.app.json --noEmit` is authoritative for web; `pnpm -C apps/server build` is authoritative for server.
|
||||
- Sam reviews diffs. Never `git add`/`commit`/`push`/`pull` on Sam's behalf.
|
||||
- Tag after commit: `git tag v1.13.8` (lightweight), then push via the Gitea deploy key:
|
||||
`GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin v1.13.8`
|
||||
|
||||
## Repo layout pointers
|
||||
|
||||
- `apps/server/src/services/system-prompt.ts` — primary target (83 lines)
|
||||
- `apps/server/src/services/agents.ts` — for the mtimes accessor
|
||||
- `apps/server/src/services/inference/payload.ts:41` — call site
|
||||
- `apps/server/src/services/__tests__/system-prompt.test.ts` — extend tests here
|
||||
- `apps/server/vitest.config.ts` — test glob is `src/**/__tests__/**/*.test.ts`
|
||||
|
||||
## Open questions for Sam during recon
|
||||
|
||||
1. **`getAgentsMtimes()` accessor in agents.ts vs BOOCHAT-only log.** Default: add the accessor. If implementation surface is bigger than expected (e.g. the agents.ts cache structure makes it awkward), fall back to BOOCHAT-only and note the gap.
|
||||
2. **What counts as a "turn" for the observer's `Map<sessionId, lastHash>`?** Default: every `buildSystemPrompt` call. If recon shows that compaction / sentinel-summary paths also call `buildSystemPrompt` and would generate noise, gate the observer to inference-turn calls only. Cleanest signal vs. cleanest implementation.
|
||||
3. **Log severity for `prefix-drift`.** Default: `warn`. If Sam expects routine BOOCHAT.md edits to fire it, downgrade to `info`. The smoke will surface this — adjust during smoke if needed.
|
||||
|
||||
## Don't repeat past mistakes
|
||||
|
||||
- AI SDK v6 silent-abort guard in `stream-phase.ts`: untouched.
|
||||
- v1.13.4 view fix (COALESCE → CASE-WHEN-EXISTS): untouched. This batch is in `system-prompt.ts` only.
|
||||
- v1.13.5 truncate.ts: untouched.
|
||||
- v1.13.6 reasoning embed in compaction: untouched.
|
||||
- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, budget bump): all live. Don't undo.
|
||||
|
||||
## Source files to read in project knowledge
|
||||
|
||||
- `boocode_roadmap.md` (last updated 2026-05-22; v1.13.x cleanup line order locked)
|
||||
- `boocode_code_review.md` (no lift source for v1.13.8 — in-house instrumentation)
|
||||
- `CLAUDE.md` (project conventions, NodeNext imports, vitest include glob, etc.)
|
||||
- This handoff (`handoff_v1.13.8_prefix_verify.md`)
|
||||
Reference in New Issue
Block a user