docs: renumber v1.13.8 to verify-and-measure, drop system_prompt_cache table, add v1.13.8 dispatch brief

2026-05-22 13:24:29 +00:00
parent ff29b48e3a
commit 0ce6115976
3 changed files with 1002 additions and 192 deletions
--- a/handoff_v1.13.8_prefix_verify.md
+++ b/handoff_v1.13.8_prefix_verify.md
@@ -0,0 +1,225 @@
+# Handoff: BooCode v1.13.8 — system-prompt prefix stability verify-and-measure
+
+#careful #boocode #nofluff
+
+Recon-only / instrumentation batch. **No cache implementation in this dispatch.** Goal: prove (or disprove) that the assembled system-prompt prefix is byte-stable across turns under steady-state inputs. Result determines whether v1.13.7-as-originally-specced (the prefix cache) is actually needed at all.
+
+## Where we are
+
+- Last tag: `v1.13.7` — stability bundle (`includeUsage:true` + trim guards + payload filter for trailing empty/failed assistants + `BUDGET_NO_AGENT 15→30`). This shipped as a renumber of the original "prefix cache" v1.13.7 slot. The prefix-cache work moved to v1.13.8 with the change-of-shape captured here.
+- Branch clean. `git log --oneline main -5` should show `…v1.13.7 v1.13.6 v1.13.5 v1.13.4 v1.13.3`.
+
+## What v1.13.x has shipped
+
+- v1.13.0 — `message_parts` table + dual-write.
+- v1.13.1-A — AI SDK v6 install (`streamText` adapter, mid-dispatch silent-abort patch).
+- v1.13.1-B — `messages_with_parts` view + read sites flipped.
+- v1.13.1-C — `ask_user_input` correlation ported + reasoning end-to-end.
+- v1.13.3 — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, `experimental_repairToolCall`.
+- v1.13.4 — two-tier compaction prune.
+- v1.13.5 — opencode `truncate.ts` port (`tr_<12char>` opaque ids on tmpfs).
+- v1.13.6 — compaction head-assembly audit; reasoning_parts added to `buildHeadPayload`.
+- v1.13.7 — stability bundle (the five fixes above).
+
+## What's queued
+
+- **v1.13.8 (this dispatch)** — prefix stability verify-and-measure
+- v1.13.9 — compaction overflow trigger formula (opencode 0.85 × ctx_max)
+- v1.13.10 — per-tool token cost accounting + AgentPicker UI
+- v1.13.11 — WebSocket frame typing (Zod schemas both ends)
+- v1.13.12 — skills audit pass (rules→recipes split)
+- v1.13.2 — drop legacy columns (last; ≥1 week production traffic on v1.13.1 first)
+
+## Why this is verify-first
+
+The original v1.13.7 roadmap line was "system-prompt prefix cache, keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated." Recon during planning surfaced that:
+
+- `apps/server/src/services/system-prompt.ts:buildSystemPrompt()` already runs over mtime-cached inputs:
+  - BOOCHAT.md / BOOCODER.md — cached in this file (`cachedGuidance`, line 25), keyed by mtime
+  - global + per-project AGENTS.md — cached in `services/agents.ts` (`safeStat` pattern, line 245), keyed by mtime
+  - `session.system_prompt` / `project.default_system_prompt` — DB scalars, byte-stable until edited
+  - BASE_SYSTEM_PROMPT — hardcoded template with `${projectPath}` interpolation
+- Skills are NOT in the system prompt today. Discovered via `skill_find` at runtime.
+- Tool schemas are NOT in the system message. They live in the OpenAI request body's `tools` field (already alpha-sorted by v1.13.3).
+- Output assembly is a microsecond string concat with no I/O.
+
+So in theory the prefix is already byte-stable across turns. **Nobody has measured it.** This batch closes that gap with logs + a unit test, no cache implementation. If stable across a real session → close v1.13.8 as no-op, drop the original cache plan, move to v1.13.9. If drift surfaces → next batch designs the fix against the actual failure mode.
+
+## Scope (all three items)
+
+### 1. Per-turn prefix fingerprint log
+
+In `apps/server/src/services/system-prompt.ts`, after `buildSystemPrompt` finishes assembling `out`, before returning:
+
+- Compute `sha256(out)` → hex string. Use `node:crypto`.
+- Emit a single log line at `level=info` via a module-level pino instance (mirror the pattern used elsewhere in the inference services). Shape:
+
+```ts
+{
+  msg: 'prefix-fingerprint',
+  project_id: project.id,
+  agent_id: agent?.id ?? null,
+  agent_name: agent?.name ?? null,
+  session_id: session.id,
+  prefix_hash: <sha256 hex>,
+  prefix_length: out.length,
+  mtime_boochat: <number | null>,           // from cachedGuidance.mtime, or null when guidance is null
+  has_agent_system_prompt: <boolean>,
+  has_session_override: session.system_prompt.trim().length > 0,
+  has_project_override: project.default_system_prompt.trim().length > 0,
+}
+```
+
+The mtime fields surface which inputs changed when drift is observed. The hash itself is what proves equality.
+
+`buildSystemPrompt` already reaches into `cachedGuidance` indirectly via `getContainerGuidance()` — expose `cachedGuidance?.mtime` for the log via a thin getter (`getCachedGuidanceMtime(): number | null`) so the log line carries it without re-statting.
+
+For the AGENTS.md mtimes (global + per-project), `services/agents.ts` exposes them via the `cache` Map but no public accessor. Either (a) add a `getAgentsMtimes(projectPath: string): { global: number | null; project: number | null }` exported function to agents.ts, or (b) skip those fields in v1.13.8 and only log the BOOCHAT mtime. **Default: do (a).** If recon shows that's invasive, fall back to (b) and note the limitation in the smoke report.
+
+### 2. Per-session drift observer
+
+Module-level `Map<sessionId, lastHash>` in `system-prompt.ts`. On each `buildSystemPrompt` call:
+
+- If `sessionId` is not in the map → set it, emit no extra log.
+- If `sessionId` IS in the map and the hash matches → emit no extra log.
+- If `sessionId` IS in the map and the hash DIFFERS → emit a second `level=warn` log:
+
+```ts
+{
+  msg: 'prefix-drift',
+  session_id: session.id,
+  prev_hash: <previous>,
+  new_hash: <current>,
+  prev_length: <number>,
+  new_length: <number>,
+  changed_inputs: <array of field names where mtime/flags changed since last call>,
+}
+```
+
+`changed_inputs` is a small array like `['mtime_boochat']` or `['has_session_override']` — the field-level diff so we can see exactly what input drifted.
+
+The map grows unboundedly across long-lived processes. Acceptable for v1.13.8 (instrumentation only, 5 min sessions in test). Add a TODO comment: "v1.13.x follow-up if it survives: LRU-bound this map at 1000 sessions." Don't implement the LRU now.
+
+Add a `_resetPrefixObserverForTests()` export mirroring the existing `_resetContainerGuidanceCacheForTests()`.
+
+### 3. Unit test for byte-stability
+
+In `apps/server/src/services/__tests__/system-prompt.test.ts`, add a `describe('buildSystemPrompt stability', () => { ... })` block:
+
+```ts
+it('returns byte-identical output across two consecutive calls with the same inputs', async () => {
+  // set BOOCHAT.md, build (project, session, agent), capture hash
+  const first = await buildSystemPrompt(project, session, agent);
+  const second = await buildSystemPrompt(project, session, agent);
+  expect(first).toBe(second);
+});
+
+it('emits a single prefix-fingerprint log per call', async () => {
+  // capture logs via pino test transport or stub
+  // assert one prefix-fingerprint per buildSystemPrompt call
+});
+
+it('emits a prefix-drift log when the same session sees a different hash', async () => {
+  // build once; mutate BOOCHAT.md or pass a different agent; build again with same sessionId
+  // assert one prefix-drift log with prev_hash and new_hash populated
+});
+```
+
+The first test is the load-bearing one — it locks in the byte-stability invariant going forward, regardless of what the production smoke surfaces.
+
+## What NOT to do in this dispatch
+
+- **Don't add a cache.** Output memoization is v1.13.9+ work IF the smoke proves it's needed. Implementing a cache before measurement is what the v1.13.6 audit was designed to catch — premature optimization disguised as correctness.
+- **Don't change `buildSystemPrompt`'s return signature or async behavior.** The output stays a single string. Signature stays `(project, session, agent) => Promise<string>`.
+- **Don't thread chat_id or anything else into the call.** `session.id` is sufficient as the observer key.
+- **Don't log the full prefix text.** Hash + length only. The prefix can be many KB; logging it 5× per session blows up log size for no benefit. If drift appears and the hash diff is mysterious, `LOG_LEVEL=debug` can be wired in a follow-up.
+- **Don't touch `messages_with_parts` or the CASE-WHEN-EXISTS fallback v1.13.4 added.** This batch is in `system-prompt.ts` only.
+- **Don't preserve the AI SDK v6 silent-abort guard differently.** It's in `stream-phase.ts` and untouched.
+
+## Recon (already done — paste these for the implementer's reference)
+
+```
+cd /opt/boocode
+wc -l apps/server/src/services/system-prompt.ts
+# → 83 lines
+
+grep -n "^export|^function|^async function|cache|mtime" apps/server/src/services/system-prompt.ts
+# → cachedGuidance at line 25; loadContainerGuidance / getContainerGuidance / _resetContainerGuidanceCacheForTests / buildSystemPrompt are the public surface
+
+grep -rn "buildSystemPrompt" apps/server/src --include="*.ts" | grep -v "tests"
+# → single caller: apps/server/src/services/inference/payload.ts:41
+# → also referenced in routes/sessions.ts (session-create flow may call it for preview; verify during implementation)
+
+grep -n "safeStat\|cache\|mtime" apps/server/src/services/agents.ts
+# → mtime-keyed cache (Map) at line 245, TTL 60_000ms, key = projectPath || '__none__'
+# → safeStat pattern at line 255
+```
+
+## Verification protocol (smoke)
+
+After deploy:
+
+1. Fresh BooChat session, default agent (no agent selected).
+2. Send 5 short messages, wait for each turn to complete.
+3. `docker compose logs --since=10m boocode | grep -E 'prefix-fingerprint|prefix-drift'`
+
+**Success criteria:**
+- 5 `prefix-fingerprint` lines (one per turn — assuming each turn calls `buildSystemPrompt` once via `buildMessagesPayload`).
+- All 5 lines have identical `prefix_hash` and `prefix_length`.
+- Zero `prefix-drift` lines.
+
+**Failure modes to characterize:**
+- Drift WITH a corresponding mtime change in `changed_inputs` → expected if BOOCHAT.md or AGENTS.md was edited mid-session. Note in smoke report; not a bug.
+- Drift WITHOUT any mtime/flag change in `changed_inputs` → assembly nondeterminism somewhere. **This is the bug case.** Report the exact `prev_hash`/`new_hash` pair and full `prefix-fingerprint` log lines from before and after the drift.
+- Multiple `prefix-fingerprint` lines per turn → `buildSystemPrompt` is being called more than once per turn (possibly from compaction or sentinel-summary paths). Note in smoke report; not necessarily a bug but worth understanding.
+- ANY successful turn that emits zero `prefix-fingerprint` lines → log statement isn't reached. Implementation bug.
+
+Repeat the smoke in a second session (different agent if available) to also confirm cross-session prefix differs only where expected (different `project.id`, different `agent_id`).
+
+## Files expected to touch
+
+- `apps/server/src/services/system-prompt.ts` — add hash + log + observer + getter (~50 LoC)
+- `apps/server/src/services/agents.ts` — add `getAgentsMtimes()` accessor (~15 LoC if going with default option)
+- `apps/server/src/services/__tests__/system-prompt.test.ts` — 3 new tests (~30 LoC)
+- `apps/server/package.json` — none expected (pino + node:crypto already available)
+
+Total ~95 LoC.
+
+## Workflow conventions (boocode)
+
+- Backup before destructive: `cp file file.bak-$(date +%Y%m%d-%H%M%S)`. (Files get gitignored via global `*.bak*`.)
+- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
+- Tests: `pnpm -C apps/server test`. Smoke after deploy.
+- Type-check: `npx tsc -p apps/web/tsconfig.app.json --noEmit` is authoritative for web; `pnpm -C apps/server build` is authoritative for server.
+- Sam reviews diffs. Never `git add`/`commit`/`push`/`pull` on Sam's behalf.
+- Tag after commit: `git tag v1.13.8` (lightweight), then push via the Gitea deploy key:
+  `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin v1.13.8`
+
+## Repo layout pointers
+
+- `apps/server/src/services/system-prompt.ts` — primary target (83 lines)
+- `apps/server/src/services/agents.ts` — for the mtimes accessor
+- `apps/server/src/services/inference/payload.ts:41` — call site
+- `apps/server/src/services/__tests__/system-prompt.test.ts` — extend tests here
+- `apps/server/vitest.config.ts` — test glob is `src/**/__tests__/**/*.test.ts`
+
+## Open questions for Sam during recon
+
+1. **`getAgentsMtimes()` accessor in agents.ts vs BOOCHAT-only log.** Default: add the accessor. If implementation surface is bigger than expected (e.g. the agents.ts cache structure makes it awkward), fall back to BOOCHAT-only and note the gap.
+2. **What counts as a "turn" for the observer's `Map<sessionId, lastHash>`?** Default: every `buildSystemPrompt` call. If recon shows that compaction / sentinel-summary paths also call `buildSystemPrompt` and would generate noise, gate the observer to inference-turn calls only. Cleanest signal vs. cleanest implementation.
+3. **Log severity for `prefix-drift`.** Default: `warn`. If Sam expects routine BOOCHAT.md edits to fire it, downgrade to `info`. The smoke will surface this — adjust during smoke if needed.
+
+## Don't repeat past mistakes
+
+- AI SDK v6 silent-abort guard in `stream-phase.ts`: untouched.
+- v1.13.4 view fix (COALESCE → CASE-WHEN-EXISTS): untouched. This batch is in `system-prompt.ts` only.
+- v1.13.5 truncate.ts: untouched.
+- v1.13.6 reasoning embed in compaction: untouched.
+- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, budget bump): all live. Don't undo.
+
+## Source files to read in project knowledge
+
+- `boocode_roadmap.md` (last updated 2026-05-22; v1.13.x cleanup line order locked)
+- `boocode_code_review.md` (no lift source for v1.13.8 — in-house instrumentation)
+- `CLAUDE.md` (project conventions, NodeNext imports, vitest include glob, etc.)
+- This handoff (`handoff_v1.13.8_prefix_verify.md`)