Files
boocode/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
indifferentketchup 5a3f357ce9 v1.13.15-openspec: reformat batch docs to OpenSpec directory structure
Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal,
specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep
documentation reformat; replaces ad-hoc boocode_batchN.md /
handoff_vN.N.N.md convention.

Existing batch docs moved into openspec/changes/archived/ via git mv
(preserves history):
- boocode_batch10.md
- handoff_v1.13.8_prefix_verify.md
- handoff_v1.13.10_per_tool_cost.md

Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The
work was already shipped; the originals are preserved as archived
snapshots. New v1.13.15+ batches land directly in
openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when
applicable) per the convention documented in openspec/README.md.

CLAUDE.md gained a one-line pointer to the convention (workflow
section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both
remain well under the AgentLint hard caps.

specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+).
No CLI dep added in this batch — directory structure only. If/when the
full OpenSpec lifecycle is adopted, that lands as a separate batch.
2026-05-22 14:54:17 +00:00

226 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Handoff: BooCode v1.13.8 — system-prompt prefix stability verify-and-measure
#careful #boocode #nofluff
Recon-only / instrumentation batch. **No cache implementation in this dispatch.** Goal: prove (or disprove) that the assembled system-prompt prefix is byte-stable across turns under steady-state inputs. Result determines whether v1.13.7-as-originally-specced (the prefix cache) is actually needed at all.
## Where we are
- Last tag: `v1.13.7` — stability bundle (`includeUsage:true` + trim guards + payload filter for trailing empty/failed assistants + `BUDGET_NO_AGENT 15→30`). This shipped as a renumber of the original "prefix cache" v1.13.7 slot. The prefix-cache work moved to v1.13.8 with the change-of-shape captured here.
- Branch clean. `git log --oneline main -5` should show `…v1.13.7 v1.13.6 v1.13.5 v1.13.4 v1.13.3`.
## What v1.13.x has shipped
- v1.13.0 — `message_parts` table + dual-write.
- v1.13.1-A — AI SDK v6 install (`streamText` adapter, mid-dispatch silent-abort patch).
- v1.13.1-B — `messages_with_parts` view + read sites flipped.
- v1.13.1-C — `ask_user_input` correlation ported + reasoning end-to-end.
- v1.13.3 — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, `experimental_repairToolCall`.
- v1.13.4 — two-tier compaction prune.
- v1.13.5 — opencode `truncate.ts` port (`tr_<12char>` opaque ids on tmpfs).
- v1.13.6 — compaction head-assembly audit; reasoning_parts added to `buildHeadPayload`.
- v1.13.7 — stability bundle (the five fixes above).
## What's queued
- **v1.13.8 (this dispatch)** — prefix stability verify-and-measure
- v1.13.9 — compaction overflow trigger formula (opencode 0.85 × ctx_max)
- v1.13.10 — per-tool token cost accounting + AgentPicker UI
- v1.13.11 — WebSocket frame typing (Zod schemas both ends)
- v1.13.12 — skills audit pass (rules→recipes split)
- v1.13.2 — drop legacy columns (last; ≥1 week production traffic on v1.13.1 first)
## Why this is verify-first
The original v1.13.7 roadmap line was "system-prompt prefix cache, keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated." Recon during planning surfaced that:
- `apps/server/src/services/system-prompt.ts:buildSystemPrompt()` already runs over mtime-cached inputs:
- BOOCHAT.md / BOOCODER.md — cached in this file (`cachedGuidance`, line 25), keyed by mtime
- global + per-project AGENTS.md — cached in `services/agents.ts` (`safeStat` pattern, line 245), keyed by mtime
- `session.system_prompt` / `project.default_system_prompt` — DB scalars, byte-stable until edited
- BASE_SYSTEM_PROMPT — hardcoded template with `${projectPath}` interpolation
- Skills are NOT in the system prompt today. Discovered via `skill_find` at runtime.
- Tool schemas are NOT in the system message. They live in the OpenAI request body's `tools` field (already alpha-sorted by v1.13.3).
- Output assembly is a microsecond string concat with no I/O.
So in theory the prefix is already byte-stable across turns. **Nobody has measured it.** This batch closes that gap with logs + a unit test, no cache implementation. If stable across a real session → close v1.13.8 as no-op, drop the original cache plan, move to v1.13.9. If drift surfaces → next batch designs the fix against the actual failure mode.
## Scope (all three items)
### 1. Per-turn prefix fingerprint log
In `apps/server/src/services/system-prompt.ts`, after `buildSystemPrompt` finishes assembling `out`, before returning:
- Compute `sha256(out)` → hex string. Use `node:crypto`.
- Emit a single log line at `level=info` via a module-level pino instance (mirror the pattern used elsewhere in the inference services). Shape:
```ts
{
msg: 'prefix-fingerprint',
project_id: project.id,
agent_id: agent?.id ?? null,
agent_name: agent?.name ?? null,
session_id: session.id,
prefix_hash: <sha256 hex>,
prefix_length: out.length,
mtime_boochat: <number | null>, // from cachedGuidance.mtime, or null when guidance is null
has_agent_system_prompt: <boolean>,
has_session_override: session.system_prompt.trim().length > 0,
has_project_override: project.default_system_prompt.trim().length > 0,
}
```
The mtime fields surface which inputs changed when drift is observed. The hash itself is what proves equality.
`buildSystemPrompt` already reaches into `cachedGuidance` indirectly via `getContainerGuidance()` — expose `cachedGuidance?.mtime` for the log via a thin getter (`getCachedGuidanceMtime(): number | null`) so the log line carries it without re-statting.
For the AGENTS.md mtimes (global + per-project), `services/agents.ts` exposes them via the `cache` Map but no public accessor. Either (a) add a `getAgentsMtimes(projectPath: string): { global: number | null; project: number | null }` exported function to agents.ts, or (b) skip those fields in v1.13.8 and only log the BOOCHAT mtime. **Default: do (a).** If recon shows that's invasive, fall back to (b) and note the limitation in the smoke report.
### 2. Per-session drift observer
Module-level `Map<sessionId, lastHash>` in `system-prompt.ts`. On each `buildSystemPrompt` call:
- If `sessionId` is not in the map → set it, emit no extra log.
- If `sessionId` IS in the map and the hash matches → emit no extra log.
- If `sessionId` IS in the map and the hash DIFFERS → emit a second `level=warn` log:
```ts
{
msg: 'prefix-drift',
session_id: session.id,
prev_hash: <previous>,
new_hash: <current>,
prev_length: <number>,
new_length: <number>,
changed_inputs: <array of field names where mtime/flags changed since last call>,
}
```
`changed_inputs` is a small array like `['mtime_boochat']` or `['has_session_override']` — the field-level diff so we can see exactly what input drifted.
The map grows unboundedly across long-lived processes. Acceptable for v1.13.8 (instrumentation only, 5 min sessions in test). Add a TODO comment: "v1.13.x follow-up if it survives: LRU-bound this map at 1000 sessions." Don't implement the LRU now.
Add a `_resetPrefixObserverForTests()` export mirroring the existing `_resetContainerGuidanceCacheForTests()`.
### 3. Unit test for byte-stability
In `apps/server/src/services/__tests__/system-prompt.test.ts`, add a `describe('buildSystemPrompt stability', () => { ... })` block:
```ts
it('returns byte-identical output across two consecutive calls with the same inputs', async () => {
// set BOOCHAT.md, build (project, session, agent), capture hash
const first = await buildSystemPrompt(project, session, agent);
const second = await buildSystemPrompt(project, session, agent);
expect(first).toBe(second);
});
it('emits a single prefix-fingerprint log per call', async () => {
// capture logs via pino test transport or stub
// assert one prefix-fingerprint per buildSystemPrompt call
});
it('emits a prefix-drift log when the same session sees a different hash', async () => {
// build once; mutate BOOCHAT.md or pass a different agent; build again with same sessionId
// assert one prefix-drift log with prev_hash and new_hash populated
});
```
The first test is the load-bearing one — it locks in the byte-stability invariant going forward, regardless of what the production smoke surfaces.
## What NOT to do in this dispatch
- **Don't add a cache.** Output memoization is v1.13.9+ work IF the smoke proves it's needed. Implementing a cache before measurement is what the v1.13.6 audit was designed to catch — premature optimization disguised as correctness.
- **Don't change `buildSystemPrompt`'s return signature or async behavior.** The output stays a single string. Signature stays `(project, session, agent) => Promise<string>`.
- **Don't thread chat_id or anything else into the call.** `session.id` is sufficient as the observer key.
- **Don't log the full prefix text.** Hash + length only. The prefix can be many KB; logging it 5× per session blows up log size for no benefit. If drift appears and the hash diff is mysterious, `LOG_LEVEL=debug` can be wired in a follow-up.
- **Don't touch `messages_with_parts` or the CASE-WHEN-EXISTS fallback v1.13.4 added.** This batch is in `system-prompt.ts` only.
- **Don't preserve the AI SDK v6 silent-abort guard differently.** It's in `stream-phase.ts` and untouched.
## Recon (already done — paste these for the implementer's reference)
```
cd /opt/boocode
wc -l apps/server/src/services/system-prompt.ts
# → 83 lines
grep -n "^export|^function|^async function|cache|mtime" apps/server/src/services/system-prompt.ts
# → cachedGuidance at line 25; loadContainerGuidance / getContainerGuidance / _resetContainerGuidanceCacheForTests / buildSystemPrompt are the public surface
grep -rn "buildSystemPrompt" apps/server/src --include="*.ts" | grep -v "tests"
# → single caller: apps/server/src/services/inference/payload.ts:41
# → also referenced in routes/sessions.ts (session-create flow may call it for preview; verify during implementation)
grep -n "safeStat\|cache\|mtime" apps/server/src/services/agents.ts
# → mtime-keyed cache (Map) at line 245, TTL 60_000ms, key = projectPath || '__none__'
# → safeStat pattern at line 255
```
## Verification protocol (smoke)
After deploy:
1. Fresh BooChat session, default agent (no agent selected).
2. Send 5 short messages, wait for each turn to complete.
3. `docker compose logs --since=10m boocode | grep -E 'prefix-fingerprint|prefix-drift'`
**Success criteria:**
- 5 `prefix-fingerprint` lines (one per turn — assuming each turn calls `buildSystemPrompt` once via `buildMessagesPayload`).
- All 5 lines have identical `prefix_hash` and `prefix_length`.
- Zero `prefix-drift` lines.
**Failure modes to characterize:**
- Drift WITH a corresponding mtime change in `changed_inputs` → expected if BOOCHAT.md or AGENTS.md was edited mid-session. Note in smoke report; not a bug.
- Drift WITHOUT any mtime/flag change in `changed_inputs` → assembly nondeterminism somewhere. **This is the bug case.** Report the exact `prev_hash`/`new_hash` pair and full `prefix-fingerprint` log lines from before and after the drift.
- Multiple `prefix-fingerprint` lines per turn → `buildSystemPrompt` is being called more than once per turn (possibly from compaction or sentinel-summary paths). Note in smoke report; not necessarily a bug but worth understanding.
- ANY successful turn that emits zero `prefix-fingerprint` lines → log statement isn't reached. Implementation bug.
Repeat the smoke in a second session (different agent if available) to also confirm cross-session prefix differs only where expected (different `project.id`, different `agent_id`).
## Files expected to touch
- `apps/server/src/services/system-prompt.ts` — add hash + log + observer + getter (~50 LoC)
- `apps/server/src/services/agents.ts` — add `getAgentsMtimes()` accessor (~15 LoC if going with default option)
- `apps/server/src/services/__tests__/system-prompt.test.ts` — 3 new tests (~30 LoC)
- `apps/server/package.json` — none expected (pino + node:crypto already available)
Total ~95 LoC.
## Workflow conventions (boocode)
- Backup before destructive: `cp file file.bak-$(date +%Y%m%d-%H%M%S)`. (Files get gitignored via global `*.bak*`.)
- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
- Tests: `pnpm -C apps/server test`. Smoke after deploy.
- Type-check: `npx tsc -p apps/web/tsconfig.app.json --noEmit` is authoritative for web; `pnpm -C apps/server build` is authoritative for server.
- Sam reviews diffs. Never `git add`/`commit`/`push`/`pull` on Sam's behalf.
- Tag after commit: `git tag v1.13.8` (lightweight), then push via the Gitea deploy key:
`GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin v1.13.8`
## Repo layout pointers
- `apps/server/src/services/system-prompt.ts` — primary target (83 lines)
- `apps/server/src/services/agents.ts` — for the mtimes accessor
- `apps/server/src/services/inference/payload.ts:41` — call site
- `apps/server/src/services/__tests__/system-prompt.test.ts` — extend tests here
- `apps/server/vitest.config.ts` — test glob is `src/**/__tests__/**/*.test.ts`
## Open questions for Sam during recon
1. **`getAgentsMtimes()` accessor in agents.ts vs BOOCHAT-only log.** Default: add the accessor. If implementation surface is bigger than expected (e.g. the agents.ts cache structure makes it awkward), fall back to BOOCHAT-only and note the gap.
2. **What counts as a "turn" for the observer's `Map<sessionId, lastHash>`?** Default: every `buildSystemPrompt` call. If recon shows that compaction / sentinel-summary paths also call `buildSystemPrompt` and would generate noise, gate the observer to inference-turn calls only. Cleanest signal vs. cleanest implementation.
3. **Log severity for `prefix-drift`.** Default: `warn`. If Sam expects routine BOOCHAT.md edits to fire it, downgrade to `info`. The smoke will surface this — adjust during smoke if needed.
## Don't repeat past mistakes
- AI SDK v6 silent-abort guard in `stream-phase.ts`: untouched.
- v1.13.4 view fix (COALESCE → CASE-WHEN-EXISTS): untouched. This batch is in `system-prompt.ts` only.
- v1.13.5 truncate.ts: untouched.
- v1.13.6 reasoning embed in compaction: untouched.
- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, budget bump): all live. Don't undo.
## Source files to read in project knowledge
- `boocode_roadmap.md` (last updated 2026-05-22; v1.13.x cleanup line order locked)
- `boocode_code_review.md` (no lift source for v1.13.8 — in-house instrumentation)
- `CLAUDE.md` (project conventions, NodeNext imports, vitest include glob, etc.)
- This handoff (`handoff_v1.13.8_prefix_verify.md`)