Files
boocode/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
indifferentketchup 5a3f357ce9 v1.13.15-openspec: reformat batch docs to OpenSpec directory structure
Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal,
specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep
documentation reformat; replaces ad-hoc boocode_batchN.md /
handoff_vN.N.N.md convention.

Existing batch docs moved into openspec/changes/archived/ via git mv
(preserves history):
- boocode_batch10.md
- handoff_v1.13.8_prefix_verify.md
- handoff_v1.13.10_per_tool_cost.md

Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The
work was already shipped; the originals are preserved as archived
snapshots. New v1.13.15+ batches land directly in
openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when
applicable) per the convention documented in openspec/README.md.

CLAUDE.md gained a one-line pointer to the convention (workflow
section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both
remain well under the AgentLint hard caps.

specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+).
No CLI dep added in this batch — directory structure only. If/when the
full OpenSpec lifecycle is adopted, that lands as a separate batch.
2026-05-22 14:54:17 +00:00

14 KiB
Raw Blame History

Handoff: BooCode v1.13.8 — system-prompt prefix stability verify-and-measure

#careful #boocode #nofluff

Recon-only / instrumentation batch. No cache implementation in this dispatch. Goal: prove (or disprove) that the assembled system-prompt prefix is byte-stable across turns under steady-state inputs. Result determines whether v1.13.7-as-originally-specced (the prefix cache) is actually needed at all.

Where we are

  • Last tag: v1.13.7 — stability bundle (includeUsage:true + trim guards + payload filter for trailing empty/failed assistants + BUDGET_NO_AGENT 15→30). This shipped as a renumber of the original "prefix cache" v1.13.7 slot. The prefix-cache work moved to v1.13.8 with the change-of-shape captured here.
  • Branch clean. git log --oneline main -5 should show …v1.13.7 v1.13.6 v1.13.5 v1.13.4 v1.13.3.

What v1.13.x has shipped

  • v1.13.0 — message_parts table + dual-write.
  • v1.13.1-A — AI SDK v6 install (streamText adapter, mid-dispatch silent-abort patch).
  • v1.13.1-B — messages_with_parts view + read sites flipped.
  • v1.13.1-C — ask_user_input correlation ported + reasoning end-to-end.
  • v1.13.3 — bundle: statement_timeout=30s, alpha tool ordering, periodic stuck-row sweeper, experimental_repairToolCall.
  • v1.13.4 — two-tier compaction prune.
  • v1.13.5 — opencode truncate.ts port (tr_<12char> opaque ids on tmpfs).
  • v1.13.6 — compaction head-assembly audit; reasoning_parts added to buildHeadPayload.
  • v1.13.7 — stability bundle (the five fixes above).

What's queued

  • v1.13.8 (this dispatch) — prefix stability verify-and-measure
  • v1.13.9 — compaction overflow trigger formula (opencode 0.85 × ctx_max)
  • v1.13.10 — per-tool token cost accounting + AgentPicker UI
  • v1.13.11 — WebSocket frame typing (Zod schemas both ends)
  • v1.13.12 — skills audit pass (rules→recipes split)
  • v1.13.2 — drop legacy columns (last; ≥1 week production traffic on v1.13.1 first)

Why this is verify-first

The original v1.13.7 roadmap line was "system-prompt prefix cache, keyed by (agent_id, project_id, skills_version), mtime-invalidated." Recon during planning surfaced that:

  • apps/server/src/services/system-prompt.ts:buildSystemPrompt() already runs over mtime-cached inputs:
    • BOOCHAT.md / BOOCODER.md — cached in this file (cachedGuidance, line 25), keyed by mtime
    • global + per-project AGENTS.md — cached in services/agents.ts (safeStat pattern, line 245), keyed by mtime
    • session.system_prompt / project.default_system_prompt — DB scalars, byte-stable until edited
    • BASE_SYSTEM_PROMPT — hardcoded template with ${projectPath} interpolation
  • Skills are NOT in the system prompt today. Discovered via skill_find at runtime.
  • Tool schemas are NOT in the system message. They live in the OpenAI request body's tools field (already alpha-sorted by v1.13.3).
  • Output assembly is a microsecond string concat with no I/O.

So in theory the prefix is already byte-stable across turns. Nobody has measured it. This batch closes that gap with logs + a unit test, no cache implementation. If stable across a real session → close v1.13.8 as no-op, drop the original cache plan, move to v1.13.9. If drift surfaces → next batch designs the fix against the actual failure mode.

Scope (all three items)

1. Per-turn prefix fingerprint log

In apps/server/src/services/system-prompt.ts, after buildSystemPrompt finishes assembling out, before returning:

  • Compute sha256(out) → hex string. Use node:crypto.
  • Emit a single log line at level=info via a module-level pino instance (mirror the pattern used elsewhere in the inference services). Shape:
{
  msg: 'prefix-fingerprint',
  project_id: project.id,
  agent_id: agent?.id ?? null,
  agent_name: agent?.name ?? null,
  session_id: session.id,
  prefix_hash: <sha256 hex>,
  prefix_length: out.length,
  mtime_boochat: <number | null>,           // from cachedGuidance.mtime, or null when guidance is null
  has_agent_system_prompt: <boolean>,
  has_session_override: session.system_prompt.trim().length > 0,
  has_project_override: project.default_system_prompt.trim().length > 0,
}

The mtime fields surface which inputs changed when drift is observed. The hash itself is what proves equality.

buildSystemPrompt already reaches into cachedGuidance indirectly via getContainerGuidance() — expose cachedGuidance?.mtime for the log via a thin getter (getCachedGuidanceMtime(): number | null) so the log line carries it without re-statting.

For the AGENTS.md mtimes (global + per-project), services/agents.ts exposes them via the cache Map but no public accessor. Either (a) add a getAgentsMtimes(projectPath: string): { global: number | null; project: number | null } exported function to agents.ts, or (b) skip those fields in v1.13.8 and only log the BOOCHAT mtime. Default: do (a). If recon shows that's invasive, fall back to (b) and note the limitation in the smoke report.

2. Per-session drift observer

Module-level Map<sessionId, lastHash> in system-prompt.ts. On each buildSystemPrompt call:

  • If sessionId is not in the map → set it, emit no extra log.
  • If sessionId IS in the map and the hash matches → emit no extra log.
  • If sessionId IS in the map and the hash DIFFERS → emit a second level=warn log:
{
  msg: 'prefix-drift',
  session_id: session.id,
  prev_hash: <previous>,
  new_hash: <current>,
  prev_length: <number>,
  new_length: <number>,
  changed_inputs: <array of field names where mtime/flags changed since last call>,
}

changed_inputs is a small array like ['mtime_boochat'] or ['has_session_override'] — the field-level diff so we can see exactly what input drifted.

The map grows unboundedly across long-lived processes. Acceptable for v1.13.8 (instrumentation only, 5 min sessions in test). Add a TODO comment: "v1.13.x follow-up if it survives: LRU-bound this map at 1000 sessions." Don't implement the LRU now.

Add a _resetPrefixObserverForTests() export mirroring the existing _resetContainerGuidanceCacheForTests().

3. Unit test for byte-stability

In apps/server/src/services/__tests__/system-prompt.test.ts, add a describe('buildSystemPrompt stability', () => { ... }) block:

it('returns byte-identical output across two consecutive calls with the same inputs', async () => {
  // set BOOCHAT.md, build (project, session, agent), capture hash
  const first = await buildSystemPrompt(project, session, agent);
  const second = await buildSystemPrompt(project, session, agent);
  expect(first).toBe(second);
});

it('emits a single prefix-fingerprint log per call', async () => {
  // capture logs via pino test transport or stub
  // assert one prefix-fingerprint per buildSystemPrompt call
});

it('emits a prefix-drift log when the same session sees a different hash', async () => {
  // build once; mutate BOOCHAT.md or pass a different agent; build again with same sessionId
  // assert one prefix-drift log with prev_hash and new_hash populated
});

The first test is the load-bearing one — it locks in the byte-stability invariant going forward, regardless of what the production smoke surfaces.

What NOT to do in this dispatch

  • Don't add a cache. Output memoization is v1.13.9+ work IF the smoke proves it's needed. Implementing a cache before measurement is what the v1.13.6 audit was designed to catch — premature optimization disguised as correctness.
  • Don't change buildSystemPrompt's return signature or async behavior. The output stays a single string. Signature stays (project, session, agent) => Promise<string>.
  • Don't thread chat_id or anything else into the call. session.id is sufficient as the observer key.
  • Don't log the full prefix text. Hash + length only. The prefix can be many KB; logging it 5× per session blows up log size for no benefit. If drift appears and the hash diff is mysterious, LOG_LEVEL=debug can be wired in a follow-up.
  • Don't touch messages_with_parts or the CASE-WHEN-EXISTS fallback v1.13.4 added. This batch is in system-prompt.ts only.
  • Don't preserve the AI SDK v6 silent-abort guard differently. It's in stream-phase.ts and untouched.

Recon (already done — paste these for the implementer's reference)

cd /opt/boocode
wc -l apps/server/src/services/system-prompt.ts
# → 83 lines

grep -n "^export|^function|^async function|cache|mtime" apps/server/src/services/system-prompt.ts
# → cachedGuidance at line 25; loadContainerGuidance / getContainerGuidance / _resetContainerGuidanceCacheForTests / buildSystemPrompt are the public surface

grep -rn "buildSystemPrompt" apps/server/src --include="*.ts" | grep -v "tests"
# → single caller: apps/server/src/services/inference/payload.ts:41
# → also referenced in routes/sessions.ts (session-create flow may call it for preview; verify during implementation)

grep -n "safeStat\|cache\|mtime" apps/server/src/services/agents.ts
# → mtime-keyed cache (Map) at line 245, TTL 60_000ms, key = projectPath || '__none__'
# → safeStat pattern at line 255

Verification protocol (smoke)

After deploy:

  1. Fresh BooChat session, default agent (no agent selected).
  2. Send 5 short messages, wait for each turn to complete.
  3. docker compose logs --since=10m boocode | grep -E 'prefix-fingerprint|prefix-drift'

Success criteria:

  • 5 prefix-fingerprint lines (one per turn — assuming each turn calls buildSystemPrompt once via buildMessagesPayload).
  • All 5 lines have identical prefix_hash and prefix_length.
  • Zero prefix-drift lines.

Failure modes to characterize:

  • Drift WITH a corresponding mtime change in changed_inputs → expected if BOOCHAT.md or AGENTS.md was edited mid-session. Note in smoke report; not a bug.
  • Drift WITHOUT any mtime/flag change in changed_inputs → assembly nondeterminism somewhere. This is the bug case. Report the exact prev_hash/new_hash pair and full prefix-fingerprint log lines from before and after the drift.
  • Multiple prefix-fingerprint lines per turn → buildSystemPrompt is being called more than once per turn (possibly from compaction or sentinel-summary paths). Note in smoke report; not necessarily a bug but worth understanding.
  • ANY successful turn that emits zero prefix-fingerprint lines → log statement isn't reached. Implementation bug.

Repeat the smoke in a second session (different agent if available) to also confirm cross-session prefix differs only where expected (different project.id, different agent_id).

Files expected to touch

  • apps/server/src/services/system-prompt.ts — add hash + log + observer + getter (~50 LoC)
  • apps/server/src/services/agents.ts — add getAgentsMtimes() accessor (~15 LoC if going with default option)
  • apps/server/src/services/__tests__/system-prompt.test.ts — 3 new tests (~30 LoC)
  • apps/server/package.json — none expected (pino + node:crypto already available)

Total ~95 LoC.

Workflow conventions (boocode)

  • Backup before destructive: cp file file.bak-$(date +%Y%m%d-%H%M%S). (Files get gitignored via global *.bak*.)
  • Build: docker compose up --build -d boocode. No --no-cache unless layer-cache trap surfaces.
  • Tests: pnpm -C apps/server test. Smoke after deploy.
  • Type-check: npx tsc -p apps/web/tsconfig.app.json --noEmit is authoritative for web; pnpm -C apps/server build is authoritative for server.
  • Sam reviews diffs. Never git add/commit/push/pull on Sam's behalf.
  • Tag after commit: git tag v1.13.8 (lightweight), then push via the Gitea deploy key: GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin v1.13.8

Repo layout pointers

  • apps/server/src/services/system-prompt.ts — primary target (83 lines)
  • apps/server/src/services/agents.ts — for the mtimes accessor
  • apps/server/src/services/inference/payload.ts:41 — call site
  • apps/server/src/services/__tests__/system-prompt.test.ts — extend tests here
  • apps/server/vitest.config.ts — test glob is src/**/__tests__/**/*.test.ts

Open questions for Sam during recon

  1. getAgentsMtimes() accessor in agents.ts vs BOOCHAT-only log. Default: add the accessor. If implementation surface is bigger than expected (e.g. the agents.ts cache structure makes it awkward), fall back to BOOCHAT-only and note the gap.
  2. What counts as a "turn" for the observer's Map<sessionId, lastHash>? Default: every buildSystemPrompt call. If recon shows that compaction / sentinel-summary paths also call buildSystemPrompt and would generate noise, gate the observer to inference-turn calls only. Cleanest signal vs. cleanest implementation.
  3. Log severity for prefix-drift. Default: warn. If Sam expects routine BOOCHAT.md edits to fire it, downgrade to info. The smoke will surface this — adjust during smoke if needed.

Don't repeat past mistakes

  • AI SDK v6 silent-abort guard in stream-phase.ts: untouched.
  • v1.13.4 view fix (COALESCE → CASE-WHEN-EXISTS): untouched. This batch is in system-prompt.ts only.
  • v1.13.5 truncate.ts: untouched.
  • v1.13.6 reasoning embed in compaction: untouched.
  • v1.13.7 stability bundle (includeUsage:true, trim guards, payload filter, budget bump): all live. Don't undo.

Source files to read in project knowledge

  • boocode_roadmap.md (last updated 2026-05-22; v1.13.x cleanup line order locked)
  • boocode_code_review.md (no lift source for v1.13.8 — in-house instrumentation)
  • CLAUDE.md (project conventions, NodeNext imports, vitest include glob, etc.)
  • This handoff (handoff_v1.13.8_prefix_verify.md)