Files
boocode/openspec/changes/mistake-tracker-file-ledger/proposal.md
indifferentketchup bcc89d8adc feat: MistakeTracker + file-provenance ledger (v2.7.4)
Two native-inference hardening features from boocode_code_review_v2 §1 #12.

MistakeTracker: new pure mistake-tracker.ts tracks consecutive heterogeneous
tool failures (kinds surfaced per tool from tool-phase.ts). On 3 in a row the
turn loop soft-nudges (model-facing recovery guidance + mistake_recovery
sentinel + reset), then escalates to stopping the turn (cap-hit-style, Continue
affordance) on a re-trip. Complements doom-loop (identical repeats) + cap-hit.

File-provenance ledger: compaction.ts derives a deterministic ## Files Read list
from the head messages' read-tool calls and injects it into the rolling-summary
prompt so provenance survives compaction (no new table; read-only).

mistake_recovery sentinel: MessageMetadata arm (server + web) + MessageBubble
render branch. Built by 2 parallel agents. Server 545 tests passing (23 new);
build + web tsc clean. Native-inference only. Builds on v2.7.3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 13:05:03 +00:00

4.3 KiB

MistakeTracker + file-provenance ledger (#12)

Status: in progress (started 2026-06-01) Source: boocode_code_review_v2.md §1 #12, §5e (cline — algorithm-reimplemented, not vendored).

Two native-inference (apps/server) hardening features. One cohesive backend change (they share TurnArgs + the tool-phase observation point) + a small frontend sentinel render.

Part A — MistakeTracker (heterogeneous-failure recovery)

Complements the doom-loop guard (sentinels.ts:detectDoomLoop, which only catches identical repeats) by catching a run of consecutive tool failures the model isn't recovering from.

  • New pure apps/server/src/services/inference/mistake-tracker.ts (mirrors detectDoomLoop):
    • FailureKind = 'zod_reject' | 'tool_not_found' | 'exec_error' | 'api_error' | 'permission_denied' (all already distinguished in tool-phase.ts:executeToolCall).
    • MISTAKE_THRESHOLD = 3.
    • State { run: FailureKind[]; nudges: number }run is the current consecutive-failure streak, reset on ANY successful tool step; nudges counts recovery injections not yet cleared by a success.
    • recordStep(state, outcome) where outcome is a failure kind or 'success'.
    • detectMistakePattern(state): 'nudge' | 'escalate' | nullrun.length >= 3'nudge' the first time (nudges === 0), 'escalate' if it trips again while nudges >= 1 (no intervening success).
  • Lives in TurnArgs (loop-local, reset per runInference, like recentToolCalls).
  • Integration in turn.ts loop: after each tool phase, recordStep per tool outcome; then detectMistakePattern:
    • 'nudge' (decision: soft + escalate): append a transient model-facing recovery-guidance system message to the NEXT turn's payload (re-read schemas, verify paths exist before acting, try a different approach — not retry variations), insert a mistake_recovery UI sentinel (escalated:false), bump nudges, reset run. Loop continues.
    • 'escalate': stop the turn (break), insert a mistake_recovery sentinel (escalated:true, can_continue:true, cap-hit-style), finalize. Prevents heterogeneous failures from burning the whole step budget.

Part B — File-provenance ledger (Read-only)

  • Accumulate file paths read by view_file/grep/find_files/list_dir into TurnArgs.filesRead: Set<string> (recorded at the tool-phase, like the failure outcomes).
  • On compaction (compaction.ts:buildPrompt), inject a deterministic, sorted ## Files Read list into the summary prompt context so the summarizer merges it into the rolling summary — no new table/column; it propagates as summary text across compactions. compaction-prompt.ts's SUMMARY_TEMPLATE already has a ## Relevant Files section to extend/merge with.
  • BooChat is read-only (no write tools on apps/server) → "Files Modified" is N/A here; only "Files Read". (The apps/coder write side can add "Modified" later.)

Sentinel contract (pinned — backend + frontend must match)

New sentinel kind on MessageMetadata in BOTH apps/server/src/types/api.ts AND apps/web/src/api/types.ts:

{ kind: 'mistake_recovery'; failure_kinds: string[]; count: number; escalated: boolean; can_continue?: boolean }
  • role='system', status='complete', stripped from the LLM payload via isAnySentinel in payload.ts (UI-only) and compaction.ts:buildHeadPayload.
  • Frontend render branch in apps/web/src/components/MessageBubble.tsx: escalated:false → "Hit repeated different errors — recovery guidance injected, continuing." escalated:true → "Repeated errors persisted — stopped the turn." (mirror the doom-loop/cap-hit branches).

Decisions (2026-06-01)

  • MistakeTracker intervention: soft nudge + escalate.
  • UI sentinel for recovery (mistake_recovery).

Files (backend, one agent) / (frontend, one agent)

  • Backend: mistake-tracker.ts (new), turn.ts, tool-phase.ts, sentinels.ts, sentinel-summaries.ts, payload.ts, compaction.ts, compaction-prompt.ts, types/api.ts + tests (mistake-tracker.test.ts, ledger/compaction assertions).
  • Frontend: apps/web/src/api/types.ts (MessageMetadata arm) + MessageBubble.tsx (render branch). MUST NOT touch Sam's WIP web files.

Verify

  • pnpm -C apps/server test; pnpm -C apps/server build; npx tsc -p apps/web/tsconfig.app.json --noEmit