Files

indifferentketchup bcc89d8adc feat: MistakeTracker + file-provenance ledger (v2.7.4)

Two native-inference hardening features from boocode_code_review_v2 §1 #12.

MistakeTracker: new pure mistake-tracker.ts tracks consecutive heterogeneous
tool failures (kinds surfaced per tool from tool-phase.ts). On 3 in a row the
turn loop soft-nudges (model-facing recovery guidance + mistake_recovery
sentinel + reset), then escalates to stopping the turn (cap-hit-style, Continue
affordance) on a re-trip. Complements doom-loop (identical repeats) + cap-hit.

File-provenance ledger: compaction.ts derives a deterministic ## Files Read list
from the head messages' read-tool calls and injects it into the rolling-summary
prompt so provenance survives compaction (no new table; read-only).

mistake_recovery sentinel: MessageMetadata arm (server + web) + MessageBubble
render branch. Built by 2 parallel agents. Server 545 tests passing (23 new);
build + web tsc clean. Native-inference only. Builds on v2.7.3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-01 13:05:03 +00:00

4.3 KiB

Raw Blame History

MistakeTracker + file-provenance ledger (#12)

Status: in progress (started 2026-06-01) Source: boocode_code_review_v2.md §1 #12, §5e (cline — algorithm-reimplemented, not vendored).

Two native-inference (apps/server) hardening features. One cohesive backend change (they share TurnArgs + the tool-phase observation point) + a small frontend sentinel render.

Part A — MistakeTracker (heterogeneous-failure recovery)

Complements the doom-loop guard (sentinels.ts:detectDoomLoop, which only catches identical repeats) by catching a run of consecutive tool failures the model isn't recovering from.

New pure apps/server/src/services/inference/mistake-tracker.ts (mirrors detectDoomLoop):
- FailureKind = 'zod_reject' | 'tool_not_found' | 'exec_error' | 'api_error' | 'permission_denied' (all already distinguished in tool-phase.ts:executeToolCall).
- MISTAKE_THRESHOLD = 3.
- State { run: FailureKind[]; nudges: number } — run is the current consecutive-failure streak, reset on ANY successful tool step; nudges counts recovery injections not yet cleared by a success.
- recordStep(state, outcome) where outcome is a failure kind or 'success'.
- detectMistakePattern(state): 'nudge' | 'escalate' | null — run.length >= 3 → 'nudge' the first time (nudges === 0), 'escalate' if it trips again while nudges >= 1 (no intervening success).
Lives in TurnArgs (loop-local, reset per runInference, like recentToolCalls).
Integration in turn.ts loop: after each tool phase, recordStep per tool outcome; then detectMistakePattern:
- 'nudge' (decision: soft + escalate): append a transient model-facing recovery-guidance system message to the NEXT turn's payload (re-read schemas, verify paths exist before acting, try a different approach — not retry variations), insert a mistake_recovery UI sentinel (escalated:false), bump nudges, reset run. Loop continues.
- 'escalate': stop the turn (break), insert a mistake_recovery sentinel (escalated:true, can_continue:true, cap-hit-style), finalize. Prevents heterogeneous failures from burning the whole step budget.

Part B — File-provenance ledger (Read-only)

Accumulate file paths read by view_file/grep/find_files/list_dir into TurnArgs.filesRead: Set<string> (recorded at the tool-phase, like the failure outcomes).
On compaction (compaction.ts:buildPrompt), inject a deterministic, sorted ## Files Read list into the summary prompt context so the summarizer merges it into the rolling summary — no new table/column; it propagates as summary text across compactions. compaction-prompt.ts's SUMMARY_TEMPLATE already has a ## Relevant Files section to extend/merge with.
BooChat is read-only (no write tools on apps/server) → "Files Modified" is N/A here; only "Files Read". (The apps/coder write side can add "Modified" later.)

Sentinel contract (pinned — backend + frontend must match)

New sentinel kind on MessageMetadata in BOTH apps/server/src/types/api.ts AND apps/web/src/api/types.ts:

{ kind: 'mistake_recovery'; failure_kinds: string[]; count: number; escalated: boolean; can_continue?: boolean }

role='system', status='complete', stripped from the LLM payload via isAnySentinel in payload.ts (UI-only) and compaction.ts:buildHeadPayload.
Frontend render branch in apps/web/src/components/MessageBubble.tsx: escalated:false → "Hit repeated different errors — recovery guidance injected, continuing." escalated:true → "Repeated errors persisted — stopped the turn." (mirror the doom-loop/cap-hit branches).

Decisions (2026-06-01)

MistakeTracker intervention: soft nudge + escalate.
UI sentinel for recovery (mistake_recovery).

Files (backend, one agent) / (frontend, one agent)

Backend: mistake-tracker.ts (new), turn.ts, tool-phase.ts, sentinels.ts, sentinel-summaries.ts, payload.ts, compaction.ts, compaction-prompt.ts, types/api.ts + tests (mistake-tracker.test.ts, ledger/compaction assertions).
Frontend: apps/web/src/api/types.ts (MessageMetadata arm) + MessageBubble.tsx (render branch). MUST NOT touch Sam's WIP web files.

Verify

pnpm -C apps/server test; pnpm -C apps/server build; npx tsc -p apps/web/tsconfig.app.json --noEmit

4.3 KiB Raw Blame History