feat: MistakeTracker + file-provenance ledger (v2.7.4)

Two native-inference hardening features from boocode_code_review_v2 §1 #12. MistakeTracker: new pure mistake-tracker.ts tracks consecutive heterogeneous tool failures (kinds surfaced per tool from tool-phase.ts). On 3 in a row the turn loop soft-nudges (model-facing recovery guidance + mistake_recovery sentinel + reset), then escalates to stopping the turn (cap-hit-style, Continue affordance) on a re-trip. Complements doom-loop (identical repeats) + cap-hit. File-provenance ledger: compaction.ts derives a deterministic ## Files Read list from the head messages' read-tool calls and injects it into the rolling-summary prompt so provenance survives compaction (no new table; read-only). mistake_recovery sentinel: MessageMetadata arm (server + web) + MessageBubble render branch. Built by 2 parallel agents. Server 545 tests passing (23 new); build + web tsc clean. Native-inference only. Builds on v2.7.3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 13:05:03 +00:00
parent f53d6a8afd
commit bcc89d8adc
15 changed files with 816 additions and 20 deletions
--- a/apps/server/src/services/inference/sentinel-summaries.ts
+++ b/apps/server/src/services/inference/sentinel-summaries.ts
@@ -717,3 +717,57 @@ async function insertDoomLoopSentinel(
    metadata,
  });
 }
+
+// #12 MistakeTracker: heterogeneous-failure recovery sentinel. Mirrors
+// insertDoomLoopSentinel structurally — a role='system', status='complete' row
+// firing the standard message_started → delta → message_complete frame
+// sequence. Two variants distinguished by `escalated`:
+//   - escalated:false → a nudge fired; recovery guidance was injected into the
+//     model's next step and the loop continued. can_continue is true (the turn
+//     is still live).
+//   - escalated:true  → the nudge didn't break the failure run; the turn was
+//     stopped (cap-hit-style). can_continue is true so the UI can still offer a
+//     Continue affordance — a fresh user turn resets the tracker.
+export async function insertMistakeRecoverySentinel(
+  ctx: InferenceContext,
+  sessionId: string,
+  chatId: string,
+  opts: { failureKinds: string[]; count: number; escalated: boolean; canContinue: boolean },
+): Promise<void> {
+  const metadata: MessageMetadata = {
+    kind: 'mistake_recovery',
+    failure_kinds: opts.failureKinds,
+    count: opts.count,
+    escalated: opts.escalated,
+    can_continue: opts.canContinue,
+  };
+  const content = opts.escalated
+    ? `Repeated different errors persisted after a recovery nudge (${opts.count} in a row). Stopping the tool-call loop.`
+    : `Hit ${opts.count} different errors in a row. Injected recovery guidance and continuing.`;
+
+  const [row] = await ctx.sql<{ id: string }[]>`
+    INSERT INTO messages (session_id, chat_id, role, content, status, created_at, metadata)
+    VALUES (${sessionId}, ${chatId}, 'system', ${content}, 'complete', clock_timestamp(), ${ctx.sql.json(metadata as never)})
+    RETURNING id
+  `;
+
+  // Standard frame sequence — same as cap-hit / doom-loop sentinels.
+  ctx.publish(sessionId, {
+    type: 'message_started',
+    message_id: row!.id,
+    chat_id: chatId,
+    role: 'system',
+  });
+  ctx.publish(sessionId, {
+    type: 'delta',
+    message_id: row!.id,
+    chat_id: chatId,
+    content,
+  });
+  ctx.publish(sessionId, {
+    type: 'message_complete',
+    message_id: row!.id,
+    chat_id: chatId,
+    metadata,
+  });
+}