#3 Fuzzy patch applier: new pure fuzzy-match.ts (locateMatch, exact→trim→ unicode-canon→Levenshtein≥0.66, refuse-on-ambiguous) wired into pending_changes applyOne/rewindOne so local-model whitespace/unicode drift in old_string no longer loses the edit. #4 Worktree checkpoint + conversation-trim: checkpoints table + checkpoints.ts (shadow-commit of tracked+untracked into refs/boocode/checkpoints, hooked into the 3 external-agent dispatcher paths) + POST restore route (reset --hard + clean -fd -> transcript trim -> backend-session reset) + "Restore to here" UI. Built by 3 parallel agents; DB-integration testing caught a created_at self-deletion bug. Coder suite 234 passing; server+coder build + web tsc clean. Builds on v2.7.0-mit. openspec write-edit-robustness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.3 KiB
Write/edit robustness — fuzzy patch applier + worktree checkpoints
Status: in progress (started 2026-06-01)
Source: boocode_code_review_v2.md §1 #3 + #4, §5b/§5d–5e (cline, Apache-2.0 — algorithm clean-reimplemented, not vendored).
Two independent BooCoder hardening features for local quantized models.
#3 — Fuzzy patch applier
Problem: applyOne's edit case (apps/coder/src/services/pending_changes.ts:124) does exact
content.includes(oldStr) → throw, then content.replace(oldStr, newStr) (first occurrence).
rewindOne (line 206) is the same. Local models (qwen3.6) drift old_string by whitespace/
indentation/unicode (curly quotes, en/em-dash, nbsp), so a valid edit fails at apply with
"old_string not found" and is lost.
Design: new pure module apps/coder/src/services/fuzzy-match.ts:
locateMatch(content: string, needle: string): { kind: 'exact'|'fuzzy'; start: number; end: number } | { kind: 'ambiguous'; count: number } | { kind: 'not_found' }. Match ladder:
- Exact
indexOf. If exactly one → exact span. If >1 → ambiguous (refuse; decision 2026-06-01: safer than silently editing the first). - Per-line whitespace-insensitive — compare
needlelines to file line-windows ignoring per-linetrimEnd/leading-trailing blank lines. - Unicode canonicalization — normalize curly→straight quotes, en/em-dash→
-, nbsp→space on both sides, then retry the whitespace pass. - Levenshtein similarity ≥ 0.66 over line-windows sized to
needle's line count; best window wins.
Non-exact (fuzzy) matches return the actual file span so the caller replaces the real file text with
new_string. pending_changes.ts applyOne/rewindOne use locateMatch; ambiguous/not_found
return success:false with a clear message (no throw escaping the existing catch). Unit-tested
(apps/coder/src/services/__tests__/fuzzy-match.test.ts), per the turn-guard.ts pure-helper pattern.
#4 — Worktree checkpoint + conversation-trim
Problem: rewind only reverses BooCode's own pending_changes (applied to the project root).
External agents (opencode/goose/qwen/claude) write directly into the session worktree
(/tmp/booworktrees/sess-<id>); rewind has zero coverage there.
Schema (apps/coder/src/schema.sql):
CREATE TABLE IF NOT EXISTS checkpoints (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chat_id UUID NOT NULL REFERENCES chats(id) ON DELETE CASCADE,
session_id UUID,
worktree_id UUID REFERENCES worktrees(id) ON DELETE SET NULL,
message_id UUID, -- anchor: the assistant turn row this checkpoint precedes
commit_sha TEXT NOT NULL, -- shadow-commit capturing the pre-turn worktree tree
label TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
CREATE INDEX IF NOT EXISTS checkpoints_chat_created_idx ON checkpoints(chat_id, created_at);
Create (apps/coder/src/services/checkpoints.ts → createCheckpoint): hooked into the three
external-agent dispatch paths in dispatcher.ts (runWarmAcpTask ~821, runOpenCodeServerTask ~513,
runExternalAgent ~255) — after ensureSessionWorktree() and the assistant-message insert (so the
anchor message_id exists), before the backend runs. Snapshot captures tracked + untracked via a
temp-index shadow commit, stored in a private GC-safe ref:
cd <wt> && TMP=$(mktemp) && GIT_INDEX_FILE="$TMP" git read-tree HEAD \
&& GIT_INDEX_FILE="$TMP" git add -A \
&& TREE=$(GIT_INDEX_FILE="$TMP" git write-tree) \
&& SHA=$(git commit-tree "$TREE" -p HEAD -m "boocode checkpoint") \
&& git update-ref refs/boocode/checkpoints/<id> "$SHA" && rm -f "$TMP" && echo "$SHA"
Best-effort: a checkpoint failure logs and never breaks the turn. Native-boocode turns (project-root, rewind-covered) get no checkpoint.
Restore (POST /api/sessions/:sessionId/checkpoints/:checkpointId/restore, proxied /api/coder/*):
- Resolve + validate the checkpoint belongs to the session.
- Reset worktree:
git -C <wt> reset --hard <commit_sha> && git -C <wt> clean -fd(hostExec+shellEscape). - Trim transcript:
DELETE FROM messages WHERE chat_id = <cp.chat_id> AND created_at >= (SELECT created_at FROM messages WHERE id = <cp.message_id>)(+ explicitmessage_partsdelete if the FK isn't ON DELETE CASCADE — verify). - Reset backend (decision 2026-06-01):
UPDATE agent_sessions SET status='crashed' WHERE chat_id=<cp.chat_id>and evict the live pool session for(chat,agent)if present, so the next turn re-establishes a fresh backend — transcript, files, and agent context all consistent at the restore point. (Warm backends hold context server-side; no partial rewind exists.) - Delete now-orphaned later checkpoints:
DELETE FROM checkpoints WHERE chat_id=? AND created_at > <cp.created_at>. - Return
{ checkpoint_id, messages_deleted, worktree_reset, backend_reset }.
Frontend: per-message "Restore to here" in CoderMessageList.tsx (via a new optional
onRestoreCheckpoint?(chatId, messageId) on MessageActions in MessageBubble.tsx), wired in
CoderPane.tsx; guarded to status==='complete' and to messages that have a checkpoint. After the call
returns, refetch the chat's messages (existing GET) — no new WS frame required.
Decisions (2026-06-01)
- Multi-exact-match → refuse as ambiguous (#3).
- #4 full scope incl. conversation-trim.
- Restore resets the external-agent backend session (context re-established fresh).
Parallelization
- Unit 1 (#3) — fully independent (
fuzzy-match.ts+pending_changes.ts+ test). - Unit 2 (#4 backend) — schema +
checkpoints.ts(create+restore) + 3 dispatcher hooks + restore route + backend reset. One agent owns all #4 coder backend (sharedcheckpoints.ts). - Unit 3 (#4 frontend) —
CoderMessageList/MessageBubble/CoderPane, against the pinned restore contract. Parallel with Unit 2. MUST NOT touch Sam's uncommitted WIP (ChatTabBar,SessionLandingPage,Workspace,useWorkspacePanes,PaneHeaderActions).
Verify
pnpm -C apps/coder test(incl. newfuzzy-match+ any checkpoint pure-helper tests)pnpm -C apps/server buildthenpnpm -C apps/coder buildnpx tsc -p apps/web/tsconfig.app.json --noEmit- Live smoke (manual, host): external-agent edit → checkpoint row; "Restore to here" → worktree reset + transcript trimmed + next turn fresh.