Files
boocode/openspec/changes/write-edit-robustness/proposal.md
indifferentketchup 59f07e8cb8 feat: write/edit robustness — fuzzy patch applier + worktree checkpoints (v2.7.1)
#3 Fuzzy patch applier: new pure fuzzy-match.ts (locateMatch, exact→trim→
unicode-canon→Levenshtein≥0.66, refuse-on-ambiguous) wired into pending_changes
applyOne/rewindOne so local-model whitespace/unicode drift in old_string no
longer loses the edit.

#4 Worktree checkpoint + conversation-trim: checkpoints table + checkpoints.ts
(shadow-commit of tracked+untracked into refs/boocode/checkpoints, hooked into
the 3 external-agent dispatcher paths) + POST restore route (reset --hard +
clean -fd -> transcript trim -> backend-session reset) + "Restore to here" UI.

Built by 3 parallel agents; DB-integration testing caught a created_at
self-deletion bug. Coder suite 234 passing; server+coder build + web tsc clean.
Builds on v2.7.0-mit. openspec write-edit-robustness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 12:01:57 +00:00

6.3 KiB
Raw Blame History

Write/edit robustness — fuzzy patch applier + worktree checkpoints

Status: in progress (started 2026-06-01) Source: boocode_code_review_v2.md §1 #3 + #4, §5b/§5d5e (cline, Apache-2.0 — algorithm clean-reimplemented, not vendored).

Two independent BooCoder hardening features for local quantized models.

#3 — Fuzzy patch applier

Problem: applyOne's edit case (apps/coder/src/services/pending_changes.ts:124) does exact content.includes(oldStr) → throw, then content.replace(oldStr, newStr) (first occurrence). rewindOne (line 206) is the same. Local models (qwen3.6) drift old_string by whitespace/ indentation/unicode (curly quotes, en/em-dash, nbsp), so a valid edit fails at apply with "old_string not found" and is lost.

Design: new pure module apps/coder/src/services/fuzzy-match.ts: locateMatch(content: string, needle: string): { kind: 'exact'|'fuzzy'; start: number; end: number } | { kind: 'ambiguous'; count: number } | { kind: 'not_found' }. Match ladder:

  1. Exact indexOf. If exactly one → exact span. If >1 → ambiguous (refuse; decision 2026-06-01: safer than silently editing the first).
  2. Per-line whitespace-insensitive — compare needle lines to file line-windows ignoring per-line trimEnd/leading-trailing blank lines.
  3. Unicode canonicalization — normalize curly→straight quotes, en/em-dash→-, nbsp→space on both sides, then retry the whitespace pass.
  4. Levenshtein similarity ≥ 0.66 over line-windows sized to needle's line count; best window wins.

Non-exact (fuzzy) matches return the actual file span so the caller replaces the real file text with new_string. pending_changes.ts applyOne/rewindOne use locateMatch; ambiguous/not_found return success:false with a clear message (no throw escaping the existing catch). Unit-tested (apps/coder/src/services/__tests__/fuzzy-match.test.ts), per the turn-guard.ts pure-helper pattern.

#4 — Worktree checkpoint + conversation-trim

Problem: rewind only reverses BooCode's own pending_changes (applied to the project root). External agents (opencode/goose/qwen/claude) write directly into the session worktree (/tmp/booworktrees/sess-<id>); rewind has zero coverage there.

Schema (apps/coder/src/schema.sql):

CREATE TABLE IF NOT EXISTS checkpoints (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  chat_id     UUID NOT NULL REFERENCES chats(id) ON DELETE CASCADE,
  session_id  UUID,
  worktree_id UUID REFERENCES worktrees(id) ON DELETE SET NULL,
  message_id  UUID,            -- anchor: the assistant turn row this checkpoint precedes
  commit_sha  TEXT NOT NULL,   -- shadow-commit capturing the pre-turn worktree tree
  label       TEXT,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
CREATE INDEX IF NOT EXISTS checkpoints_chat_created_idx ON checkpoints(chat_id, created_at);

Create (apps/coder/src/services/checkpoints.tscreateCheckpoint): hooked into the three external-agent dispatch paths in dispatcher.ts (runWarmAcpTask ~821, runOpenCodeServerTask ~513, runExternalAgent ~255) — after ensureSessionWorktree() and the assistant-message insert (so the anchor message_id exists), before the backend runs. Snapshot captures tracked + untracked via a temp-index shadow commit, stored in a private GC-safe ref:

cd <wt> && TMP=$(mktemp) && GIT_INDEX_FILE="$TMP" git read-tree HEAD \
  && GIT_INDEX_FILE="$TMP" git add -A \
  && TREE=$(GIT_INDEX_FILE="$TMP" git write-tree) \
  && SHA=$(git commit-tree "$TREE" -p HEAD -m "boocode checkpoint") \
  && git update-ref refs/boocode/checkpoints/<id> "$SHA" && rm -f "$TMP" && echo "$SHA"

Best-effort: a checkpoint failure logs and never breaks the turn. Native-boocode turns (project-root, rewind-covered) get no checkpoint.

Restore (POST /api/sessions/:sessionId/checkpoints/:checkpointId/restore, proxied /api/coder/*):

  1. Resolve + validate the checkpoint belongs to the session.
  2. Reset worktree: git -C <wt> reset --hard <commit_sha> && git -C <wt> clean -fd (hostExec+shellEscape).
  3. Trim transcript: DELETE FROM messages WHERE chat_id = <cp.chat_id> AND created_at >= (SELECT created_at FROM messages WHERE id = <cp.message_id>) (+ explicit message_parts delete if the FK isn't ON DELETE CASCADE — verify).
  4. Reset backend (decision 2026-06-01): UPDATE agent_sessions SET status='crashed' WHERE chat_id=<cp.chat_id> and evict the live pool session for (chat,agent) if present, so the next turn re-establishes a fresh backend — transcript, files, and agent context all consistent at the restore point. (Warm backends hold context server-side; no partial rewind exists.)
  5. Delete now-orphaned later checkpoints: DELETE FROM checkpoints WHERE chat_id=? AND created_at > <cp.created_at>.
  6. Return { checkpoint_id, messages_deleted, worktree_reset, backend_reset }.

Frontend: per-message "Restore to here" in CoderMessageList.tsx (via a new optional onRestoreCheckpoint?(chatId, messageId) on MessageActions in MessageBubble.tsx), wired in CoderPane.tsx; guarded to status==='complete' and to messages that have a checkpoint. After the call returns, refetch the chat's messages (existing GET) — no new WS frame required.

Decisions (2026-06-01)

  • Multi-exact-match → refuse as ambiguous (#3).
  • #4 full scope incl. conversation-trim.
  • Restore resets the external-agent backend session (context re-established fresh).

Parallelization

  • Unit 1 (#3) — fully independent (fuzzy-match.ts + pending_changes.ts + test).
  • Unit 2 (#4 backend) — schema + checkpoints.ts (create+restore) + 3 dispatcher hooks + restore route + backend reset. One agent owns all #4 coder backend (shared checkpoints.ts).
  • Unit 3 (#4 frontend)CoderMessageList/MessageBubble/CoderPane, against the pinned restore contract. Parallel with Unit 2. MUST NOT touch Sam's uncommitted WIP (ChatTabBar, SessionLandingPage, Workspace, useWorkspacePanes, PaneHeaderActions).

Verify

  • pnpm -C apps/coder test (incl. new fuzzy-match + any checkpoint pure-helper tests)
  • pnpm -C apps/server build then pnpm -C apps/coder build
  • npx tsc -p apps/web/tsconfig.app.json --noEmit
  • Live smoke (manual, host): external-agent edit → checkpoint row; "Restore to here" → worktree reset + transcript trimmed + next turn fresh.