#3 Fuzzy patch applier: new pure fuzzy-match.ts (locateMatch, exact→trim→ unicode-canon→Levenshtein≥0.66, refuse-on-ambiguous) wired into pending_changes applyOne/rewindOne so local-model whitespace/unicode drift in old_string no longer loses the edit. #4 Worktree checkpoint + conversation-trim: checkpoints table + checkpoints.ts (shadow-commit of tracked+untracked into refs/boocode/checkpoints, hooked into the 3 external-agent dispatcher paths) + POST restore route (reset --hard + clean -fd -> transcript trim -> backend-session reset) + "Restore to here" UI. Built by 3 parallel agents; DB-integration testing caught a created_at self-deletion bug. Coder suite 234 passing; server+coder build + web tsc clean. Builds on v2.7.0-mit. openspec write-edit-robustness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
102 lines
6.3 KiB
Markdown
102 lines
6.3 KiB
Markdown
# Write/edit robustness — fuzzy patch applier + worktree checkpoints
|
||
|
||
**Status:** in progress (started 2026-06-01)
|
||
**Source:** `boocode_code_review_v2.md` §1 #3 + #4, §5b/§5d–5e (cline, Apache-2.0 — algorithm clean-reimplemented, not vendored).
|
||
|
||
Two independent BooCoder hardening features for local quantized models.
|
||
|
||
## #3 — Fuzzy patch applier
|
||
|
||
**Problem:** `applyOne`'s edit case (`apps/coder/src/services/pending_changes.ts:124`) does exact
|
||
`content.includes(oldStr)` → throw, then `content.replace(oldStr, newStr)` (first occurrence).
|
||
`rewindOne` (line 206) is the same. Local models (qwen3.6) drift `old_string` by whitespace/
|
||
indentation/unicode (curly quotes, en/em-dash, nbsp), so a valid edit fails at apply with
|
||
"old_string not found" and is lost.
|
||
|
||
**Design:** new pure module `apps/coder/src/services/fuzzy-match.ts`:
|
||
`locateMatch(content: string, needle: string): { kind: 'exact'|'fuzzy'; start: number; end: number }
|
||
| { kind: 'ambiguous'; count: number } | { kind: 'not_found' }`. Match ladder:
|
||
1. **Exact** `indexOf`. If exactly one → exact span. If >1 → **ambiguous** (refuse; decision
|
||
2026-06-01: safer than silently editing the first).
|
||
2. **Per-line whitespace-insensitive** — compare `needle` lines to file line-windows ignoring per-line
|
||
`trimEnd`/leading-trailing blank lines.
|
||
3. **Unicode canonicalization** — normalize curly→straight quotes, en/em-dash→`-`, nbsp→space on both
|
||
sides, then retry the whitespace pass.
|
||
4. **Levenshtein** similarity ≥ 0.66 over line-windows sized to `needle`'s line count; best window wins.
|
||
|
||
Non-exact (fuzzy) matches return the actual file span so the caller replaces the real file text with
|
||
`new_string`. `pending_changes.ts` `applyOne`/`rewindOne` use `locateMatch`; `ambiguous`/`not_found`
|
||
return `success:false` with a clear message (no throw escaping the existing catch). Unit-tested
|
||
(`apps/coder/src/services/__tests__/fuzzy-match.test.ts`), per the `turn-guard.ts` pure-helper pattern.
|
||
|
||
## #4 — Worktree checkpoint + conversation-trim
|
||
|
||
**Problem:** `rewind` only reverses BooCode's own `pending_changes` (applied to the project root).
|
||
External agents (opencode/goose/qwen/claude) write **directly into the session worktree**
|
||
(`/tmp/booworktrees/sess-<id>`); rewind has zero coverage there.
|
||
|
||
**Schema** (`apps/coder/src/schema.sql`):
|
||
```sql
|
||
CREATE TABLE IF NOT EXISTS checkpoints (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
chat_id UUID NOT NULL REFERENCES chats(id) ON DELETE CASCADE,
|
||
session_id UUID,
|
||
worktree_id UUID REFERENCES worktrees(id) ON DELETE SET NULL,
|
||
message_id UUID, -- anchor: the assistant turn row this checkpoint precedes
|
||
commit_sha TEXT NOT NULL, -- shadow-commit capturing the pre-turn worktree tree
|
||
label TEXT,
|
||
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
|
||
);
|
||
CREATE INDEX IF NOT EXISTS checkpoints_chat_created_idx ON checkpoints(chat_id, created_at);
|
||
```
|
||
|
||
**Create** (`apps/coder/src/services/checkpoints.ts` → `createCheckpoint`): hooked into the three
|
||
external-agent dispatch paths in `dispatcher.ts` (`runWarmAcpTask` ~821, `runOpenCodeServerTask` ~513,
|
||
`runExternalAgent` ~255) — after `ensureSessionWorktree()` and the assistant-message insert (so the
|
||
anchor `message_id` exists), before the backend runs. Snapshot captures tracked **+ untracked** via a
|
||
temp-index shadow commit, stored in a private GC-safe ref:
|
||
```
|
||
cd <wt> && TMP=$(mktemp) && GIT_INDEX_FILE="$TMP" git read-tree HEAD \
|
||
&& GIT_INDEX_FILE="$TMP" git add -A \
|
||
&& TREE=$(GIT_INDEX_FILE="$TMP" git write-tree) \
|
||
&& SHA=$(git commit-tree "$TREE" -p HEAD -m "boocode checkpoint") \
|
||
&& git update-ref refs/boocode/checkpoints/<id> "$SHA" && rm -f "$TMP" && echo "$SHA"
|
||
```
|
||
Best-effort: a checkpoint failure logs and never breaks the turn. Native-boocode turns (project-root,
|
||
rewind-covered) get no checkpoint.
|
||
|
||
**Restore** (`POST /api/sessions/:sessionId/checkpoints/:checkpointId/restore`, proxied `/api/coder/*`):
|
||
1. Resolve + validate the checkpoint belongs to the session.
|
||
2. Reset worktree: `git -C <wt> reset --hard <commit_sha> && git -C <wt> clean -fd` (hostExec+shellEscape).
|
||
3. Trim transcript: `DELETE FROM messages WHERE chat_id = <cp.chat_id> AND created_at >=
|
||
(SELECT created_at FROM messages WHERE id = <cp.message_id>)` (+ explicit `message_parts` delete if
|
||
the FK isn't ON DELETE CASCADE — verify).
|
||
4. Reset backend (decision 2026-06-01): `UPDATE agent_sessions SET status='crashed' WHERE
|
||
chat_id=<cp.chat_id>` and evict the live pool session for `(chat,agent)` if present, so the next turn
|
||
re-establishes a fresh backend — transcript, files, and agent context all consistent at the restore
|
||
point. (Warm backends hold context server-side; no partial rewind exists.)
|
||
5. Delete now-orphaned later checkpoints: `DELETE FROM checkpoints WHERE chat_id=? AND created_at >
|
||
<cp.created_at>`.
|
||
6. Return `{ checkpoint_id, messages_deleted, worktree_reset, backend_reset }`.
|
||
|
||
**Frontend:** per-message "Restore to here" in `CoderMessageList.tsx` (via a new optional
|
||
`onRestoreCheckpoint?(chatId, messageId)` on `MessageActions` in `MessageBubble.tsx`), wired in
|
||
`CoderPane.tsx`; guarded to `status==='complete'` and to messages that have a checkpoint. After the call
|
||
returns, refetch the chat's messages (existing GET) — no new WS frame required.
|
||
|
||
## Decisions (2026-06-01)
|
||
- Multi-exact-match → **refuse as ambiguous** (#3).
|
||
- #4 **full** scope incl. conversation-trim.
|
||
- Restore **resets** the external-agent backend session (context re-established fresh).
|
||
|
||
## Parallelization
|
||
- **Unit 1 (#3)** — fully independent (`fuzzy-match.ts` + `pending_changes.ts` + test).
|
||
- **Unit 2 (#4 backend)** — schema + `checkpoints.ts` (create+restore) + 3 dispatcher hooks + restore route + backend reset. One agent owns all #4 coder backend (shared `checkpoints.ts`).
|
||
- **Unit 3 (#4 frontend)** — `CoderMessageList`/`MessageBubble`/`CoderPane`, against the pinned restore contract. Parallel with Unit 2. MUST NOT touch Sam's uncommitted WIP (`ChatTabBar`, `SessionLandingPage`, `Workspace`, `useWorkspacePanes`, `PaneHeaderActions`).
|
||
|
||
## Verify
|
||
- `pnpm -C apps/coder test` (incl. new `fuzzy-match` + any checkpoint pure-helper tests)
|
||
- `pnpm -C apps/server build` then `pnpm -C apps/coder build`
|
||
- `npx tsc -p apps/web/tsconfig.app.json --noEmit`
|
||
- Live smoke (manual, host): external-agent edit → checkpoint row; "Restore to here" → worktree reset + transcript trimmed + next turn fresh.
|