docs: archive shipped openspec batches; add feature/plan/research notes

Move 13 shipped openspec change docs under openspec/changes/archived/. Add docs/features/git-diff-panel, docs/plans/post-review-backlog, and docs/research/cross-app-contract-ssot.md (the research behind the @boocode/contracts SSOT work). Update BOOCHAT.md, BOOCODER.md, and boocode_roadmap.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 21:20:33 +00:00
parent e5ce01ae72
commit 2a05d2f9fe
27 changed files with 2210 additions and 17 deletions
--- a/docs/plans/post-review-backlog/artifacts/.discovery-notes.md
+++ b/docs/plans/post-review-backlog/artifacts/.discovery-notes.md
@@ -0,0 +1,76 @@
+# Discovery notes — post-review-backlog plan
+
+Single source of truth for project context. Specialists: read this first, do NOT re-grep what is here.
+Search further only for what your domain needs that is not covered.
+
+## Tech stack
+
+- Monorepo, pnpm workspaces: `apps/server` (BooChat — Fastify + postgres, native inference, read-only tools),
+  `apps/web` (React + Vite SPA), `apps/coder` (BooCoder — host systemd service, write tools + external-agent
+  dispatch, port 9502), `apps/booterm` (PTY/tmux). TypeScript strict, NodeNext (`.js` import suffixes) on
+  server + coder.
+- Tests: vitest (pinned ^3). server `pnpm -C apps/server test`; coder `pnpm -C apps/coder test`
+  (`globals:false` — import describe/it/expect). Include glob `src/**/__tests__/**/*.test.ts`. No web test
+  harness, no linters. DB-integration tests opt-in via `DATABASE_URL` + `describe.runIf`.
+- Deploy: apps/coder → `sudo systemctl restart boocoder`; apps/web|server → `docker compose up --build -d boocode`.
+- Postgres 16, DB `boochat`. Two schema files: `apps/server/src/schema.sql` (sessions/chats/messages/
+  message_parts) + `apps/coder/src/schema.sql` (agent_sessions, worktrees, pending_changes, available_agents,
+  checkpoints, claude_session_entries, tasks extension).
+
+## ADRs / coding standards found
+
+- No `docs/adr/` directory. Architectural decisions live in `boocode_roadmap.md` (Decisions log) +
+  per-app `CLAUDE.md` files (auto-loaded when editing that subtree) + `openspec/changes/archived/`.
+- Coding standards: `docs/coding-standards/` (canonical), surfaced via `.claude/rules/coding-standards/`
+  path-scoped index files. Not loaded automatically; open on demand.
+- Cross-cutting conventions in root `CLAUDE.md` "Conventions" section (WS-frame parity, sentinels, JSONB
+  via `sql.json`, event dedup discipline, deploy-by-surface).
+
+## Code touch points (per scope-brief item)
+
+- **F1 task-cancel:** `apps/coder/src/services/dispatcher.ts` (private `ac=new AbortController()` at
+  ~316/655/991/1248; `inflight=Map<sessionId,Promise>` at :56 — no per-task AbortController registry);
+  `apps/coder/src/routes/tasks.ts:110-148` (cancel route — inference.cancel + DB only); `routes/messages.ts:388`
+  (session stop); backends honor `ctx.signal`: `pty-dispatch.ts:159` (child.kill), `backends/warm-acp.ts:318`
+  (session/cancel), `backends/opencode-server.ts:775` (session.abort), `backends/claude-sdk.ts:209` (interrupt).
+  Frontend `apps/web/src/components/panes/CoderPane.tsx:987` handleStop; `api/client.ts:395` cancelTask.
+- **F2 parser prune:** `apps/server/src/services/inference/tool-call-parser.ts` (exports extractToolCallBlocks,
+  stripToolMarkup, parseXmlToolCall, parseInvokeToolCall, isPlaceholderArgValue, XML_/INVOKE_ consts).
+  Live consumers: `stream-phase.ts:263-284` (extractToolCallBlocks, text-delta fallback path), `:285-294`
+  (structured tool-call path — authoritative today), `tool-phase.ts:122` + `error-handler.ts:25,106`
+  (stripToolMarkup). llama-swap native `--jinja` parsing confirmed ON (external host `:8401`).
+- **F3 xml logging:** `tool-call-parser.ts:65` console.debug; one call site in `stream-phase.ts` executeStreamPhase.
+- **F4 notify-hook:** agent config paths `~/.claude/settings.json`, `~/.qwen/settings.json`,
+  `~/.config/goose/`; existing readers `claude-command-discovery.ts:84`, `qwen-settings.ts`,
+  `provider-registry.ts`. Existing normalize helper `apps/coder/src/services/normalize-agent-status.ts`
+  (`normalizeAgentEvent`). Existing status publish wired at dispatcher turn boundaries (v2.7.6);
+  `index.ts:86` references #10. `permission-waiter.ts:47` has a `PermissionHooks` registry.
+- **F5 compaction surfacing:** `apps/coder/src/services/backends/opencode-server.ts` SSE arm handling
+  (~215-311 region); WS frame parity → server `ws-frames.ts` + web `apps/web/src/api/types.ts` (`WsFrame`).
+- **F6 resilience:** `apps/server/src/services/inference/stream-phase.ts:261` (`for await ... result.fullStream`),
+  abort check at :333, usage at :343. Frontend 60s `discard_stale` watchdog is the only stall guard today.
+- **F7 session-history MCP tool:** `apps/coder/src/services/mcp-server.ts` (existing BooCoder MCP tools);
+  read path `messages_with_parts` view.
+- **F8 diff-line UX:** diff UI component NOT located by `git ls-files apps/web | grep -i diff` (returned
+  empty) — UX specialist must locate the actual diff/changes panel component (BOOCODER.md calls it
+  "DiffPanel"; may be named differently or nested). Routes through dispatcher + AgentComposerBar.
+- **F9 retire :9502:** `apps/coder/web/` package + static serve in `apps/coder/src/index.ts` + coder build
+  scripts/Dockerfile. KEEP WS + REST routes.
+
+## Recent activity / precedent
+
+- HEAD `e5ce01a` (v2.7.11). v2.7.x line (relicense, write-edit-robustness, sampling-streamjson-tokens,
+  mistake-tracker-ledger, claude-sdk-sessionstore, agent-status-normalize, UI batches) all shipped 2026-06-01.
+- Pure-helper + TDD precedent for extraction: `backends/turn-guard.ts`, `backends/lifecycle-decisions.ts`,
+  `mistake-tracker.ts` (pure module + unit test, then wire). F1/F2/F6 should follow it.
+- Parallel-disjoint-file agent precedent: v2.7.0/v2.7.1/v2.7.3 each built by 3 parallel agents over disjoint
+  files — relevant to decomposition/sequencing.
+
+## Enumerated gaps (searched, not found)
+
+- No `feature-specification.md` — `scope-brief.md` is the ground-truth stand-in.
+- No `docs/adr/`.
+- Diff UI component filename not found via naive grep (F8) — needs UX specialist location.
+- llama-swap config is NOT in-repo (external host `:8401`); native-jinja state confirmed by live probe only.
+- F8 (diff-line UX) and F4 (notify-hook) are the two items most likely to need their own plan-a-feature; they
+  have no behavioral spec beyond the review-doc pattern description.
--- a/docs/plans/post-review-backlog/artifacts/implementation-decision-log.md
+++ b/docs/plans/post-review-backlog/artifacts/implementation-decision-log.md
@@ -0,0 +1,147 @@
+# Implementation Decision Log: Post-Review Backlog (F1–F9)
+
+<!--
+This file records every implementation decision committed while planning the post-review backlog.
+Behavioral and implementation statements live in [../feature-implementation-plan.md](../feature-implementation-plan.md) —
+this file captures the question, rationale, evidence, and rejected alternatives for each decision.
+Round-by-round history lives in [implementation-iteration-history.md](implementation-iteration-history.md).
+-->
+
+Source of truth: [../scope-brief.md](../scope-brief.md) (ground-truth spec stand-in) and
+[synthesis-input.md](synthesis-input.md) (the consolidated Round-1 specialist aggregation; its file:line
+evidence is treated as verbatim). The D-N counter is shared across the Trivial and Full sections below.
+
+## Trivial decisions
+
+- D-2: F3 logger threading shape — pass an optional `log?: { debug }` param to `extractToolCallBlocks` from its single call site in `stream-phase.ts` `executeStreamPhase`; no interface (one site, one impl). — Referenced in plan: Implementation Approach (F2+F3), Decomposition and Sequencing.
+- D-6: F7 query shape — read `messages_with_parts` `WHERE role != 'system'` (strips sentinels), params `session_id` + optional `chat_id` + `limit` (default 50, max 200), `ORDER BY created_at ASC`, returns `{role,content,...}[]`. — Referenced in plan: Implementation Approach (F7), External Interfaces.
+- D-9: Patch-tag-per-unit — each ready item ships as its own sequential patch tag (one batch per coherent unit), not a minor bump; Sam declined v2.8.0 twice. — Referenced in plan: Decomposition and Sequencing, Operational Readiness.
+- D-10: F1 Stop-button terminal label — render a muted "Stopped" label (not red, not a toast) for `status='cancelled'`. — Referenced in plan: Implementation Approach (F1), Testing Strategy.
+
+## Full decisions
+
+### D-1: F1 cancel registry shape and finalization-fix scope
+
+- **Question:** How does a Stop on an external agent task actually abort the running child, and what message-state corruption does wiring that abort newly expose?
+- **Decision:** Add `taskControllers = new Map<string, AbortController>()` inside `createDispatcher`; `taskControllers.set(taskId, ac)` at each of the four run-functions (`dispatcher.ts` ~316/655/991/1248) and `.delete` in the existing `.finally()` (`dispatcher.ts:117`); export `cancelExternalTask(taskId): boolean` (idempotent — `ac.abort()` no-ops when already aborted, so double-Stop and cancel-after-exit are safe). Pass a narrow `ExternalCancelFn` (not the whole dispatcher) into `registerTaskRoutes`, wired in `index.ts:254`. **In the same batch**, fix the two pre-existing finalization bugs this newly makes reachable: (1) the four catch blocks update only `tasks` state and leave the assistant `messages` row `status='streaming'` (the BooChat 5-min sweep is a different process and cannot recover it); (2) the warm-backend success path writes `messages.status='complete'` unconditionally before checking abort (`dispatcher.ts` ~853/1122/1377). Fix via a shared `cancelAndFinalize` helper across all four paths: after `await backend.prompt(...)`, `if (ac.signal.aborted)` → write `status='cancelled'`, publish the terminal `message_complete` frame, emit idle, return; in each catch finalize the message with `WHERE status='streaming'` (idempotent), mapping `AbortError → cancelled` vs `error → failed`.
+- **Rationale:** 4-way specialist consensus (on-call B1, behavioral B1, architect A1, junior). The frontend and all four backends already honor the abort signal correctly (PTY `child.kill` `pty-dispatch.ts:159`; warm-ACP `session/cancel` `warm-acp.ts:318`; opencode `session.abort` `opencode-server.ts:775`; claude-sdk interrupt `claude-sdk.ts:209`); the only missing link is the registry + export + route wiring. Shipping the abort wiring **without** the finalization fixes would convert a silent no-op into a new bug (cancelled turns recorded `complete`, or messages stuck `streaming`), so the two are inseparable in one batch (on-call OCE-001/OCE-002, behavioral B2/B3).
+- **Evidence:** `routes/tasks.ts:110-148` (cancel route calls `cancelPendingPermission` + `inference.cancel` native-only + DB cancelled; never reaches dispatcher); `dispatcher.ts:316/655/991/1248` (four private `ac` unreachable); `cancelExternalTask` absent anywhere (synthesis-input C1); finalization bugs at `dispatcher.ts` catch blocks + ~853/1122/1377 (synthesis-input C2); backend signal honoring cited above (scope-brief F1).
+- **Rejected alternatives:**
+  - Abort wiring only, defer the finalization fixes — rejected because wiring abort makes the `streaming`/`complete` corruption reachable from the UI for the first time; deferring ships a new bug (synthesis-input C2, on-call/behavioral).
+  - Recover stuck messages via the BooChat 5-min sweep — rejected because the sweep runs in a different process (BooChat) and does not see BooCoder's `agent_sessions`/`tasks` finalization (on-call OCE-001).
+  - Pass the whole dispatcher into `registerTaskRoutes` — rejected for over-coupling; a narrow `ExternalCancelFn` is sufficient (architect A1).
+- **Specialist owner:** on-call-engineer (resilience) with software-architect (registry shape).
+- **Revisit criterion:** a fifth external backend is added whose abort contract differs from signal-based cancellation, or the warm-vs-one-shot worktree-cleanup split changes.
+- **Dissent (if any):** none.
+- **Driven by rounds:** R1.
+- **Dependent decisions:** D-7 (F1 terminal state), D-8 (F1 frame-extension over new frame type), D-10.
+- **Referenced in plan:** Implementation Approach (F1), Runtime Behavior, On-Call Resilience Posture, Decomposition and Sequencing, RAID Log (R1).
+
+### D-3: F2 prune scope — option A (prune-now-minimal), keep the load-bearing guard
+
+- **Question:** How far does the tool-call-parser prune go — unexport the dead symbols only, or the full flag-gated retirement of the text-scrape fallback?
+- **Decision:** Option A only. KEEP `extractToolCallBlocks` + `stripToolMarkup` and their types (`ToolCallExtraction`, `ParsedCall`) — the load-bearing `<invoke>`-as-text guard. REMOVE only the `export` keyword (not the implementations) from the 8 zero-external-caller symbols: `isPlaceholderArgValue`, `parseXmlToolCall`, `parseInvokeToolCall`, `partialXmlOpenerStart`, and the 4 consts `XML_TOOL_OPEN/CLOSE`, `INVOKE_TOOL_OPEN/CLOSE`. Zero runtime effect; public export surface goes 11 → 4.
+- **Rationale:** The TS parser is dormant defense-in-depth but the `<invoke>`-as-text path is the *only* guard for "tool call emitted as plain text" — `experimental_repairToolCall` does not cover that case, and the sidecar `--jinja` state is confirmed only by a live probe, not pinned in-repo, so keeping the guard is correct (architect A2, confirms junior OQ-F2c). Unexporting is pure simplification with no behavior change; the relicense already removed the AGPL-dead exports, so there is no license pressure forcing the larger move.
+- **Evidence:** live consumers `stream-phase.ts:263-284` (extractToolCallBlocks text-delta fallback), `tool-phase.ts:122` + `error-handler.ts:25,106` (stripToolMarkup); structured path `stream-phase.ts:285` does all real work today; llama-swap `--jinja` confirmed ON by live probe of `:8401` only (scope-brief F2, synthesis-input C3).
+- **Rejected alternatives:**
+  - Option B — flag-gate the text-scrape fallback, validate native parsing on live qwen3.6 for one release, then delete — rejected (deferred) because there is no evidence qwen3.6 stopped emitting `<invoke>` text on live and the sidecar `--jinja` state is unconfirmed in-repo; deleting the only plain-text-tool-call guard on that basis is unsafe (architect/test-engineer R1). See Deferred (YAGNI).
+  - Delete the 8 symbols' implementations outright — rejected because three of them (`parseXmlToolCall`, `parseInvokeToolCall`, `isPlaceholderArgValue`) are called internally by the retained `extractToolCallBlocks`; only the `export` keyword is dead.
+- **Specialist owner:** software-architect.
+- **Revisit criterion:** a documented multi-session live probe shows zero text-delta tool calls from qwen3.6 (then Option B reopens).
+- **Dissent (if any):** none.
+- **Driven by rounds:** R1.
+- **Dependent decisions:** D-2 (F3 logger — safe because F2 keeps `extractToolCallBlocks`), D-4 (F2 gate test).
+- **Referenced in plan:** Implementation Approach (F2+F3), Decomposition and Sequencing, Testing Strategy, RAID Log (R2), Deferred (YAGNI).
+
+### D-4: F2 fallback gate test — pin the untested guard before pruning
+
+- **Question:** The `<invoke>`-as-text fallback is currently exercised by no test; how do we prevent the prune from silently removing it?
+- **Decision:** Add a gate test before the prune: stub `streamText` to emit a text-delta containing a complete `<invoke>` block; assert the call lands in `result.toolCalls` and the markup is NOT present in `result.content`. The test must stay green through the prune and fail if `extractToolCallBlocks` is ever removed from the text-delta path.
+- **Rationale:** Pruning around an untested load-bearing path risks a silent regression; the gate test converts D-3's "keep the guard" commitment into an enforced invariant (test-engineer T6).
+- **Evidence:** untested fallback at `stream-phase.ts:263-284` (synthesis-input C4, test-engineer T6).
+- **Rejected alternatives:**
+  - Prune without the gate test, relying on review — rejected because the fallback has no current coverage, so a future removal would pass CI silently (test-engineer T6).
+- **Specialist owner:** test-engineer.
+- **Revisit criterion:** the fallback path is intentionally retired under Option B (then this test is rewritten, not deleted).
+- **Dissent (if any):** none.
+- **Driven by rounds:** R1.
+- **Dependent decisions:** none.
+- **Referenced in plan:** Testing Strategy, Decomposition and Sequencing.
+
+### D-5: F6 stall-timeout via AbortSignal.any; no retry
+
+- **Question:** How does BooChat detect and recover a hung llama-swap stream server-side, and does it retry?
+- **Decision:** Wrap the `stream-phase.ts:261` `fullStream` loop with a per-chunk stall deadline. Create a local `stallAc = new AbortController()`, pass `effectiveSignal = AbortSignal.any([signal, stallAc.signal])` to `streamText`, bump a `setTimeout(STALL_TIMEOUT_MS = 90_000)` on each chunk, clear it in the existing `finally`. At the post-loop check (`stream-phase.ts:337`) test `signal?.aborted || stallAc.signal.aborted` and throw `AbortError` (→ `handleAbortOrError` writes `cancelled`). **No retry** at `executeStreamPhase`/`streamCompletion`.
+- **Rationale:** Today a hung stream relies entirely on the frontend 60s `discard_stale` watchdog with zero server-side guard; the 90s server stall-timeout closes that gap and reuses the existing abort/finalize path. Retry is deferred (YAGNI): a retry after a partial stream re-emits already-streamed deltas (`state.accumulated` + live `delta` frames are non-idempotent), which is worse than the current behavior; at single-local-instance scale the user re-sending is the correct recovery (on-call, strong).
+- **Evidence:** `stream-phase.ts:261` fullStream loop, `:333` abort check, `:343` usage; frontend 60s `discard_stale` is the only stall guard today (scope-brief F6, synthesis-input C6).
+- **Rejected alternatives:**
+  - `Promise.race` of the loop against a timeout — rejected in favor of `AbortSignal.any`, which threads cancellation through `streamText` and the existing finalize path cleanly (OQ-F6a, on-call).
+  - Retry/backoff classifier (transient-5xx / stall) — rejected (deferred) because partial-stream re-emit is non-idempotent and llama-swap is a single local instance (synthesis-input YAGNI ledger). See Deferred (YAGNI).
+- **Specialist owner:** on-call-engineer.
+- **Revisit criterion:** llama-swap gains restart-in-place-with-clear-partial, or a second instance is added for failover (then retry reopens).
+- **Dissent (if any):** none.
+- **Driven by rounds:** R1.
+- **Dependent decisions:** none.
+- **Referenced in plan:** Implementation Approach (F6), On-Call Resilience Posture, Testing Strategy, RAID Log (R3), Deferred (YAGNI).
+
+### D-7: F1 terminal state for user Stop — `cancelled`, not `failed`
+
+- **Question:** When a user hits Stop, what terminal `messages.status` does the finalized assistant message land in?
+- **Decision:** `cancelled` for a user-initiated Stop (`AbortError`); `failed` only for a genuine thrown error in the catch path.
+- **Rationale:** A user Stop is a deliberate, non-error outcome; `MessageStatus` already includes `'cancelled'` and the web reducer can map it without a new enum value. Distinguishing `AbortError → cancelled` vs `error → failed` keeps the human-inbox / failure surfaces honest (resolved OQ, on-call/behavioral).
+- **Evidence:** `MessageStatus` includes `'cancelled'`; reducer map point `CoderPane.tsx:299` (synthesis-input F1 UX + resolved OQs).
+- **Rejected alternatives:**
+  - Record user Stop as `failed` — rejected because it pollutes failure surfaces with deliberate user actions (resolved OQ F1 terminal state).
+- **Specialist owner:** behavioral-analyst.
+- **Revisit criterion:** product decides a user Stop should count against a failure/alerting budget.
+- **Dissent (if any):** none.
+- **Driven by rounds:** R1.
+- **Dependent decisions:** D-8, D-10.
+- **Referenced in plan:** Implementation Approach (F1), Runtime Behavior.
+
+### D-8: F1 status surfacing — extend the existing frame, no new frame type
+
+- **Question:** How does the cancelled terminal state reach the web reducer — a new WS frame type, or an extension of the existing one?
+- **Decision:** Extend the coder `message_complete` frame with an optional `status` field (Option A — minimal); map it in the reducer (`CoderPane.tsx:299`). No new frame type, so the cross-app `WsFrame` parity rule does not force a paired strict-union arm beyond the optional field.
+- **Rationale:** Adding a whole new frame type triggers the full cross-app parity dance (server `InferenceFrame`/`ws-frames.ts` + web `WsFrame`) for a single optional value already carried on a terminal frame; an optional field on the existing frame is the smaller change with the same observable result (UX agent).
+- **Evidence:** reducer map point `CoderPane.tsx:299`; cross-app frame parity rule (CLAUDE.md Conventions; scope-brief cross-cutting constraints).
+- **Rejected alternatives:**
+  - New `agent_cancelled` frame type — rejected because it forces a paired strict-union arm in two files for a single optional status value (UX agent, Option A vs B).
+- **Specialist owner:** user-experience-designer.
+- **Revisit criterion:** a second distinct terminal sub-state needs carrying that does not fit `message_complete.status`.
+- **Dissent (if any):** none.
+- **Driven by rounds:** R1.
+- **Dependent decisions:** D-10.
+- **Referenced in plan:** Implementation Approach (F1), External Interfaces.
+
+### D-11: F9 retire :9502 SPA — delete the serve block, keep all API/WS routes
+
+- **Question:** What exactly is removed when retiring the :9502 fallback SPA, and what must stay?
+- **Decision:** Delete the `if (existsSync(webRoot))` block in `index.ts` (~269-289) which already no-ops when the dist is absent; keep the inline 404 handler (`{error:'not found'}`). Remove `apps/coder/web` from `pnpm-workspace.yaml`, the coder build step, and the Dockerfile copy; remove the now-unused `fastifyStatic` import (verify it is only used there). KEEP all `/api/coder/*` REST + WS + `/api/health` + `--mcp` routes — CoderPane depends on them. Optionally add a 2-line `GET /` redirect-to-BooChat (no `fastifyStatic`).
+- **Rationale:** Sam confirmed "I don't use 9502"; primary UI is CoderPane inside BooChat. OQ-F9a resolved: nothing probes `GET /` on :9502 (health is `/api/health`; the compose healthcheck targets the boocode container, not the host-systemd coder), so 404-or-redirect at `/` is safe (architect A5, verified).
+- **Evidence:** serve block `index.ts` ~269-289; `GET /` unprobed (synthesis-input C8, OQ-F9a resolved); scope-brief F9 / DEFERRED #5 removal checklist.
+- **Rejected alternatives:**
+  - Keep the SPA — rejected; Sam greenlit removal and the build step is dead weight (scope-brief F9).
+  - Remove the REST/WS routes too — rejected because CoderPane inside BooChat depends on every `/api/coder/*` route (architect A5).
+- **Specialist owner:** software-architect.
+- **Revisit criterion:** a standalone :9502 UI is ever wanted again (would be a fresh feature, not a revert).
+- **Dissent (if any):** none.
+- **Driven by rounds:** R1.
+- **Dependent decisions:** none.
+- **Referenced in plan:** Implementation Approach (F9), Operational Readiness, Decomposition and Sequencing.
+
+### D-12: F4/F5/F8 disposition under the standing override — document as Blocked, do not halt
+
+- **Question:** The spec-maturity gate tripped on F4/F5/F8; the skill says recommend routing them out. Sam issued a standing override to plan everything. How are they recorded?
+- **Decision:** Proceed with the plan but record F4, F5, F8 in a structurally separate **BLOCKED** tier with their exact blocking open question(s) and recommended resolution path, rather than halting synthesis. F4 → route to `plan-a-feature` (hook-firing-in-unattended-mode premise UNVERIFIED + goose hook mechanism unknown). F5 → SDK capability check (pinned `@opencode-ai/sdk` exposes no compaction event arm); UI treatment (sentinel-row vs ephemeral-frame) stays disputed until the event is confirmed to exist. F8 → route to `plan-a-feature` (no line-selection infra exists, diff source ambiguous). These three do NOT block the ready cluster (F1/F2/F3/F6/F7/F9).
+- **Rationale:** The spec-maturity gate tripped with ≥5 spec-level findings (C9, C11, C12, C13, OQ-F4b, OQ-F8a) concentrated in F4/F5/F8 across junior, behavioral, and UX. Sam pre-acknowledged the WANT items would be planned more shallowly when choosing scope "everything we discussed," so the honest synthesis records them as Blocked with explicit reopen paths rather than fabricating plan-level resolutions or stalling the ready work.
+- **Evidence:** spec-maturity gate TRIPPED (synthesis-input "Spec-maturity gate"); USER OVERRIDE STANDING (same section); blocking OQs OQ-F4a/F4b, OQ-F5a/F5b, OQ-F8a/b/c (synthesis-input Open Questions); claims C9, C11, C12, C13.
+- **Rejected alternatives:**
+  - Halt synthesis and route F4/F5/F8 out before any planning — rejected because Sam's standing override directs the plan to proceed and document (synthesis-input gate section).
+  - Plan F4/F5/F8 at plan-level alongside the ready cluster — rejected because their core premises are unverified (F4) / capability-blocked (F5) / infra-absent (F8); plan-level decisions would rest on unproven assumptions (C9/C11/C13).
+- **Specialist owner:** project-manager.
+- **Revisit criterion:** the named blocking OQ for an item resolves (F4: hooks confirmed to fire unattended + goose format known; F5: SDK compaction event confirmed; F8: diff source chosen + line-selection approach specified) — then that item graduates to its own plan.
+- **Dissent (if any):** none; the gate-trip is acknowledged rather than overridden silently.
+- **Driven by rounds:** R1.
+- **Dependent decisions:** none.
+- **Referenced in plan:** Implementation Approach (Blocked tier), RAID Log (R4, R5, assumptions), Open Items, Deferred (YAGNI).
--- a/docs/plans/post-review-backlog/artifacts/implementation-iteration-history.md
+++ b/docs/plans/post-review-backlog/artifacts/implementation-iteration-history.md
@@ -0,0 +1,48 @@
+# Implementation Iteration History: Post-Review Backlog (F1–F9)
+
+<!--
+This file records how the implementation plan for the post-review backlog evolved across discussion rounds.
+Committed decisions live in [implementation-decision-log.md](implementation-decision-log.md) and the primary
+plan lives in [../feature-implementation-plan.md](../feature-implementation-plan.md). It also consolidates
+the project-manager's per-round facilitation output (claim ledger, Open Questions, spec-maturity tags).
+The loop converged in one round; remaining unknowns are spec-level, not resolvable by more specialist rounds.
+-->
+
+## R1: Parallel six-specialist backlog review
+
+- **Specialists engaged:** on-call-engineer, behavioral-analyst, software-architect, test-engineer, user-experience-designer, junior-developer, project-manager (coordinator). Team size: large (cross-subsystem; user chose scope "everything we discussed"). Round cap 3; converged in 1.
+- **New input provided:** Initial inputs — the [scope brief](../scope-brief.md) (ground-truth spec stand-in; two items live-verified 2026-06-02 with file:line evidence) and the [discovery notes](.discovery-notes.md) (per-item code touch points). No prior round; this is the initial sweep.
+- **Claim ledger:** (consolidated, deduped — see [synthesis-input.md](synthesis-input.md) for the full table)
+
+  | # | Claim | State | Spec-maturity |
+  |---|-------|-------|---------------|
+  | C1 | F1 cancel route never aborts external child; no registry/export | Evidenced | plan-level |
+  | C2 | F1 catch blocks leave message `streaming`; success path writes `complete` on abort — fix same batch | Evidenced | plan-level |
+  | C3 | F2 = prune-now-minimal: unexport 8 zero-caller symbols, keep extractToolCallBlocks+stripToolMarkup | Evidenced | plan-level |
+  | C4 | F2 `<invoke>`-text fallback untested → add gate test before prune | Evidenced | plan-level |
+  | C5 | F3 optional logger param, do with F2 (same file) | Evidenced | plan-level |
+  | C6 | F6 stall-timeout via AbortSignal.any, 90s; NO retry (non-idempotent deltas) | Evidenced | plan-level |
+  | C7 | F7 inline MCP tool, messages_with_parts, role!='system', limit 50/200 | Evidenced | plan-level |
+  | C8 | F9 delete SPA block, keep routes; GET / unprobed → safe | Evidenced | plan-level |
+  | C9 | F4 hook-firing in unattended mode UNVERIFIED; goose hook mechanism unknown | Anecdotal (premise) | spec-level |
+  | C10 | F4 dedup rule: confirm running before `blocked`; suppress hook `done` | Evidenced | plan-level |
+  | C11 | F5 pinned @opencode-ai/sdk exposes no compaction arm → blocked on capability check | Evidenced | spec-level |
+  | C12 | F5 UI treatment sentinel-row vs ephemeral-frame | Disputed | spec-level |
+  | C13 | F8 no line-selection infra; diff source ambiguous; needs own spec | Evidenced | spec-level |
+
+- **Open Questions raised:**
+  - F1: terminal state (→ D-7, `cancelled`); registry key (→ D-1, `taskId`); shared finalize helper (→ D-1, yes); warm re-throw on abort (→ D-1, short-circuit on `ac.signal.aborted`).
+  - OQ-F2a (sidecar jinja) → moot under D-3 option A; OQ-F2c (a vs b) → D-3 option A.
+  - OQ-F6a/b/c → D-5 (AbortSignal.any, no retry, 90s).
+  - OQ-F7a (session vs chat id) → D-6 (both, chat_id optional, + limit).
+  - OQ-F9a (GET / probe) → D-11 (unprobed, safe).
+  - OQ-F4a (hooks fire unattended?), OQ-F4b (goose hook format) → UNRESOLVED, spec-level → OI-1, route F4 to plan-a-feature (D-12).
+  - OQ-F5a (SDK compaction event existence/name) → UNRESOLVED, capability check → OI-2, blocks F5 (D-12).
+  - OQ-F5b (sentinel vs ephemeral UI) → UNRESOLVED → OI-3, settle once event confirmed (D-12).
+  - OQ-F8a/b/c (diff source, serialization, new viewer) → UNRESOLVED, spec-level → OI-4, route F8 to plan-a-feature (D-12).
+  - OI-5 (F1 best-effort session-stop leg) → non-blocking, decided at implementation.
+- **Spec-maturity tags:** plan-level — C1-C8, C10 (9 claims). spec-level — C9, C11, C12, C13, plus OQ-F4b and OQ-F8a (≥5 across junior, behavioral, UX). **Spec-maturity gate TRIPPED**, concentrated in the three WANT items F4/F5/F8; F1/F2/F3/F6/F7/F9 are all plan-level and ready. No T#-contradictions.
+- **Resolution source:** evidence (Step 6 specialist findings) for every plan-level OQ (F1, F2, F3, F6, F7, F9); user input for the gate disposition (Sam's standing override → D-12); deferred-to-spec for OQ-F4a/F4b, OQ-F5a/F5b, OQ-F8a/b/c (recorded as OI-1..OI-4, routed out rather than resolved in this loop). The YAGNI gate ran during synthesis: F6 retry, F2 option B, F4 interface, F5 extra compaction arms, F7 reader interface all deferred.
+- **Decisions produced:** D-1, D-2, D-3, D-4, D-5, D-6, D-7, D-8, D-9, D-10, D-11, D-12 (all 12; the loop converged in one round so every decision originates here).
+- **Changed in plan:** all sections (initial authoring) — Source Specification, Outcome, Context, Implementation Approach (TIER 1 READY / TIER 2 BLOCKED), Decomposition and Sequencing, RAID Log, Testing Strategy, Security Posture, Operational Readiness, On-Call Resilience Posture, Definition of Done, Specialist Handoffs, Deferred (YAGNI), Open Items, Summary.
+- **Project-manager next-step recommendation:** Go to synthesis (done — this plan). Build the READY cluster in order F1 → F2+F3 → F6 → F7 → F9 as sequential patch tags; route F4 and F8 to `plan-a-feature` and F5 to an `@opencode-ai/sdk` capability check before any build on those three.
--- a/docs/plans/post-review-backlog/artifacts/synthesis-input.md
+++ b/docs/plans/post-review-backlog/artifacts/synthesis-input.md
@@ -0,0 +1,181 @@
+# Synthesis input — Round 1 aggregation + dispositions
+
+Deterministic aggregation of the Round-1 specialist review (on-call-engineer, behavioral-analyst,
+software-architect, test-engineer, user-experience-designer, junior-developer). This is the consolidated
+record the project-manager synthesizes into the three plan files. Evidence (file:line) is preserved inline.
+
+Team size: large (cross-subsystem, user chose "everything"). Round cap 3; converged in 1 round (the
+remaining unknowns are spec-level, not resolvable by more specialist rounds).
+
+---
+
+## Per-feature dispositions (the decisions)
+
+### READY TO BUILD
+
+**F1 — external task cancel kills child + finalizes message.** Strong 4-way consensus (on-call B1, behavioral
+B1, architect A1, junior).
+- Root cause CONFIRMED: `routes/tasks.ts:130-138` calls `inference.cancel` (native-only); dispatcher has no
+  `Map<taskId,AbortController>`; the four private `ac` (dispatcher.ts:316/655/991/1248) are unreachable;
+  `cancelExternalTask` does not exist anywhere.
+- Design (architect A1): add `taskControllers = new Map<string,AbortController>()` inside `createDispatcher`;
+  `taskControllers.set(taskId, ac)` at each of the 4 run-functions; delete in the existing `.finally()` at
+  dispatcher.ts:117; export `cancelExternalTask(taskId): boolean` (idempotent — `ac.abort()` is a no-op when
+  already aborted, so double-Stop and cancel-after-exit are safe). Pass a narrow `ExternalCancelFn`
+  (NOT the whole dispatcher) into `registerTaskRoutes`; wire in `index.ts:254`.
+- TWO pre-existing bugs F1 makes reachable, MUST be fixed in the same batch (on-call OCE-001/OCE-002,
+  behavioral B2/B3): (1) the four catch blocks update only `tasks` state, never the `messages` row → an
+  aborted/thrown turn leaves the assistant message `status='streaming'` (BooChat's 5-min sweep can't recover
+  it — different process); (2) the warm-backend success path writes `messages.status='complete'`
+  unconditionally before checking abort (dispatcher.ts ~853/1122/1377) → a cancelled turn is recorded
+  `complete`. Fix: after `await backend.prompt(...)`, `if (ac.signal.aborted)` → write `status='cancelled'`,
+  publish the terminal `message_complete` frame, emit idle, return; and in each catch finalize the message
+  with `WHERE status='streaming'` (idempotent) distinguishing AbortError→cancelled vs error→failed.
+- UX (UX agent): disable the Stop button while the cancel POST is in flight (mobile double-tap); extend the
+  coder `message_complete` frame with an optional `status` field (Option A — minimal, no new frame type) and
+  map it in the reducer (`CoderPane.tsx:299`, `MessageStatus` already includes `'cancelled'`); render a muted
+  "Stopped" label (not red, not a toast).
+- Tests (test-engineer T1-T3): extract a pure `CancelRegistry` (register/cancel/delete/has) — 4 unit cases,
+  no DB/child; one DB-integration test for the route → row lands `'cancelled'`; warm-worktree-preserved held
+  as a code comment, not a spy.
+- Resolved OQs: terminal state = `cancelled` (not `failed`) for user Stop; registry keyed by `taskId`
+  (route receives taskId); `session/stop` route — CoderPane already calls `cancelTask` for external tasks so
+  the session-stop path "never fires for external from UI" (on-call) — wire it best-effort via a
+  `SELECT id FROM tasks WHERE session_id=$ AND state='running'` lookup OR defer that leg (low value); use a
+  shared `cancelAndFinalize` helper across the 4 paths (TDD precedent).
+
+**F2 — tool-call-parser prune (option a: prune-now-minimal).** DECISION (architect A2, confirms junior
+OQ-F2c): do NOT do the flag-gated full retirement (option b). KEEP `extractToolCallBlocks` + `stripToolMarkup`
+ their types (`ToolCallExtraction`, `ParsedCall`) — load-bearing `<invoke>`-as-text guard (the only guard for
+that case; `experimental_repairToolCall` doesn't cover it; sidecar `--jinja` unconfirmed so keeping the guard
+is correct). REMOVE the `export` keyword (not the implementations) from the 8 zero-external-caller symbols:
+`isPlaceholderArgValue`, `parseXmlToolCall`, `parseInvokeToolCall`, `partialXmlOpenerStart`, and the 4 consts
+`XML_TOOL_OPEN/CLOSE`, `INVOKE_TOOL_OPEN/CLOSE`. Zero runtime effect; public surface 11→4 exports.
+- Test gap (test-engineer T6): the `<invoke>`-text fallback in `stream-phase.ts:263-284` is currently NOT
+  exercised by any test → add a gate test (stub `streamText` to emit a text-delta containing a complete
+  `<invoke>` block; assert it lands in `result.toolCalls` and the markup is NOT in `result.content`). Must
+  stay green through the prune and fail if `extractToolCallBlocks` is ever removed from the text-delta path.
+
+**F3 — xml-parser structured logging.** Trivial. `tool-call-parser.ts:65` `console.debug` → pass an optional
+`log?: { debug }` param to `extractToolCallBlocks` from its one call site (`stream-phase.ts` executeStreamPhase)
+and use it. No interface (architect: one site, one impl). SEQUENCING: same file as F2; F2 keeps
+`extractToolCallBlocks` (decided), so F3 is safe; do F2+F3 in one batch. Confirm `executeStreamPhase`
+signature/test-stubs tolerate the param (junior).
+
+**F6 — BooChat stall-timeout ONLY (retry deferred).** on-call: wrap the `stream-phase.ts:261` fullStream loop
+with a per-chunk stall deadline: a local `stallAc = new AbortController()`, `effectiveSignal =
+AbortSignal.any([signal, stallAc.signal])` passed to `streamText`; bump a `setTimeout(STALL_TIMEOUT_MS=90_000)`
+on each chunk; clear it in the existing `finally`; at the post-loop check (stream-phase.ts:337) test
+`signal?.aborted || stallAc.signal.aborted` and throw `AbortError` (→ `handleAbortOrError` writes
+`cancelled`). Tests (test-engineer T8-T10): pure `classifyStreamError(err)` helper (5 cases, no I/O) + a
+`vi.useFakeTimers()` stall test on a fake hanging stream + a regression pin on the existing `signal?.aborted`
+post-loop check.
+- YAGNI DEFER (on-call, strong): NO retry at `executeStreamPhase`/`streamCompletion`. A retry after partial
+  stream re-emits already-streamed deltas (`state.accumulated` + live `delta` frames are non-idempotent) —
+  worse than current. Reopen trigger: llama-swap gains restart-in-place-with-clear-partial, or a second
+  instance for failover. The user re-sending is the correct recovery at single-instance scale.
+
+**F7 — view_session_history MCP tool.** architect A4: add tool 7 inline in `mcp-server.ts` (follows the
+existing 6-tool inline pattern, `textResult` + direct `sql`). Reads `messages_with_parts`, `WHERE role !=
+'system'` (strips sentinels), params `session_id` + optional `chat_id` + `limit` (default 50, max 200),
+`ORDER BY created_at ASC`. No interface, no pagination beyond limit. Returns `{role,content,...}[]`.
+
+**F9 — retire apps/coder/web :9502 SPA.** architect A5: the `if (existsSync(webRoot))` block in `index.ts`
+(~269-289) already no-ops when the dist is absent. Delete that block, keep the inline 404 handler
+(`{error:'not found'}`); remove `apps/coder/web` from `pnpm-workspace.yaml`, the coder build step, and the
+Dockerfile copy; remove the now-unused `fastifyStatic` import (verify it's only used there). KEEP all
+`/api/coder/*` REST + WS + `/api/health` + `--mcp` routes (CoderPane depends on them). OQ-F9a RESOLVED:
+nothing probes `GET /` on :9502 (health is `/api/health`; compose healthcheck is the boocode container, not
+the host-systemd coder) → safe to 404 or add a 2-line `GET /` redirect-to-BooChat (no fastifyStatic).
+
+### BLOCKED — need a spec or a capability check before building (gate-trip items)
+
+**F4 — notify-hook config injection.** SPEC-LEVEL gaps (junior OQ-F4a-e, behavioral B4, UX). The core premise
+is UNVERIFIED: do claude / qwen / goose actually fire their native lifecycle hooks in unattended mode
+(`claude -p` / SDK, `qwen --acp` / `--output-format stream-json`, goose)? goose's hook file/format is unknown
+(not in repo). Idempotent per-agent settings.json merge strategy unspecified. `boocoder.service` run-user /
+`homedir()` resolution unconfirmed. The inbound POST is a new unauthenticated localhost route (acceptable
+single-user, note it). Double-publish dedup with the v2.7.6 turn-boundary publish: behavioral B4 +
+architect A3 agree on the rule — inbound route calls `normalizeAgentEvent` (returns bucket
+`working|blocked|done`), confirms `tasks.state='running'` before publishing `blocked`, and SUPPRESSES `done`
+(the dispatcher already emits `idle`); `done`→drop, never re-publish. UI side already exists (AgentStatusDot,
+all 4 buckets — UX: F4 is server-side only). RECOMMENDATION: own `plan-a-feature` — the dedup rule + module
+shapes are settled, but the hook-firing-in-unattended-mode premise and goose hook mechanism must be verified
+first or the whole feature is built on sand.
+
+**F5 — opencode compaction surfacing.** BLOCKED on a capability check. The installed `@opencode-ai/sdk`
+exposes NO compaction event arm (current arms confirmed: `session.next.{text,reasoning,tool,step}.*`,
+`message.part.*`, `session.idle/error` at opencode-server.ts:379-491). The review's "consume
+compaction.{started,delta,ended}" assumed events from opencode's CORE `event.ts`, which the pinned SDK may not
+surface. MUST confirm the SDK emits a compaction signal + its exact event name (or an SDK bump is needed)
+before building. DISPUTED UI treatment (behavioral B5 = persistent sentinel row `metadata.kind='compaction'`,
+survives refresh; UX = ephemeral inline divider via a new `agent_compacted` frame, no DB row) — settle once
+the event exists. Only `compaction.ended` is in scope (YAGNI: started/delta/step.failed/tool.progress out).
+Cross-app WS-frame parity is certain if a frame is added.
+
+**F8 — diff-line → agent re-prompt.** SPEC-LEVEL (UX + junior, firm). The "DiffPanel" is inline in
+`CoderPane.tsx:478-619`, rendering `pending_changes` rows as a static `<pre>` (CoderPane.tsx:607-610) — NO
+line-selection infrastructure exists. Diff source ambiguous (`pending_changes.diff` = BooCode write-tools only
+vs the external-agent worktree git diff). "Send to new agent" needs coordinated workspace-pane + chat creation
+ pre-population across 3 surfaces with no existing contract. Selection diverges by modality (desktop line-
+select vs mobile long-press → bottom sheet). RECOMMENDATION: own `plan-a-feature` (the scope-brief already
+hedged this; treat as firm). MVP-if-pushed: "comment to current agent" only, block-level selection,
+pre-populate `ChatInput` — still wants a spec.
+
+---
+
+## Claim ledger (consolidated, deduped)
+
+| # | Claim | State | Spec-maturity | Supporting |
+|---|-------|-------|---------------|-----------|
+| C1 | F1 cancel route never aborts external child; no registry/export | Evidenced | plan-level | on-call,behavioral,architect,junior |
+| C2 | F1 catch blocks leave message `streaming`; success path writes `complete` on abort — fix in same batch | Evidenced | plan-level | on-call,behavioral |
+| C3 | F2 = prune-now-minimal: unexport 8 zero-caller symbols, keep extractToolCallBlocks+stripToolMarkup | Evidenced | plan-level | architect (test-engineer guard) |
+| C4 | F2 `<invoke>`-text fallback is untested → add gate test before prune | Evidenced | plan-level | test-engineer |
+| C5 | F3 optional logger param, do with F2 (same file) | Evidenced | plan-level | architect,junior |
+| C6 | F6 stall-timeout via AbortSignal.any, 90s; NO retry (non-idempotent deltas) | Evidenced | plan-level | on-call,behavioral,test-engineer |
+| C7 | F7 inline MCP tool, messages_with_parts, role!='system', limit 50/200 | Evidenced | plan-level | architect,UX |
+| C8 | F9 delete SPA block, keep routes; GET / unprobed → safe | Evidenced | plan-level | architect (+ verified) |
+| C9 | F4 hook-firing in unattended mode UNVERIFIED; goose hook mechanism unknown | Anecdotal (premise) | spec-level | junior,behavioral,UX |
+| C10 | F4 dedup rule: confirm running before `blocked`; suppress hook `done` | Evidenced | plan-level | behavioral,architect |
+| C11 | F5 pinned @opencode-ai/sdk exposes no compaction arm → blocked on capability check | Evidenced | spec-level | (verified) + junior |
+| C12 | F5 UI treatment sentinel-row vs ephemeral-frame | Disputed | spec-level | behavioral vs UX |
+| C13 | F8 no line-selection infra; diff source ambiguous; needs own spec | Evidenced | spec-level | UX,junior |
+
+## Open Questions — resolutions
+
+- OQ (F1 terminal state) → RESOLVED: `cancelled`. OQ (F1 registry key) → RESOLVED: `taskId`. OQ (F1 shared
+  finalize helper) → RESOLVED: yes, pure helper. OQ (F1 warm re-throw on abort) → RESOLVED: short-circuit on
+  `ac.signal.aborted`.
+- OQ-F2a (sidecar jinja) → RESOLVED moot: option a keeps the guard. OQ-F2c (a vs b) → RESOLVED: option a.
+- OQ-F6a/b/c → RESOLVED: AbortSignal.any (not Promise.race); no retry; 90s.
+- OQ-F7a (session vs chat id) → RESOLVED: both (chat_id optional) + limit.
+- OQ-F9a (GET / probe) → RESOLVED: unprobed, safe.
+- OQ-F4a (hooks fire unattended?), OQ-F4b (goose hook format) → UNRESOLVED, spec-level → route to F4 spec.
+- OQ-F5a (SDK compaction event name/existence) → UNRESOLVED, capability check → blocks F5.
+- OQ-F5b (sentinel vs ephemeral UI) → UNRESOLVED → settle in F5 once event confirmed.
+- OQ-F8a/b/c (diff source, serialization, new viewer) → UNRESOLVED, spec-level → route to F8 spec.
+
+## Spec-maturity gate
+
+TRIPPED (≥5 spec-level findings — C9, C11, C12, C13, plus OQ-F4b/F8a — across ≥3 specialists: junior,
+behavioral, UX). The trip is CONCENTRATED in the three WANT items F4/F5/F8; F1/F2/F3/F6/F7/F9 are all
+plan-level and ready. Per skill: gate-trip → recommend the user route F4/F8 to `plan-a-feature` and F5 to a
+capability check. USER OVERRIDE STANDING: Sam chose scope "everything we discussed" having pre-acknowledged the
+WANT items would be planned more shallowly — so the plan proceeds, documenting F4/F5/F8 as Blocked/own-spec
+rather than halting. Decision deferred to Step 9 user presentation.
+
+## YAGNI ledger
+
+- F6 retry logic → DEFER (non-idempotent re-emit of streamed deltas). Reopen: llama-swap restart-in-place or
+  second instance. Source: on-call R1.
+- F2 option b (flag-gated full retirement of extractToolCallBlocks/stripToolMarkup) → DEFER (no evidence
+  qwen3.6 stopped emitting `<invoke>` text on live; sidecar jinja unconfirmed). Reopen: documented multi-
+  session live probe shows zero text-delta tool calls. Source: architect/test-engineer R1.
+- F4 `NotifyHookInjection` interface → REPLACE with one concrete function switching on agent name (3 agents,
+  identical read-merge-write). Source: architect R1.
+- F5 handling of compaction.started/delta + step.failed + tool.progress → DEFER, only compaction.ended is
+  user-actionable. Source: behavioral R1.
+- F7 SessionHistoryReader interface / pagination → REPLACE with inline query + limit. Source: architect R1.
+- Provider tier-2 follow-ups (snapshot frame, enabled column, shared types, MCP list_providers) → already
+  DEFER/DROP per scope-brief; not re-planned.