boocode/docs/plans/post-review-backlog/feature-implementation-plan.md

# Feature Implementation Plan: Post-Review Backlog (F1–F9)

This is a multi-item backlog, not a single feature. It commits to shipping six **READY** items as
independent sequential patch tags ([D-9](artifacts/implementation-decision-log.md#trivial-decisions)), one batch per
coherent unit, honoring deploy-by-surface and stage-commits-by-path; and to documenting three **BLOCKED**
WANT items (F4/F5/F8) with their exact blocking open questions and recommended resolution paths
([D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt))
rather than halting on the tripped spec-maturity gate.

## Source Specification

- **Feature specification:** No formal `feature-specification.md` exists. The ground-truth source is the [scope brief](scope-brief.md), which captures the items, decisions, and live-verified current state from the 2026-06-02 backlog review conversation plus `boocode_code_review_v2.md`, `docs/DEFERRED-WORK.md`, and `boocode_roadmap.md`. Origin is conversational + review-doc, so the WANT items are intentionally planned more shallowly.
- **Consolidated specialist record:** [artifacts/synthesis-input.md](artifacts/synthesis-input.md) — the Round-1 aggregation of all six specialists (on-call-engineer, behavioral-analyst, software-architect, test-engineer, user-experience-designer, junior-developer) with file:line evidence, the claim ledger, OQ resolutions, the spec-maturity gate, and the YAGNI ledger. There is no separate per-specialist file; this digest is authoritative.
- **Discovery notes:** [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md) — project context and per-item code touch points.

## Outcome

When this plan's READY cluster is executed: hitting Stop on an external agent task actually kills the
running child and finalizes the assistant message in a correct terminal state (no more `streaming`-stuck or
falsely-`complete` rows); the tool-call parser's public surface is pruned to its four load-bearing exports
while the plain-text `<invoke>` guard stays intact and is for the first time test-pinned; the parser's
rejection logging flows through pino/`LOG_LEVEL`; a hung BooChat stream is detected and finalized
server-side after 90s instead of relying solely on the frontend watchdog; an MCP client can read a
session's transcript through a new BooCoder MCP tool; and the unused :9502 fallback SPA and its build step
are gone. F4/F5/F8 leave this plan with a documented, evidence-backed route to their own spec/capability
work.

## Context

- **Driving constraint:** A post-review backlog with two items live-verified as active correctness bugs (F1 cancel is a no-op that leaves messages stuck; F6 has zero server-side stall guard). The review surfaced them; Sam chose scope "everything we discussed." No external deadline — value is correctness and simplification debt paydown.
- **Stakeholders:** Sam (sole user/operator — wants Stop to work, wants the parser simplified, wants resilience and the MCP history tool, greenlit the :9502 removal). On-call posture is single-operator: Sam is the post-ship owner for every item.
- **Future-state concern:** F1 touches the dispatcher's message-finalization paths across four backends — the risk to watch is finalization races/regressions there. F6 introduces a server-side timer that must not fire on a healthy-but-slow stream. F5 is a standing capability dependency on the pinned `@opencode-ai/sdk`.
- **Out-of-scope boundary:** F2 Option B (flag-gated full parser retirement), F6 retry/backoff, F4's superset code (clean-room pattern only), and everything in the scope brief's "Deferred (Sam's explicit dispositions)" list (subagent permission demux, tier-2 provider follow-ups, large-file splits, PR-resolver, etc.). These are recorded, not planned.

## Team Composition and Participation

Round-by-round detail lives in [artifacts/implementation-iteration-history.md](artifacts/implementation-iteration-history.md). All six specialists converged in one round; remaining unknowns are spec-level, not resolvable by more specialist rounds.

| Specialist | Status | Key Input |
|------------|--------|-----------|
| `project-manager` | Coordinator | Facilitated R1, applied the spec-maturity gate + YAGNI gate, synthesized this plan. |
| `on-call-engineer` | Active | F1 finalization bugs (OCE-001/002); F6 stall-timeout design + no-retry YAGNI call. |
| `behavioral-analyst` | Active | F1 message-state corruption (B2/B3); F5 sentinel-row UI position (disputed); F4 dedup rule. |
| `software-architect` | Active | F1 registry/`ExternalCancelFn` shape (A1); F2 option-A prune (A2); F7 inline tool (A4); F9 removal (A5). |
| `test-engineer` | Active | F1 `CancelRegistry` + DB test (T1-T3); F2 fallback gate test (T6); F6 `classifyStreamError` + fake-timer test (T8-T10). |
| `user-experience-designer` | Active | F1 disable-Stop-while-in-flight + Option-A frame extension + muted "Stopped" label; F5 ephemeral-divider UI (disputed); F8 no line-selection infra. |
| `junior-developer` | Active | OQ reframing across all items; F2c (option a), F4a/b unverified premise, F8 spec-level. |

## Implementation Approach

ALTITUDE: this section names code artifacts and inlines only decision-bearing values (flag names, the 90s
stall timeout, the 50/200 limits, the export list to unexport). Full implementations belong in the files
themselves; mechanics live in the cited decision-log entries.

### TIER 1 — READY TO BUILD

**F1 — external task cancel kills the child + finalizes the message (apps/coder, standalone, do first).**
The highest-value correctness fix. A `taskControllers = new Map<string,AbortController>()` registry inside
`createDispatcher`, populated at the four run-functions and deleted in the existing `.finally()`, plus an
exported idempotent `cancelExternalTask(taskId): boolean` wired into `POST /api/tasks/:id/cancel` via a
narrow `ExternalCancelFn`
([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)).
This batch **must** also fix the two pre-existing finalization bugs the abort wiring newly makes reachable —
the catch blocks that leave messages `streaming` and the warm success path that writes `complete` on an
aborted turn — via a shared `cancelAndFinalize` helper, or it ships a new bug
([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)).
User Stop finalizes to `cancelled`, not `failed`
([D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed));
the terminal state reaches the web reducer via an optional `status` field on the existing coder
`message_complete` frame, not a new frame type
([D-8](artifacts/implementation-decision-log.md#d-8-f1-status-surfacing--extend-the-existing-frame-no-new-frame-type)),
rendered as a muted "Stopped" label ([D-10](artifacts/implementation-decision-log.md#trivial-decisions)). The
`session/stop` leg "never fires for external from UI" (CoderPane already calls `cancelTask`), so it is wired
best-effort via a `WHERE session_id=$ AND state='running'` lookup or deferred as a low-value leg.

**F2 + F3 together (apps/server, same file).** F2 is option-A prune-now-minimal: KEEP `extractToolCallBlocks`
+ `stripToolMarkup` (the only guard for a tool call emitted as plain text), unexport the 8 zero-caller
symbols — `isPlaceholderArgValue`, `parseXmlToolCall`, `parseInvokeToolCall`, `partialXmlOpenerStart`, and
the consts `XML_TOOL_OPEN/CLOSE`, `INVOKE_TOOL_OPEN/CLOSE` — taking the public surface 11 → 4 with zero
runtime effect
([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard)).
A gate test pins the untested `<invoke>`-as-text fallback before the prune
([D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)).
Because F2 keeps `extractToolCallBlocks`, F3 is safe to land in the same file/batch: thread an optional
`log?: { debug }` param into `extractToolCallBlocks` from its one call site so rejection logging flows
through pino/`LOG_LEVEL` instead of `console.debug` ([D-2](artifacts/implementation-decision-log.md#trivial-decisions)).

**F6 — BooChat stall-timeout ONLY (apps/server, standalone).** Wrap the `stream-phase.ts:261` `fullStream`
loop with a per-chunk stall deadline using `AbortSignal.any([signal, stallAc.signal])` and a
`setTimeout(STALL_TIMEOUT_MS = 90_000)` bumped on each chunk, cleared in the existing `finally`; the
post-loop check throws `AbortError` so the existing finalize path writes `cancelled`
([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)). **No
retry** — partial-stream re-emit is non-idempotent and the instance is single+local (see Deferred (YAGNI)).

**F7 — `view_session_history` MCP tool (apps/coder MCP, standalone).** Tool 7 added inline in
`mcp-server.ts` following the existing 6-tool inline pattern (`textResult` + direct `sql`). Reads
`messages_with_parts` `WHERE role != 'system'`, params `session_id` + optional `chat_id` + `limit` (default
50, max 200), `ORDER BY created_at ASC` ([D-6](artifacts/implementation-decision-log.md#trivial-decisions)).
No interface, no pagination beyond `limit`.

**F9 — retire :9502 SPA (apps/coder, standalone cleanup).** Delete the `if (existsSync(webRoot))` serve
block in `index.ts` (~269-289), keep the inline 404; remove `apps/coder/web` from `pnpm-workspace.yaml`, the
coder build step, and the Dockerfile copy; drop the now-unused `fastifyStatic` import. KEEP every
`/api/coder/*` REST + WS + `/api/health` + `--mcp` route
([D-11](artifacts/implementation-decision-log.md#d-11-f9-retire-9502-spa--delete-the-serve-block-keep-all-apiws-routes)).

### TIER 2 — BLOCKED (need their own spec / a capability check)

These three are recorded, not built. They do **not** block the READY cluster
([D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt)).

**F4 — notify-hook config injection.** The module shapes and the double-publish dedup rule are settled
(inbound route calls `normalizeAgentEvent` → bucket; confirms `tasks.state='running'` before publishing
`blocked`; SUPPRESSES `done` since the dispatcher already emits `idle`). But the **core premise is
unverified**: it is unknown whether claude / qwen / goose actually fire their native lifecycle hooks in
unattended mode (`claude -p`/SDK, `qwen --acp`/`stream-json`, goose), and goose's hook file/format is
unknown (not in repo). **Recommended path:** route to `plan-a-feature`; verify hook-firing-in-unattended-mode
and the goose hook mechanism first, or the feature is built on sand. Blocking OQs: OQ-F4a (hooks fire
unattended?), OQ-F4b (goose hook format?).

**F5 — opencode compaction surfacing.** **Blocked on a capability check**: the pinned `@opencode-ai/sdk`
exposes NO compaction event arm (confirmed arms: `session.next.{text,reasoning,tool,step}.*`,
`message.part.*`, `session.idle/error`). The review assumed events from opencode's core `event.ts` that the
pinned SDK may not surface. **Recommended path:** confirm the SDK emits a compaction signal and its exact
event name (or an SDK bump is needed) before building. The UI treatment is **disputed** (behavioral:
persistent sentinel row `metadata.kind='compaction'` that survives refresh; UX: ephemeral inline divider via
a new `agent_compacted` frame, no DB row) — settle once the event is confirmed to exist. Only
`compaction.ended` is in scope. Blocking OQs: OQ-F5a (SDK event existence/name), OQ-F5b (sentinel vs
ephemeral UI).

**F8 — diff-line → agent re-prompt UX.** **Spec-level.** The "DiffPanel" is inline in
`CoderPane.tsx:478-619`, rendering `pending_changes` rows as a static `<pre>` — no line-selection
infrastructure exists. The diff source is ambiguous (`pending_changes.diff` = BooCode write-tools only vs the
external-agent worktree git diff). "Send to new agent" needs coordinated workspace-pane + chat creation +
pre-population across three surfaces with no existing contract; selection diverges by modality.
**Recommended path:** route to `plan-a-feature` (the scope brief already hedged this; treat as firm).
MVP-if-pushed: "comment to current agent" only, block-level selection, pre-populate `ChatInput` — still
wants a spec. Blocking OQs: OQ-F8a (diff source), OQ-F8b (selection serialization), OQ-F8c (new-viewer
routing).

### Architecture and Integration Points

F1 lives entirely in apps/coder (`dispatcher.ts`, `routes/tasks.ts`, `index.ts:254` wiring) with a
single-field touch on the web reducer (`CoderPane.tsx:299`). F2/F3 are confined to
`apps/server/.../tool-call-parser.ts` and its one call site `stream-phase.ts`. F6 is confined to
`stream-phase.ts`. F7 is one inline tool in `apps/coder/.../mcp-server.ts`. F9 removes a serve block and a
workspace package. No new interface is introduced anywhere in TIER 1 — every item follows the existing inline
/ pure-helper precedent ([D-2](artifacts/implementation-decision-log.md#trivial-decisions),
[D-6](artifacts/implementation-decision-log.md#trivial-decisions)).

### Runtime Behavior

F1's cancel path: Stop → `POST /api/tasks/:id/cancel` → `cancelExternalTask(taskId)` → `ac.abort()` →
backend honors the signal (`child.kill` / `session/cancel` / `session.abort` / interrupt) → the run-function
hits its abort short-circuit or catch → `cancelAndFinalize` writes the terminal `messages.status`
(`cancelled` on abort, `failed` on error) `WHERE status='streaming'` and publishes the terminal frame
([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope),
[D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed)).
F6's stall path: each chunk bumps the 90s timer; on stall, `stallAc.abort()` flows through `AbortSignal.any`
into the same finalize-as-`cancelled` path
([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).

### Data Model and Persistence

No schema changes in any TIER 1 item. F1 and F6 only change *which* terminal value gets written to existing
`messages.status` / `tasks.state` columns. F7 is a read against the existing `messages_with_parts` view.

### External Interfaces

F1 extends the existing coder `message_complete` WS frame with an optional `status` field — no new frame
type, so no paired strict-union arm is forced in `apps/web/src/api/types.ts` beyond the optional field
([D-8](artifacts/implementation-decision-log.md#d-8-f1-status-surfacing--extend-the-existing-frame-no-new-frame-type)).
F7 exposes one new read-only MCP tool on BooCoder's MCP server
([D-6](artifacts/implementation-decision-log.md#trivial-decisions)). F4 (blocked) would add a new
unauthenticated localhost POST route — see Security Posture. F5 (blocked) would add a new WS frame and thus
trigger the full cross-app parity rule.

## Decomposition and Sequencing

Each READY item is its own patch tag, one batch per coherent unit
([D-9](artifacts/implementation-decision-log.md#trivial-decisions)). F2+F3 are a single unit (same file; F2 keeping
`extractToolCallBlocks` unblocks F3). F1 goes first as the highest-value correctness fix.

| # | Work Unit | Surface / Deploy | Delivers | Depends On | Verification |
|---|-----------|------------------|----------|------------|--------------|
| 1 | **F1** cancel registry + finalization fixes | apps/coder → `systemctl restart boocoder` | Stop kills child; message finalizes `cancelled`/`failed`; no stuck `streaming` ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)) | — | `CancelRegistry` unit (4 cases) + DB-integration route test → row `cancelled` (T1-T3) |
| 2 | **F2+F3** parser prune + logger | apps/server → docker rebuild | Public surface 11→4; rejection logging via pino; guard pinned ([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard), [D-2](artifacts/implementation-decision-log.md#trivial-decisions)) | F2's keep-decision unblocks F3 | Fallback gate test green through prune (T6, [D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)) |
| 3 | **F6** stall-timeout | apps/server → docker rebuild | 90s server-side stall detection → finalize `cancelled` ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)) | — | `classifyStreamError` unit (5 cases) + `vi.useFakeTimers()` stall test + post-loop abort regression pin (T8-T10) |
| 4 | **F7** session-history MCP tool | apps/coder → `systemctl restart boocoder` | Read-only transcript tool; `/api/health` tool count +1 ([D-6](artifacts/implementation-decision-log.md#trivial-decisions)) | — | Manual MCP call + sentinel-strip assertion (`role != 'system'`) |
| 5 | **F9** retire :9502 SPA | apps/coder → `systemctl restart boocoder` | Serve block + build step + workspace pkg removed; routes kept ([D-11](artifacts/implementation-decision-log.md#d-11-f9-retire-9502-spa--delete-the-serve-block-keep-all-apiws-routes)) | — | `/api/coder/*` + `/api/health` still 200; `GET /` 404-or-redirect |

F4/F5/F8 are NOT in this table — they route out per
[D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt).

## RAID Log

### Risks

| ID | Risk | Likelihood | Severity | Blast Radius | Reversibility | Owner | Mitigation |
|----|------|------------|----------|--------------|---------------|-------|------------|
| R1 | F1 message-finalization races across the 4 backend paths leave a row in the wrong terminal state | Medium | Medium | One assistant message per affected turn | Reversible (re-run) | on-call-engineer | Shared `cancelAndFinalize` helper, `WHERE status='streaming'` idempotency, abort short-circuit before the unconditional `complete` write ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)) |
| R2 | F2 prune accidentally removes the `<invoke>`-as-text fallback guard (regression: plain-text tool calls silently dropped) | Low | High | Any qwen3.6 turn that emits tool-call markup as text | Reversible (re-add export/call) | software-architect | Option-A keeps the impl; gate test fails if `extractToolCallBlocks` leaves the text-delta path ([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard), [D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)) |
| R3 | F6 stall-timeout fires on a healthy-but-slow stream (false abort) | Low | Low | One in-flight chat turn | Reversible (re-send) | on-call-engineer | 90s per-*chunk* (not per-turn) deadline, bumped on every chunk; tune only on observed false-fire ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)) |
| R4 | F5 premise fails — the pinned SDK never surfaces a compaction event, so the feature is unbuildable without an SDK bump | High | N/A (blocked) | F5 only | N/A | (capability check) | Confirm event existence/name before any build; F5 stays Blocked until then ([D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt)) |
| R5 | F4 premise fails — agents do not fire native hooks in unattended mode, so injection yields no signals | High | N/A (blocked) | F4 only | N/A | (plan-a-feature) | Verify hook-firing + goose format in the F4 spec before any build ([D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt)) |

### Assumptions

| ID | Assumption | What Changes If Wrong | Verifier | Status |
|----|------------|-----------------------|----------|--------|
| A1 | llama-swap native `--jinja` parsing stays ON (structured `tool_calls`), so the text-scrape fallback remains dormant defense-in-depth | If off, the kept guard becomes hot — fine; if a future config silently changes parsing, F2 Option B reopens | live probe of `:8401` (not in-repo) | Unverified-in-repo (probe-only); guard kept regardless ([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard)) |
| A2 | All four external backends honor `ctx.signal` as cited (`child.kill` / `session/cancel` / `session.abort` / interrupt) | If one does not, that backend's Stop stays a no-op even after wiring | F1 DB-integration test per backend path | Evidenced by file:line (scope-brief F1); test confirms |
| A3 | 90s is the right stall deadline for the local llama-swap workload shape | Too low → false aborts on slow models; too high → sluggish recovery | Production observation | Committed value, tunable ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)) |
| A4 | F1's `session/stop` leg "never fires for external from UI" (CoderPane calls `cancelTask`) | If a code path does hit session-stop for an external task, that leg's finalization is best-effort only | on-call review of CoderPane wiring | Evidenced (synthesis-input F1 resolved OQs) |

### Dependencies

| ID | Dependency | Owner | Status |
|----|------------|-------|--------|
| Dep1 | F5 requires `@opencode-ai/sdk` to surface a compaction event (or an SDK bump) | (capability check) | BLOCKING F5 — unresolved (OQ-F5a) |
| Dep2 | F4 requires confirmed unattended hook-firing for claude/qwen/goose + the goose hook format | (plan-a-feature) | BLOCKING F4 — unresolved (OQ-F4a, OQ-F4b) |
| Dep3 | F8 requires a chosen diff source + a line-selection approach | (plan-a-feature) | BLOCKING F8 — unresolved (OQ-F8a/b/c) |

## Testing Strategy

Sourced from test-engineer (T1-T3, T6, T8-T10) following the established pure-helper-then-wire precedent
(`turn-guard.ts`, `lifecycle-decisions.ts`, `mistake-tracker.ts`). Coder tests are `globals:false` (import
`describe`/`it`/`expect`); include glob `src/**/__tests__/**/*.test.ts`; DB-integration tests are opt-in via
`DATABASE_URL` + `describe.runIf`.

- **Observable behaviors to test:**
  - F1: a cancelled external task lands `messages.status='cancelled'` (not `streaming`, not `complete`); the registry register/cancel/delete/has behaves idempotently ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope), [D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed)).
  - F2: a text-delta carrying a complete `<invoke>` block lands in `result.toolCalls` with the markup absent from `result.content` — green through the prune, red if the guard is removed ([D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)).
  - F6: a fake hanging stream is aborted after the 90s deadline under `vi.useFakeTimers()`; the existing post-loop `signal?.aborted` check still passes (regression pin) ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
  - F7: results exclude `role='system'` rows (sentinel strip) and respect the `limit` cap.
- **Test doubles posture:** extract pure helpers (`CancelRegistry`, `classifyStreamError`) with no DB/child/I/O for unit coverage; stub `streamText` for the F2 gate and F6 stall tests; one DB-integration test for the F1 route. Warm-worktree-preserved is held as a code comment, not a spy (test-engineer T1-T3).
- **Edge cases requiring coverage:** double-Stop / cancel-after-exit (idempotent `ac.abort()`); abort vs thrown error in the catch (cancelled vs failed mapping); F6 stall vs healthy-slow stream.
- **Test levels:** unit (registry, classifier, gate) → integration (F1 route → DB row) → manual (F7 MCP call, F9 route-still-200 smoke).

## Security Posture

Thin. The single TIER 1 surface is BooCoder's new read-only `view_session_history` MCP tool (F7), which
reads only `messages_with_parts` `WHERE role != 'system'` — no write verbs, no PII beyond what the operator
already authored, single-user, Authelia at the reverse proxy, no app-layer auth by design.

The one genuinely new auth-relevant surface — F4's inbound unauthenticated localhost POST route for hook
callbacks — belongs to a **BLOCKED** item and is not built in this plan. When F4 is planned, it must be
recorded explicitly as a new unauthenticated localhost route (acceptable under the single-user / Authelia
posture, but called out, not silent).

## Operational Readiness

- **Deploy by surface:** apps/coder items (F1, F7, F9) → `sudo systemctl restart boocoder`; apps/server items (F2+F3, F6) → `docker compose up --build -d boocode`. Stage commits explicitly by path; never `git add -A` (Sam has uncommitted web WIP) ([D-9](artifacts/implementation-decision-log.md#trivial-decisions)).
- **Build-step change (F9):** removing `apps/coder/web` deletes a coder build step and a workspace package; the coder Dockerfile copy and `pnpm-workspace.yaml` entry must be removed in the same batch or the build references a missing package ([D-11](artifacts/implementation-decision-log.md#d-11-f9-retire-9502-spa--delete-the-serve-block-keep-all-apiws-routes)).
- **Health check after F7:** `/api/health` reports a tool count; expect it to increment by one.
- **Tagging:** sequential patch tags, one per unit; no v2.8.0 ([D-9](artifacts/implementation-decision-log.md#trivial-decisions)).
- **Rollback:** each unit is an independent patch tag, so rollback is per-item (revert the tag, redeploy by surface). No schema migrations means no expand/contract sequencing to unwind.

## On-Call Resilience Posture

Application-source resilience, sourced from on-call-engineer. Infrastructure concerns live in Operational
Readiness above.

- **Timeouts and deadlines:** F6 adds a 90s per-chunk stall deadline on the BooChat `fullStream` loop, the first server-side guard against a hung llama-swap stream (today only the frontend 60s `discard_stale` watchdog exists) ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
- **Retry strategy:** none, deliberately. A retry after a partial stream re-emits already-streamed deltas (non-idempotent); the user re-sending is the correct recovery at single-local-instance scale (see Deferred (YAGNI)) ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
- **Idempotency:** F1's `cancelExternalTask` is idempotent (`ac.abort()` no-ops when already aborted → double-Stop and cancel-after-exit are safe); message finalization is idempotent via `WHERE status='streaming'` ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)).
- **Kill switches:** F1 is itself the kill switch for a runaway external agent — making Stop actually stop is the operator's manual control.
- **Graceful degradation:** F1's `session/stop` leg is best-effort (does not fire for external tasks from the UI); F6 falls back to finalizing as `cancelled` on stall rather than hanging indefinitely.
- **Observability of failure paths:** F1 distinguishes `cancelled` (user/stall) from `failed` (thrown error) so the human-inbox surface stays honest ([D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed)); F3 routes parser-rejection logging through pino/`LOG_LEVEL` ([D-2](artifacts/implementation-decision-log.md#trivial-decisions)).
- **Data integrity:** no monetary/rate columns touched; F1/F6 only write existing terminal-state enum values into existing columns.
- **Migration safety:** no schema changes in any TIER 1 item — nothing to expand/contract.

## Definition of Done

- [ ] Hitting Stop on an external task kills the child and the assistant message finalizes `cancelled` (never left `streaming`, never falsely `complete`) ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope), [D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed)).
- [ ] Stop button is disabled while the cancel POST is in flight; a muted "Stopped" label renders on cancel ([D-8](artifacts/implementation-decision-log.md#d-8-f1-status-surfacing--extend-the-existing-frame-no-new-frame-type), [D-10](artifacts/implementation-decision-log.md#trivial-decisions)).
- [ ] Parser public surface is 4 exports; the `<invoke>`-as-text gate test is green and fails if the guard is removed ([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard), [D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)).
- [ ] Parser rejection logging appears via pino and respects `LOG_LEVEL` ([D-2](artifacts/implementation-decision-log.md#trivial-decisions)).
- [ ] A fake hanging BooChat stream is finalized `cancelled` after 90s under fake timers; the post-loop abort regression pin passes ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
- [ ] `view_session_history` returns a session transcript excluding `role='system'` rows, capped at `limit` (default 50, max 200); `/api/health` tool count +1 ([D-6](artifacts/implementation-decision-log.md#trivial-decisions)).
- [ ] :9502 serve block + build step + workspace package removed; all `/api/coder/*` + `/api/health` routes still 200 ([D-11](artifacts/implementation-decision-log.md#d-11-f9-retire-9502-spa--delete-the-serve-block-keep-all-apiws-routes)).
- [ ] Each shipped item has its own sequential patch tag + CHANGELOG entry; deployed by surface ([D-9](artifacts/implementation-decision-log.md#trivial-decisions)).

## Specialist Handoffs for Implementation

- **`test-engineer`** — dispatch alongside each READY unit; needs the touch-point file:lines (already in discovery notes) to author T1-T3 (F1), T6 (F2), T8-T10 (F6) before/with the wiring.
- **`software-architect`** — dispatch if F1's `cancelAndFinalize` helper shape needs settling across the four backend paths during implementation; needs the four catch-block + success-path line ranges.
- **`plan-a-feature` (F4)** — dispatch when Sam wants F4; input: the settled dedup rule + module shapes from synthesis-input, and the two blocking OQs (unattended hook-firing, goose format) to resolve first.
- **`plan-a-feature` (F8)** — dispatch when Sam wants F8; input: `CoderPane.tsx:478-619` DiffPanel location, the diff-source ambiguity, and the cross-surface routing gap.
- **Capability check (F5)** — dispatch before any F5 build; input: the pinned `@opencode-ai/sdk` arm list and OQ-F5a (does it emit a compaction event, and what is its exact name?).

## Deferred (YAGNI)

Items considered during planning and deferred under the YAGNI rule. Each carries a concrete reopen trigger.

### F6 retry / backoff classifier
- **Why deferred:** Evidence test — a retry after a partial stream re-emits already-streamed deltas (`state.accumulated` + live `delta` frames are non-idempotent), which is strictly worse than the current behavior; no second instance exists to fail over to. Single-local-instance scale makes operator re-send the correct recovery.
- **Reopen when:** llama-swap gains restart-in-place-with-clear-partial semantics, OR a second llama-swap instance is added for failover.
- **Source:** R1, on-call-engineer.

### F2 Option B — flag-gated full retirement of `extractToolCallBlocks` / `stripToolMarkup`
- **Why deferred:** Evidence test — no evidence qwen3.6 stopped emitting `<invoke>`-as-text on live, and the sidecar `--jinja` state is unconfirmed in-repo; deleting the only plain-text-tool-call guard on that basis is unsafe (named anti-pattern: removing a load-bearing guard without a measured signal).
- **Reopen when:** a documented multi-session live probe shows zero text-delta tool calls from qwen3.6.
- **Source:** R1, software-architect / test-engineer.

### F4 `NotifyHookInjection` interface
- **Why deferred:** Simpler-version test — three agents with an identical read-merge-write settings.json flow do not need an interface; one concrete function switching on agent name satisfies the same need (named anti-pattern: single-implementation interface before three divergent uses exist).
- **Reopen when:** a fourth agent with a genuinely different injection contract appears (and only after F4's premise is verified).
- **Source:** R1, software-architect.

### F5 handling of `compaction.started` / `compaction.delta` + `step.failed` + `tool.progress`
- **Why deferred:** Evidence test — only `compaction.ended` is user-actionable (it is what closes the silent context gap); the other arms add UI noise with no user-described need. Compounded by F5 being capability-blocked anyway.
- **Reopen when:** the SDK compaction event is confirmed AND a user-described need for in-progress compaction feedback emerges.
- **Source:** R1, behavioral-analyst.

### F7 `SessionHistoryReader` interface / pagination beyond `limit`
- **Why deferred:** Simpler-version test — one read tool with an inline query + a `limit` cap satisfies the read need; an interface and cursor pagination are unused machinery (named anti-pattern: abstraction before a second concrete use).
- **Reopen when:** a second history-read consumer needs a different read path, or transcripts routinely exceed the 200-row cap in a way that breaks the use case.
- **Source:** R1, software-architect.

## Open Items

- **OI-1 (F4 premise):** Do claude / qwen / goose fire native lifecycle hooks in unattended mode, and what is goose's hook file/format?
  - **Resolves when:** verified in a `plan-a-feature` spec for F4 (OQ-F4a, OQ-F4b).
  - **Blocks implementation:** Yes for F4 — No for the READY cluster (F4 is independent).
- **OI-2 (F5 capability):** Does the pinned `@opencode-ai/sdk` surface a compaction event, and what is its exact name?
  - **Resolves when:** a capability check confirms the event (or an SDK bump is scoped) — OQ-F5a.
  - **Blocks implementation:** Yes for F5 — No for the READY cluster.
- **OI-3 (F5 UI treatment):** persistent sentinel row vs ephemeral inline divider for "context compacted."
  - **Resolves when:** OI-2 resolves and the team picks the UI shape (OQ-F5b); currently disputed (behavioral vs UX).
  - **Blocks implementation:** Yes for F5 — No for the READY cluster.
- **OI-4 (F8 spec):** diff source (`pending_changes.diff` vs worktree git diff), selection serialization, and new-viewer routing.
  - **Resolves when:** a `plan-a-feature` spec for F8 settles OQ-F8a/b/c.
  - **Blocks implementation:** Yes for F8 — No for the READY cluster.
- **OI-5 (F1 session-stop leg):** whether to wire the best-effort `session/stop` lookup or defer it.
  - **Resolves when:** decided at F1 implementation; it does not fire for external tasks from the UI today, so it is low-value.
  - **Blocks implementation:** No — F1 ships with or without this leg.

## Summary

- **Outcome delivered:** Stop actually stops external agents and finalizes their messages correctly; the parser is pruned to its load-bearing surface with the guard test-pinned; BooChat gains a 90s server-side stall guard; a session-history MCP tool and the :9502 retirement land; F4/F5/F8 have a documented route to their own spec/capability work.
- **Team size:** 7 specialists (incl. project-manager) — see [artifacts/implementation-iteration-history.md](artifacts/implementation-iteration-history.md)
- **Rounds of facilitation:** 1 — see [artifacts/implementation-iteration-history.md](artifacts/implementation-iteration-history.md)
- **Decisions committed:** 12 — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Decisions settled by evidence:** 11 — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Decisions settled by junior-developer reframing:** 0 (junior confirmed several, but each rests on specialist evidence) — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Decisions settled by user input:** 1 (D-12, the standing-override disposition) — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Rejected alternatives recorded:** 13 — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Open items remaining:** 5 (4 blocking only their own BLOCKED item; OI-5 non-blocking)
- **Recommendation:** Ship the READY cluster as planned — build F1 first (apps/coder, highest-value correctness fix, must include the OCE-001/002 finalization fixes), then F2+F3 (apps/server, one batch), F6 (apps/server), F7 (apps/coder MCP), F9 (apps/coder cleanup), each as its own sequential patch tag. Route F4 and F8 to `plan-a-feature` and F5 to an `@opencode-ai/sdk` capability check before any build.