Files
boocode/docs/plans/post-review-backlog/feature-implementation-plan.md
indifferentketchup 2a05d2f9fe docs: archive shipped openspec batches; add feature/plan/research notes
Move 13 shipped openspec change docs under openspec/changes/archived/.
Add docs/features/git-diff-panel, docs/plans/post-review-backlog, and
docs/research/cross-app-contract-ssot.md (the research behind the
@boocode/contracts SSOT work). Update BOOCHAT.md, BOOCODER.md, and
boocode_roadmap.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 21:20:33 +00:00

351 lines
36 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Feature Implementation Plan: Post-Review Backlog (F1F9)
This is a multi-item backlog, not a single feature. It commits to shipping six **READY** items as
independent sequential patch tags ([D-9](artifacts/implementation-decision-log.md#trivial-decisions)), one batch per
coherent unit, honoring deploy-by-surface and stage-commits-by-path; and to documenting three **BLOCKED**
WANT items (F4/F5/F8) with their exact blocking open questions and recommended resolution paths
([D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt))
rather than halting on the tripped spec-maturity gate.
## Source Specification
- **Feature specification:** No formal `feature-specification.md` exists. The ground-truth source is the [scope brief](scope-brief.md), which captures the items, decisions, and live-verified current state from the 2026-06-02 backlog review conversation plus `boocode_code_review_v2.md`, `docs/DEFERRED-WORK.md`, and `boocode_roadmap.md`. Origin is conversational + review-doc, so the WANT items are intentionally planned more shallowly.
- **Consolidated specialist record:** [artifacts/synthesis-input.md](artifacts/synthesis-input.md) — the Round-1 aggregation of all six specialists (on-call-engineer, behavioral-analyst, software-architect, test-engineer, user-experience-designer, junior-developer) with file:line evidence, the claim ledger, OQ resolutions, the spec-maturity gate, and the YAGNI ledger. There is no separate per-specialist file; this digest is authoritative.
- **Discovery notes:** [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md) — project context and per-item code touch points.
## Outcome
When this plan's READY cluster is executed: hitting Stop on an external agent task actually kills the
running child and finalizes the assistant message in a correct terminal state (no more `streaming`-stuck or
falsely-`complete` rows); the tool-call parser's public surface is pruned to its four load-bearing exports
while the plain-text `<invoke>` guard stays intact and is for the first time test-pinned; the parser's
rejection logging flows through pino/`LOG_LEVEL`; a hung BooChat stream is detected and finalized
server-side after 90s instead of relying solely on the frontend watchdog; an MCP client can read a
session's transcript through a new BooCoder MCP tool; and the unused :9502 fallback SPA and its build step
are gone. F4/F5/F8 leave this plan with a documented, evidence-backed route to their own spec/capability
work.
## Context
- **Driving constraint:** A post-review backlog with two items live-verified as active correctness bugs (F1 cancel is a no-op that leaves messages stuck; F6 has zero server-side stall guard). The review surfaced them; Sam chose scope "everything we discussed." No external deadline — value is correctness and simplification debt paydown.
- **Stakeholders:** Sam (sole user/operator — wants Stop to work, wants the parser simplified, wants resilience and the MCP history tool, greenlit the :9502 removal). On-call posture is single-operator: Sam is the post-ship owner for every item.
- **Future-state concern:** F1 touches the dispatcher's message-finalization paths across four backends — the risk to watch is finalization races/regressions there. F6 introduces a server-side timer that must not fire on a healthy-but-slow stream. F5 is a standing capability dependency on the pinned `@opencode-ai/sdk`.
- **Out-of-scope boundary:** F2 Option B (flag-gated full parser retirement), F6 retry/backoff, F4's superset code (clean-room pattern only), and everything in the scope brief's "Deferred (Sam's explicit dispositions)" list (subagent permission demux, tier-2 provider follow-ups, large-file splits, PR-resolver, etc.). These are recorded, not planned.
## Team Composition and Participation
Round-by-round detail lives in [artifacts/implementation-iteration-history.md](artifacts/implementation-iteration-history.md). All six specialists converged in one round; remaining unknowns are spec-level, not resolvable by more specialist rounds.
| Specialist | Status | Key Input |
|------------|--------|-----------|
| `project-manager` | Coordinator | Facilitated R1, applied the spec-maturity gate + YAGNI gate, synthesized this plan. |
| `on-call-engineer` | Active | F1 finalization bugs (OCE-001/002); F6 stall-timeout design + no-retry YAGNI call. |
| `behavioral-analyst` | Active | F1 message-state corruption (B2/B3); F5 sentinel-row UI position (disputed); F4 dedup rule. |
| `software-architect` | Active | F1 registry/`ExternalCancelFn` shape (A1); F2 option-A prune (A2); F7 inline tool (A4); F9 removal (A5). |
| `test-engineer` | Active | F1 `CancelRegistry` + DB test (T1-T3); F2 fallback gate test (T6); F6 `classifyStreamError` + fake-timer test (T8-T10). |
| `user-experience-designer` | Active | F1 disable-Stop-while-in-flight + Option-A frame extension + muted "Stopped" label; F5 ephemeral-divider UI (disputed); F8 no line-selection infra. |
| `junior-developer` | Active | OQ reframing across all items; F2c (option a), F4a/b unverified premise, F8 spec-level. |
## Implementation Approach
ALTITUDE: this section names code artifacts and inlines only decision-bearing values (flag names, the 90s
stall timeout, the 50/200 limits, the export list to unexport). Full implementations belong in the files
themselves; mechanics live in the cited decision-log entries.
### TIER 1 — READY TO BUILD
**F1 — external task cancel kills the child + finalizes the message (apps/coder, standalone, do first).**
The highest-value correctness fix. A `taskControllers = new Map<string,AbortController>()` registry inside
`createDispatcher`, populated at the four run-functions and deleted in the existing `.finally()`, plus an
exported idempotent `cancelExternalTask(taskId): boolean` wired into `POST /api/tasks/:id/cancel` via a
narrow `ExternalCancelFn`
([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)).
This batch **must** also fix the two pre-existing finalization bugs the abort wiring newly makes reachable —
the catch blocks that leave messages `streaming` and the warm success path that writes `complete` on an
aborted turn — via a shared `cancelAndFinalize` helper, or it ships a new bug
([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)).
User Stop finalizes to `cancelled`, not `failed`
([D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed));
the terminal state reaches the web reducer via an optional `status` field on the existing coder
`message_complete` frame, not a new frame type
([D-8](artifacts/implementation-decision-log.md#d-8-f1-status-surfacing--extend-the-existing-frame-no-new-frame-type)),
rendered as a muted "Stopped" label ([D-10](artifacts/implementation-decision-log.md#trivial-decisions)). The
`session/stop` leg "never fires for external from UI" (CoderPane already calls `cancelTask`), so it is wired
best-effort via a `WHERE session_id=$ AND state='running'` lookup or deferred as a low-value leg.
**F2 + F3 together (apps/server, same file).** F2 is option-A prune-now-minimal: KEEP `extractToolCallBlocks`
+ `stripToolMarkup` (the only guard for a tool call emitted as plain text), unexport the 8 zero-caller
symbols — `isPlaceholderArgValue`, `parseXmlToolCall`, `parseInvokeToolCall`, `partialXmlOpenerStart`, and
the consts `XML_TOOL_OPEN/CLOSE`, `INVOKE_TOOL_OPEN/CLOSE` — taking the public surface 11 → 4 with zero
runtime effect
([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard)).
A gate test pins the untested `<invoke>`-as-text fallback before the prune
([D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)).
Because F2 keeps `extractToolCallBlocks`, F3 is safe to land in the same file/batch: thread an optional
`log?: { debug }` param into `extractToolCallBlocks` from its one call site so rejection logging flows
through pino/`LOG_LEVEL` instead of `console.debug` ([D-2](artifacts/implementation-decision-log.md#trivial-decisions)).
**F6 — BooChat stall-timeout ONLY (apps/server, standalone).** Wrap the `stream-phase.ts:261` `fullStream`
loop with a per-chunk stall deadline using `AbortSignal.any([signal, stallAc.signal])` and a
`setTimeout(STALL_TIMEOUT_MS = 90_000)` bumped on each chunk, cleared in the existing `finally`; the
post-loop check throws `AbortError` so the existing finalize path writes `cancelled`
([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)). **No
retry** — partial-stream re-emit is non-idempotent and the instance is single+local (see Deferred (YAGNI)).
**F7 — `view_session_history` MCP tool (apps/coder MCP, standalone).** Tool 7 added inline in
`mcp-server.ts` following the existing 6-tool inline pattern (`textResult` + direct `sql`). Reads
`messages_with_parts` `WHERE role != 'system'`, params `session_id` + optional `chat_id` + `limit` (default
50, max 200), `ORDER BY created_at ASC` ([D-6](artifacts/implementation-decision-log.md#trivial-decisions)).
No interface, no pagination beyond `limit`.
**F9 — retire :9502 SPA (apps/coder, standalone cleanup).** Delete the `if (existsSync(webRoot))` serve
block in `index.ts` (~269-289), keep the inline 404; remove `apps/coder/web` from `pnpm-workspace.yaml`, the
coder build step, and the Dockerfile copy; drop the now-unused `fastifyStatic` import. KEEP every
`/api/coder/*` REST + WS + `/api/health` + `--mcp` route
([D-11](artifacts/implementation-decision-log.md#d-11-f9-retire-9502-spa--delete-the-serve-block-keep-all-apiws-routes)).
### TIER 2 — BLOCKED (need their own spec / a capability check)
These three are recorded, not built. They do **not** block the READY cluster
([D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt)).
**F4 — notify-hook config injection.** The module shapes and the double-publish dedup rule are settled
(inbound route calls `normalizeAgentEvent` → bucket; confirms `tasks.state='running'` before publishing
`blocked`; SUPPRESSES `done` since the dispatcher already emits `idle`). But the **core premise is
unverified**: it is unknown whether claude / qwen / goose actually fire their native lifecycle hooks in
unattended mode (`claude -p`/SDK, `qwen --acp`/`stream-json`, goose), and goose's hook file/format is
unknown (not in repo). **Recommended path:** route to `plan-a-feature`; verify hook-firing-in-unattended-mode
and the goose hook mechanism first, or the feature is built on sand. Blocking OQs: OQ-F4a (hooks fire
unattended?), OQ-F4b (goose hook format?).
**F5 — opencode compaction surfacing.** **Blocked on a capability check**: the pinned `@opencode-ai/sdk`
exposes NO compaction event arm (confirmed arms: `session.next.{text,reasoning,tool,step}.*`,
`message.part.*`, `session.idle/error`). The review assumed events from opencode's core `event.ts` that the
pinned SDK may not surface. **Recommended path:** confirm the SDK emits a compaction signal and its exact
event name (or an SDK bump is needed) before building. The UI treatment is **disputed** (behavioral:
persistent sentinel row `metadata.kind='compaction'` that survives refresh; UX: ephemeral inline divider via
a new `agent_compacted` frame, no DB row) — settle once the event is confirmed to exist. Only
`compaction.ended` is in scope. Blocking OQs: OQ-F5a (SDK event existence/name), OQ-F5b (sentinel vs
ephemeral UI).
**F8 — diff-line → agent re-prompt UX.** **Spec-level.** The "DiffPanel" is inline in
`CoderPane.tsx:478-619`, rendering `pending_changes` rows as a static `<pre>` — no line-selection
infrastructure exists. The diff source is ambiguous (`pending_changes.diff` = BooCode write-tools only vs the
external-agent worktree git diff). "Send to new agent" needs coordinated workspace-pane + chat creation +
pre-population across three surfaces with no existing contract; selection diverges by modality.
**Recommended path:** route to `plan-a-feature` (the scope brief already hedged this; treat as firm).
MVP-if-pushed: "comment to current agent" only, block-level selection, pre-populate `ChatInput` — still
wants a spec. Blocking OQs: OQ-F8a (diff source), OQ-F8b (selection serialization), OQ-F8c (new-viewer
routing).
### Architecture and Integration Points
F1 lives entirely in apps/coder (`dispatcher.ts`, `routes/tasks.ts`, `index.ts:254` wiring) with a
single-field touch on the web reducer (`CoderPane.tsx:299`). F2/F3 are confined to
`apps/server/.../tool-call-parser.ts` and its one call site `stream-phase.ts`. F6 is confined to
`stream-phase.ts`. F7 is one inline tool in `apps/coder/.../mcp-server.ts`. F9 removes a serve block and a
workspace package. No new interface is introduced anywhere in TIER 1 — every item follows the existing inline
/ pure-helper precedent ([D-2](artifacts/implementation-decision-log.md#trivial-decisions),
[D-6](artifacts/implementation-decision-log.md#trivial-decisions)).
### Runtime Behavior
F1's cancel path: Stop → `POST /api/tasks/:id/cancel``cancelExternalTask(taskId)``ac.abort()`
backend honors the signal (`child.kill` / `session/cancel` / `session.abort` / interrupt) → the run-function
hits its abort short-circuit or catch → `cancelAndFinalize` writes the terminal `messages.status`
(`cancelled` on abort, `failed` on error) `WHERE status='streaming'` and publishes the terminal frame
([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope),
[D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed)).
F6's stall path: each chunk bumps the 90s timer; on stall, `stallAc.abort()` flows through `AbortSignal.any`
into the same finalize-as-`cancelled` path
([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
### Data Model and Persistence
No schema changes in any TIER 1 item. F1 and F6 only change *which* terminal value gets written to existing
`messages.status` / `tasks.state` columns. F7 is a read against the existing `messages_with_parts` view.
### External Interfaces
F1 extends the existing coder `message_complete` WS frame with an optional `status` field — no new frame
type, so no paired strict-union arm is forced in `apps/web/src/api/types.ts` beyond the optional field
([D-8](artifacts/implementation-decision-log.md#d-8-f1-status-surfacing--extend-the-existing-frame-no-new-frame-type)).
F7 exposes one new read-only MCP tool on BooCoder's MCP server
([D-6](artifacts/implementation-decision-log.md#trivial-decisions)). F4 (blocked) would add a new
unauthenticated localhost POST route — see Security Posture. F5 (blocked) would add a new WS frame and thus
trigger the full cross-app parity rule.
## Decomposition and Sequencing
Each READY item is its own patch tag, one batch per coherent unit
([D-9](artifacts/implementation-decision-log.md#trivial-decisions)). F2+F3 are a single unit (same file; F2 keeping
`extractToolCallBlocks` unblocks F3). F1 goes first as the highest-value correctness fix.
| # | Work Unit | Surface / Deploy | Delivers | Depends On | Verification |
|---|-----------|------------------|----------|------------|--------------|
| 1 | **F1** cancel registry + finalization fixes | apps/coder → `systemctl restart boocoder` | Stop kills child; message finalizes `cancelled`/`failed`; no stuck `streaming` ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)) | — | `CancelRegistry` unit (4 cases) + DB-integration route test → row `cancelled` (T1-T3) |
| 2 | **F2+F3** parser prune + logger | apps/server → docker rebuild | Public surface 11→4; rejection logging via pino; guard pinned ([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard), [D-2](artifacts/implementation-decision-log.md#trivial-decisions)) | F2's keep-decision unblocks F3 | Fallback gate test green through prune (T6, [D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)) |
| 3 | **F6** stall-timeout | apps/server → docker rebuild | 90s server-side stall detection → finalize `cancelled` ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)) | — | `classifyStreamError` unit (5 cases) + `vi.useFakeTimers()` stall test + post-loop abort regression pin (T8-T10) |
| 4 | **F7** session-history MCP tool | apps/coder → `systemctl restart boocoder` | Read-only transcript tool; `/api/health` tool count +1 ([D-6](artifacts/implementation-decision-log.md#trivial-decisions)) | — | Manual MCP call + sentinel-strip assertion (`role != 'system'`) |
| 5 | **F9** retire :9502 SPA | apps/coder → `systemctl restart boocoder` | Serve block + build step + workspace pkg removed; routes kept ([D-11](artifacts/implementation-decision-log.md#d-11-f9-retire-9502-spa--delete-the-serve-block-keep-all-apiws-routes)) | — | `/api/coder/*` + `/api/health` still 200; `GET /` 404-or-redirect |
F4/F5/F8 are NOT in this table — they route out per
[D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt).
## RAID Log
### Risks
| ID | Risk | Likelihood | Severity | Blast Radius | Reversibility | Owner | Mitigation |
|----|------|------------|----------|--------------|---------------|-------|------------|
| R1 | F1 message-finalization races across the 4 backend paths leave a row in the wrong terminal state | Medium | Medium | One assistant message per affected turn | Reversible (re-run) | on-call-engineer | Shared `cancelAndFinalize` helper, `WHERE status='streaming'` idempotency, abort short-circuit before the unconditional `complete` write ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)) |
| R2 | F2 prune accidentally removes the `<invoke>`-as-text fallback guard (regression: plain-text tool calls silently dropped) | Low | High | Any qwen3.6 turn that emits tool-call markup as text | Reversible (re-add export/call) | software-architect | Option-A keeps the impl; gate test fails if `extractToolCallBlocks` leaves the text-delta path ([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard), [D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)) |
| R3 | F6 stall-timeout fires on a healthy-but-slow stream (false abort) | Low | Low | One in-flight chat turn | Reversible (re-send) | on-call-engineer | 90s per-*chunk* (not per-turn) deadline, bumped on every chunk; tune only on observed false-fire ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)) |
| R4 | F5 premise fails — the pinned SDK never surfaces a compaction event, so the feature is unbuildable without an SDK bump | High | N/A (blocked) | F5 only | N/A | (capability check) | Confirm event existence/name before any build; F5 stays Blocked until then ([D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt)) |
| R5 | F4 premise fails — agents do not fire native hooks in unattended mode, so injection yields no signals | High | N/A (blocked) | F4 only | N/A | (plan-a-feature) | Verify hook-firing + goose format in the F4 spec before any build ([D-12](artifacts/implementation-decision-log.md#d-12-f4f5f8-disposition-under-the-standing-override--document-as-blocked-do-not-halt)) |
### Assumptions
| ID | Assumption | What Changes If Wrong | Verifier | Status |
|----|------------|-----------------------|----------|--------|
| A1 | llama-swap native `--jinja` parsing stays ON (structured `tool_calls`), so the text-scrape fallback remains dormant defense-in-depth | If off, the kept guard becomes hot — fine; if a future config silently changes parsing, F2 Option B reopens | live probe of `:8401` (not in-repo) | Unverified-in-repo (probe-only); guard kept regardless ([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard)) |
| A2 | All four external backends honor `ctx.signal` as cited (`child.kill` / `session/cancel` / `session.abort` / interrupt) | If one does not, that backend's Stop stays a no-op even after wiring | F1 DB-integration test per backend path | Evidenced by file:line (scope-brief F1); test confirms |
| A3 | 90s is the right stall deadline for the local llama-swap workload shape | Too low → false aborts on slow models; too high → sluggish recovery | Production observation | Committed value, tunable ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)) |
| A4 | F1's `session/stop` leg "never fires for external from UI" (CoderPane calls `cancelTask`) | If a code path does hit session-stop for an external task, that leg's finalization is best-effort only | on-call review of CoderPane wiring | Evidenced (synthesis-input F1 resolved OQs) |
### Dependencies
| ID | Dependency | Owner | Status |
|----|------------|-------|--------|
| Dep1 | F5 requires `@opencode-ai/sdk` to surface a compaction event (or an SDK bump) | (capability check) | BLOCKING F5 — unresolved (OQ-F5a) |
| Dep2 | F4 requires confirmed unattended hook-firing for claude/qwen/goose + the goose hook format | (plan-a-feature) | BLOCKING F4 — unresolved (OQ-F4a, OQ-F4b) |
| Dep3 | F8 requires a chosen diff source + a line-selection approach | (plan-a-feature) | BLOCKING F8 — unresolved (OQ-F8a/b/c) |
## Testing Strategy
Sourced from test-engineer (T1-T3, T6, T8-T10) following the established pure-helper-then-wire precedent
(`turn-guard.ts`, `lifecycle-decisions.ts`, `mistake-tracker.ts`). Coder tests are `globals:false` (import
`describe`/`it`/`expect`); include glob `src/**/__tests__/**/*.test.ts`; DB-integration tests are opt-in via
`DATABASE_URL` + `describe.runIf`.
- **Observable behaviors to test:**
- F1: a cancelled external task lands `messages.status='cancelled'` (not `streaming`, not `complete`); the registry register/cancel/delete/has behaves idempotently ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope), [D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed)).
- F2: a text-delta carrying a complete `<invoke>` block lands in `result.toolCalls` with the markup absent from `result.content` — green through the prune, red if the guard is removed ([D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)).
- F6: a fake hanging stream is aborted after the 90s deadline under `vi.useFakeTimers()`; the existing post-loop `signal?.aborted` check still passes (regression pin) ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
- F7: results exclude `role='system'` rows (sentinel strip) and respect the `limit` cap.
- **Test doubles posture:** extract pure helpers (`CancelRegistry`, `classifyStreamError`) with no DB/child/I/O for unit coverage; stub `streamText` for the F2 gate and F6 stall tests; one DB-integration test for the F1 route. Warm-worktree-preserved is held as a code comment, not a spy (test-engineer T1-T3).
- **Edge cases requiring coverage:** double-Stop / cancel-after-exit (idempotent `ac.abort()`); abort vs thrown error in the catch (cancelled vs failed mapping); F6 stall vs healthy-slow stream.
- **Test levels:** unit (registry, classifier, gate) → integration (F1 route → DB row) → manual (F7 MCP call, F9 route-still-200 smoke).
## Security Posture
Thin. The single TIER 1 surface is BooCoder's new read-only `view_session_history` MCP tool (F7), which
reads only `messages_with_parts` `WHERE role != 'system'` — no write verbs, no PII beyond what the operator
already authored, single-user, Authelia at the reverse proxy, no app-layer auth by design.
The one genuinely new auth-relevant surface — F4's inbound unauthenticated localhost POST route for hook
callbacks — belongs to a **BLOCKED** item and is not built in this plan. When F4 is planned, it must be
recorded explicitly as a new unauthenticated localhost route (acceptable under the single-user / Authelia
posture, but called out, not silent).
## Operational Readiness
- **Deploy by surface:** apps/coder items (F1, F7, F9) → `sudo systemctl restart boocoder`; apps/server items (F2+F3, F6) → `docker compose up --build -d boocode`. Stage commits explicitly by path; never `git add -A` (Sam has uncommitted web WIP) ([D-9](artifacts/implementation-decision-log.md#trivial-decisions)).
- **Build-step change (F9):** removing `apps/coder/web` deletes a coder build step and a workspace package; the coder Dockerfile copy and `pnpm-workspace.yaml` entry must be removed in the same batch or the build references a missing package ([D-11](artifacts/implementation-decision-log.md#d-11-f9-retire-9502-spa--delete-the-serve-block-keep-all-apiws-routes)).
- **Health check after F7:** `/api/health` reports a tool count; expect it to increment by one.
- **Tagging:** sequential patch tags, one per unit; no v2.8.0 ([D-9](artifacts/implementation-decision-log.md#trivial-decisions)).
- **Rollback:** each unit is an independent patch tag, so rollback is per-item (revert the tag, redeploy by surface). No schema migrations means no expand/contract sequencing to unwind.
## On-Call Resilience Posture
Application-source resilience, sourced from on-call-engineer. Infrastructure concerns live in Operational
Readiness above.
- **Timeouts and deadlines:** F6 adds a 90s per-chunk stall deadline on the BooChat `fullStream` loop, the first server-side guard against a hung llama-swap stream (today only the frontend 60s `discard_stale` watchdog exists) ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
- **Retry strategy:** none, deliberately. A retry after a partial stream re-emits already-streamed deltas (non-idempotent); the user re-sending is the correct recovery at single-local-instance scale (see Deferred (YAGNI)) ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
- **Idempotency:** F1's `cancelExternalTask` is idempotent (`ac.abort()` no-ops when already aborted → double-Stop and cancel-after-exit are safe); message finalization is idempotent via `WHERE status='streaming'` ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope)).
- **Kill switches:** F1 is itself the kill switch for a runaway external agent — making Stop actually stop is the operator's manual control.
- **Graceful degradation:** F1's `session/stop` leg is best-effort (does not fire for external tasks from the UI); F6 falls back to finalizing as `cancelled` on stall rather than hanging indefinitely.
- **Observability of failure paths:** F1 distinguishes `cancelled` (user/stall) from `failed` (thrown error) so the human-inbox surface stays honest ([D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed)); F3 routes parser-rejection logging through pino/`LOG_LEVEL` ([D-2](artifacts/implementation-decision-log.md#trivial-decisions)).
- **Data integrity:** no monetary/rate columns touched; F1/F6 only write existing terminal-state enum values into existing columns.
- **Migration safety:** no schema changes in any TIER 1 item — nothing to expand/contract.
## Definition of Done
- [ ] Hitting Stop on an external task kills the child and the assistant message finalizes `cancelled` (never left `streaming`, never falsely `complete`) ([D-1](artifacts/implementation-decision-log.md#d-1-f1-cancel-registry-shape-and-finalization-fix-scope), [D-7](artifacts/implementation-decision-log.md#d-7-f1-terminal-state-for-user-stop--cancelled-not-failed)).
- [ ] Stop button is disabled while the cancel POST is in flight; a muted "Stopped" label renders on cancel ([D-8](artifacts/implementation-decision-log.md#d-8-f1-status-surfacing--extend-the-existing-frame-no-new-frame-type), [D-10](artifacts/implementation-decision-log.md#trivial-decisions)).
- [ ] Parser public surface is 4 exports; the `<invoke>`-as-text gate test is green and fails if the guard is removed ([D-3](artifacts/implementation-decision-log.md#d-3-f2-prune-scope--option-a-prune-now-minimal-keep-the-load-bearing-guard), [D-4](artifacts/implementation-decision-log.md#d-4-f2-fallback-gate-test--pin-the-untested-guard-before-pruning)).
- [ ] Parser rejection logging appears via pino and respects `LOG_LEVEL` ([D-2](artifacts/implementation-decision-log.md#trivial-decisions)).
- [ ] A fake hanging BooChat stream is finalized `cancelled` after 90s under fake timers; the post-loop abort regression pin passes ([D-5](artifacts/implementation-decision-log.md#d-5-f6-stall-timeout-via-abortsignalany-no-retry)).
- [ ] `view_session_history` returns a session transcript excluding `role='system'` rows, capped at `limit` (default 50, max 200); `/api/health` tool count +1 ([D-6](artifacts/implementation-decision-log.md#trivial-decisions)).
- [ ] :9502 serve block + build step + workspace package removed; all `/api/coder/*` + `/api/health` routes still 200 ([D-11](artifacts/implementation-decision-log.md#d-11-f9-retire-9502-spa--delete-the-serve-block-keep-all-apiws-routes)).
- [ ] Each shipped item has its own sequential patch tag + CHANGELOG entry; deployed by surface ([D-9](artifacts/implementation-decision-log.md#trivial-decisions)).
## Specialist Handoffs for Implementation
- **`test-engineer`** — dispatch alongside each READY unit; needs the touch-point file:lines (already in discovery notes) to author T1-T3 (F1), T6 (F2), T8-T10 (F6) before/with the wiring.
- **`software-architect`** — dispatch if F1's `cancelAndFinalize` helper shape needs settling across the four backend paths during implementation; needs the four catch-block + success-path line ranges.
- **`plan-a-feature` (F4)** — dispatch when Sam wants F4; input: the settled dedup rule + module shapes from synthesis-input, and the two blocking OQs (unattended hook-firing, goose format) to resolve first.
- **`plan-a-feature` (F8)** — dispatch when Sam wants F8; input: `CoderPane.tsx:478-619` DiffPanel location, the diff-source ambiguity, and the cross-surface routing gap.
- **Capability check (F5)** — dispatch before any F5 build; input: the pinned `@opencode-ai/sdk` arm list and OQ-F5a (does it emit a compaction event, and what is its exact name?).
## Deferred (YAGNI)
Items considered during planning and deferred under the YAGNI rule. Each carries a concrete reopen trigger.
### F6 retry / backoff classifier
- **Why deferred:** Evidence test — a retry after a partial stream re-emits already-streamed deltas (`state.accumulated` + live `delta` frames are non-idempotent), which is strictly worse than the current behavior; no second instance exists to fail over to. Single-local-instance scale makes operator re-send the correct recovery.
- **Reopen when:** llama-swap gains restart-in-place-with-clear-partial semantics, OR a second llama-swap instance is added for failover.
- **Source:** R1, on-call-engineer.
### F2 Option B — flag-gated full retirement of `extractToolCallBlocks` / `stripToolMarkup`
- **Why deferred:** Evidence test — no evidence qwen3.6 stopped emitting `<invoke>`-as-text on live, and the sidecar `--jinja` state is unconfirmed in-repo; deleting the only plain-text-tool-call guard on that basis is unsafe (named anti-pattern: removing a load-bearing guard without a measured signal).
- **Reopen when:** a documented multi-session live probe shows zero text-delta tool calls from qwen3.6.
- **Source:** R1, software-architect / test-engineer.
### F4 `NotifyHookInjection` interface
- **Why deferred:** Simpler-version test — three agents with an identical read-merge-write settings.json flow do not need an interface; one concrete function switching on agent name satisfies the same need (named anti-pattern: single-implementation interface before three divergent uses exist).
- **Reopen when:** a fourth agent with a genuinely different injection contract appears (and only after F4's premise is verified).
- **Source:** R1, software-architect.
### F5 handling of `compaction.started` / `compaction.delta` + `step.failed` + `tool.progress`
- **Why deferred:** Evidence test — only `compaction.ended` is user-actionable (it is what closes the silent context gap); the other arms add UI noise with no user-described need. Compounded by F5 being capability-blocked anyway.
- **Reopen when:** the SDK compaction event is confirmed AND a user-described need for in-progress compaction feedback emerges.
- **Source:** R1, behavioral-analyst.
### F7 `SessionHistoryReader` interface / pagination beyond `limit`
- **Why deferred:** Simpler-version test — one read tool with an inline query + a `limit` cap satisfies the read need; an interface and cursor pagination are unused machinery (named anti-pattern: abstraction before a second concrete use).
- **Reopen when:** a second history-read consumer needs a different read path, or transcripts routinely exceed the 200-row cap in a way that breaks the use case.
- **Source:** R1, software-architect.
## Open Items
- **OI-1 (F4 premise):** Do claude / qwen / goose fire native lifecycle hooks in unattended mode, and what is goose's hook file/format?
- **Resolves when:** verified in a `plan-a-feature` spec for F4 (OQ-F4a, OQ-F4b).
- **Blocks implementation:** Yes for F4 — No for the READY cluster (F4 is independent).
- **OI-2 (F5 capability):** Does the pinned `@opencode-ai/sdk` surface a compaction event, and what is its exact name?
- **Resolves when:** a capability check confirms the event (or an SDK bump is scoped) — OQ-F5a.
- **Blocks implementation:** Yes for F5 — No for the READY cluster.
- **OI-3 (F5 UI treatment):** persistent sentinel row vs ephemeral inline divider for "context compacted."
- **Resolves when:** OI-2 resolves and the team picks the UI shape (OQ-F5b); currently disputed (behavioral vs UX).
- **Blocks implementation:** Yes for F5 — No for the READY cluster.
- **OI-4 (F8 spec):** diff source (`pending_changes.diff` vs worktree git diff), selection serialization, and new-viewer routing.
- **Resolves when:** a `plan-a-feature` spec for F8 settles OQ-F8a/b/c.
- **Blocks implementation:** Yes for F8 — No for the READY cluster.
- **OI-5 (F1 session-stop leg):** whether to wire the best-effort `session/stop` lookup or defer it.
- **Resolves when:** decided at F1 implementation; it does not fire for external tasks from the UI today, so it is low-value.
- **Blocks implementation:** No — F1 ships with or without this leg.
## Summary
- **Outcome delivered:** Stop actually stops external agents and finalizes their messages correctly; the parser is pruned to its load-bearing surface with the guard test-pinned; BooChat gains a 90s server-side stall guard; a session-history MCP tool and the :9502 retirement land; F4/F5/F8 have a documented route to their own spec/capability work.
- **Team size:** 7 specialists (incl. project-manager) — see [artifacts/implementation-iteration-history.md](artifacts/implementation-iteration-history.md)
- **Rounds of facilitation:** 1 — see [artifacts/implementation-iteration-history.md](artifacts/implementation-iteration-history.md)
- **Decisions committed:** 12 — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Decisions settled by evidence:** 11 — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Decisions settled by junior-developer reframing:** 0 (junior confirmed several, but each rests on specialist evidence) — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Decisions settled by user input:** 1 (D-12, the standing-override disposition) — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Rejected alternatives recorded:** 13 — see [artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
- **Open items remaining:** 5 (4 blocking only their own BLOCKED item; OI-5 non-blocking)
- **Recommendation:** Ship the READY cluster as planned — build F1 first (apps/coder, highest-value correctness fix, must include the OCE-001/002 finalization fixes), then F2+F3 (apps/server, one batch), F6 (apps/server), F7 (apps/coder MCP), F9 (apps/coder cleanup), each as its own sequential patch tag. Route F4 and F8 to `plan-a-feature` and F5 to an `@opencode-ai/sdk` capability check before any build.