docs: archive shipped openspec batches; add feature/plan/research notes

Move 13 shipped openspec change docs under openspec/changes/archived/. Add docs/features/git-diff-panel, docs/plans/post-review-backlog, and docs/research/cross-app-contract-ssot.md (the research behind the @boocode/contracts SSOT work). Update BOOCHAT.md, BOOCODER.md, and boocode_roadmap.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 21:20:33 +00:00
parent e5ce01ae72
commit 2a05d2f9fe
27 changed files with 2210 additions and 17 deletions
--- a/docs/plans/post-review-backlog/artifacts/synthesis-input.md
+++ b/docs/plans/post-review-backlog/artifacts/synthesis-input.md
@@ -0,0 +1,181 @@
+# Synthesis input — Round 1 aggregation + dispositions
+
+Deterministic aggregation of the Round-1 specialist review (on-call-engineer, behavioral-analyst,
+software-architect, test-engineer, user-experience-designer, junior-developer). This is the consolidated
+record the project-manager synthesizes into the three plan files. Evidence (file:line) is preserved inline.
+
+Team size: large (cross-subsystem, user chose "everything"). Round cap 3; converged in 1 round (the
+remaining unknowns are spec-level, not resolvable by more specialist rounds).
+
+---
+
+## Per-feature dispositions (the decisions)
+
+### READY TO BUILD
+
+**F1 — external task cancel kills child + finalizes message.** Strong 4-way consensus (on-call B1, behavioral
+B1, architect A1, junior).
+- Root cause CONFIRMED: `routes/tasks.ts:130-138` calls `inference.cancel` (native-only); dispatcher has no
+  `Map<taskId,AbortController>`; the four private `ac` (dispatcher.ts:316/655/991/1248) are unreachable;
+  `cancelExternalTask` does not exist anywhere.
+- Design (architect A1): add `taskControllers = new Map<string,AbortController>()` inside `createDispatcher`;
+  `taskControllers.set(taskId, ac)` at each of the 4 run-functions; delete in the existing `.finally()` at
+  dispatcher.ts:117; export `cancelExternalTask(taskId): boolean` (idempotent — `ac.abort()` is a no-op when
+  already aborted, so double-Stop and cancel-after-exit are safe). Pass a narrow `ExternalCancelFn`
+  (NOT the whole dispatcher) into `registerTaskRoutes`; wire in `index.ts:254`.
+- TWO pre-existing bugs F1 makes reachable, MUST be fixed in the same batch (on-call OCE-001/OCE-002,
+  behavioral B2/B3): (1) the four catch blocks update only `tasks` state, never the `messages` row → an
+  aborted/thrown turn leaves the assistant message `status='streaming'` (BooChat's 5-min sweep can't recover
+  it — different process); (2) the warm-backend success path writes `messages.status='complete'`
+  unconditionally before checking abort (dispatcher.ts ~853/1122/1377) → a cancelled turn is recorded
+  `complete`. Fix: after `await backend.prompt(...)`, `if (ac.signal.aborted)` → write `status='cancelled'`,
+  publish the terminal `message_complete` frame, emit idle, return; and in each catch finalize the message
+  with `WHERE status='streaming'` (idempotent) distinguishing AbortError→cancelled vs error→failed.
+- UX (UX agent): disable the Stop button while the cancel POST is in flight (mobile double-tap); extend the
+  coder `message_complete` frame with an optional `status` field (Option A — minimal, no new frame type) and
+  map it in the reducer (`CoderPane.tsx:299`, `MessageStatus` already includes `'cancelled'`); render a muted
+  "Stopped" label (not red, not a toast).
+- Tests (test-engineer T1-T3): extract a pure `CancelRegistry` (register/cancel/delete/has) — 4 unit cases,
+  no DB/child; one DB-integration test for the route → row lands `'cancelled'`; warm-worktree-preserved held
+  as a code comment, not a spy.
+- Resolved OQs: terminal state = `cancelled` (not `failed`) for user Stop; registry keyed by `taskId`
+  (route receives taskId); `session/stop` route — CoderPane already calls `cancelTask` for external tasks so
+  the session-stop path "never fires for external from UI" (on-call) — wire it best-effort via a
+  `SELECT id FROM tasks WHERE session_id=$ AND state='running'` lookup OR defer that leg (low value); use a
+  shared `cancelAndFinalize` helper across the 4 paths (TDD precedent).
+
+**F2 — tool-call-parser prune (option a: prune-now-minimal).** DECISION (architect A2, confirms junior
+OQ-F2c): do NOT do the flag-gated full retirement (option b). KEEP `extractToolCallBlocks` + `stripToolMarkup`
+ their types (`ToolCallExtraction`, `ParsedCall`) — load-bearing `<invoke>`-as-text guard (the only guard for
+that case; `experimental_repairToolCall` doesn't cover it; sidecar `--jinja` unconfirmed so keeping the guard
+is correct). REMOVE the `export` keyword (not the implementations) from the 8 zero-external-caller symbols:
+`isPlaceholderArgValue`, `parseXmlToolCall`, `parseInvokeToolCall`, `partialXmlOpenerStart`, and the 4 consts
+`XML_TOOL_OPEN/CLOSE`, `INVOKE_TOOL_OPEN/CLOSE`. Zero runtime effect; public surface 11→4 exports.
+- Test gap (test-engineer T6): the `<invoke>`-text fallback in `stream-phase.ts:263-284` is currently NOT
+  exercised by any test → add a gate test (stub `streamText` to emit a text-delta containing a complete
+  `<invoke>` block; assert it lands in `result.toolCalls` and the markup is NOT in `result.content`). Must
+  stay green through the prune and fail if `extractToolCallBlocks` is ever removed from the text-delta path.
+
+**F3 — xml-parser structured logging.** Trivial. `tool-call-parser.ts:65` `console.debug` → pass an optional
+`log?: { debug }` param to `extractToolCallBlocks` from its one call site (`stream-phase.ts` executeStreamPhase)
+and use it. No interface (architect: one site, one impl). SEQUENCING: same file as F2; F2 keeps
+`extractToolCallBlocks` (decided), so F3 is safe; do F2+F3 in one batch. Confirm `executeStreamPhase`
+signature/test-stubs tolerate the param (junior).
+
+**F6 — BooChat stall-timeout ONLY (retry deferred).** on-call: wrap the `stream-phase.ts:261` fullStream loop
+with a per-chunk stall deadline: a local `stallAc = new AbortController()`, `effectiveSignal =
+AbortSignal.any([signal, stallAc.signal])` passed to `streamText`; bump a `setTimeout(STALL_TIMEOUT_MS=90_000)`
+on each chunk; clear it in the existing `finally`; at the post-loop check (stream-phase.ts:337) test
+`signal?.aborted || stallAc.signal.aborted` and throw `AbortError` (→ `handleAbortOrError` writes
+`cancelled`). Tests (test-engineer T8-T10): pure `classifyStreamError(err)` helper (5 cases, no I/O) + a
+`vi.useFakeTimers()` stall test on a fake hanging stream + a regression pin on the existing `signal?.aborted`
+post-loop check.
+- YAGNI DEFER (on-call, strong): NO retry at `executeStreamPhase`/`streamCompletion`. A retry after partial
+  stream re-emits already-streamed deltas (`state.accumulated` + live `delta` frames are non-idempotent) —
+  worse than current. Reopen trigger: llama-swap gains restart-in-place-with-clear-partial, or a second
+  instance for failover. The user re-sending is the correct recovery at single-instance scale.
+
+**F7 — view_session_history MCP tool.** architect A4: add tool 7 inline in `mcp-server.ts` (follows the
+existing 6-tool inline pattern, `textResult` + direct `sql`). Reads `messages_with_parts`, `WHERE role !=
+'system'` (strips sentinels), params `session_id` + optional `chat_id` + `limit` (default 50, max 200),
+`ORDER BY created_at ASC`. No interface, no pagination beyond limit. Returns `{role,content,...}[]`.
+
+**F9 — retire apps/coder/web :9502 SPA.** architect A5: the `if (existsSync(webRoot))` block in `index.ts`
+(~269-289) already no-ops when the dist is absent. Delete that block, keep the inline 404 handler
+(`{error:'not found'}`); remove `apps/coder/web` from `pnpm-workspace.yaml`, the coder build step, and the
+Dockerfile copy; remove the now-unused `fastifyStatic` import (verify it's only used there). KEEP all
+`/api/coder/*` REST + WS + `/api/health` + `--mcp` routes (CoderPane depends on them). OQ-F9a RESOLVED:
+nothing probes `GET /` on :9502 (health is `/api/health`; compose healthcheck is the boocode container, not
+the host-systemd coder) → safe to 404 or add a 2-line `GET /` redirect-to-BooChat (no fastifyStatic).
+
+### BLOCKED — need a spec or a capability check before building (gate-trip items)
+
+**F4 — notify-hook config injection.** SPEC-LEVEL gaps (junior OQ-F4a-e, behavioral B4, UX). The core premise
+is UNVERIFIED: do claude / qwen / goose actually fire their native lifecycle hooks in unattended mode
+(`claude -p` / SDK, `qwen --acp` / `--output-format stream-json`, goose)? goose's hook file/format is unknown
+(not in repo). Idempotent per-agent settings.json merge strategy unspecified. `boocoder.service` run-user /
+`homedir()` resolution unconfirmed. The inbound POST is a new unauthenticated localhost route (acceptable
+single-user, note it). Double-publish dedup with the v2.7.6 turn-boundary publish: behavioral B4 +
+architect A3 agree on the rule — inbound route calls `normalizeAgentEvent` (returns bucket
+`working|blocked|done`), confirms `tasks.state='running'` before publishing `blocked`, and SUPPRESSES `done`
+(the dispatcher already emits `idle`); `done`→drop, never re-publish. UI side already exists (AgentStatusDot,
+all 4 buckets — UX: F4 is server-side only). RECOMMENDATION: own `plan-a-feature` — the dedup rule + module
+shapes are settled, but the hook-firing-in-unattended-mode premise and goose hook mechanism must be verified
+first or the whole feature is built on sand.
+
+**F5 — opencode compaction surfacing.** BLOCKED on a capability check. The installed `@opencode-ai/sdk`
+exposes NO compaction event arm (current arms confirmed: `session.next.{text,reasoning,tool,step}.*`,
+`message.part.*`, `session.idle/error` at opencode-server.ts:379-491). The review's "consume
+compaction.{started,delta,ended}" assumed events from opencode's CORE `event.ts`, which the pinned SDK may not
+surface. MUST confirm the SDK emits a compaction signal + its exact event name (or an SDK bump is needed)
+before building. DISPUTED UI treatment (behavioral B5 = persistent sentinel row `metadata.kind='compaction'`,
+survives refresh; UX = ephemeral inline divider via a new `agent_compacted` frame, no DB row) — settle once
+the event exists. Only `compaction.ended` is in scope (YAGNI: started/delta/step.failed/tool.progress out).
+Cross-app WS-frame parity is certain if a frame is added.
+
+**F8 — diff-line → agent re-prompt.** SPEC-LEVEL (UX + junior, firm). The "DiffPanel" is inline in
+`CoderPane.tsx:478-619`, rendering `pending_changes` rows as a static `<pre>` (CoderPane.tsx:607-610) — NO
+line-selection infrastructure exists. Diff source ambiguous (`pending_changes.diff` = BooCode write-tools only
+vs the external-agent worktree git diff). "Send to new agent" needs coordinated workspace-pane + chat creation
+ pre-population across 3 surfaces with no existing contract. Selection diverges by modality (desktop line-
+select vs mobile long-press → bottom sheet). RECOMMENDATION: own `plan-a-feature` (the scope-brief already
+hedged this; treat as firm). MVP-if-pushed: "comment to current agent" only, block-level selection,
+pre-populate `ChatInput` — still wants a spec.
+
+---
+
+## Claim ledger (consolidated, deduped)
+
+| # | Claim | State | Spec-maturity | Supporting |
+|---|-------|-------|---------------|-----------|
+| C1 | F1 cancel route never aborts external child; no registry/export | Evidenced | plan-level | on-call,behavioral,architect,junior |
+| C2 | F1 catch blocks leave message `streaming`; success path writes `complete` on abort — fix in same batch | Evidenced | plan-level | on-call,behavioral |
+| C3 | F2 = prune-now-minimal: unexport 8 zero-caller symbols, keep extractToolCallBlocks+stripToolMarkup | Evidenced | plan-level | architect (test-engineer guard) |
+| C4 | F2 `<invoke>`-text fallback is untested → add gate test before prune | Evidenced | plan-level | test-engineer |
+| C5 | F3 optional logger param, do with F2 (same file) | Evidenced | plan-level | architect,junior |
+| C6 | F6 stall-timeout via AbortSignal.any, 90s; NO retry (non-idempotent deltas) | Evidenced | plan-level | on-call,behavioral,test-engineer |
+| C7 | F7 inline MCP tool, messages_with_parts, role!='system', limit 50/200 | Evidenced | plan-level | architect,UX |
+| C8 | F9 delete SPA block, keep routes; GET / unprobed → safe | Evidenced | plan-level | architect (+ verified) |
+| C9 | F4 hook-firing in unattended mode UNVERIFIED; goose hook mechanism unknown | Anecdotal (premise) | spec-level | junior,behavioral,UX |
+| C10 | F4 dedup rule: confirm running before `blocked`; suppress hook `done` | Evidenced | plan-level | behavioral,architect |
+| C11 | F5 pinned @opencode-ai/sdk exposes no compaction arm → blocked on capability check | Evidenced | spec-level | (verified) + junior |
+| C12 | F5 UI treatment sentinel-row vs ephemeral-frame | Disputed | spec-level | behavioral vs UX |
+| C13 | F8 no line-selection infra; diff source ambiguous; needs own spec | Evidenced | spec-level | UX,junior |
+
+## Open Questions — resolutions
+
+- OQ (F1 terminal state) → RESOLVED: `cancelled`. OQ (F1 registry key) → RESOLVED: `taskId`. OQ (F1 shared
+  finalize helper) → RESOLVED: yes, pure helper. OQ (F1 warm re-throw on abort) → RESOLVED: short-circuit on
+  `ac.signal.aborted`.
+- OQ-F2a (sidecar jinja) → RESOLVED moot: option a keeps the guard. OQ-F2c (a vs b) → RESOLVED: option a.
+- OQ-F6a/b/c → RESOLVED: AbortSignal.any (not Promise.race); no retry; 90s.
+- OQ-F7a (session vs chat id) → RESOLVED: both (chat_id optional) + limit.
+- OQ-F9a (GET / probe) → RESOLVED: unprobed, safe.
+- OQ-F4a (hooks fire unattended?), OQ-F4b (goose hook format) → UNRESOLVED, spec-level → route to F4 spec.
+- OQ-F5a (SDK compaction event name/existence) → UNRESOLVED, capability check → blocks F5.
+- OQ-F5b (sentinel vs ephemeral UI) → UNRESOLVED → settle in F5 once event confirmed.
+- OQ-F8a/b/c (diff source, serialization, new viewer) → UNRESOLVED, spec-level → route to F8 spec.
+
+## Spec-maturity gate
+
+TRIPPED (≥5 spec-level findings — C9, C11, C12, C13, plus OQ-F4b/F8a — across ≥3 specialists: junior,
+behavioral, UX). The trip is CONCENTRATED in the three WANT items F4/F5/F8; F1/F2/F3/F6/F7/F9 are all
+plan-level and ready. Per skill: gate-trip → recommend the user route F4/F8 to `plan-a-feature` and F5 to a
+capability check. USER OVERRIDE STANDING: Sam chose scope "everything we discussed" having pre-acknowledged the
+WANT items would be planned more shallowly — so the plan proceeds, documenting F4/F5/F8 as Blocked/own-spec
+rather than halting. Decision deferred to Step 9 user presentation.
+
+## YAGNI ledger
+
+- F6 retry logic → DEFER (non-idempotent re-emit of streamed deltas). Reopen: llama-swap restart-in-place or
+  second instance. Source: on-call R1.
+- F2 option b (flag-gated full retirement of extractToolCallBlocks/stripToolMarkup) → DEFER (no evidence
+  qwen3.6 stopped emitting `<invoke>` text on live; sidecar jinja unconfirmed). Reopen: documented multi-
+  session live probe shows zero text-delta tool calls. Source: architect/test-engineer R1.
+- F4 `NotifyHookInjection` interface → REPLACE with one concrete function switching on agent name (3 agents,
+  identical read-merge-write). Source: architect R1.
+- F5 handling of compaction.started/delta + step.failed + tool.progress → DEFER, only compaction.ended is
+  user-actionable. Source: behavioral R1.
+- F7 SessionHistoryReader interface / pagination → REPLACE with inline query + limit. Source: architect R1.
+- Provider tier-2 follow-ups (snapshot frame, enabled column, shared types, MCP list_providers) → already
+  DEFER/DROP per scope-brief; not re-planned.