docs: archive shipped openspec batches; add feature/plan/research notes

Move 13 shipped openspec change docs under openspec/changes/archived/.
Add docs/features/git-diff-panel, docs/plans/post-review-backlog, and
docs/research/cross-app-contract-ssot.md (the research behind the
@boocode/contracts SSOT work). Update BOOCHAT.md, BOOCODER.md, and
boocode_roadmap.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-02 21:20:33 +00:00
parent e5ce01ae72
commit 2a05d2f9fe
27 changed files with 2210 additions and 17 deletions

View File

@@ -0,0 +1,181 @@
# Synthesis input — Round 1 aggregation + dispositions
Deterministic aggregation of the Round-1 specialist review (on-call-engineer, behavioral-analyst,
software-architect, test-engineer, user-experience-designer, junior-developer). This is the consolidated
record the project-manager synthesizes into the three plan files. Evidence (file:line) is preserved inline.
Team size: large (cross-subsystem, user chose "everything"). Round cap 3; converged in 1 round (the
remaining unknowns are spec-level, not resolvable by more specialist rounds).
---
## Per-feature dispositions (the decisions)
### READY TO BUILD
**F1 — external task cancel kills child + finalizes message.** Strong 4-way consensus (on-call B1, behavioral
B1, architect A1, junior).
- Root cause CONFIRMED: `routes/tasks.ts:130-138` calls `inference.cancel` (native-only); dispatcher has no
`Map<taskId,AbortController>`; the four private `ac` (dispatcher.ts:316/655/991/1248) are unreachable;
`cancelExternalTask` does not exist anywhere.
- Design (architect A1): add `taskControllers = new Map<string,AbortController>()` inside `createDispatcher`;
`taskControllers.set(taskId, ac)` at each of the 4 run-functions; delete in the existing `.finally()` at
dispatcher.ts:117; export `cancelExternalTask(taskId): boolean` (idempotent — `ac.abort()` is a no-op when
already aborted, so double-Stop and cancel-after-exit are safe). Pass a narrow `ExternalCancelFn`
(NOT the whole dispatcher) into `registerTaskRoutes`; wire in `index.ts:254`.
- TWO pre-existing bugs F1 makes reachable, MUST be fixed in the same batch (on-call OCE-001/OCE-002,
behavioral B2/B3): (1) the four catch blocks update only `tasks` state, never the `messages` row → an
aborted/thrown turn leaves the assistant message `status='streaming'` (BooChat's 5-min sweep can't recover
it — different process); (2) the warm-backend success path writes `messages.status='complete'`
unconditionally before checking abort (dispatcher.ts ~853/1122/1377) → a cancelled turn is recorded
`complete`. Fix: after `await backend.prompt(...)`, `if (ac.signal.aborted)` → write `status='cancelled'`,
publish the terminal `message_complete` frame, emit idle, return; and in each catch finalize the message
with `WHERE status='streaming'` (idempotent) distinguishing AbortError→cancelled vs error→failed.
- UX (UX agent): disable the Stop button while the cancel POST is in flight (mobile double-tap); extend the
coder `message_complete` frame with an optional `status` field (Option A — minimal, no new frame type) and
map it in the reducer (`CoderPane.tsx:299`, `MessageStatus` already includes `'cancelled'`); render a muted
"Stopped" label (not red, not a toast).
- Tests (test-engineer T1-T3): extract a pure `CancelRegistry` (register/cancel/delete/has) — 4 unit cases,
no DB/child; one DB-integration test for the route → row lands `'cancelled'`; warm-worktree-preserved held
as a code comment, not a spy.
- Resolved OQs: terminal state = `cancelled` (not `failed`) for user Stop; registry keyed by `taskId`
(route receives taskId); `session/stop` route — CoderPane already calls `cancelTask` for external tasks so
the session-stop path "never fires for external from UI" (on-call) — wire it best-effort via a
`SELECT id FROM tasks WHERE session_id=$ AND state='running'` lookup OR defer that leg (low value); use a
shared `cancelAndFinalize` helper across the 4 paths (TDD precedent).
**F2 — tool-call-parser prune (option a: prune-now-minimal).** DECISION (architect A2, confirms junior
OQ-F2c): do NOT do the flag-gated full retirement (option b). KEEP `extractToolCallBlocks` + `stripToolMarkup`
+ their types (`ToolCallExtraction`, `ParsedCall`) — load-bearing `<invoke>`-as-text guard (the only guard for
that case; `experimental_repairToolCall` doesn't cover it; sidecar `--jinja` unconfirmed so keeping the guard
is correct). REMOVE the `export` keyword (not the implementations) from the 8 zero-external-caller symbols:
`isPlaceholderArgValue`, `parseXmlToolCall`, `parseInvokeToolCall`, `partialXmlOpenerStart`, and the 4 consts
`XML_TOOL_OPEN/CLOSE`, `INVOKE_TOOL_OPEN/CLOSE`. Zero runtime effect; public surface 11→4 exports.
- Test gap (test-engineer T6): the `<invoke>`-text fallback in `stream-phase.ts:263-284` is currently NOT
exercised by any test → add a gate test (stub `streamText` to emit a text-delta containing a complete
`<invoke>` block; assert it lands in `result.toolCalls` and the markup is NOT in `result.content`). Must
stay green through the prune and fail if `extractToolCallBlocks` is ever removed from the text-delta path.
**F3 — xml-parser structured logging.** Trivial. `tool-call-parser.ts:65` `console.debug` → pass an optional
`log?: { debug }` param to `extractToolCallBlocks` from its one call site (`stream-phase.ts` executeStreamPhase)
and use it. No interface (architect: one site, one impl). SEQUENCING: same file as F2; F2 keeps
`extractToolCallBlocks` (decided), so F3 is safe; do F2+F3 in one batch. Confirm `executeStreamPhase`
signature/test-stubs tolerate the param (junior).
**F6 — BooChat stall-timeout ONLY (retry deferred).** on-call: wrap the `stream-phase.ts:261` fullStream loop
with a per-chunk stall deadline: a local `stallAc = new AbortController()`, `effectiveSignal =
AbortSignal.any([signal, stallAc.signal])` passed to `streamText`; bump a `setTimeout(STALL_TIMEOUT_MS=90_000)`
on each chunk; clear it in the existing `finally`; at the post-loop check (stream-phase.ts:337) test
`signal?.aborted || stallAc.signal.aborted` and throw `AbortError` (→ `handleAbortOrError` writes
`cancelled`). Tests (test-engineer T8-T10): pure `classifyStreamError(err)` helper (5 cases, no I/O) + a
`vi.useFakeTimers()` stall test on a fake hanging stream + a regression pin on the existing `signal?.aborted`
post-loop check.
- YAGNI DEFER (on-call, strong): NO retry at `executeStreamPhase`/`streamCompletion`. A retry after partial
stream re-emits already-streamed deltas (`state.accumulated` + live `delta` frames are non-idempotent) —
worse than current. Reopen trigger: llama-swap gains restart-in-place-with-clear-partial, or a second
instance for failover. The user re-sending is the correct recovery at single-instance scale.
**F7 — view_session_history MCP tool.** architect A4: add tool 7 inline in `mcp-server.ts` (follows the
existing 6-tool inline pattern, `textResult` + direct `sql`). Reads `messages_with_parts`, `WHERE role !=
'system'` (strips sentinels), params `session_id` + optional `chat_id` + `limit` (default 50, max 200),
`ORDER BY created_at ASC`. No interface, no pagination beyond limit. Returns `{role,content,...}[]`.
**F9 — retire apps/coder/web :9502 SPA.** architect A5: the `if (existsSync(webRoot))` block in `index.ts`
(~269-289) already no-ops when the dist is absent. Delete that block, keep the inline 404 handler
(`{error:'not found'}`); remove `apps/coder/web` from `pnpm-workspace.yaml`, the coder build step, and the
Dockerfile copy; remove the now-unused `fastifyStatic` import (verify it's only used there). KEEP all
`/api/coder/*` REST + WS + `/api/health` + `--mcp` routes (CoderPane depends on them). OQ-F9a RESOLVED:
nothing probes `GET /` on :9502 (health is `/api/health`; compose healthcheck is the boocode container, not
the host-systemd coder) → safe to 404 or add a 2-line `GET /` redirect-to-BooChat (no fastifyStatic).
### BLOCKED — need a spec or a capability check before building (gate-trip items)
**F4 — notify-hook config injection.** SPEC-LEVEL gaps (junior OQ-F4a-e, behavioral B4, UX). The core premise
is UNVERIFIED: do claude / qwen / goose actually fire their native lifecycle hooks in unattended mode
(`claude -p` / SDK, `qwen --acp` / `--output-format stream-json`, goose)? goose's hook file/format is unknown
(not in repo). Idempotent per-agent settings.json merge strategy unspecified. `boocoder.service` run-user /
`homedir()` resolution unconfirmed. The inbound POST is a new unauthenticated localhost route (acceptable
single-user, note it). Double-publish dedup with the v2.7.6 turn-boundary publish: behavioral B4 +
architect A3 agree on the rule — inbound route calls `normalizeAgentEvent` (returns bucket
`working|blocked|done`), confirms `tasks.state='running'` before publishing `blocked`, and SUPPRESSES `done`
(the dispatcher already emits `idle`); `done`→drop, never re-publish. UI side already exists (AgentStatusDot,
all 4 buckets — UX: F4 is server-side only). RECOMMENDATION: own `plan-a-feature` — the dedup rule + module
shapes are settled, but the hook-firing-in-unattended-mode premise and goose hook mechanism must be verified
first or the whole feature is built on sand.
**F5 — opencode compaction surfacing.** BLOCKED on a capability check. The installed `@opencode-ai/sdk`
exposes NO compaction event arm (current arms confirmed: `session.next.{text,reasoning,tool,step}.*`,
`message.part.*`, `session.idle/error` at opencode-server.ts:379-491). The review's "consume
compaction.{started,delta,ended}" assumed events from opencode's CORE `event.ts`, which the pinned SDK may not
surface. MUST confirm the SDK emits a compaction signal + its exact event name (or an SDK bump is needed)
before building. DISPUTED UI treatment (behavioral B5 = persistent sentinel row `metadata.kind='compaction'`,
survives refresh; UX = ephemeral inline divider via a new `agent_compacted` frame, no DB row) — settle once
the event exists. Only `compaction.ended` is in scope (YAGNI: started/delta/step.failed/tool.progress out).
Cross-app WS-frame parity is certain if a frame is added.
**F8 — diff-line → agent re-prompt.** SPEC-LEVEL (UX + junior, firm). The "DiffPanel" is inline in
`CoderPane.tsx:478-619`, rendering `pending_changes` rows as a static `<pre>` (CoderPane.tsx:607-610) — NO
line-selection infrastructure exists. Diff source ambiguous (`pending_changes.diff` = BooCode write-tools only
vs the external-agent worktree git diff). "Send to new agent" needs coordinated workspace-pane + chat creation
+ pre-population across 3 surfaces with no existing contract. Selection diverges by modality (desktop line-
select vs mobile long-press → bottom sheet). RECOMMENDATION: own `plan-a-feature` (the scope-brief already
hedged this; treat as firm). MVP-if-pushed: "comment to current agent" only, block-level selection,
pre-populate `ChatInput` — still wants a spec.
---
## Claim ledger (consolidated, deduped)
| # | Claim | State | Spec-maturity | Supporting |
|---|-------|-------|---------------|-----------|
| C1 | F1 cancel route never aborts external child; no registry/export | Evidenced | plan-level | on-call,behavioral,architect,junior |
| C2 | F1 catch blocks leave message `streaming`; success path writes `complete` on abort — fix in same batch | Evidenced | plan-level | on-call,behavioral |
| C3 | F2 = prune-now-minimal: unexport 8 zero-caller symbols, keep extractToolCallBlocks+stripToolMarkup | Evidenced | plan-level | architect (test-engineer guard) |
| C4 | F2 `<invoke>`-text fallback is untested → add gate test before prune | Evidenced | plan-level | test-engineer |
| C5 | F3 optional logger param, do with F2 (same file) | Evidenced | plan-level | architect,junior |
| C6 | F6 stall-timeout via AbortSignal.any, 90s; NO retry (non-idempotent deltas) | Evidenced | plan-level | on-call,behavioral,test-engineer |
| C7 | F7 inline MCP tool, messages_with_parts, role!='system', limit 50/200 | Evidenced | plan-level | architect,UX |
| C8 | F9 delete SPA block, keep routes; GET / unprobed → safe | Evidenced | plan-level | architect (+ verified) |
| C9 | F4 hook-firing in unattended mode UNVERIFIED; goose hook mechanism unknown | Anecdotal (premise) | spec-level | junior,behavioral,UX |
| C10 | F4 dedup rule: confirm running before `blocked`; suppress hook `done` | Evidenced | plan-level | behavioral,architect |
| C11 | F5 pinned @opencode-ai/sdk exposes no compaction arm → blocked on capability check | Evidenced | spec-level | (verified) + junior |
| C12 | F5 UI treatment sentinel-row vs ephemeral-frame | Disputed | spec-level | behavioral vs UX |
| C13 | F8 no line-selection infra; diff source ambiguous; needs own spec | Evidenced | spec-level | UX,junior |
## Open Questions — resolutions
- OQ (F1 terminal state) → RESOLVED: `cancelled`. OQ (F1 registry key) → RESOLVED: `taskId`. OQ (F1 shared
finalize helper) → RESOLVED: yes, pure helper. OQ (F1 warm re-throw on abort) → RESOLVED: short-circuit on
`ac.signal.aborted`.
- OQ-F2a (sidecar jinja) → RESOLVED moot: option a keeps the guard. OQ-F2c (a vs b) → RESOLVED: option a.
- OQ-F6a/b/c → RESOLVED: AbortSignal.any (not Promise.race); no retry; 90s.
- OQ-F7a (session vs chat id) → RESOLVED: both (chat_id optional) + limit.
- OQ-F9a (GET / probe) → RESOLVED: unprobed, safe.
- OQ-F4a (hooks fire unattended?), OQ-F4b (goose hook format) → UNRESOLVED, spec-level → route to F4 spec.
- OQ-F5a (SDK compaction event name/existence) → UNRESOLVED, capability check → blocks F5.
- OQ-F5b (sentinel vs ephemeral UI) → UNRESOLVED → settle in F5 once event confirmed.
- OQ-F8a/b/c (diff source, serialization, new viewer) → UNRESOLVED, spec-level → route to F8 spec.
## Spec-maturity gate
TRIPPED (≥5 spec-level findings — C9, C11, C12, C13, plus OQ-F4b/F8a — across ≥3 specialists: junior,
behavioral, UX). The trip is CONCENTRATED in the three WANT items F4/F5/F8; F1/F2/F3/F6/F7/F9 are all
plan-level and ready. Per skill: gate-trip → recommend the user route F4/F8 to `plan-a-feature` and F5 to a
capability check. USER OVERRIDE STANDING: Sam chose scope "everything we discussed" having pre-acknowledged the
WANT items would be planned more shallowly — so the plan proceeds, documenting F4/F5/F8 as Blocked/own-spec
rather than halting. Decision deferred to Step 9 user presentation.
## YAGNI ledger
- F6 retry logic → DEFER (non-idempotent re-emit of streamed deltas). Reopen: llama-swap restart-in-place or
second instance. Source: on-call R1.
- F2 option b (flag-gated full retirement of extractToolCallBlocks/stripToolMarkup) → DEFER (no evidence
qwen3.6 stopped emitting `<invoke>` text on live; sidecar jinja unconfirmed). Reopen: documented multi-
session live probe shows zero text-delta tool calls. Source: architect/test-engineer R1.
- F4 `NotifyHookInjection` interface → REPLACE with one concrete function switching on agent name (3 agents,
identical read-merge-write). Source: architect R1.
- F5 handling of compaction.started/delta + step.failed + tool.progress → DEFER, only compaction.ended is
user-actionable. Source: behavioral R1.
- F7 SessionHistoryReader interface / pagination → REPLACE with inline query + limit. Source: architect R1.
- Provider tier-2 follow-ups (snapshot frame, enabled column, shared types, MCP list_providers) → already
DEFER/DROP per scope-brief; not re-planned.