Files

indifferentketchup 2a05d2f9fe docs: archive shipped openspec batches; add feature/plan/research notes

Move 13 shipped openspec change docs under openspec/changes/archived/.
Add docs/features/git-diff-panel, docs/plans/post-review-backlog, and
docs/research/cross-app-contract-ssot.md (the research behind the
@boocode/contracts SSOT work). Update BOOCHAT.md, BOOCODER.md, and
boocode_roadmap.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-02 21:20:33 +00:00

15 KiB

Raw Blame History

Synthesis input — Round 1 aggregation + dispositions

Deterministic aggregation of the Round-1 specialist review (on-call-engineer, behavioral-analyst, software-architect, test-engineer, user-experience-designer, junior-developer). This is the consolidated record the project-manager synthesizes into the three plan files. Evidence (file:line) is preserved inline.

Team size: large (cross-subsystem, user chose "everything"). Round cap 3; converged in 1 round (the remaining unknowns are spec-level, not resolvable by more specialist rounds).

Per-feature dispositions (the decisions)

READY TO BUILD

F1 — external task cancel kills child + finalizes message. Strong 4-way consensus (on-call B1, behavioral B1, architect A1, junior).

Root cause CONFIRMED: routes/tasks.ts:130-138 calls inference.cancel (native-only); dispatcher has no Map<taskId,AbortController>; the four private ac (dispatcher.ts:316/655/991/1248) are unreachable; cancelExternalTask does not exist anywhere.
Design (architect A1): add taskControllers = new Map<string,AbortController>() inside createDispatcher; taskControllers.set(taskId, ac) at each of the 4 run-functions; delete in the existing .finally() at dispatcher.ts:117; export cancelExternalTask(taskId): boolean (idempotent — ac.abort() is a no-op when already aborted, so double-Stop and cancel-after-exit are safe). Pass a narrow ExternalCancelFn (NOT the whole dispatcher) into registerTaskRoutes; wire in index.ts:254.
TWO pre-existing bugs F1 makes reachable, MUST be fixed in the same batch (on-call OCE-001/OCE-002, behavioral B2/B3): (1) the four catch blocks update only tasks state, never the messages row → an aborted/thrown turn leaves the assistant message status='streaming' (BooChat's 5-min sweep can't recover it — different process); (2) the warm-backend success path writes messages.status='complete' unconditionally before checking abort (dispatcher.ts ~853/1122/1377) → a cancelled turn is recorded complete. Fix: after await backend.prompt(...), if (ac.signal.aborted) → write status='cancelled', publish the terminal message_complete frame, emit idle, return; and in each catch finalize the message with WHERE status='streaming' (idempotent) distinguishing AbortError→cancelled vs error→failed.
UX (UX agent): disable the Stop button while the cancel POST is in flight (mobile double-tap); extend the coder message_complete frame with an optional status field (Option A — minimal, no new frame type) and map it in the reducer (CoderPane.tsx:299, MessageStatus already includes 'cancelled'); render a muted "Stopped" label (not red, not a toast).
Tests (test-engineer T1-T3): extract a pure CancelRegistry (register/cancel/delete/has) — 4 unit cases, no DB/child; one DB-integration test for the route → row lands 'cancelled'; warm-worktree-preserved held as a code comment, not a spy.
Resolved OQs: terminal state = cancelled (not failed) for user Stop; registry keyed by taskId (route receives taskId); session/stop route — CoderPane already calls cancelTask for external tasks so the session-stop path "never fires for external from UI" (on-call) — wire it best-effort via a SELECT id FROM tasks WHERE session_id=$ AND state='running' lookup OR defer that leg (low value); use a shared cancelAndFinalize helper across the 4 paths (TDD precedent).

F2 — tool-call-parser prune (option a: prune-now-minimal). DECISION (architect A2, confirms junior OQ-F2c): do NOT do the flag-gated full retirement (option b). KEEP extractToolCallBlocks + stripToolMarkup

their types (ToolCallExtraction, ParsedCall) — load-bearing <invoke>-as-text guard (the only guard for that case; experimental_repairToolCall doesn't cover it; sidecar --jinja unconfirmed so keeping the guard is correct). REMOVE the export keyword (not the implementations) from the 8 zero-external-caller symbols: isPlaceholderArgValue, parseXmlToolCall, parseInvokeToolCall, partialXmlOpenerStart, and the 4 consts XML_TOOL_OPEN/CLOSE, INVOKE_TOOL_OPEN/CLOSE. Zero runtime effect; public surface 11→4 exports.

Test gap (test-engineer T6): the <invoke>-text fallback in stream-phase.ts:263-284 is currently NOT exercised by any test → add a gate test (stub streamText to emit a text-delta containing a complete <invoke> block; assert it lands in result.toolCalls and the markup is NOT in result.content). Must stay green through the prune and fail if extractToolCallBlocks is ever removed from the text-delta path.

F3 — xml-parser structured logging. Trivial. tool-call-parser.ts:65 console.debug → pass an optional log?: { debug } param to extractToolCallBlocks from its one call site (stream-phase.ts executeStreamPhase) and use it. No interface (architect: one site, one impl). SEQUENCING: same file as F2; F2 keeps extractToolCallBlocks (decided), so F3 is safe; do F2+F3 in one batch. Confirm executeStreamPhase signature/test-stubs tolerate the param (junior).

F6 — BooChat stall-timeout ONLY (retry deferred). on-call: wrap the stream-phase.ts:261 fullStream loop with a per-chunk stall deadline: a local stallAc = new AbortController(), effectiveSignal = AbortSignal.any([signal, stallAc.signal]) passed to streamText; bump a setTimeout(STALL_TIMEOUT_MS=90_000) on each chunk; clear it in the existing finally; at the post-loop check (stream-phase.ts:337) test signal?.aborted || stallAc.signal.aborted and throw AbortError (→ handleAbortOrError writes cancelled). Tests (test-engineer T8-T10): pure classifyStreamError(err) helper (5 cases, no I/O) + a vi.useFakeTimers() stall test on a fake hanging stream + a regression pin on the existing signal?.aborted post-loop check.

YAGNI DEFER (on-call, strong): NO retry at executeStreamPhase/streamCompletion. A retry after partial stream re-emits already-streamed deltas (state.accumulated + live delta frames are non-idempotent) — worse than current. Reopen trigger: llama-swap gains restart-in-place-with-clear-partial, or a second instance for failover. The user re-sending is the correct recovery at single-instance scale.

F7 — view_session_history MCP tool. architect A4: add tool 7 inline in mcp-server.ts (follows the existing 6-tool inline pattern, textResult + direct sql). Reads messages_with_parts, WHERE role != 'system' (strips sentinels), params session_id + optional chat_id + limit (default 50, max 200), ORDER BY created_at ASC. No interface, no pagination beyond limit. Returns {role,content,...}[].

F9 — retire apps/coder/web :9502 SPA. architect A5: the if (existsSync(webRoot)) block in index.ts (~269-289) already no-ops when the dist is absent. Delete that block, keep the inline 404 handler ({error:'not found'}); remove apps/coder/web from pnpm-workspace.yaml, the coder build step, and the Dockerfile copy; remove the now-unused fastifyStatic import (verify it's only used there). KEEP all /api/coder/* REST + WS + /api/health + --mcp routes (CoderPane depends on them). OQ-F9a RESOLVED: nothing probes GET / on :9502 (health is /api/health; compose healthcheck is the boocode container, not the host-systemd coder) → safe to 404 or add a 2-line GET / redirect-to-BooChat (no fastifyStatic).

BLOCKED — need a spec or a capability check before building (gate-trip items)

F4 — notify-hook config injection. SPEC-LEVEL gaps (junior OQ-F4a-e, behavioral B4, UX). The core premise is UNVERIFIED: do claude / qwen / goose actually fire their native lifecycle hooks in unattended mode (claude -p / SDK, qwen --acp / --output-format stream-json, goose)? goose's hook file/format is unknown (not in repo). Idempotent per-agent settings.json merge strategy unspecified. boocoder.service run-user / homedir() resolution unconfirmed. The inbound POST is a new unauthenticated localhost route (acceptable single-user, note it). Double-publish dedup with the v2.7.6 turn-boundary publish: behavioral B4 + architect A3 agree on the rule — inbound route calls normalizeAgentEvent (returns bucket working|blocked|done), confirms tasks.state='running' before publishing blocked, and SUPPRESSES done (the dispatcher already emits idle); done→drop, never re-publish. UI side already exists (AgentStatusDot, all 4 buckets — UX: F4 is server-side only). RECOMMENDATION: own plan-a-feature — the dedup rule + module shapes are settled, but the hook-firing-in-unattended-mode premise and goose hook mechanism must be verified first or the whole feature is built on sand.

F5 — opencode compaction surfacing. BLOCKED on a capability check. The installed @opencode-ai/sdk exposes NO compaction event arm (current arms confirmed: session.next.{text,reasoning,tool,step}.*, message.part.*, session.idle/error at opencode-server.ts:379-491). The review's "consume compaction.{started,delta,ended}" assumed events from opencode's CORE event.ts, which the pinned SDK may not surface. MUST confirm the SDK emits a compaction signal + its exact event name (or an SDK bump is needed) before building. DISPUTED UI treatment (behavioral B5 = persistent sentinel row metadata.kind='compaction', survives refresh; UX = ephemeral inline divider via a new agent_compacted frame, no DB row) — settle once the event exists. Only compaction.ended is in scope (YAGNI: started/delta/step.failed/tool.progress out). Cross-app WS-frame parity is certain if a frame is added.

F8 — diff-line → agent re-prompt. SPEC-LEVEL (UX + junior, firm). The "DiffPanel" is inline in CoderPane.tsx:478-619, rendering pending_changes rows as a static <pre> (CoderPane.tsx:607-610) — NO line-selection infrastructure exists. Diff source ambiguous (pending_changes.diff = BooCode write-tools only vs the external-agent worktree git diff). "Send to new agent" needs coordinated workspace-pane + chat creation

pre-population across 3 surfaces with no existing contract. Selection diverges by modality (desktop line- select vs mobile long-press → bottom sheet). RECOMMENDATION: own plan-a-feature (the scope-brief already hedged this; treat as firm). MVP-if-pushed: "comment to current agent" only, block-level selection, pre-populate ChatInput — still wants a spec.

Claim ledger (consolidated, deduped)

#	Claim	State	Spec-maturity	Supporting
C1	F1 cancel route never aborts external child; no registry/export	Evidenced	plan-level	on-call,behavioral,architect,junior
C2	F1 catch blocks leave message `streaming`; success path writes `complete` on abort — fix in same batch	Evidenced	plan-level	on-call,behavioral
C3	F2 = prune-now-minimal: unexport 8 zero-caller symbols, keep extractToolCallBlocks+stripToolMarkup	Evidenced	plan-level	architect (test-engineer guard)
C4	F2 `<invoke>`-text fallback is untested → add gate test before prune	Evidenced	plan-level	test-engineer
C5	F3 optional logger param, do with F2 (same file)	Evidenced	plan-level	architect,junior
C6	F6 stall-timeout via AbortSignal.any, 90s; NO retry (non-idempotent deltas)	Evidenced	plan-level	on-call,behavioral,test-engineer
C7	F7 inline MCP tool, messages_with_parts, role!='system', limit 50/200	Evidenced	plan-level	architect,UX
C8	F9 delete SPA block, keep routes; GET / unprobed → safe	Evidenced	plan-level	architect (+ verified)
C9	F4 hook-firing in unattended mode UNVERIFIED; goose hook mechanism unknown	Anecdotal (premise)	spec-level	junior,behavioral,UX
C10	F4 dedup rule: confirm running before `blocked`; suppress hook `done`	Evidenced	plan-level	behavioral,architect
C11	F5 pinned @opencode-ai/sdk exposes no compaction arm → blocked on capability check	Evidenced	spec-level	(verified) + junior
C12	F5 UI treatment sentinel-row vs ephemeral-frame	Disputed	spec-level	behavioral vs UX
C13	F8 no line-selection infra; diff source ambiguous; needs own spec	Evidenced	spec-level	UX,junior

Open Questions — resolutions

OQ (F1 terminal state) → RESOLVED: cancelled. OQ (F1 registry key) → RESOLVED: taskId. OQ (F1 shared finalize helper) → RESOLVED: yes, pure helper. OQ (F1 warm re-throw on abort) → RESOLVED: short-circuit on ac.signal.aborted.
OQ-F2a (sidecar jinja) → RESOLVED moot: option a keeps the guard. OQ-F2c (a vs b) → RESOLVED: option a.
OQ-F6a/b/c → RESOLVED: AbortSignal.any (not Promise.race); no retry; 90s.
OQ-F7a (session vs chat id) → RESOLVED: both (chat_id optional) + limit.
OQ-F9a (GET / probe) → RESOLVED: unprobed, safe.
OQ-F4a (hooks fire unattended?), OQ-F4b (goose hook format) → UNRESOLVED, spec-level → route to F4 spec.
OQ-F5a (SDK compaction event name/existence) → UNRESOLVED, capability check → blocks F5.
OQ-F5b (sentinel vs ephemeral UI) → UNRESOLVED → settle in F5 once event confirmed.
OQ-F8a/b/c (diff source, serialization, new viewer) → UNRESOLVED, spec-level → route to F8 spec.

Spec-maturity gate

TRIPPED (≥5 spec-level findings — C9, C11, C12, C13, plus OQ-F4b/F8a — across ≥3 specialists: junior, behavioral, UX). The trip is CONCENTRATED in the three WANT items F4/F5/F8; F1/F2/F3/F6/F7/F9 are all plan-level and ready. Per skill: gate-trip → recommend the user route F4/F8 to plan-a-feature and F5 to a capability check. USER OVERRIDE STANDING: Sam chose scope "everything we discussed" having pre-acknowledged the WANT items would be planned more shallowly — so the plan proceeds, documenting F4/F5/F8 as Blocked/own-spec rather than halting. Decision deferred to Step 9 user presentation.

YAGNI ledger

F6 retry logic → DEFER (non-idempotent re-emit of streamed deltas). Reopen: llama-swap restart-in-place or second instance. Source: on-call R1.
F2 option b (flag-gated full retirement of extractToolCallBlocks/stripToolMarkup) → DEFER (no evidence qwen3.6 stopped emitting <invoke> text on live; sidecar jinja unconfirmed). Reopen: documented multi- session live probe shows zero text-delta tool calls. Source: architect/test-engineer R1.
F4 NotifyHookInjection interface → REPLACE with one concrete function switching on agent name (3 agents, identical read-merge-write). Source: architect R1.
F5 handling of compaction.started/delta + step.failed + tool.progress → DEFER, only compaction.ended is user-actionable. Source: behavioral R1.
F7 SessionHistoryReader interface / pagination → REPLACE with inline query + limit. Source: architect R1.
Provider tier-2 follow-ups (snapshot frame, enabled column, shared types, MCP list_providers) → already DEFER/DROP per scope-brief; not re-planned.

15 KiB Raw Blame History