Files
boocode/docs/plans/post-review-backlog/feature-implementation-plan.md
indifferentketchup 2a05d2f9fe docs: archive shipped openspec batches; add feature/plan/research notes
Move 13 shipped openspec change docs under openspec/changes/archived/.
Add docs/features/git-diff-panel, docs/plans/post-review-backlog, and
docs/research/cross-app-contract-ssot.md (the research behind the
@boocode/contracts SSOT work). Update BOOCHAT.md, BOOCODER.md, and
boocode_roadmap.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 21:20:33 +00:00

36 KiB
Raw Blame History

Feature Implementation Plan: Post-Review Backlog (F1F9)

This is a multi-item backlog, not a single feature. It commits to shipping six READY items as independent sequential patch tags (D-9), one batch per coherent unit, honoring deploy-by-surface and stage-commits-by-path; and to documenting three BLOCKED WANT items (F4/F5/F8) with their exact blocking open questions and recommended resolution paths (D-12) rather than halting on the tripped spec-maturity gate.

Source Specification

  • Feature specification: No formal feature-specification.md exists. The ground-truth source is the scope brief, which captures the items, decisions, and live-verified current state from the 2026-06-02 backlog review conversation plus boocode_code_review_v2.md, docs/DEFERRED-WORK.md, and boocode_roadmap.md. Origin is conversational + review-doc, so the WANT items are intentionally planned more shallowly.
  • Consolidated specialist record: artifacts/synthesis-input.md — the Round-1 aggregation of all six specialists (on-call-engineer, behavioral-analyst, software-architect, test-engineer, user-experience-designer, junior-developer) with file:line evidence, the claim ledger, OQ resolutions, the spec-maturity gate, and the YAGNI ledger. There is no separate per-specialist file; this digest is authoritative.
  • Discovery notes: artifacts/.discovery-notes.md — project context and per-item code touch points.

Outcome

When this plan's READY cluster is executed: hitting Stop on an external agent task actually kills the running child and finalizes the assistant message in a correct terminal state (no more streaming-stuck or falsely-complete rows); the tool-call parser's public surface is pruned to its four load-bearing exports while the plain-text <invoke> guard stays intact and is for the first time test-pinned; the parser's rejection logging flows through pino/LOG_LEVEL; a hung BooChat stream is detected and finalized server-side after 90s instead of relying solely on the frontend watchdog; an MCP client can read a session's transcript through a new BooCoder MCP tool; and the unused :9502 fallback SPA and its build step are gone. F4/F5/F8 leave this plan with a documented, evidence-backed route to their own spec/capability work.

Context

  • Driving constraint: A post-review backlog with two items live-verified as active correctness bugs (F1 cancel is a no-op that leaves messages stuck; F6 has zero server-side stall guard). The review surfaced them; Sam chose scope "everything we discussed." No external deadline — value is correctness and simplification debt paydown.
  • Stakeholders: Sam (sole user/operator — wants Stop to work, wants the parser simplified, wants resilience and the MCP history tool, greenlit the :9502 removal). On-call posture is single-operator: Sam is the post-ship owner for every item.
  • Future-state concern: F1 touches the dispatcher's message-finalization paths across four backends — the risk to watch is finalization races/regressions there. F6 introduces a server-side timer that must not fire on a healthy-but-slow stream. F5 is a standing capability dependency on the pinned @opencode-ai/sdk.
  • Out-of-scope boundary: F2 Option B (flag-gated full parser retirement), F6 retry/backoff, F4's superset code (clean-room pattern only), and everything in the scope brief's "Deferred (Sam's explicit dispositions)" list (subagent permission demux, tier-2 provider follow-ups, large-file splits, PR-resolver, etc.). These are recorded, not planned.

Team Composition and Participation

Round-by-round detail lives in artifacts/implementation-iteration-history.md. All six specialists converged in one round; remaining unknowns are spec-level, not resolvable by more specialist rounds.

Specialist Status Key Input
project-manager Coordinator Facilitated R1, applied the spec-maturity gate + YAGNI gate, synthesized this plan.
on-call-engineer Active F1 finalization bugs (OCE-001/002); F6 stall-timeout design + no-retry YAGNI call.
behavioral-analyst Active F1 message-state corruption (B2/B3); F5 sentinel-row UI position (disputed); F4 dedup rule.
software-architect Active F1 registry/ExternalCancelFn shape (A1); F2 option-A prune (A2); F7 inline tool (A4); F9 removal (A5).
test-engineer Active F1 CancelRegistry + DB test (T1-T3); F2 fallback gate test (T6); F6 classifyStreamError + fake-timer test (T8-T10).
user-experience-designer Active F1 disable-Stop-while-in-flight + Option-A frame extension + muted "Stopped" label; F5 ephemeral-divider UI (disputed); F8 no line-selection infra.
junior-developer Active OQ reframing across all items; F2c (option a), F4a/b unverified premise, F8 spec-level.

Implementation Approach

ALTITUDE: this section names code artifacts and inlines only decision-bearing values (flag names, the 90s stall timeout, the 50/200 limits, the export list to unexport). Full implementations belong in the files themselves; mechanics live in the cited decision-log entries.

TIER 1 — READY TO BUILD

F1 — external task cancel kills the child + finalizes the message (apps/coder, standalone, do first). The highest-value correctness fix. A taskControllers = new Map<string,AbortController>() registry inside createDispatcher, populated at the four run-functions and deleted in the existing .finally(), plus an exported idempotent cancelExternalTask(taskId): boolean wired into POST /api/tasks/:id/cancel via a narrow ExternalCancelFn (D-1). This batch must also fix the two pre-existing finalization bugs the abort wiring newly makes reachable — the catch blocks that leave messages streaming and the warm success path that writes complete on an aborted turn — via a shared cancelAndFinalize helper, or it ships a new bug (D-1). User Stop finalizes to cancelled, not failed (D-7); the terminal state reaches the web reducer via an optional status field on the existing coder message_complete frame, not a new frame type (D-8), rendered as a muted "Stopped" label (D-10). The session/stop leg "never fires for external from UI" (CoderPane already calls cancelTask), so it is wired best-effort via a WHERE session_id=$ AND state='running' lookup or deferred as a low-value leg.

F2 + F3 together (apps/server, same file). F2 is option-A prune-now-minimal: KEEP extractToolCallBlocks

  • stripToolMarkup (the only guard for a tool call emitted as plain text), unexport the 8 zero-caller symbols — isPlaceholderArgValue, parseXmlToolCall, parseInvokeToolCall, partialXmlOpenerStart, and the consts XML_TOOL_OPEN/CLOSE, INVOKE_TOOL_OPEN/CLOSE — taking the public surface 11 → 4 with zero runtime effect (D-3). A gate test pins the untested <invoke>-as-text fallback before the prune (D-4). Because F2 keeps extractToolCallBlocks, F3 is safe to land in the same file/batch: thread an optional log?: { debug } param into extractToolCallBlocks from its one call site so rejection logging flows through pino/LOG_LEVEL instead of console.debug (D-2).

F6 — BooChat stall-timeout ONLY (apps/server, standalone). Wrap the stream-phase.ts:261 fullStream loop with a per-chunk stall deadline using AbortSignal.any([signal, stallAc.signal]) and a setTimeout(STALL_TIMEOUT_MS = 90_000) bumped on each chunk, cleared in the existing finally; the post-loop check throws AbortError so the existing finalize path writes cancelled (D-5). No retry — partial-stream re-emit is non-idempotent and the instance is single+local (see Deferred (YAGNI)).

F7 — view_session_history MCP tool (apps/coder MCP, standalone). Tool 7 added inline in mcp-server.ts following the existing 6-tool inline pattern (textResult + direct sql). Reads messages_with_parts WHERE role != 'system', params session_id + optional chat_id + limit (default 50, max 200), ORDER BY created_at ASC (D-6). No interface, no pagination beyond limit.

F9 — retire :9502 SPA (apps/coder, standalone cleanup). Delete the if (existsSync(webRoot)) serve block in index.ts (~269-289), keep the inline 404; remove apps/coder/web from pnpm-workspace.yaml, the coder build step, and the Dockerfile copy; drop the now-unused fastifyStatic import. KEEP every /api/coder/* REST + WS + /api/health + --mcp route (D-11).

TIER 2 — BLOCKED (need their own spec / a capability check)

These three are recorded, not built. They do not block the READY cluster (D-12).

F4 — notify-hook config injection. The module shapes and the double-publish dedup rule are settled (inbound route calls normalizeAgentEvent → bucket; confirms tasks.state='running' before publishing blocked; SUPPRESSES done since the dispatcher already emits idle). But the core premise is unverified: it is unknown whether claude / qwen / goose actually fire their native lifecycle hooks in unattended mode (claude -p/SDK, qwen --acp/stream-json, goose), and goose's hook file/format is unknown (not in repo). Recommended path: route to plan-a-feature; verify hook-firing-in-unattended-mode and the goose hook mechanism first, or the feature is built on sand. Blocking OQs: OQ-F4a (hooks fire unattended?), OQ-F4b (goose hook format?).

F5 — opencode compaction surfacing. Blocked on a capability check: the pinned @opencode-ai/sdk exposes NO compaction event arm (confirmed arms: session.next.{text,reasoning,tool,step}.*, message.part.*, session.idle/error). The review assumed events from opencode's core event.ts that the pinned SDK may not surface. Recommended path: confirm the SDK emits a compaction signal and its exact event name (or an SDK bump is needed) before building. The UI treatment is disputed (behavioral: persistent sentinel row metadata.kind='compaction' that survives refresh; UX: ephemeral inline divider via a new agent_compacted frame, no DB row) — settle once the event is confirmed to exist. Only compaction.ended is in scope. Blocking OQs: OQ-F5a (SDK event existence/name), OQ-F5b (sentinel vs ephemeral UI).

F8 — diff-line → agent re-prompt UX. Spec-level. The "DiffPanel" is inline in CoderPane.tsx:478-619, rendering pending_changes rows as a static <pre> — no line-selection infrastructure exists. The diff source is ambiguous (pending_changes.diff = BooCode write-tools only vs the external-agent worktree git diff). "Send to new agent" needs coordinated workspace-pane + chat creation + pre-population across three surfaces with no existing contract; selection diverges by modality. Recommended path: route to plan-a-feature (the scope brief already hedged this; treat as firm). MVP-if-pushed: "comment to current agent" only, block-level selection, pre-populate ChatInput — still wants a spec. Blocking OQs: OQ-F8a (diff source), OQ-F8b (selection serialization), OQ-F8c (new-viewer routing).

Architecture and Integration Points

F1 lives entirely in apps/coder (dispatcher.ts, routes/tasks.ts, index.ts:254 wiring) with a single-field touch on the web reducer (CoderPane.tsx:299). F2/F3 are confined to apps/server/.../tool-call-parser.ts and its one call site stream-phase.ts. F6 is confined to stream-phase.ts. F7 is one inline tool in apps/coder/.../mcp-server.ts. F9 removes a serve block and a workspace package. No new interface is introduced anywhere in TIER 1 — every item follows the existing inline / pure-helper precedent (D-2, D-6).

Runtime Behavior

F1's cancel path: Stop → POST /api/tasks/:id/cancelcancelExternalTask(taskId)ac.abort() → backend honors the signal (child.kill / session/cancel / session.abort / interrupt) → the run-function hits its abort short-circuit or catch → cancelAndFinalize writes the terminal messages.status (cancelled on abort, failed on error) WHERE status='streaming' and publishes the terminal frame (D-1, D-7). F6's stall path: each chunk bumps the 90s timer; on stall, stallAc.abort() flows through AbortSignal.any into the same finalize-as-cancelled path (D-5).

Data Model and Persistence

No schema changes in any TIER 1 item. F1 and F6 only change which terminal value gets written to existing messages.status / tasks.state columns. F7 is a read against the existing messages_with_parts view.

External Interfaces

F1 extends the existing coder message_complete WS frame with an optional status field — no new frame type, so no paired strict-union arm is forced in apps/web/src/api/types.ts beyond the optional field (D-8). F7 exposes one new read-only MCP tool on BooCoder's MCP server (D-6). F4 (blocked) would add a new unauthenticated localhost POST route — see Security Posture. F5 (blocked) would add a new WS frame and thus trigger the full cross-app parity rule.

Decomposition and Sequencing

Each READY item is its own patch tag, one batch per coherent unit (D-9). F2+F3 are a single unit (same file; F2 keeping extractToolCallBlocks unblocks F3). F1 goes first as the highest-value correctness fix.

# Work Unit Surface / Deploy Delivers Depends On Verification
1 F1 cancel registry + finalization fixes apps/coder → systemctl restart boocoder Stop kills child; message finalizes cancelled/failed; no stuck streaming (D-1) CancelRegistry unit (4 cases) + DB-integration route test → row cancelled (T1-T3)
2 F2+F3 parser prune + logger apps/server → docker rebuild Public surface 11→4; rejection logging via pino; guard pinned (D-3, D-2) F2's keep-decision unblocks F3 Fallback gate test green through prune (T6, D-4)
3 F6 stall-timeout apps/server → docker rebuild 90s server-side stall detection → finalize cancelled (D-5) classifyStreamError unit (5 cases) + vi.useFakeTimers() stall test + post-loop abort regression pin (T8-T10)
4 F7 session-history MCP tool apps/coder → systemctl restart boocoder Read-only transcript tool; /api/health tool count +1 (D-6) Manual MCP call + sentinel-strip assertion (role != 'system')
5 F9 retire :9502 SPA apps/coder → systemctl restart boocoder Serve block + build step + workspace pkg removed; routes kept (D-11) /api/coder/* + /api/health still 200; GET / 404-or-redirect

F4/F5/F8 are NOT in this table — they route out per D-12.

RAID Log

Risks

ID Risk Likelihood Severity Blast Radius Reversibility Owner Mitigation
R1 F1 message-finalization races across the 4 backend paths leave a row in the wrong terminal state Medium Medium One assistant message per affected turn Reversible (re-run) on-call-engineer Shared cancelAndFinalize helper, WHERE status='streaming' idempotency, abort short-circuit before the unconditional complete write (D-1)
R2 F2 prune accidentally removes the <invoke>-as-text fallback guard (regression: plain-text tool calls silently dropped) Low High Any qwen3.6 turn that emits tool-call markup as text Reversible (re-add export/call) software-architect Option-A keeps the impl; gate test fails if extractToolCallBlocks leaves the text-delta path (D-3, D-4)
R3 F6 stall-timeout fires on a healthy-but-slow stream (false abort) Low Low One in-flight chat turn Reversible (re-send) on-call-engineer 90s per-chunk (not per-turn) deadline, bumped on every chunk; tune only on observed false-fire (D-5)
R4 F5 premise fails — the pinned SDK never surfaces a compaction event, so the feature is unbuildable without an SDK bump High N/A (blocked) F5 only N/A (capability check) Confirm event existence/name before any build; F5 stays Blocked until then (D-12)
R5 F4 premise fails — agents do not fire native hooks in unattended mode, so injection yields no signals High N/A (blocked) F4 only N/A (plan-a-feature) Verify hook-firing + goose format in the F4 spec before any build (D-12)

Assumptions

ID Assumption What Changes If Wrong Verifier Status
A1 llama-swap native --jinja parsing stays ON (structured tool_calls), so the text-scrape fallback remains dormant defense-in-depth If off, the kept guard becomes hot — fine; if a future config silently changes parsing, F2 Option B reopens live probe of :8401 (not in-repo) Unverified-in-repo (probe-only); guard kept regardless (D-3)
A2 All four external backends honor ctx.signal as cited (child.kill / session/cancel / session.abort / interrupt) If one does not, that backend's Stop stays a no-op even after wiring F1 DB-integration test per backend path Evidenced by file:line (scope-brief F1); test confirms
A3 90s is the right stall deadline for the local llama-swap workload shape Too low → false aborts on slow models; too high → sluggish recovery Production observation Committed value, tunable (D-5)
A4 F1's session/stop leg "never fires for external from UI" (CoderPane calls cancelTask) If a code path does hit session-stop for an external task, that leg's finalization is best-effort only on-call review of CoderPane wiring Evidenced (synthesis-input F1 resolved OQs)

Dependencies

ID Dependency Owner Status
Dep1 F5 requires @opencode-ai/sdk to surface a compaction event (or an SDK bump) (capability check) BLOCKING F5 — unresolved (OQ-F5a)
Dep2 F4 requires confirmed unattended hook-firing for claude/qwen/goose + the goose hook format (plan-a-feature) BLOCKING F4 — unresolved (OQ-F4a, OQ-F4b)
Dep3 F8 requires a chosen diff source + a line-selection approach (plan-a-feature) BLOCKING F8 — unresolved (OQ-F8a/b/c)

Testing Strategy

Sourced from test-engineer (T1-T3, T6, T8-T10) following the established pure-helper-then-wire precedent (turn-guard.ts, lifecycle-decisions.ts, mistake-tracker.ts). Coder tests are globals:false (import describe/it/expect); include glob src/**/__tests__/**/*.test.ts; DB-integration tests are opt-in via DATABASE_URL + describe.runIf.

  • Observable behaviors to test:
    • F1: a cancelled external task lands messages.status='cancelled' (not streaming, not complete); the registry register/cancel/delete/has behaves idempotently (D-1, D-7).
    • F2: a text-delta carrying a complete <invoke> block lands in result.toolCalls with the markup absent from result.content — green through the prune, red if the guard is removed (D-4).
    • F6: a fake hanging stream is aborted after the 90s deadline under vi.useFakeTimers(); the existing post-loop signal?.aborted check still passes (regression pin) (D-5).
    • F7: results exclude role='system' rows (sentinel strip) and respect the limit cap.
  • Test doubles posture: extract pure helpers (CancelRegistry, classifyStreamError) with no DB/child/I/O for unit coverage; stub streamText for the F2 gate and F6 stall tests; one DB-integration test for the F1 route. Warm-worktree-preserved is held as a code comment, not a spy (test-engineer T1-T3).
  • Edge cases requiring coverage: double-Stop / cancel-after-exit (idempotent ac.abort()); abort vs thrown error in the catch (cancelled vs failed mapping); F6 stall vs healthy-slow stream.
  • Test levels: unit (registry, classifier, gate) → integration (F1 route → DB row) → manual (F7 MCP call, F9 route-still-200 smoke).

Security Posture

Thin. The single TIER 1 surface is BooCoder's new read-only view_session_history MCP tool (F7), which reads only messages_with_parts WHERE role != 'system' — no write verbs, no PII beyond what the operator already authored, single-user, Authelia at the reverse proxy, no app-layer auth by design.

The one genuinely new auth-relevant surface — F4's inbound unauthenticated localhost POST route for hook callbacks — belongs to a BLOCKED item and is not built in this plan. When F4 is planned, it must be recorded explicitly as a new unauthenticated localhost route (acceptable under the single-user / Authelia posture, but called out, not silent).

Operational Readiness

  • Deploy by surface: apps/coder items (F1, F7, F9) → sudo systemctl restart boocoder; apps/server items (F2+F3, F6) → docker compose up --build -d boocode. Stage commits explicitly by path; never git add -A (Sam has uncommitted web WIP) (D-9).
  • Build-step change (F9): removing apps/coder/web deletes a coder build step and a workspace package; the coder Dockerfile copy and pnpm-workspace.yaml entry must be removed in the same batch or the build references a missing package (D-11).
  • Health check after F7: /api/health reports a tool count; expect it to increment by one.
  • Tagging: sequential patch tags, one per unit; no v2.8.0 (D-9).
  • Rollback: each unit is an independent patch tag, so rollback is per-item (revert the tag, redeploy by surface). No schema migrations means no expand/contract sequencing to unwind.

On-Call Resilience Posture

Application-source resilience, sourced from on-call-engineer. Infrastructure concerns live in Operational Readiness above.

  • Timeouts and deadlines: F6 adds a 90s per-chunk stall deadline on the BooChat fullStream loop, the first server-side guard against a hung llama-swap stream (today only the frontend 60s discard_stale watchdog exists) (D-5).
  • Retry strategy: none, deliberately. A retry after a partial stream re-emits already-streamed deltas (non-idempotent); the user re-sending is the correct recovery at single-local-instance scale (see Deferred (YAGNI)) (D-5).
  • Idempotency: F1's cancelExternalTask is idempotent (ac.abort() no-ops when already aborted → double-Stop and cancel-after-exit are safe); message finalization is idempotent via WHERE status='streaming' (D-1).
  • Kill switches: F1 is itself the kill switch for a runaway external agent — making Stop actually stop is the operator's manual control.
  • Graceful degradation: F1's session/stop leg is best-effort (does not fire for external tasks from the UI); F6 falls back to finalizing as cancelled on stall rather than hanging indefinitely.
  • Observability of failure paths: F1 distinguishes cancelled (user/stall) from failed (thrown error) so the human-inbox surface stays honest (D-7); F3 routes parser-rejection logging through pino/LOG_LEVEL (D-2).
  • Data integrity: no monetary/rate columns touched; F1/F6 only write existing terminal-state enum values into existing columns.
  • Migration safety: no schema changes in any TIER 1 item — nothing to expand/contract.

Definition of Done

  • Hitting Stop on an external task kills the child and the assistant message finalizes cancelled (never left streaming, never falsely complete) (D-1, D-7).
  • Stop button is disabled while the cancel POST is in flight; a muted "Stopped" label renders on cancel (D-8, D-10).
  • Parser public surface is 4 exports; the <invoke>-as-text gate test is green and fails if the guard is removed (D-3, D-4).
  • Parser rejection logging appears via pino and respects LOG_LEVEL (D-2).
  • A fake hanging BooChat stream is finalized cancelled after 90s under fake timers; the post-loop abort regression pin passes (D-5).
  • view_session_history returns a session transcript excluding role='system' rows, capped at limit (default 50, max 200); /api/health tool count +1 (D-6).
  • :9502 serve block + build step + workspace package removed; all /api/coder/* + /api/health routes still 200 (D-11).
  • Each shipped item has its own sequential patch tag + CHANGELOG entry; deployed by surface (D-9).

Specialist Handoffs for Implementation

  • test-engineer — dispatch alongside each READY unit; needs the touch-point file:lines (already in discovery notes) to author T1-T3 (F1), T6 (F2), T8-T10 (F6) before/with the wiring.
  • software-architect — dispatch if F1's cancelAndFinalize helper shape needs settling across the four backend paths during implementation; needs the four catch-block + success-path line ranges.
  • plan-a-feature (F4) — dispatch when Sam wants F4; input: the settled dedup rule + module shapes from synthesis-input, and the two blocking OQs (unattended hook-firing, goose format) to resolve first.
  • plan-a-feature (F8) — dispatch when Sam wants F8; input: CoderPane.tsx:478-619 DiffPanel location, the diff-source ambiguity, and the cross-surface routing gap.
  • Capability check (F5) — dispatch before any F5 build; input: the pinned @opencode-ai/sdk arm list and OQ-F5a (does it emit a compaction event, and what is its exact name?).

Deferred (YAGNI)

Items considered during planning and deferred under the YAGNI rule. Each carries a concrete reopen trigger.

F6 retry / backoff classifier

  • Why deferred: Evidence test — a retry after a partial stream re-emits already-streamed deltas (state.accumulated + live delta frames are non-idempotent), which is strictly worse than the current behavior; no second instance exists to fail over to. Single-local-instance scale makes operator re-send the correct recovery.
  • Reopen when: llama-swap gains restart-in-place-with-clear-partial semantics, OR a second llama-swap instance is added for failover.
  • Source: R1, on-call-engineer.

F2 Option B — flag-gated full retirement of extractToolCallBlocks / stripToolMarkup

  • Why deferred: Evidence test — no evidence qwen3.6 stopped emitting <invoke>-as-text on live, and the sidecar --jinja state is unconfirmed in-repo; deleting the only plain-text-tool-call guard on that basis is unsafe (named anti-pattern: removing a load-bearing guard without a measured signal).
  • Reopen when: a documented multi-session live probe shows zero text-delta tool calls from qwen3.6.
  • Source: R1, software-architect / test-engineer.

F4 NotifyHookInjection interface

  • Why deferred: Simpler-version test — three agents with an identical read-merge-write settings.json flow do not need an interface; one concrete function switching on agent name satisfies the same need (named anti-pattern: single-implementation interface before three divergent uses exist).
  • Reopen when: a fourth agent with a genuinely different injection contract appears (and only after F4's premise is verified).
  • Source: R1, software-architect.

F5 handling of compaction.started / compaction.delta + step.failed + tool.progress

  • Why deferred: Evidence test — only compaction.ended is user-actionable (it is what closes the silent context gap); the other arms add UI noise with no user-described need. Compounded by F5 being capability-blocked anyway.
  • Reopen when: the SDK compaction event is confirmed AND a user-described need for in-progress compaction feedback emerges.
  • Source: R1, behavioral-analyst.

F7 SessionHistoryReader interface / pagination beyond limit

  • Why deferred: Simpler-version test — one read tool with an inline query + a limit cap satisfies the read need; an interface and cursor pagination are unused machinery (named anti-pattern: abstraction before a second concrete use).
  • Reopen when: a second history-read consumer needs a different read path, or transcripts routinely exceed the 200-row cap in a way that breaks the use case.
  • Source: R1, software-architect.

Open Items

  • OI-1 (F4 premise): Do claude / qwen / goose fire native lifecycle hooks in unattended mode, and what is goose's hook file/format?
    • Resolves when: verified in a plan-a-feature spec for F4 (OQ-F4a, OQ-F4b).
    • Blocks implementation: Yes for F4 — No for the READY cluster (F4 is independent).
  • OI-2 (F5 capability): Does the pinned @opencode-ai/sdk surface a compaction event, and what is its exact name?
    • Resolves when: a capability check confirms the event (or an SDK bump is scoped) — OQ-F5a.
    • Blocks implementation: Yes for F5 — No for the READY cluster.
  • OI-3 (F5 UI treatment): persistent sentinel row vs ephemeral inline divider for "context compacted."
    • Resolves when: OI-2 resolves and the team picks the UI shape (OQ-F5b); currently disputed (behavioral vs UX).
    • Blocks implementation: Yes for F5 — No for the READY cluster.
  • OI-4 (F8 spec): diff source (pending_changes.diff vs worktree git diff), selection serialization, and new-viewer routing.
    • Resolves when: a plan-a-feature spec for F8 settles OQ-F8a/b/c.
    • Blocks implementation: Yes for F8 — No for the READY cluster.
  • OI-5 (F1 session-stop leg): whether to wire the best-effort session/stop lookup or defer it.
    • Resolves when: decided at F1 implementation; it does not fire for external tasks from the UI today, so it is low-value.
    • Blocks implementation: No — F1 ships with or without this leg.

Summary

  • Outcome delivered: Stop actually stops external agents and finalizes their messages correctly; the parser is pruned to its load-bearing surface with the guard test-pinned; BooChat gains a 90s server-side stall guard; a session-history MCP tool and the :9502 retirement land; F4/F5/F8 have a documented route to their own spec/capability work.
  • Team size: 7 specialists (incl. project-manager) — see artifacts/implementation-iteration-history.md
  • Rounds of facilitation: 1 — see artifacts/implementation-iteration-history.md
  • Decisions committed: 12 — see artifacts/implementation-decision-log.md
  • Decisions settled by evidence: 11 — see artifacts/implementation-decision-log.md
  • Decisions settled by junior-developer reframing: 0 (junior confirmed several, but each rests on specialist evidence) — see artifacts/implementation-decision-log.md
  • Decisions settled by user input: 1 (D-12, the standing-override disposition) — see artifacts/implementation-decision-log.md
  • Rejected alternatives recorded: 13 — see artifacts/implementation-decision-log.md
  • Open items remaining: 5 (4 blocking only their own BLOCKED item; OI-5 non-blocking)
  • Recommendation: Ship the READY cluster as planned — build F1 first (apps/coder, highest-value correctness fix, must include the OCE-001/002 finalization fixes), then F2+F3 (apps/server, one batch), F6 (apps/server), F7 (apps/coder MCP), F9 (apps/coder cleanup), each as its own sequential patch tag. Route F4 and F8 to plan-a-feature and F5 to an @opencode-ai/sdk capability check before any build.