Move openspec/changes/{contracts-ssot,orchestrator} → archived/ (both shipped,
v2.7.13 and v2.7.17). Mark the roadmap's "Write/edit robustness" and "Claude
provider SDK" milestones as shipped (fuzzy-match.ts + checkpoints.ts; the
claude-sdk backend is live via CLAUDE_SDK_BACKEND in .env.host) and add a
v2.7.12–v2.7.17 shipped summary. Flag DEFERRED-WORK.md as superseded.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
25 KiB
Implementation Decision Log — Orchestrator (Phase 2)
Han synthesis output. Each decision is committed: it cites evidence, records
rejected alternatives, names an owner, and a revisit criterion. Cross-reference
invariant: every D-N here is referenced by design.md and/or
tasks.md, and produced by a round recorded in
implementation-iteration-history.md.
Source: a conversational grill-me design session. The settled behavioral spec
is captured in design-context.md (12 decisions; decision 5
is REVISED by D-3 / D-4 below). Specialist findings are in the claim ledger
C1–C16 of the iteration history.
Trust class of evidence below: codebase (file:line in this repo) unless noted. No single-source web claims underpin any committed decision.
D-1 — Re-home the pure conductor definitions into apps/coder/src/conductor/
Decision. Copy the pure (dispatch-free) conductor definition files —
spine.ts, flows/*, contracts.ts, types.ts, render.ts — into
apps/coder/src/conductor/, plus the 23 Han personas (conductor/agents/*.md).
The Phase-1 standalone CLI (conductor/) stays alive and unchanged. Sever the
flows/code-review.ts → dispatch.ts coupling by adding a DispatchFn to
StepContext, injected by the flow-runner. Parameterize spine.ts's model from
process.env.CONDUCTOR_MODEL to the run's configured model.
Rationale. The flow definitions are pure data + closures; only dispatch.ts
(the opencode run subprocess path) and flow.ts (the in-memory scheduler) are
Phase-1-specific. Copying the pure files avoids a workspace-package extraction
(YAGNI — only two consumers) while keeping the Phase-1 CLI as a regression
oracle. The evidence/yagni contracts are preserved because the flow-runner calls
step.run(ctx) in-process to build each prompt BEFORE inserting the task — the
closures execute in the coder process; prompts are never serialized to DB.
Evidence. code-review.ts:10 (import { dispatchAgent } from '../dispatch.js')
and :62 (the per-dimension dispatch call) — the only flow→dispatch coupling
(C1). spine.ts:122 renders process.env.CONDUCTOR_MODEL into the report header
(C14). spine.ts:73 — contracts injected via the step closure, in-process (C11).
23 personas confirmed at conductor/agents/*.md.
Rejected alternatives.
- A
@boocode/conductorworkspace package — rejected: only two consumers (Phase-1 CLI + coder); a shared package is premature abstraction (YAGNI). Deferred with a reopen trigger (a 3rd app needing conductor types). See Deferred (YAGNI). - Importing
conductor/src/*directly fromapps/coderacross the workspace boundary — rejected: couples the coder build to the standalone CLI tree and itsopencode-flavored dispatch import graph.
Specialist owner. software-architect. Revisit criterion. A third app needs the conductor types (then extract the workspace package). Driven by rounds: R1. Referenced in plan: design.md §Re-home & DispatchFn seam; tasks.md group 1.
D-2 — DB-driven flow-runner with an onTaskTerminal dispatcher hook
Decision. Add apps/coder/src/services/flow-runner.ts: a DB-backed scheduler
that owns flow_runs/flow_steps, computes the ready wave from the loaded flow
def, INSERTs each ready agent step as a tasks row, runs code steps inline,
and advances. Fan-out is driven by ONE new hook — an onTaskTerminal(taskId, state) callback on createDispatcher — invoked when any task reaches a terminal
state. No third poll loop; no modification to the dispatcher's internal run
functions.
Rationale. The dispatcher already has the LISTEN/NOTIFY + poll machinery and the terminal-state transitions; a single callback at those transition points lets the flow-runner react without duplicating the dispatch loop. The flow-runner stays a pure scheduler; execution stays in the dispatcher.
Evidence. dispatcher.ts:46-179 (the loop + runTask), :279-286 (the
notify_tasks_new trigger) (C2). Terminal transitions the hook attaches to:
external completed dispatcher.ts:642-646, external failed :659-661. Full step
output must persist in flow_steps.output TEXT because tasks.output_summary is
≤500 char and cannot reconstruct ctx.results for render/resume (schema.sql:26,
flow.ts:49,59) (C3).
Rejected alternatives.
- A standalone third poll loop in the flow-runner — rejected: duplicates the
dispatcher's LISTEN/poll, two writers racing on
tasks. - Modifying the dispatcher's
runTaskinternals to know about flows — rejected: couples the generic dispatcher to the orchestrator; the callback seam keeps the dispatcher flow-agnostic.
Specialist owner. software-architect. Revisit criterion. Step throughput requires batching beyond what one callback per terminal task supports. Driven by rounds: R1. Referenced in plan: design.md §Flow-runner & onTaskTerminal; tasks.md group 4.
D-3 — Reuse the existing dispatcher (insert pending task), not a direct-PTY bypass
Decision. The flow-runner INSERTs each ready step as a normal state='pending'
tasks row; the existing dispatcher picks it up via LISTEN 'tasks_new', runs it
through the existing external-agent path (creating a git worktree as a stable HEAD
read-checkout), and streams AgentEvents → WS frames unchanged. The new
onTaskTerminal hook (D-2) notifies the flow-runner on terminal state. No
direct-PTY bypass; the dispatcher is reused with exactly one new hook.
This REVISES design-context decision 5 ("no worktree") to: a worktree IS created, but it is a harmless read snapshot — read-only is enforced by plan mode (D-4), not by the absence of a worktree.
Rationale. Reuse (architect's A2) gets streaming, persistence, resume, cancellation, and AgentEvent→WS mapping for free. The only objection to A2 was that it creates a worktree the "no worktree" decision-5 wanted to avoid; once read-only is enforced at the tool level by plan mode (D-4), the worktree is inert (a checkout the agent cannot write to), so the objection dissolves. This was the user's explicit choice over the architect's leaning-toward-bypass (A4).
Evidence. The external-agent path with worktree creation and AgentEvent→WS
streaming: dispatcher.ts external branch (worktree create → run → terminal at
:642-646/:659-661). Task-as-dispatch precedents the flow-runner copies:
routes/skills.ts:94 (a skill is already dispatched as a task),
routes/arena.ts:49, tools/new_task.ts:54. Dispatch tension recorded as C12
(A2 vs A4, architect self-flagged Disputed); resolved here by user choice.
Rejected alternatives.
- A4 direct-
dispatchViaPtybypass (insertrunningtask + call PTY directly to skip worktree creation) — rejected: duplicates streaming/persistence/resume wiring, and a restart kills the PTY child outside the dispatcher's lifecycle (worsening resume, C15). The worktree it was avoiding is harmless under D-4. - design-context decision 5's "no worktree, read project dir directly" — rejected: reusing the dispatcher means reusing its worktree creation; under D-4 the worktree is a read snapshot, so avoiding it bought nothing and cost the reuse.
Specialist owner. software-architect (execution path); devops-engineer (operational behavior of the reused dispatcher under flow load). Revisit criterion. Worktree creation per step becomes a measured throughput or disk-cost problem under real flow concurrency. Driven by rounds: R1 (C12), R2 (read-only finding that made the worktree inert). Referenced in plan: design.md §Execution via dispatcher reuse; tasks.md group 4.
D-4 — Read-only enforced HARD by mode_id='plan' (qwen --approval-mode plan)
Decision. Every orchestrator step task is dispatched with mode_id = 'plan',
which the PTY dispatcher passes to qwen as --approval-mode plan — a built-in
tool-level gate: reads allowed, writes blocked. The flow-runner hardcodes
mode_id='plan' for every step task; it is never user-overridable. This is the
sole read-only enforcement mechanism. BOOCODE_TOOLS and persona prompts are NOT
relied upon (they do not govern external CLI agents).
Rationale. Read-only is a safety-critical invariant of the whole feature
(flows never write the repo). Prompt-level intent and BOOCODE_TOOLS ceilings
govern BooChat's in-process tools, not an external qwen CLI child — so they are
not watertight. qwen's --approval-mode plan is a tool-level gate inside the
agent binary itself, which the adversarial-security-analyst (R2) identified as the
only enforcement that actually binds the external agent. Qwen-only (decision 6)
makes a single hardcoded flag sufficient.
Evidence. The wiring already exists: pty-dispatch.ts:75 —
if (modeId) args.push('--approval-mode', modeId) in the qwen spawn spec. R2
security finding recorded as C13 (the R1 claim that prompt-level + BOOCODE_TOOLS
enforcement was sufficient was Anecdotal/unproven; R2 refuted it and named plan
mode as the binding control).
Rejected alternatives.
- Prompt-level read-only intent (personas tell the agent not to write) — rejected (C13, R2): an instruction, not a gate; a model can ignore or be steered past it.
BOOCODE_TOOLS=coreas the gate — rejected (C13, R2): governs BooChat's in-process tool registry, does not constrain the externalqwenCLI's own tools.- A
read_onlyboolean flag ontasks— rejected: superseded bymode_id='plan', which is an existing column already plumbed to the binary. See Deferred (YAGNI).
Specialist owner. adversarial-security-analyst.
Revisit criterion. A non-qwen agent is added to flows (re-verify that agent's
equivalent of --approval-mode plan before allowing it), or qwen changes
--approval-mode plan semantics.
Driven by rounds: R1 (C13 flagged), R2 (resolved).
Referenced in plan: design.md §Read-only via plan mode; tasks.md group 4.
D-5 — flow_runs + flow_steps schema in the coder schema
Decision. Add two tables to apps/coder/src/schema.sql:
flow_runs(id, project_id [no FK, matches tasks.project_id], flow_name, band [CHECK small|medium|large], model, status [CHECK-named], input JSONB [CHECK (input ? 'question')], report TEXT [nullable], error, timestamps).flow_steps(id, run_id [FK → flow_runs ON DELETE CASCADE], step_id, kind [CHECK agent|code], agent, status [CHECK-named], task_id [UUID → tasks(id) ON DELETE SET NULL; nullable, code steps NULL], chat_id [UUID → chats(id) ON DELETE SET NULL], input TEXT, output TEXT [FULL output], error, timestamps, UNIQUE(run_id, step_id)).
No depends_on column (derive from the loaded flow def). Do NOT insert
skipped-step rows (when() is pure on stored input). Indexes:
flow_steps(run_id, status), flow_runs(project_id, created_at DESC). Explicit
CHECK constraint names + the repo's DROP-IF-EXISTS → guarded-ADD migration
discipline.
Rationale. A run spans multiple tasks; existing tables (tasks,
agent_sessions) model single dispatches, not a DAG. flow_steps.task_id → tasks(id) (not a column on tasks) keeps tasks generic. output TEXT is FULL
because tasks.output_summary is ≤500 char and cannot reconstruct ctx.results.
project_id has no FK to match tasks.project_id's existing convention.
Evidence. tasks shape and output_summary ≤500 char: schema.sql:18-34,
:26 (C3, C4). flow.ts:49,59 (results reconstruction needs full output, C3).
flow.ts:28-41, types.ts:27 (deps + when() derivable from flow def — omit
depends_on and skipped rows, C6). schema.sql:19,32 (project_id no-FK pattern;
CHECK-named discipline, C5). Migration discipline: root CLAUDE.md schema section.
Rejected alternatives.
- A
depends_oncolumn onflow_steps— rejected (C6, YAGNI): deps are in the loaded flow def; storing them duplicates the source of truth. Deferred. - Persisting skipped-step rows — rejected (C6, YAGNI):
when()is pure on storedinput, so a skip is reconstructable. Deferred. - A column on
tasks(e.g.flow_step_id) — rejected (C4): pollutes the generic tasks table; the FK belongs onflow_steps.
Specialist owner. data-engineer.
Revisit criterion. A stored-run DAG visualization needs deps without loading
the flow def (then add depends_on); the UI must explain a skip without the flow
def (then persist skipped rows).
Driven by rounds: R1.
Referenced in plan: design.md §Schema; tasks.md group 2.
D-6 — Two new WS frames; per-agent stream reuses existing frames by chat_id
Decision. Add two frames to packages/contracts/src/ws-frames.ts:
flow_run_started:run_id, flow_name, band, steps[](eachstep_id, agent, kind, chat_id, label).flow_run_step_updated:run_id, step_id, status, run_status?, report?.
The per-agent content stream REUSES the existing delta / tool_call /
message_complete frames keyed by the step's chat_id. Each agent step gets a
synthetic chats row for stream attribution. Register in all THREE frame
registries: contracts WsFrameSchema, the server InferenceFrame union
(services/inference/turn.ts), and the web strict WsFrame union
(apps/web/src/api/types.ts) — the web type is the wire-format gate.
Rationale. The run-level lifecycle (which agents exist, their status, the final
report) needs new frames; the per-agent token stream is exactly what the existing
delta/tool_call/message_complete pipeline already carries, so keying it by a
synthetic chat_id reuses the whole broker→WS path with no new streaming code.
The report rides on flow_run_step_updated rather than its own frame (one fewer
frame type; revisit only if reports exceed the frame size limit).
Evidence. Existing broker→WS frame pipeline and frame list: ws-frames.ts
(snapshot…error). Three-registry rule + web-type-is-wire-gate: root CLAUDE.md
"Adding a new WS frame type" + discovery notes §packages/contracts. Stream-by-chat
reuse precedent: the dispatcher publishes delta/tool_call/message_complete keyed
by chat already (C7).
Rejected alternatives.
- New per-agent stream frames (
flow_agent_delta, etc.) — rejected: the existing delta/tool_call/message_complete already stream by chat; new frames duplicate them. - A separate
flow_run_reportframe — rejected (YAGNI): the report fits onflow_run_step_updated. Deferred with a reopen trigger (reports exceed ~50KB).
Specialist owner. software-architect. Revisit criterion. Reports exceed the frame size limit (~50KB) → split the report onto its own frame. Driven by rounds: R1. Referenced in plan: design.md §WS frames; tasks.md group 3.
D-7 — orchestrator pane kind + OrchestratorPane
Decision. Add an orchestrator pane kind (following the
markdown_artifact/html_artifact precedent) — touching WorkspacePaneKind,
useWorkspacePanes, Workspace, NewPaneMenu, ChatTabBar,
PaneHeaderActions. OrchestratorPane.tsx: run header; report-at-top on
completion; collapsed agent roster reusing AgentStatusDot; expand-one-at-a-time
detail well reusing CoderPane stream rendering; mobile single-column inline
expand; auto-expand-follows-active. Runs history in NewPaneMenu. Export (copy /
save-file / send-to-chat via the existing sendToChat) in the pane header …,
conditional on a completed report.
Rationale. A fourth pane kind is already a precedented extension point; the
pane reuses AgentStatusDot and the CoderPane stream renderer, so the new surface
is composition, not new streaming UI. Expand-one-at-a-time avoids the crowding the
grill rejected.
Evidence. Pane-kind precedent: api/types.ts:386 WorkspacePaneKind (with
markdown_artifact/html_artifact). Roster/status reuse: AgentComposerBar.tsx:204
(AgentStatusDot), CoderPane stream rendering (C8). Launcher categories from the
flow registry: flows/index.ts; runs history host NewPaneMenu.tsx; export via
lib/events.ts sendToChat (C10).
Rejected alternatives.
- Rendering runs inside the existing
coderpane — rejected: a run is a parent-with-nested-children view, not a single agent session; conflating them crowds both. - All-agents-expanded simultaneously — rejected (C8): the crowding the design session explicitly rejected.
Specialist owner. user-experience-designer. Revisit criterion. Users cannot follow multiple concurrent runs from the roster (then revisit the expand model). Driven by rounds: R1. Referenced in plan: design.md §Orchestrator pane; tasks.md groups 7, 10.
D-8 — Workflow toolbar button + slash launch, BooChat/BooCoder parity
Decision. Add a Workflow (lucide) button on ChatInput's controls row,
between the SquareSlash chip and the Globe pill — yielding parity in BooChat
(ChatPane) and BooCoder (CoderPane) for free. Label "Flows" on desktop, icon-only
on mobile (toolbar confirmed to fit one line). Slash launches instantly with
defaults (band small, current pane's project, text-after-command = focus),
opening the pane. The button opens FlowLauncherDialog.tsx first: 5 category tabs
(Analysis/Discovery/Planning/Authoring/Review) → filtered flow list + size + focus
- fast toggle; defaults Analysis/Small/off.
Rationale. ChatInput is the shared composer rendered by both panes, so a
single button gives both doors with parity at no extra cost. The toolbar fits one
line at ≤5 elements, so adding the button does not force scroll/wrap (a standing
mobile constraint).
Evidence. ChatInput.tsx:648-732, :673 — the controls row is ≤5 elements;
adding the Workflow icon between SquareSlash and Globe keeps it one line; refutes
junior Q13's crowding worry (C9). Launcher categories from flows/index.ts (C10).
Shared-composer fact: discovery notes §apps/web (ChatInput rendered by ChatPane +
CoderPane).
Rejected alternatives.
- Separate buttons in ChatPane and CoderPane — rejected: duplicates wiring; the shared composer already gives parity from one button.
- A launcher search box instead of category tabs — rejected (YAGNI): 22 flows in 5 categories are browsable; a search box is unproven need. Deferred.
Specialist owner. user-experience-designer. Revisit criterion. Category grouping fails users at the 22-flow catalog size (then add the search box). Driven by rounds: R1. Referenced in plan: design.md §Toolbar button & launcher; tasks.md groups 8, 9.
D-9 — Resumable runs via initResume on coder startup
Decision. On coder startup, an initResume re-advances every flow_runs WHERE status='running': a step whose task completed → mark the step done + advance the
run; a step whose task is lost/failed (PTY died on restart) → re-dispatch;
completed steps are kept. (design-context decision 4 commits to "resumable".)
Rationale. A restart can land mid-flight. Because execution goes through the dispatcher with persisted task state (D-3), a step's outcome is recoverable from the DB; the run-level scheduler just has to re-derive the wave and re-dispatch only the steps that did not finish. Reconcile-and-advance (architect A3) beats mark-run-failed (data's conservative option) because decision 4 already committed to resumable and the task state is durable.
Evidence. No run-level resume exists today (single tasks resume via
agent_sessions; a run spanning tasks does not) — discovery notes §Enumerated
gaps. Resume tension recorded as C15 (architect reconcile-and-advance vs data
mark-failed); resolved toward reconcile-and-advance by decision 4 + durable task
state under D-3.
Rejected alternatives.
- Mark a running run failed on restart — rejected (C15): contradicts decision 4 (resumable) and discards recoverable completed-step work.
- Re-running the whole flow from step 0 — rejected: re-does completed steps, burning the local model on work already persisted.
Specialist owner. software-architect (scheduler); data-engineer (recovery query). Revisit criterion. A step-level idempotency hazard surfaces where re-dispatch of a "lost" step double-counts side effects (none expected under read-only plan mode). Driven by rounds: R1. Referenced in plan: design.md §Resume; tasks.md group 5.
D-10 — Concurrency: multiple runs, no queued status, single model per run
Decision. Multiple runs are allowed; each gets its own pane + flow_runs row,
no shared state. Step statuses are pending / running / completed / failed / skipped
— there is NO separate queued status (the dispatcher's pending covers a step
waiting on the busy model or on deps). Model is a single config value per run,
default qwen3.6-35b-a3b-mxfp4.
Rationale. Each run is independent state, so concurrency needs no coordination
beyond the dispatcher's existing per-session serialization. A queued status is
not observable: with the model busy, a task is simply pending/running and
llama-swap does not expose queue position, so a distinct queued state would be a
label the system cannot honestly populate (revising decision-11's "panes show
queued honestly").
Evidence. queued unobservability recorded as C16 (junior Q11, data
DATA-005): llama-swap does not report queue position; the status reduces to
pending(dep/model-wait)/running. Single-model-per-run carried from decision 6/11.
Rejected alternatives.
- A distinct
queuedstep status — rejected (C16): nothing can populate it honestly;pendingalready means "waiting". Deferred (reopen if llama-swap exposes queue position). - Serializing runs (one at a time) — rejected: runs are independent; serialization adds coordination for no benefit and hurts the multi-pane UX (decision 11).
Specialist owner. data-engineer (status set), devops-engineer (model-busy
behavior under concurrent runs).
Revisit criterion. llama-swap exposes queue position → add an observable
queued status.
Driven by rounds: R1.
Referenced in plan: design.md §Schema (status sets) + §Concurrency; tasks.md
group 2.
Cross-reference index
| Decision | Driven by | Design.md section | Tasks.md group |
|---|---|---|---|
| D-1 Re-home + DispatchFn | R1 (C1, C11, C14) | Re-home & DispatchFn seam | 1 |
| D-2 Flow-runner + onTaskTerminal | R1 (C2, C3) | Flow-runner & onTaskTerminal | 4 |
| D-3 Dispatcher reuse (not bypass) | R1 (C12), R2 | Execution via dispatcher reuse | 4 |
| D-4 Read-only via plan mode | R1 (C13), R2 | Read-only via plan mode | 4 |
| D-5 Schema flow_runs/flow_steps | R1 (C3–C6) | Schema | 2 |
| D-6 WS frames | R1 (C7) | WS frames | 3 |
| D-7 Orchestrator pane | R1 (C8) | Orchestrator pane | 7, 10 |
| D-8 Toolbar button + slash | R1 (C9, C10) | Toolbar button & launcher | 8, 9 |
| D-9 Resume | R1 (C15) | Resume | 5 |
| D-10 Concurrency / no-queued | R1 (C16) | Schema + Concurrency | 2 |
Deferred (YAGNI)
These were considered and deferred under the evidence rule. Each names the trigger that would justify reopening.
@boocode/conductor workspace package
- Why deferred: only two consumers (Phase-1 CLI + coder); copy-in (D-1) avoids premature shared-package abstraction.
- Reopen when: a third app needs the conductor types.
- Source: architect (D-1 rejected alternative).
flow_steps.depends_on column
- Why deferred: deps are derivable from the loaded flow def (
flow.ts:28-41,types.ts:27); a column duplicates the source of truth. - Reopen when: a stored-run DAG visualization must show deps without loading the flow def.
- Source: data-engineer C6 (D-5 rejected alternative).
Persisted skipped-step rows
- Why deferred:
when()is pure on storedinput, so a skip is reconstructable from the flow def + run input. - Reopen when: the UI must explain a skip without the flow def.
- Source: data-engineer C6 (D-5 rejected alternative).
read_only flag on tasks
- Why deferred: superseded by
mode_id='plan'(D-4), an existing column already plumbed to qwen's--approval-mode. - Reopen when: a non-qwen agent without a
--approval-mode planequivalent is added to flows. - Source: D-4 rejected alternative.
Explicit queued step status
- Why deferred: llama-swap does not expose queue position; nothing can populate
the status honestly (C16).
pendingcovers waiting. - Reopen when: llama-swap exposes queue position.
- Source: junior Q11 / data DATA-005 (D-10 rejected alternative).
Launcher search box
- Why deferred: 22 flows in 5 category tabs are browsable; a search box is unproven need.
- Reopen when: category grouping fails users at the catalog size.
- Source: UX C10 (D-8 rejected alternative).
Separate report-stored WS frame
- Why deferred: the report rides on
flow_run_step_updated(D-6). - Reopen when: reports exceed the ~50KB frame size limit.
- Source: architect C7 (D-6 rejected alternative).