# Implementation Decision Log — Orchestrator (Phase 2) Han synthesis output. Each decision is committed: it cites evidence, records rejected alternatives, names an owner, and a revisit criterion. Cross-reference invariant: every `D-N` here is referenced by [design.md](../design.md) and/or [tasks.md](../tasks.md), and produced by a round recorded in [implementation-iteration-history.md](implementation-iteration-history.md). Source: a conversational `grill-me` design session. The settled behavioral spec is captured in [design-context.md](design-context.md) (12 decisions; decision 5 is REVISED by D-3 / D-4 below). Specialist findings are in the claim ledger C1–C16 of the iteration history. Trust class of evidence below: **codebase** (file:line in this repo) unless noted. No single-source web claims underpin any committed decision. --- ## D-1 — Re-home the pure conductor definitions into `apps/coder/src/conductor/` **Decision.** Copy the pure (dispatch-free) conductor definition files — `spine.ts`, `flows/*`, `contracts.ts`, `types.ts`, `render.ts` — into `apps/coder/src/conductor/`, plus the 23 Han personas (`conductor/agents/*.md`). The Phase-1 standalone CLI (`conductor/`) stays alive and unchanged. Sever the `flows/code-review.ts` → `dispatch.ts` coupling by adding a `DispatchFn` to `StepContext`, injected by the flow-runner. Parameterize `spine.ts`'s model from `process.env.CONDUCTOR_MODEL` to the run's configured model. **Rationale.** The flow definitions are pure data + closures; only `dispatch.ts` (the `opencode run` subprocess path) and `flow.ts` (the in-memory scheduler) are Phase-1-specific. Copying the pure files avoids a workspace-package extraction (YAGNI — only two consumers) while keeping the Phase-1 CLI as a regression oracle. The evidence/yagni contracts are preserved because the flow-runner calls `step.run(ctx)` in-process to build each prompt BEFORE inserting the task — the closures execute in the coder process; prompts are never serialized to DB. **Evidence.** `code-review.ts:10` (`import { dispatchAgent } from '../dispatch.js'`) and `:62` (the per-dimension dispatch call) — the only flow→dispatch coupling (C1). `spine.ts:122` renders `process.env.CONDUCTOR_MODEL` into the report header (C14). `spine.ts:73` — contracts injected via the step closure, in-process (C11). 23 personas confirmed at `conductor/agents/*.md`. **Rejected alternatives.** - A `@boocode/conductor` workspace package — rejected: only two consumers (Phase-1 CLI + coder); a shared package is premature abstraction (YAGNI). Deferred with a reopen trigger (a 3rd app needing conductor types). See Deferred (YAGNI). - Importing `conductor/src/*` directly from `apps/coder` across the workspace boundary — rejected: couples the coder build to the standalone CLI tree and its `opencode`-flavored dispatch import graph. **Specialist owner.** software-architect. **Revisit criterion.** A third app needs the conductor types (then extract the workspace package). **Driven by rounds:** R1. **Referenced in plan:** design.md §Re-home & DispatchFn seam; tasks.md group 1. --- ## D-2 — DB-driven flow-runner with an `onTaskTerminal` dispatcher hook **Decision.** Add `apps/coder/src/services/flow-runner.ts`: a DB-backed scheduler that owns `flow_runs`/`flow_steps`, computes the ready wave from the loaded flow def, INSERTs each ready `agent` step as a `tasks` row, runs `code` steps inline, and advances. Fan-out is driven by ONE new hook — an `onTaskTerminal(taskId, state)` callback on `createDispatcher` — invoked when any task reaches a terminal state. No third poll loop; no modification to the dispatcher's internal run functions. **Rationale.** The dispatcher already has the LISTEN/NOTIFY + poll machinery and the terminal-state transitions; a single callback at those transition points lets the flow-runner react without duplicating the dispatch loop. The flow-runner stays a pure scheduler; execution stays in the dispatcher. **Evidence.** `dispatcher.ts:46-179` (the loop + `runTask`), `:279-286` (the `notify_tasks_new` trigger) (C2). Terminal transitions the hook attaches to: external completed `dispatcher.ts:642-646`, external failed `:659-661`. Full step output must persist in `flow_steps.output TEXT` because `tasks.output_summary` is ≤500 char and cannot reconstruct `ctx.results` for render/resume (`schema.sql:26`, `flow.ts:49,59`) (C3). **Rejected alternatives.** - A standalone third poll loop in the flow-runner — rejected: duplicates the dispatcher's LISTEN/poll, two writers racing on `tasks`. - Modifying the dispatcher's `runTask` internals to know about flows — rejected: couples the generic dispatcher to the orchestrator; the callback seam keeps the dispatcher flow-agnostic. **Specialist owner.** software-architect. **Revisit criterion.** Step throughput requires batching beyond what one callback per terminal task supports. **Driven by rounds:** R1. **Referenced in plan:** design.md §Flow-runner & onTaskTerminal; tasks.md group 4. --- ## D-3 — Reuse the existing dispatcher (insert pending task), not a direct-PTY bypass **Decision.** The flow-runner INSERTs each ready step as a normal `state='pending'` `tasks` row; the existing dispatcher picks it up via `LISTEN 'tasks_new'`, runs it through the existing external-agent path (creating a git worktree as a stable HEAD read-checkout), and streams AgentEvents → WS frames unchanged. The new `onTaskTerminal` hook (D-2) notifies the flow-runner on terminal state. No direct-PTY bypass; the dispatcher is reused with exactly one new hook. This **REVISES design-context decision 5** ("no worktree") to: a worktree IS created, but it is a harmless read snapshot — read-only is enforced by plan mode (D-4), not by the absence of a worktree. **Rationale.** Reuse (architect's A2) gets streaming, persistence, resume, cancellation, and AgentEvent→WS mapping for free. The only objection to A2 was that it creates a worktree the "no worktree" decision-5 wanted to avoid; once read-only is enforced at the tool level by plan mode (D-4), the worktree is inert (a checkout the agent cannot write to), so the objection dissolves. This was the user's explicit choice over the architect's leaning-toward-bypass (A4). **Evidence.** The external-agent path with worktree creation and AgentEvent→WS streaming: `dispatcher.ts` external branch (worktree create → run → terminal at `:642-646`/`:659-661`). Task-as-dispatch precedents the flow-runner copies: `routes/skills.ts:94` (a skill is already dispatched as a task), `routes/arena.ts:49`, `tools/new_task.ts:54`. Dispatch tension recorded as C12 (A2 vs A4, architect self-flagged Disputed); resolved here by user choice. **Rejected alternatives.** - A4 direct-`dispatchViaPty` bypass (insert `running` task + call PTY directly to skip worktree creation) — rejected: duplicates streaming/persistence/resume wiring, and a restart kills the PTY child outside the dispatcher's lifecycle (worsening resume, C15). The worktree it was avoiding is harmless under D-4. - design-context decision 5's "no worktree, read project dir directly" — rejected: reusing the dispatcher means reusing its worktree creation; under D-4 the worktree is a read snapshot, so avoiding it bought nothing and cost the reuse. **Specialist owner.** software-architect (execution path); devops-engineer (operational behavior of the reused dispatcher under flow load). **Revisit criterion.** Worktree creation per step becomes a measured throughput or disk-cost problem under real flow concurrency. **Driven by rounds:** R1 (C12), R2 (read-only finding that made the worktree inert). **Referenced in plan:** design.md §Execution via dispatcher reuse; tasks.md group 4. --- ## D-4 — Read-only enforced HARD by `mode_id='plan'` (qwen `--approval-mode plan`) **Decision.** Every orchestrator step task is dispatched with `mode_id = 'plan'`, which the PTY dispatcher passes to qwen as `--approval-mode plan` — a built-in tool-level gate: reads allowed, writes blocked. The flow-runner hardcodes `mode_id='plan'` for every step task; it is never user-overridable. This is the sole read-only enforcement mechanism. `BOOCODE_TOOLS` and persona prompts are NOT relied upon (they do not govern external CLI agents). **Rationale.** Read-only is a safety-critical invariant of the whole feature (flows never write the repo). Prompt-level intent and `BOOCODE_TOOLS` ceilings govern BooChat's in-process tools, not an external `qwen` CLI child — so they are not watertight. qwen's `--approval-mode plan` is a tool-level gate inside the agent binary itself, which the adversarial-security-analyst (R2) identified as the only enforcement that actually binds the external agent. Qwen-only (decision 6) makes a single hardcoded flag sufficient. **Evidence.** The wiring already exists: `pty-dispatch.ts:75` — `if (modeId) args.push('--approval-mode', modeId)` in the `qwen` spawn spec. R2 security finding recorded as C13 (the R1 claim that prompt-level + `BOOCODE_TOOLS` enforcement was sufficient was Anecdotal/unproven; R2 refuted it and named plan mode as the binding control). **Rejected alternatives.** - Prompt-level read-only intent (personas tell the agent not to write) — rejected (C13, R2): an instruction, not a gate; a model can ignore or be steered past it. - `BOOCODE_TOOLS=core` as the gate — rejected (C13, R2): governs BooChat's in-process tool registry, does not constrain the external `qwen` CLI's own tools. - A `read_only` boolean flag on `tasks` — rejected: superseded by `mode_id='plan'`, which is an existing column already plumbed to the binary. See Deferred (YAGNI). **Specialist owner.** adversarial-security-analyst. **Revisit criterion.** A non-qwen agent is added to flows (re-verify that agent's equivalent of `--approval-mode plan` before allowing it), or qwen changes `--approval-mode plan` semantics. **Driven by rounds:** R1 (C13 flagged), R2 (resolved). **Referenced in plan:** design.md §Read-only via plan mode; tasks.md group 4. --- ## D-5 — `flow_runs` + `flow_steps` schema in the coder schema **Decision.** Add two tables to `apps/coder/src/schema.sql`: - `flow_runs(id, project_id [no FK, matches tasks.project_id], flow_name, band [CHECK small|medium|large], model, status [CHECK-named], input JSONB [CHECK (input ? 'question')], report TEXT [nullable], error, timestamps)`. - `flow_steps(id, run_id [FK → flow_runs ON DELETE CASCADE], step_id, kind [CHECK agent|code], agent, status [CHECK-named], task_id [UUID → tasks(id) ON DELETE SET NULL; nullable, code steps NULL], chat_id [UUID → chats(id) ON DELETE SET NULL], input TEXT, output TEXT [FULL output], error, timestamps, UNIQUE(run_id, step_id))`. No `depends_on` column (derive from the loaded flow def). Do NOT insert skipped-step rows (`when()` is pure on stored input). Indexes: `flow_steps(run_id, status)`, `flow_runs(project_id, created_at DESC)`. Explicit CHECK constraint names + the repo's DROP-IF-EXISTS → guarded-ADD migration discipline. **Rationale.** A run spans multiple tasks; existing tables (`tasks`, `agent_sessions`) model single dispatches, not a DAG. `flow_steps.task_id → tasks(id)` (not a column on `tasks`) keeps `tasks` generic. `output TEXT` is FULL because `tasks.output_summary` is ≤500 char and cannot reconstruct `ctx.results`. `project_id` has no FK to match `tasks.project_id`'s existing convention. **Evidence.** `tasks` shape and `output_summary` ≤500 char: `schema.sql:18-34`, `:26` (C3, C4). `flow.ts:49,59` (results reconstruction needs full output, C3). `flow.ts:28-41`, `types.ts:27` (deps + `when()` derivable from flow def — omit `depends_on` and skipped rows, C6). `schema.sql:19,32` (project_id no-FK pattern; CHECK-named discipline, C5). Migration discipline: root CLAUDE.md schema section. **Rejected alternatives.** - A `depends_on` column on `flow_steps` — rejected (C6, YAGNI): deps are in the loaded flow def; storing them duplicates the source of truth. Deferred. - Persisting skipped-step rows — rejected (C6, YAGNI): `when()` is pure on stored `input`, so a skip is reconstructable. Deferred. - A column on `tasks` (e.g. `flow_step_id`) — rejected (C4): pollutes the generic tasks table; the FK belongs on `flow_steps`. **Specialist owner.** data-engineer. **Revisit criterion.** A stored-run DAG visualization needs deps without loading the flow def (then add `depends_on`); the UI must explain a skip without the flow def (then persist skipped rows). **Driven by rounds:** R1. **Referenced in plan:** design.md §Schema; tasks.md group 2. --- ## D-6 — Two new WS frames; per-agent stream reuses existing frames by `chat_id` **Decision.** Add two frames to `packages/contracts/src/ws-frames.ts`: - `flow_run_started`: `run_id, flow_name, band, steps[]` (each `step_id, agent, kind, chat_id, label`). - `flow_run_step_updated`: `run_id, step_id, status, run_status?, report?`. The per-agent content stream REUSES the existing `delta` / `tool_call` / `message_complete` frames keyed by the step's `chat_id`. Each agent step gets a synthetic `chats` row for stream attribution. Register in all THREE frame registries: contracts `WsFrameSchema`, the server `InferenceFrame` union (`services/inference/turn.ts`), and the web strict `WsFrame` union (`apps/web/src/api/types.ts`) — the web type is the wire-format gate. **Rationale.** The run-level lifecycle (which agents exist, their status, the final report) needs new frames; the per-agent token stream is exactly what the existing delta/tool_call/message_complete pipeline already carries, so keying it by a synthetic `chat_id` reuses the whole broker→WS path with no new streaming code. The report rides on `flow_run_step_updated` rather than its own frame (one fewer frame type; revisit only if reports exceed the frame size limit). **Evidence.** Existing broker→WS frame pipeline and frame list: `ws-frames.ts` (snapshot…error). Three-registry rule + web-type-is-wire-gate: root CLAUDE.md "Adding a new WS frame type" + discovery notes §packages/contracts. Stream-by-chat reuse precedent: the dispatcher publishes delta/tool_call/message_complete keyed by chat already (C7). **Rejected alternatives.** - New per-agent stream frames (`flow_agent_delta`, etc.) — rejected: the existing delta/tool_call/message_complete already stream by chat; new frames duplicate them. - A separate `flow_run_report` frame — rejected (YAGNI): the report fits on `flow_run_step_updated`. Deferred with a reopen trigger (reports exceed ~50KB). **Specialist owner.** software-architect. **Revisit criterion.** Reports exceed the frame size limit (~50KB) → split the report onto its own frame. **Driven by rounds:** R1. **Referenced in plan:** design.md §WS frames; tasks.md group 3. --- ## D-7 — `orchestrator` pane kind + OrchestratorPane **Decision.** Add an `orchestrator` pane kind (following the `markdown_artifact`/`html_artifact` precedent) — touching `WorkspacePaneKind`, `useWorkspacePanes`, `Workspace`, `NewPaneMenu`, `ChatTabBar`, `PaneHeaderActions`. `OrchestratorPane.tsx`: run header; report-at-top on completion; collapsed agent roster reusing `AgentStatusDot`; expand-one-at-a-time detail well reusing CoderPane stream rendering; mobile single-column inline expand; auto-expand-follows-active. Runs history in `NewPaneMenu`. Export (copy / save-file / send-to-chat via the existing `sendToChat`) in the pane header `…`, conditional on a completed report. **Rationale.** A fourth pane kind is already a precedented extension point; the pane reuses `AgentStatusDot` and the CoderPane stream renderer, so the new surface is composition, not new streaming UI. Expand-one-at-a-time avoids the crowding the grill rejected. **Evidence.** Pane-kind precedent: `api/types.ts:386` `WorkspacePaneKind` (with `markdown_artifact`/`html_artifact`). Roster/status reuse: `AgentComposerBar.tsx:204` (`AgentStatusDot`), CoderPane stream rendering (C8). Launcher categories from the flow registry: `flows/index.ts`; runs history host `NewPaneMenu.tsx`; export via `lib/events.ts` `sendToChat` (C10). **Rejected alternatives.** - Rendering runs inside the existing `coder` pane — rejected: a run is a parent-with-nested-children view, not a single agent session; conflating them crowds both. - All-agents-expanded simultaneously — rejected (C8): the crowding the design session explicitly rejected. **Specialist owner.** user-experience-designer. **Revisit criterion.** Users cannot follow multiple concurrent runs from the roster (then revisit the expand model). **Driven by rounds:** R1. **Referenced in plan:** design.md §Orchestrator pane; tasks.md groups 7, 10. --- ## D-8 — Workflow toolbar button + slash launch, BooChat/BooCoder parity **Decision.** Add a `Workflow` (lucide) button on `ChatInput`'s controls row, between the `SquareSlash` chip and the `Globe` pill — yielding parity in BooChat (ChatPane) and BooCoder (CoderPane) for free. Label "Flows" on desktop, icon-only on mobile (toolbar confirmed to fit one line). Slash launches instantly with defaults (band small, current pane's project, text-after-command = focus), opening the pane. The button opens `FlowLauncherDialog.tsx` first: 5 category tabs (Analysis/Discovery/Planning/Authoring/Review) → filtered flow list + size + focus + fast toggle; defaults Analysis/Small/off. **Rationale.** `ChatInput` is the shared composer rendered by both panes, so a single button gives both doors with parity at no extra cost. The toolbar fits one line at ≤5 elements, so adding the button does not force scroll/wrap (a standing mobile constraint). **Evidence.** `ChatInput.tsx:648-732`, `:673` — the controls row is ≤5 elements; adding the `Workflow` icon between SquareSlash and Globe keeps it one line; refutes junior Q13's crowding worry (C9). Launcher categories from `flows/index.ts` (C10). Shared-composer fact: discovery notes §apps/web (ChatInput rendered by ChatPane + CoderPane). **Rejected alternatives.** - Separate buttons in ChatPane and CoderPane — rejected: duplicates wiring; the shared composer already gives parity from one button. - A launcher search box instead of category tabs — rejected (YAGNI): 22 flows in 5 categories are browsable; a search box is unproven need. Deferred. **Specialist owner.** user-experience-designer. **Revisit criterion.** Category grouping fails users at the 22-flow catalog size (then add the search box). **Driven by rounds:** R1. **Referenced in plan:** design.md §Toolbar button & launcher; tasks.md groups 8, 9. --- ## D-9 — Resumable runs via `initResume` on coder startup **Decision.** On coder startup, an `initResume` re-advances every `flow_runs WHERE status='running'`: a step whose task completed → mark the step done + advance the run; a step whose task is lost/failed (PTY died on restart) → re-dispatch; completed steps are kept. (design-context decision 4 commits to "resumable".) **Rationale.** A restart can land mid-flight. Because execution goes through the dispatcher with persisted task state (D-3), a step's outcome is recoverable from the DB; the run-level scheduler just has to re-derive the wave and re-dispatch only the steps that did not finish. Reconcile-and-advance (architect A3) beats mark-run-failed (data's conservative option) because decision 4 already committed to resumable and the task state is durable. **Evidence.** No run-level resume exists today (single tasks resume via `agent_sessions`; a run spanning tasks does not) — discovery notes §Enumerated gaps. Resume tension recorded as C15 (architect reconcile-and-advance vs data mark-failed); resolved toward reconcile-and-advance by decision 4 + durable task state under D-3. **Rejected alternatives.** - Mark a running run failed on restart — rejected (C15): contradicts decision 4 (resumable) and discards recoverable completed-step work. - Re-running the whole flow from step 0 — rejected: re-does completed steps, burning the local model on work already persisted. **Specialist owner.** software-architect (scheduler); data-engineer (recovery query). **Revisit criterion.** A step-level idempotency hazard surfaces where re-dispatch of a "lost" step double-counts side effects (none expected under read-only plan mode). **Driven by rounds:** R1. **Referenced in plan:** design.md §Resume; tasks.md group 5. --- ## D-10 — Concurrency: multiple runs, no `queued` status, single model per run **Decision.** Multiple runs are allowed; each gets its own pane + `flow_runs` row, no shared state. Step statuses are pending / running / completed / failed / skipped — there is NO separate `queued` status (the dispatcher's `pending` covers a step waiting on the busy model or on deps). Model is a single config value per run, default `qwen3.6-35b-a3b-mxfp4`. **Rationale.** Each run is independent state, so concurrency needs no coordination beyond the dispatcher's existing per-session serialization. A `queued` status is not observable: with the model busy, a task is simply `pending`/`running` and llama-swap does not expose queue position, so a distinct `queued` state would be a label the system cannot honestly populate (revising decision-11's "panes show queued honestly"). **Evidence.** `queued` unobservability recorded as C16 (junior Q11, data DATA-005): llama-swap does not report queue position; the status reduces to pending(dep/model-wait)/running. Single-model-per-run carried from decision 6/11. **Rejected alternatives.** - A distinct `queued` step status — rejected (C16): nothing can populate it honestly; `pending` already means "waiting". Deferred (reopen if llama-swap exposes queue position). - Serializing runs (one at a time) — rejected: runs are independent; serialization adds coordination for no benefit and hurts the multi-pane UX (decision 11). **Specialist owner.** data-engineer (status set), devops-engineer (model-busy behavior under concurrent runs). **Revisit criterion.** llama-swap exposes queue position → add an observable `queued` status. **Driven by rounds:** R1. **Referenced in plan:** design.md §Schema (status sets) + §Concurrency; tasks.md group 2. --- ## Cross-reference index | Decision | Driven by | Design.md section | Tasks.md group | |---|---|---|---| | D-1 Re-home + DispatchFn | R1 (C1, C11, C14) | Re-home & DispatchFn seam | 1 | | D-2 Flow-runner + onTaskTerminal | R1 (C2, C3) | Flow-runner & onTaskTerminal | 4 | | D-3 Dispatcher reuse (not bypass) | R1 (C12), R2 | Execution via dispatcher reuse | 4 | | D-4 Read-only via plan mode | R1 (C13), R2 | Read-only via plan mode | 4 | | D-5 Schema flow_runs/flow_steps | R1 (C3–C6) | Schema | 2 | | D-6 WS frames | R1 (C7) | WS frames | 3 | | D-7 Orchestrator pane | R1 (C8) | Orchestrator pane | 7, 10 | | D-8 Toolbar button + slash | R1 (C9, C10) | Toolbar button & launcher | 8, 9 | | D-9 Resume | R1 (C15) | Resume | 5 | | D-10 Concurrency / no-queued | R1 (C16) | Schema + Concurrency | 2 | ## Deferred (YAGNI) These were considered and deferred under the evidence rule. Each names the trigger that would justify reopening. ### `@boocode/conductor` workspace package - **Why deferred:** only two consumers (Phase-1 CLI + coder); copy-in (D-1) avoids premature shared-package abstraction. - **Reopen when:** a third app needs the conductor types. - **Source:** architect (D-1 rejected alternative). ### `flow_steps.depends_on` column - **Why deferred:** deps are derivable from the loaded flow def (`flow.ts:28-41`, `types.ts:27`); a column duplicates the source of truth. - **Reopen when:** a stored-run DAG visualization must show deps without loading the flow def. - **Source:** data-engineer C6 (D-5 rejected alternative). ### Persisted skipped-step rows - **Why deferred:** `when()` is pure on stored `input`, so a skip is reconstructable from the flow def + run input. - **Reopen when:** the UI must explain a skip without the flow def. - **Source:** data-engineer C6 (D-5 rejected alternative). ### `read_only` flag on `tasks` - **Why deferred:** superseded by `mode_id='plan'` (D-4), an existing column already plumbed to qwen's `--approval-mode`. - **Reopen when:** a non-qwen agent without a `--approval-mode plan` equivalent is added to flows. - **Source:** D-4 rejected alternative. ### Explicit `queued` step status - **Why deferred:** llama-swap does not expose queue position; nothing can populate the status honestly (C16). `pending` covers waiting. - **Reopen when:** llama-swap exposes queue position. - **Source:** junior Q11 / data DATA-005 (D-10 rejected alternative). ### Launcher search box - **Why deferred:** 22 flows in 5 category tabs are browsable; a search box is unproven need. - **Reopen when:** category grouping fails users at the catalog size. - **Source:** UX C10 (D-8 rejected alternative). ### Separate report-stored WS frame - **Why deferred:** the report rides on `flow_run_step_updated` (D-6). - **Reopen when:** reports exceed the ~50KB frame size limit. - **Source:** architect C7 (D-6 rejected alternative).