Files

indifferentketchup 1937af8df9 feat: in-app Orchestrator (Phase 2) — multi-agent conductor

Brings the deterministic Han-flow conductor into BooCode: launch any read-only
flow from BooChat or BooCoder, watch each agent stream live in a Paseo-style
run pane, get an evidence-disciplined report — on local Qwen, persisted and
resumable. Read-only enforced hard via qwen --approval-mode plan (orchestrator
tasks fail closed if qwen is unavailable; never fall to write-capable native).

Backend (apps/coder): re-homed conductor defs, flow_runs/flow_steps schema,
flow-runner + dispatcher onTaskTerminal hook, restart-resume, runs routes
(launch/list/get/cancel), user-channel WS. Contracts: two flow_run_* frames.
Web: orchestrator pane kind + OrchestratorPane, Workflow button + slash flows
(BooChat/BooCoder parity), FlowLauncherDialog, "New Orchestrator" in the + and
split menus, runs history + export. Plan: openspec/changes/orchestrator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-03 15:22:48 +00:00

13 KiB

Raw Blame History

Orchestrator (Phase 2) — design (the HOW)

Planning altitude: names files, columns, frames, and decision-bearing values (the plan-mode flag, status sets, frame field names). Every non-obvious choice cites a committed decision in artifacts/implementation-decision-log.md. The behavioral spec is artifacts/design-context.md (decision 5 REVISED here); integration surfaces are in artifacts/.discovery-notes.md.

Architecture at a glance

ChatInput (shared composer)                          apps/web
  ├─ Workflow button → FlowLauncherDialog ─┐
  └─ /flow slash (instant defaults) ───────┤
                                           ▼
                          POST /api/runs  ── apps/coder/routes
                                           ▼
                       flow-runner.ts (DB-driven scheduler)
                         · loads flow def from src/conductor/
                         · step.run(ctx) IN-PROCESS → prompt (contracts injected)
                         · INSERT flow_runs / flow_steps
                         · INSERT each ready agent step as a tasks row
                              (mode_id='plan', synthetic chat_id)
                                           ▼
                          dispatcher.ts (REUSED, unchanged internals)
                         · LISTEN 'tasks_new' → external-agent path
                         · qwen --approval-mode plan  (read-only gate)
                         · worktree = read snapshot; AgentEvents → WS frames
                         · onTaskTerminal(taskId,state)  ← ONE new hook
                                           ▼
                       flow-runner advances: read full output → run code steps
                         inline → INSERT next ready wave  (or finish + report)
                                           ▼
   flow_run_started / flow_run_step_updated  +  reused delta/tool_call/
   message_complete (keyed by step chat_id)  → broker → WS
                                           ▼
                 OrchestratorPane.tsx (run header, report-at-top,
                 collapsed roster, expand-one-at-a-time stream)

Re-home & DispatchFn seam (D-1)

Copy the pure (dispatch-free) conductor files into apps/coder/src/conductor/: spine.ts, flows/*, contracts.ts, types.ts, render.ts. Copy the 23 personas (conductor/agents/*.md). Do NOT copy flow.ts (in-memory scheduler, replaced by the flow-runner) or dispatch.ts (opencode run subprocess, replaced by dispatcher reuse). The Phase-1 CLI under conductor/ stays alive unchanged as a regression oracle.

Two seam edits on the copies:

Sever flow→dispatch coupling. flows/code-review.ts:10 imports dispatchAgent from ../dispatch.js and calls it at :62. Replace that import with a DispatchFn field on StepContext, injected by the flow-runner. Every flow then reaches dispatch through the context, not a module import.
Parameterize the model. spine.ts:122 reads process.env.CONDUCTOR_MODEL into the report header. Make it read the run's configured model (passed through the spine factory / step context) so the header matches the run, not a process env.

The evidence/yagni contracts (contracts.ts) and the adversarial-validator gate are preserved because the flow-runner calls step.run(ctx) in-process to build each prompt before it INSERTs the task — the closures execute in the coder process; prompts are never serialized to DB ([D-1] rationale, C11).

Schema (D-5, D-10)

Two tables in apps/coder/src/schema.sql (coder-owned; applied by the host boocoder service). Explicit CHECK names + the repo's DROP-IF-EXISTS → guarded-ADD discipline (root CLAUDE.md).

flow_runs:

id, project_id (NO FK — matches tasks.project_id, schema.sql:19), flow_name, band CHECK (small|medium|large), model, status CHECK-named (running|completed|failed), input JSONB CHECK (input ? 'question'), report TEXT nullable, error, created_at/updated_at (clock_timestamp()).
Index flow_runs(project_id, created_at DESC) (runs history).

flow_steps:

id, run_id UUID → flow_runs(id) ON DELETE CASCADE, step_id, kind CHECK (agent|code), agent, status CHECK-named (pending|running|completed|failed|skipped) — no queued status ([D-10]; llama-swap can't populate it, C16), task_id UUID → tasks(id) ON DELETE SET NULL (nullable; code steps NULL), chat_id UUID → chats(id) ON DELETE SET NULL, input TEXT, output TEXT (FULL output — tasks.output_summary is ≤500 char, schema.sql:26, and can't reconstruct ctx.results, C3), error, timestamps, UNIQUE (run_id, step_id).
Index flow_steps(run_id, status) (ready-wave + resume scans).

No depends_on column and no skipped-step rows — deps and skips are derivable from the loaded flow def (flow.ts:28-41, types.ts:27, C6). The FK lives on flow_steps.task_id, NOT a new column on tasks ([D-5]; keeps tasks generic, C4). JSONB writes via sql.json(value as never).

Flow-runner & onTaskTerminal (D-2)

New apps/coder/src/services/flow-runner.ts — a DB-backed scheduler that owns flow_runs/flow_steps. It does NOT run a poll loop; it reacts to ONE new hook.

createDispatcher gains an onTaskTerminal(taskId, state) callback, invoked at the existing external-agent terminal transitions (dispatcher.ts:642-646 completed, :659-661 failed). No change to the dispatcher's internal run functions ([D-2]).

Run lifecycle:

POST /api/runs → flow-runner loads the flow def, derives the first ready wave, INSERTs flow_runs (status='running') and its flow_steps (each status='pending'), and a synthetic chats row per agent step (stream attribution, [D-6]).
For each ready agent step: build the prompt via step.run(ctx) in-process, then INSERT a tasks row (project_id, input=prompt, agent, model, mode_id='plan', chat_id=<synthetic>) with state='pending'. The dispatcher picks it up via LISTEN 'tasks_new' ([D-3]).
code steps run inline in the flow-runner (no task; flow_steps.task_id NULL).
onTaskTerminal fires → flow-runner reads the full task output, writes it to flow_steps.output, marks the step completed/failed, derives the next ready wave, and INSERTs it (or, on the last wave, renders the report into flow_runs.report and sets status='completed').

Execution via dispatcher reuse (D-3)

Steps execute through the existing dispatcher external-agent path — not a direct-PTY bypass. The dispatcher creates a git worktree (a stable HEAD read-checkout), runs the agent, and streams AgentEvents → WS frames unchanged. This REVISES design-context decision 5 ("no worktree") to "worktree as a harmless read snapshot" — inert because the agent cannot write under plan mode ([D-4]). Task-as-dispatch precedents the flow-runner mirrors: routes/skills.ts:94, routes/arena.ts:49, tools/new_task.ts:54.

Read-only via plan mode (D-4)

The flow-runner hardcodes mode_id='plan' on every step task; never user-overridable. The PTY dispatcher already passes it to qwen as --approval-mode plan (pty-dispatch.ts:75), a built-in tool-level gate: reads allowed, writes blocked. This is the SOLE read-only enforcement. Persona prompts and BOOCODE_TOOLS are NOT relied upon — they do not govern an external qwen CLI child (R2 security finding, C13). Adding a non-qwen agent to flows requires re-verifying that agent's plan-mode equivalent before allowing it.

WS frames (D-6)

Two new frames in packages/contracts/src/ws-frames.ts WsFrameSchema:

flow_run_started: { run_id, flow_name, band, steps: [{ step_id, agent, kind, chat_id, label }] }.
flow_run_step_updated: { run_id, step_id, status, run_status?, report? } (the report rides here — no separate report frame, [D-6]).

The per-agent token stream REUSES the existing delta / tool_call / message_complete frames keyed by the step's synthetic chat_id — no new streaming frames. Register both new frames in ALL THREE registries: contracts WsFrameSchema (rebuild pnpm -C packages/contracts build), the server loose InferenceFrame union (services/inference/turn.ts), and the web strict WsFrame union (apps/web/src/api/types.ts — the wire-format gate; missing it silently drops the frame at JSON-parse).

Resume (D-9)

initResume runs on coder startup over flow_runs WHERE status='running':

step whose task_id task is completed → mark step done, advance the run;
step whose task is lost/failed (PTY died on restart) → re-dispatch (re-INSERT a fresh task, again mode_id='plan');
completed steps are kept (no re-run).

Reconcile-and-advance, not mark-run-failed — decision 4 commits to resumable and task state is durable under [D-3] (C15).

Orchestrator pane (D-7)

New orchestrator pane kind following the markdown_artifact/html_artifact precedent (api/types.ts:386 WorkspacePaneKind). Touches WorkspacePaneKind, useWorkspacePanes, Workspace, NewPaneMenu, ChatTabBar, PaneHeaderActions.

OrchestratorPane.tsx:

run header (flow + band);
report-at-top on completion;
collapsed agent roster reusing AgentStatusDot (AgentComposerBar.tsx:204);
expand-one-at-a-time detail well reusing the CoderPane stream rendering (keyed by the step's chat_id);
mobile single-column inline expand; auto-expand-follows-active.

The pane subscribes to flow_run_started (to build the roster) and flow_run_step_updated (status + report), and to the reused delta/tool_call/message_complete frames by chat_id for the expanded agent.

A Workflow (lucide) button on ChatInput's controls row, between the SquareSlash chip and the Globe pill (ChatInput.tsx:648-732, :673 — row is ≤5 elements, stays one line, C9). Because ChatInput is rendered by both ChatPane and CoderPane, this is BooChat + BooCoder parity from one button. "Flows" label desktop, icon-only mobile.

Slash (/flow <focus>): launches instantly with defaults (band small, current pane's project, text-after-command = focus), opening an Orchestrator pane.
Button → FlowLauncherDialog.tsx: 5 category tabs (Analysis / Discovery / Planning / Authoring / Review) filtering the flow list (flows/index.ts), + size
- focus + fast toggle; defaults Analysis / Small / off. Same run pane either way.

Runs history surfaces in NewPaneMenu. Export (copy / save-file / send-to-chat via the existing sendToChat, lib/events.ts) lives in the pane header …, conditional on a completed report.

Concurrency (D-10)

Multiple runs allowed; each its own pane + flow_runs row, no shared state. Step statuses: pending / running / completed / failed / skipped (no queued — the dispatcher's pending covers a step waiting on deps or on the busy model; llama- swap can't report queue position, C16). Single model per run, default qwen3.6-35b-a3b-mxfp4.

Routes

POST /api/runs — { project_id, flow_name, band, input:{question,...}, model? } → creates the run, starts the flow-runner, returns run_id. Publishes flow_run_started.
GET /api/runs?project_id= — runs history (backs NewPaneMenu).
GET /api/runs/:id — reopen a run (run + steps + report).

Deploy surface

apps/coder changes (conductor defs, flow-runner, dispatcher hook, schema, resume, routes) → sudo systemctl restart boocoder.
packages/contracts + apps/web (frames, pane, button, launcher, history) → docker compose up --build -d boocode. Build contracts first (pnpm -C packages/contracts build).

Deferred (YAGNI)

Full list with reopen triggers in artifacts/implementation-decision-log.md: @boocode/conductor workspace package (copy-in instead); flow_steps.depends_on column (derive from flow def); persisted skipped-step rows (when() is pure); a read_only flag on tasks (superseded by mode_id='plan'); an explicit queued status (llama-swap can't populate it); a launcher search box (5 category tabs suffice); a separate report WS frame (report rides on flow_run_step_updated).

13 KiB Raw Blame History