Files
boocode/openspec/changes/orchestrator/design.md
indifferentketchup 1937af8df9 feat: in-app Orchestrator (Phase 2) — multi-agent conductor
Brings the deterministic Han-flow conductor into BooCode: launch any read-only
flow from BooChat or BooCoder, watch each agent stream live in a Paseo-style
run pane, get an evidence-disciplined report — on local Qwen, persisted and
resumable. Read-only enforced hard via qwen --approval-mode plan (orchestrator
tasks fail closed if qwen is unavailable; never fall to write-capable native).

Backend (apps/coder): re-homed conductor defs, flow_runs/flow_steps schema,
flow-runner + dispatcher onTaskTerminal hook, restart-resume, runs routes
(launch/list/get/cancel), user-channel WS. Contracts: two flow_run_* frames.
Web: orchestrator pane kind + OrchestratorPane, Workflow button + slash flows
(BooChat/BooCoder parity), FlowLauncherDialog, "New Orchestrator" in the + and
split menus, runs history + export. Plan: openspec/changes/orchestrator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 15:22:48 +00:00

244 lines
13 KiB
Markdown

# Orchestrator (Phase 2) — design (the HOW)
Planning altitude: names files, columns, frames, and decision-bearing values
(the plan-mode flag, status sets, frame field names). Every non-obvious choice
cites a committed decision in
[artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md).
The behavioral spec is [artifacts/design-context.md](artifacts/design-context.md)
(decision 5 REVISED here); integration surfaces are in
[artifacts/.discovery-notes.md](artifacts/.discovery-notes.md).
## Architecture at a glance
```
ChatInput (shared composer) apps/web
├─ Workflow button → FlowLauncherDialog ─┐
└─ /flow slash (instant defaults) ───────┤
POST /api/runs ── apps/coder/routes
flow-runner.ts (DB-driven scheduler)
· loads flow def from src/conductor/
· step.run(ctx) IN-PROCESS → prompt (contracts injected)
· INSERT flow_runs / flow_steps
· INSERT each ready agent step as a tasks row
(mode_id='plan', synthetic chat_id)
dispatcher.ts (REUSED, unchanged internals)
· LISTEN 'tasks_new' → external-agent path
· qwen --approval-mode plan (read-only gate)
· worktree = read snapshot; AgentEvents → WS frames
· onTaskTerminal(taskId,state) ← ONE new hook
flow-runner advances: read full output → run code steps
inline → INSERT next ready wave (or finish + report)
flow_run_started / flow_run_step_updated + reused delta/tool_call/
message_complete (keyed by step chat_id) → broker → WS
OrchestratorPane.tsx (run header, report-at-top,
collapsed roster, expand-one-at-a-time stream)
```
## Re-home & DispatchFn seam ([D-1](artifacts/implementation-decision-log.md#d-1--re-home-the-pure-conductor-definitions-into-appscodersrcconductor))
Copy the pure (dispatch-free) conductor files into `apps/coder/src/conductor/`:
`spine.ts`, `flows/*`, `contracts.ts`, `types.ts`, `render.ts`. Copy the 23
personas (`conductor/agents/*.md`). Do NOT copy `flow.ts` (in-memory scheduler,
replaced by the flow-runner) or `dispatch.ts` (`opencode run` subprocess,
replaced by dispatcher reuse). The Phase-1 CLI under `conductor/` stays alive
unchanged as a regression oracle.
Two seam edits on the copies:
- **Sever flow→dispatch coupling.** `flows/code-review.ts:10` imports
`dispatchAgent` from `../dispatch.js` and calls it at `:62`. Replace that import
with a `DispatchFn` field on `StepContext`, injected by the flow-runner. Every
flow then reaches dispatch through the context, not a module import.
- **Parameterize the model.** `spine.ts:122` reads `process.env.CONDUCTOR_MODEL`
into the report header. Make it read the run's configured `model` (passed through
the spine factory / step context) so the header matches the run, not a process
env.
The evidence/yagni contracts (`contracts.ts`) and the adversarial-validator gate
are preserved because the flow-runner calls `step.run(ctx)` **in-process** to build
each prompt before it INSERTs the task — the closures execute in the coder process;
prompts are never serialized to DB ([D-1] rationale, C11).
## Schema ([D-5](artifacts/implementation-decision-log.md#d-5--flow_runs--flow_steps-schema-in-the-coder-schema), [D-10](artifacts/implementation-decision-log.md#d-10--concurrency-multiple-runs-no-queued-status-single-model-per-run))
Two tables in `apps/coder/src/schema.sql` (coder-owned; applied by the host
boocoder service). Explicit CHECK names + the repo's DROP-IF-EXISTS →
guarded-ADD discipline (root CLAUDE.md).
`flow_runs`:
- `id`, `project_id` (NO FK — matches `tasks.project_id`, `schema.sql:19`),
`flow_name`, `band` CHECK `(small|medium|large)`, `model`,
`status` CHECK-named `(running|completed|failed)`,
`input` JSONB CHECK `(input ? 'question')`,
`report` TEXT nullable, `error`, `created_at`/`updated_at` (`clock_timestamp()`).
- Index `flow_runs(project_id, created_at DESC)` (runs history).
`flow_steps`:
- `id`, `run_id` UUID → `flow_runs(id)` ON DELETE CASCADE, `step_id`,
`kind` CHECK `(agent|code)`, `agent`,
`status` CHECK-named `(pending|running|completed|failed|skipped)`
— no `queued` status ([D-10]; llama-swap can't populate it, C16),
`task_id` UUID → `tasks(id)` ON DELETE SET NULL (nullable; code steps NULL),
`chat_id` UUID → `chats(id)` ON DELETE SET NULL,
`input` TEXT, `output` TEXT (FULL output — `tasks.output_summary` is ≤500 char,
`schema.sql:26`, and can't reconstruct `ctx.results`, C3), `error`,
timestamps, UNIQUE `(run_id, step_id)`.
- Index `flow_steps(run_id, status)` (ready-wave + resume scans).
No `depends_on` column and no skipped-step rows — deps and skips are derivable from
the loaded flow def (`flow.ts:28-41`, `types.ts:27`, C6). The FK lives on
`flow_steps.task_id`, NOT a new column on `tasks` ([D-5]; keeps `tasks` generic, C4).
JSONB writes via `sql.json(value as never)`.
## Flow-runner & onTaskTerminal ([D-2](artifacts/implementation-decision-log.md#d-2--db-driven-flow-runner-with-an-ontaskterminal-dispatcher-hook))
New `apps/coder/src/services/flow-runner.ts` — a DB-backed scheduler that owns
`flow_runs`/`flow_steps`. It does NOT run a poll loop; it reacts to ONE new hook.
`createDispatcher` gains an `onTaskTerminal(taskId, state)` callback, invoked at
the existing external-agent terminal transitions (`dispatcher.ts:642-646`
completed, `:659-661` failed). No change to the dispatcher's internal run
functions ([D-2]).
Run lifecycle:
1. `POST /api/runs` → flow-runner loads the flow def, derives the first ready wave,
INSERTs `flow_runs` (`status='running'`) and its `flow_steps` (each
`status='pending'`), and a synthetic `chats` row per agent step (stream
attribution, [D-6]).
2. For each ready `agent` step: build the prompt via `step.run(ctx)` in-process,
then INSERT a `tasks` row `(project_id, input=prompt, agent, model,
mode_id='plan', chat_id=<synthetic>)` with `state='pending'`. The dispatcher
picks it up via `LISTEN 'tasks_new'` ([D-3]).
3. `code` steps run inline in the flow-runner (no task; `flow_steps.task_id` NULL).
4. `onTaskTerminal` fires → flow-runner reads the **full** task output, writes it to
`flow_steps.output`, marks the step completed/failed, derives the next ready
wave, and INSERTs it (or, on the last wave, renders the report into
`flow_runs.report` and sets `status='completed'`).
## Execution via dispatcher reuse ([D-3](artifacts/implementation-decision-log.md#d-3--reuse-the-existing-dispatcher-insert-pending-task-not-a-direct-pty-bypass))
Steps execute through the **existing** dispatcher external-agent path — not a
direct-PTY bypass. The dispatcher creates a git worktree (a stable HEAD
read-checkout), runs the agent, and streams AgentEvents → WS frames unchanged.
This REVISES design-context decision 5 ("no worktree") to "worktree as a harmless
read snapshot" — inert because the agent cannot write under plan mode ([D-4]).
Task-as-dispatch precedents the flow-runner mirrors: `routes/skills.ts:94`,
`routes/arena.ts:49`, `tools/new_task.ts:54`.
## Read-only via plan mode ([D-4](artifacts/implementation-decision-log.md#d-4--read-only-enforced-hard-by-mode_idplan-qwen---approval-mode-plan))
The flow-runner hardcodes `mode_id='plan'` on every step task; never
user-overridable. The PTY dispatcher already passes it to qwen as
`--approval-mode plan` (`pty-dispatch.ts:75`), a built-in tool-level gate: reads
allowed, writes blocked. This is the SOLE read-only enforcement. Persona prompts
and `BOOCODE_TOOLS` are NOT relied upon — they do not govern an external qwen CLI
child (R2 security finding, C13). Adding a non-qwen agent to flows requires
re-verifying that agent's plan-mode equivalent before allowing it.
## WS frames ([D-6](artifacts/implementation-decision-log.md#d-6--two-new-ws-frames-per-agent-stream-reuses-existing-frames-by-chat_id))
Two new frames in `packages/contracts/src/ws-frames.ts` `WsFrameSchema`:
- `flow_run_started`: `{ run_id, flow_name, band, steps: [{ step_id, agent, kind,
chat_id, label }] }`.
- `flow_run_step_updated`: `{ run_id, step_id, status, run_status?, report? }`
(the report rides here — no separate report frame, [D-6]).
The per-agent token stream REUSES the existing `delta` / `tool_call` /
`message_complete` frames keyed by the step's synthetic `chat_id` — no new
streaming frames. Register both new frames in ALL THREE registries: contracts
`WsFrameSchema` (rebuild `pnpm -C packages/contracts build`), the server loose
`InferenceFrame` union (`services/inference/turn.ts`), and the web strict
`WsFrame` union (`apps/web/src/api/types.ts` — the wire-format gate; missing it
silently drops the frame at JSON-parse).
## Resume ([D-9](artifacts/implementation-decision-log.md#d-9--resumable-runs-via-initresume-on-coder-startup))
`initResume` runs on coder startup over `flow_runs WHERE status='running'`:
- step whose `task_id` task is `completed` → mark step done, advance the run;
- step whose task is lost/failed (PTY died on restart) → re-dispatch (re-INSERT a
fresh task, again `mode_id='plan'`);
- completed steps are kept (no re-run).
Reconcile-and-advance, not mark-run-failed — decision 4 commits to resumable and
task state is durable under [D-3] (C15).
## Orchestrator pane ([D-7](artifacts/implementation-decision-log.md#d-7--orchestrator-pane-kind--orchestratorpane))
New `orchestrator` pane kind following the `markdown_artifact`/`html_artifact`
precedent (`api/types.ts:386` `WorkspacePaneKind`). Touches `WorkspacePaneKind`,
`useWorkspacePanes`, `Workspace`, `NewPaneMenu`, `ChatTabBar`,
`PaneHeaderActions`.
`OrchestratorPane.tsx`:
- run header (flow + band);
- report-at-top on completion;
- collapsed agent roster reusing `AgentStatusDot` (`AgentComposerBar.tsx:204`);
- expand-one-at-a-time detail well reusing the CoderPane stream rendering (keyed by
the step's `chat_id`);
- mobile single-column inline expand; auto-expand-follows-active.
The pane subscribes to `flow_run_started` (to build the roster) and
`flow_run_step_updated` (status + report), and to the reused
delta/tool_call/message_complete frames by `chat_id` for the expanded agent.
## Toolbar button & launcher ([D-8](artifacts/implementation-decision-log.md#d-8--workflow-toolbar-button--slash-launch-boochatboocoder-parity))
A `Workflow` (lucide) button on `ChatInput`'s controls row, between the
`SquareSlash` chip and the `Globe` pill (`ChatInput.tsx:648-732`, `:673` — row is
≤5 elements, stays one line, C9). Because `ChatInput` is rendered by both ChatPane
and CoderPane, this is BooChat + BooCoder parity from one button. "Flows" label
desktop, icon-only mobile.
- **Slash** (`/flow <focus>`): launches instantly with defaults (band `small`,
current pane's project, text-after-command = focus), opening an Orchestrator
pane.
- **Button** → `FlowLauncherDialog.tsx`: 5 category tabs (Analysis / Discovery /
Planning / Authoring / Review) filtering the flow list (`flows/index.ts`), + size
+ focus + fast toggle; defaults Analysis / Small / off. Same run pane either way.
Runs history surfaces in `NewPaneMenu`. Export (copy / save-file / send-to-chat via
the existing `sendToChat`, `lib/events.ts`) lives in the pane header ``,
conditional on a completed report.
## Concurrency ([D-10](artifacts/implementation-decision-log.md#d-10--concurrency-multiple-runs-no-queued-status-single-model-per-run))
Multiple runs allowed; each its own pane + `flow_runs` row, no shared state. Step
statuses: pending / running / completed / failed / skipped (no `queued` — the
dispatcher's `pending` covers a step waiting on deps or on the busy model; llama-
swap can't report queue position, C16). Single model per run, default
`qwen3.6-35b-a3b-mxfp4`.
## Routes
- `POST /api/runs` — `{ project_id, flow_name, band, input:{question,...}, model? }`
→ creates the run, starts the flow-runner, returns `run_id`. Publishes
`flow_run_started`.
- `GET /api/runs?project_id=` — runs history (backs `NewPaneMenu`).
- `GET /api/runs/:id` — reopen a run (run + steps + report).
## Deploy surface
- `apps/coder` changes (conductor defs, flow-runner, dispatcher hook, schema,
resume, routes) → `sudo systemctl restart boocoder`.
- `packages/contracts` + `apps/web` (frames, pane, button, launcher, history) →
`docker compose up --build -d boocode`. Build contracts first
(`pnpm -C packages/contracts build`).
## Deferred (YAGNI)
Full list with reopen triggers in
[artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md#deferred-yagni):
`@boocode/conductor` workspace package (copy-in instead);
`flow_steps.depends_on` column (derive from flow def);
persisted skipped-step rows (`when()` is pure);
a `read_only` flag on `tasks` (superseded by `mode_id='plan'`);
an explicit `queued` status (llama-swap can't populate it);
a launcher search box (5 category tabs suffice);
a separate report WS frame (report rides on `flow_run_step_updated`).