feat: in-app Orchestrator (Phase 2) — multi-agent conductor

Brings the deterministic Han-flow conductor into BooCode: launch any read-only flow from BooChat or BooCoder, watch each agent stream live in a Paseo-style run pane, get an evidence-disciplined report — on local Qwen, persisted and resumable. Read-only enforced hard via qwen --approval-mode plan (orchestrator tasks fail closed if qwen is unavailable; never fall to write-capable native). Backend (apps/coder): re-homed conductor defs, flow_runs/flow_steps schema, flow-runner + dispatcher onTaskTerminal hook, restart-resume, runs routes (launch/list/get/cancel), user-channel WS. Contracts: two flow_run_* frames. Web: orchestrator pane kind + OrchestratorPane, Workflow button + slash flows (BooChat/BooCoder parity), FlowLauncherDialog, "New Orchestrator" in the + and split menus, runs history + export. Plan: openspec/changes/orchestrator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 14:59:07 +00:00
parent 519b1d2ca1
commit 1937af8df9
118 changed files with 15723 additions and 27 deletions
--- a/openspec/changes/orchestrator/proposal.md
+++ b/openspec/changes/orchestrator/proposal.md
@@ -0,0 +1,82 @@
+# Orchestrator (Phase 2) — in-app multi-agent conductor
+
+## Source
+
+Settled via a conversational `grill-me` design session. The captured behavioral
+spec (12 decisions, the *what*) is
+[artifacts/design-context.md](artifacts/design-context.md); the HOW is
+[design.md](design.md), decomposed in [tasks.md](tasks.md) and committed in
+[artifacts/implementation-decision-log.md](artifacts/implementation-decision-log.md)
+(D-1…D-10). Note: design-context **decision 5** ("no worktree") is REVISED by
+D-3/D-4 (worktree kept as a harmless read snapshot; read-only enforced by qwen
+plan mode).
+
+## Why
+
+Phase 1 shipped a deterministic multi-agent **conductor** as a standalone host CLI
+(`/opt/boocode/conductor/`): Han flows that fan out read-only specialist agents,
+apply the evidence/yagni contracts, and emit an evidence-disciplined report. It is
+not reachable from the app — a user in BooChat or BooCoder cannot launch a flow,
+cannot watch agents progress live, and the run leaves no persisted, reopenable
+artifact.
+
+This change brings the conductor in-app: launch any Han flow from the shared
+composer, watch each agent stream live (Paseo-style parent-with-subagents), and
+get the report — all on the **already-loaded local Qwen 35B, free**, with the run
+persisted and resumable across a coder restart. It reuses the existing task
+dispatcher, streaming pipeline, and broker rather than standing up a parallel
+execution path (see [design-context](artifacts/design-context.md) decision 4 and
+D-2/D-3).
+
+## What Changes
+
+- **Two doors, full parity.** A `Workflow` button and a slash command on the shared
+  `ChatInput` composer → both appear in BooChat (ChatPane) and BooCoder (CoderPane),
+  desktop and mobile (icon-only). Slash launches instantly with defaults; the
+  button opens a flow launcher first. (decisions 2, 8, 9 · D-8)
+- **A new `orchestrator` pane kind.** A run view alongside `chat | coder |
+  terminal`: flow + band header, the report at the top on completion, a collapsed
+  agent roster, expand-one-at-a-time to watch a single agent's live stream.
+  (decision 3 · D-7)
+- **Read-only flows, enforced HARD.** Every step runs as a qwen agent under
+  `--approval-mode plan` (`mode_id='plan'`): reads allowed, writes blocked at the
+  tool level. Flows never write the repo; the report is the only output.
+  (decision 5 revised · D-4)
+- **Execution reuses the dispatcher.** Each flow step is inserted as a normal
+  `tasks` row; the existing dispatcher runs it through the external-agent path and
+  streams AgentEvents → WS frames unchanged. One new `onTaskTerminal` hook advances
+  the flow. (decisions 1, 4 · D-2, D-3)
+- **Persisted + resumable runs.** New `flow_runs` / `flow_steps` tables in the
+  coder schema; a run survives a coder restart (`initResume` reconciles mid-flight
+  steps). Runs are reopenable from a history; the report is exportable on demand
+  (copy / save-file / send-to-chat). (decisions 4, 10 · D-5, D-9)
+- **Qwen-only, one model per run.** Default `qwen3.6-35b-a3b-mxfp4`, held as a
+  single config value so more local models slot in later. Multiple runs allowed,
+  each its own pane. (decisions 6, 11 · D-10)
+- **Conductor definitions re-homed.** The pure flow/spine/contracts/types/render
+  files + 23 personas are copied into `apps/coder/src/conductor/`; the Phase-1 CLI
+  stays alive. The evidence/yagni contracts and adversarial-validator gate are
+  preserved (the flow-runner builds each prompt in-process before dispatch).
+  (decision 1 · D-1)
+
+## Impact
+
+- **`apps/coder` (deploy: `sudo systemctl restart boocoder`):** new
+  `conductor/` defs, `flow-runner.ts`, the `onTaskTerminal` dispatcher hook,
+  `flow_runs`/`flow_steps` in `schema.sql`, `initResume`, `POST /api/runs` +
+  list/reopen routes.
+- **`packages/contracts` + `apps/web` (deploy: `docker compose up --build -d
+  boocode`):** two new WS frames (in all three registries), the `orchestrator`
+  pane kind + `OrchestratorPane.tsx`, the `Workflow` toolbar button + slash wiring,
+  `FlowLauncherDialog.tsx`, runs history + export.
+- **No `apps/server` chat-pipeline change** beyond the contracts frame registry
+  (the web type is the wire gate).
+- **Safety:** read-only is the whole feature's invariant; D-4 makes it a tool-level
+  gate, not a prompt. Reviewed by adversarial-security-analyst in R2.
+
+## Out of scope (carried from design-context)
+
+- A Claude execution path (Claude Code covers it).
+- Folding Arena into the Orchestrator (stays separate).
+- Per-agent model tiering (single model per run for now).
+- Pixel-faithful per-skill Han report templates (spine-level only).