feat: in-app Orchestrator (Phase 2) — multi-agent conductor
Brings the deterministic Han-flow conductor into BooCode: launch any read-only flow from BooChat or BooCoder, watch each agent stream live in a Paseo-style run pane, get an evidence-disciplined report — on local Qwen, persisted and resumable. Read-only enforced hard via qwen --approval-mode plan (orchestrator tasks fail closed if qwen is unavailable; never fall to write-capable native). Backend (apps/coder): re-homed conductor defs, flow_runs/flow_steps schema, flow-runner + dispatcher onTaskTerminal hook, restart-resume, runs routes (launch/list/get/cancel), user-channel WS. Contracts: two flow_run_* frames. Web: orchestrator pane kind + OrchestratorPane, Workflow button + slash flows (BooChat/BooCoder parity), FlowLauncherDialog, "New Orchestrator" in the + and split menus, runs history + export. Plan: openspec/changes/orchestrator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
146
conductor/README.md
Normal file
146
conductor/README.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# BooCode Code Conductor
|
||||
|
||||
A deterministic, code-driven orchestrator for multi-agent flows, modelled on
|
||||
[Han](https://github.com/testdouble/han). **Code owns the sequence; the model
|
||||
only works.**
|
||||
|
||||
## Why this exists
|
||||
|
||||
Han's skills are conducted by a *model* — the driving LLM reads a skill and
|
||||
decides when to dispatch each specialist agent. That works with a strong model
|
||||
(Claude), but a local 35B model drops orchestration steps when it has to track
|
||||
methodology *and* sequence agents at once (measured: the loose Han `research`
|
||||
skill on Qwen 3.6 35B dropped the adversarial-validator step).
|
||||
|
||||
This conductor inverts that. A small TypeScript scheduler decides every step,
|
||||
runs independent steps in parallel, folds their results, and dispatches each Han
|
||||
agent as a **bounded single-task worker** — one `opencode run` per worker, with
|
||||
the agent's persona baked in. The model never sees the workflow; it only does
|
||||
one task and returns. That is the "external deterministic orchestrator drives
|
||||
bounded workers" pattern, which holds on weak local models where self-
|
||||
orchestration does not.
|
||||
|
||||
## The research flow (Han's spine, as code)
|
||||
|
||||
```
|
||||
research-analyst ─┐
|
||||
├─▶ synthesis (code fold) ─▶ adversarial-validator ─▶ render
|
||||
codebase-explorer ┘ (parallel angles) (attacks the fold)
|
||||
```
|
||||
|
||||
- `research-analyst` and `codebase-explorer` run **in parallel** (explorer only
|
||||
when `--repo` is given).
|
||||
- A **code** step folds their full outputs verbatim — no 500-char summary, no
|
||||
model deciding order.
|
||||
- `adversarial-validator` attacks the fold.
|
||||
- `render` stitches the Han-shaped report, validation above recommendation.
|
||||
|
||||
## Run it
|
||||
|
||||
```bash
|
||||
cd conductor
|
||||
# (local-only: node_modules is symlinked to ../apps/coder/node_modules — gitignored)
|
||||
node_modules/.bin/tsx src/run.ts <flow> "<question>" [--size=small|medium|large] [--repo=/abs/path] [--fast]
|
||||
|
||||
# examples:
|
||||
node_modules/.bin/tsx src/run.ts research "polling vs webhooks for a third-party API?" --fast
|
||||
node_modules/.bin/tsx src/run.ts architectural-analysis "the dispatcher" --repo=/opt/boocode/apps/coder --size=large
|
||||
node_modules/.bin/tsx src/run.ts security-review "the auth middleware" --repo=/opt/boocode
|
||||
node_modules/.bin/tsx src/run.ts investigate "tasks occasionally dispatch twice" --repo=/opt/boocode
|
||||
```
|
||||
|
||||
Run with no args to list flows. Writes `conductor-report-<flow>-<slug>.md`, prints its path on stdout.
|
||||
|
||||
### Flows
|
||||
|
||||
22 flows are wired — run with no args to list them. By category:
|
||||
|
||||
- **Analysis / review:** `research`, `investigate`, `architectural-analysis`, `security-review`, `gap-analysis`, `data-review`, `devops-review`, `issue-triage`
|
||||
- **Discovery / docs / tests:** `project-discovery`, `project-documentation`, `test-planning`
|
||||
- **Planning** (one-pass drafts): `plan-a-feature`, `plan-implementation`, `plan-a-phased-build`, `plan-work-items`, `iterative-plan-review`
|
||||
- **Authoring / reporting** (one-pass drafts): `adr`, `coding-standard`, `runbook`, `tdd`, `stakeholder-summary`
|
||||
- **Bespoke pipeline:** `code-review` — per-dimension reviewers → adversarially verify each dimension (drops false positives)
|
||||
|
||||
Every spine flow ends with the **adversarial-validator** gate. `--size` selects how many angles fan out; `--fast` caps each worker's depth.
|
||||
|
||||
### Modes
|
||||
|
||||
- **Band** (`--size`, Han's small/medium/large): selects the roster breadth per flow. Small = the core angle(s); large = every angle.
|
||||
- **Fast** (`--fast`): appends a speed directive to every worker (cap tool calls, decisive evidence only). Turns a ~12-min web-research worker into ~1 min.
|
||||
|
||||
### Config (env)
|
||||
|
||||
| var | default | meaning |
|
||||
|---|---|---|
|
||||
| `CONDUCTOR_MODEL` | `llama-swap/qwen3.6-35b-a3b-mxfp4` | model each worker runs on |
|
||||
| `CONDUCTOR_OPENCODE_BIN` | `/home/samkintop/.opencode/bin/opencode` | opencode binary |
|
||||
| `CONDUCTOR_TIMEOUT_MS` | `1500000` | per-worker timeout (25 min — strict web-research personas on a local 35B routinely run 10+ min) |
|
||||
|
||||
## Layout
|
||||
|
||||
- `src/types.ts` — `Flow` / `Step` / `StepContext`, plus `Spine` / `Angle` / `Band` (a Han skill as data).
|
||||
- `src/dispatch.ts` — one bounded worker = one `opencode run` with a baked persona.
|
||||
- `src/flow.ts` — the conductor: a dependency-aware wave scheduler (parallel fan-out, barrier on deps).
|
||||
- `src/spine.ts` — the **factory**: compiles a `Spine` into a `Flow` (band gating, fold, optional synthesizer, validator gate, generic render).
|
||||
- `src/flows/*.ts` — the Han skills as `Spine` configs + the registry (`index.ts`).
|
||||
- `src/run.ts` — CLI.
|
||||
- `agents/` — all 23 Han personas (the worker roster).
|
||||
|
||||
## Han coverage
|
||||
|
||||
All 23 Han **agents** are in `agents/` and dispatchable, and the full **skill** surface is wired (22 flows):
|
||||
|
||||
- **Spine-shaped (21):** fan-out → fold → optional synthesizer → adversarial-validator → render. Added as a `Spine` in `flows/`.
|
||||
- **Bespoke (1):** `code-review` — a per-dimension find → verify-each-dimension pipeline (`flows/code-review.ts`).
|
||||
|
||||
**Honesty about the one-pass drafts.** Han's planning/authoring skills (`plan-*`, `iterative-plan-review`, `tdd`, `adr`, `coding-standard`, `runbook`, `stakeholder-summary`) are designed as human-in-the-loop loops. Run unattended here they produce a first-draft artifact and still take the adversarial-validator gate, but they are *not* a substitute for the interactive refinement Han intends. Phase 2 (in-app) is where they get a real human-in-the-loop surface.
|
||||
|
||||
## Evidence & reference alignment (Han)
|
||||
|
||||
The conductor applies Han's two foundational rules, vendored verbatim in
|
||||
`references/` and injected as contracts (`src/contracts.ts`):
|
||||
|
||||
- **`evidence-rule`** — every evidence-bearing flow brief and the validator carry
|
||||
it: trust classes (codebase / web / provided), the web **corroboration gate**
|
||||
(a single-source web claim is marked `[single-source]` and can't be the sole
|
||||
basis for a conclusion), codebase-as-current-state-anchor, and explicit
|
||||
**no-evidence labeling** with a reopen trigger. The validator additionally runs
|
||||
the rule's *reviewing* checklist — verified live: it flags single-source
|
||||
laundering and unnamed source contradictions.
|
||||
- **`yagni-rule`** — flows that PRODUCE a committable artifact (`plan-*`, `adr`,
|
||||
`coding-standard`, `runbook`, `tdd`, `project-documentation`) carry
|
||||
`contracts: ['evidence', 'yagni']`: the inclusion gate (evidence-of-need),
|
||||
the simpler-version test, and a `## Deferred (YAGNI)` section for items that
|
||||
fail the gate. The validator runs the YAGNI review checklist on them.
|
||||
|
||||
Per flow, set `contracts` on the `Spine` (default `['evidence']`). The report
|
||||
header states which rules were applied; the plain-language **Summary**, the
|
||||
**Confidence** (High/Med/Low), and any deferrals live in the **Validation**
|
||||
section.
|
||||
|
||||
**Template fidelity — partial, by design.** Han skills render a fixed per-skill
|
||||
template (e.g. `research-report-template.md`'s Sources registry). This conductor
|
||||
follows that template *spine* — sourced/numbered evidence (`A#`/`E#`) from the
|
||||
agents, then Validation with `V#` + Confidence — but does **not** run a model to
|
||||
re-assemble a pixel-faithful template (that would reintroduce model-driven
|
||||
rendering the conductor deliberately avoids). The agents emit the sourced
|
||||
sections; the conductor stitches them in template order. Exact per-skill template
|
||||
rendering is available as a follow-up if wanted (vendor the template, pass it to
|
||||
the terminal agent).
|
||||
|
||||
## Phase 2 (in-app integration) — not started
|
||||
|
||||
This standalone conductor is Phase 1. Phase 2 moves it into `apps/coder`: persist flow/step rows to Postgres, dispatch through the existing `AgentBackend`s (instead of shelling `opencode run`), expose an API route, and surface launch/watch in the CoderPane. See `docs/research/2026-06-03-boocode-orchestration-integration.md`.
|
||||
|
||||
## Not done yet (deliberate v1 scope)
|
||||
|
||||
- **Tool-permission safety.** Workers run on opencode's default agent, so the
|
||||
persona (read-only by charter) is trusted to stay read-only. v2: ship
|
||||
`mode: all`, `edit: deny` agent files and dispatch via `--agent` for
|
||||
enforced read-only.
|
||||
- **Conditional / dynamic flows.** Steps are static (band-gated). No data-
|
||||
dependent branching or dynamic agent selection yet — the scheduler could
|
||||
support it, but no flow needs it so far.
|
||||
- **Output parsing.** Worker output is the cleaned default `opencode run` text
|
||||
(banner + tool-progress lines stripped). A `--format json` parser would be
|
||||
more robust.
|
||||
Reference in New Issue
Block a user