Files

indifferentketchup 1937af8df9 feat: in-app Orchestrator (Phase 2) — multi-agent conductor

Brings the deterministic Han-flow conductor into BooCode: launch any read-only
flow from BooChat or BooCoder, watch each agent stream live in a Paseo-style
run pane, get an evidence-disciplined report — on local Qwen, persisted and
resumable. Read-only enforced hard via qwen --approval-mode plan (orchestrator
tasks fail closed if qwen is unavailable; never fall to write-capable native).

Backend (apps/coder): re-homed conductor defs, flow_runs/flow_steps schema,
flow-runner + dispatcher onTaskTerminal hook, restart-resume, runs routes
(launch/list/get/cancel), user-channel WS. Contracts: two flow_run_* frames.
Web: orchestrator pane kind + OrchestratorPane, Workflow button + slash flows
(BooChat/BooCoder parity), FlowLauncherDialog, "New Orchestrator" in the + and
split menus, runs history + export. Plan: openspec/changes/orchestrator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-03 15:22:48 +00:00

agents

feat: in-app Orchestrator (Phase 2) — multi-agent conductor

2026-06-03 15:22:48 +00:00

references

feat: in-app Orchestrator (Phase 2) — multi-agent conductor

2026-06-03 15:22:48 +00:00

src

feat: in-app Orchestrator (Phase 2) — multi-agent conductor

2026-06-03 15:22:48 +00:00

.gitignore

feat: in-app Orchestrator (Phase 2) — multi-agent conductor

2026-06-03 15:22:48 +00:00

package.json

feat: in-app Orchestrator (Phase 2) — multi-agent conductor

2026-06-03 15:22:48 +00:00

README.md

feat: in-app Orchestrator (Phase 2) — multi-agent conductor

2026-06-03 15:22:48 +00:00

tsconfig.json

feat: in-app Orchestrator (Phase 2) — multi-agent conductor

2026-06-03 15:22:48 +00:00

README.md

BooCode Code Conductor

A deterministic, code-driven orchestrator for multi-agent flows, modelled on Han. Code owns the sequence; the model only works.

Why this exists

Han's skills are conducted by a model — the driving LLM reads a skill and decides when to dispatch each specialist agent. That works with a strong model (Claude), but a local 35B model drops orchestration steps when it has to track methodology and sequence agents at once (measured: the loose Han research skill on Qwen 3.6 35B dropped the adversarial-validator step).

This conductor inverts that. A small TypeScript scheduler decides every step, runs independent steps in parallel, folds their results, and dispatches each Han agent as a bounded single-task worker — one opencode run per worker, with the agent's persona baked in. The model never sees the workflow; it only does one task and returns. That is the "external deterministic orchestrator drives bounded workers" pattern, which holds on weak local models where self- orchestration does not.

The research flow (Han's spine, as code)

research-analyst ─┐
                  ├─▶ synthesis (code fold) ─▶ adversarial-validator ─▶ render
codebase-explorer ┘   (parallel angles)        (attacks the fold)

research-analyst and codebase-explorer run in parallel (explorer only when --repo is given).
A code step folds their full outputs verbatim — no 500-char summary, no model deciding order.
adversarial-validator attacks the fold.
render stitches the Han-shaped report, validation above recommendation.

Run it

cd conductor
# (local-only: node_modules is symlinked to ../apps/coder/node_modules — gitignored)
node_modules/.bin/tsx src/run.ts <flow> "<question>" [--size=small|medium|large] [--repo=/abs/path] [--fast]

# examples:
node_modules/.bin/tsx src/run.ts research "polling vs webhooks for a third-party API?" --fast
node_modules/.bin/tsx src/run.ts architectural-analysis "the dispatcher" --repo=/opt/boocode/apps/coder --size=large
node_modules/.bin/tsx src/run.ts security-review "the auth middleware" --repo=/opt/boocode
node_modules/.bin/tsx src/run.ts investigate "tasks occasionally dispatch twice" --repo=/opt/boocode

Run with no args to list flows. Writes conductor-report-<flow>-<slug>.md, prints its path on stdout.

Flows

22 flows are wired — run with no args to list them. By category:

Analysis / review: research, investigate, architectural-analysis, security-review, gap-analysis, data-review, devops-review, issue-triage
Discovery / docs / tests: project-discovery, project-documentation, test-planning
Planning (one-pass drafts): plan-a-feature, plan-implementation, plan-a-phased-build, plan-work-items, iterative-plan-review
Authoring / reporting (one-pass drafts): adr, coding-standard, runbook, tdd, stakeholder-summary
Bespoke pipeline: code-review — per-dimension reviewers → adversarially verify each dimension (drops false positives)

Every spine flow ends with the adversarial-validator gate. --size selects how many angles fan out; --fast caps each worker's depth.

Modes

Band (--size, Han's small/medium/large): selects the roster breadth per flow. Small = the core angle(s); large = every angle.
Fast (--fast): appends a speed directive to every worker (cap tool calls, decisive evidence only). Turns a ~12-min web-research worker into ~1 min.

Config (env)

var	default	meaning
`CONDUCTOR_MODEL`	`llama-swap/qwen3.6-35b-a3b-mxfp4`	model each worker runs on
`CONDUCTOR_OPENCODE_BIN`	`/home/samkintop/.opencode/bin/opencode`	opencode binary
`CONDUCTOR_TIMEOUT_MS`	`1500000`	per-worker timeout (25 min — strict web-research personas on a local 35B routinely run 10+ min)

Layout

src/types.ts — Flow / Step / StepContext, plus Spine / Angle / Band (a Han skill as data).
src/dispatch.ts — one bounded worker = one opencode run with a baked persona.
src/flow.ts — the conductor: a dependency-aware wave scheduler (parallel fan-out, barrier on deps).
src/spine.ts — the factory: compiles a Spine into a Flow (band gating, fold, optional synthesizer, validator gate, generic render).
src/flows/*.ts — the Han skills as Spine configs + the registry (index.ts).
src/run.ts — CLI.
agents/ — all 23 Han personas (the worker roster).

Han coverage

All 23 Han agents are in agents/ and dispatchable, and the full skill surface is wired (22 flows):

Spine-shaped (21): fan-out → fold → optional synthesizer → adversarial-validator → render. Added as a Spine in flows/.
Bespoke (1): code-review — a per-dimension find → verify-each-dimension pipeline (flows/code-review.ts).

Honesty about the one-pass drafts. Han's planning/authoring skills (plan-*, iterative-plan-review, tdd, adr, coding-standard, runbook, stakeholder-summary) are designed as human-in-the-loop loops. Run unattended here they produce a first-draft artifact and still take the adversarial-validator gate, but they are not a substitute for the interactive refinement Han intends. Phase 2 (in-app) is where they get a real human-in-the-loop surface.

Evidence & reference alignment (Han)

The conductor applies Han's two foundational rules, vendored verbatim in references/ and injected as contracts (src/contracts.ts):

evidence-rule — every evidence-bearing flow brief and the validator carry it: trust classes (codebase / web / provided), the web corroboration gate (a single-source web claim is marked [single-source] and can't be the sole basis for a conclusion), codebase-as-current-state-anchor, and explicit no-evidence labeling with a reopen trigger. The validator additionally runs the rule's reviewing checklist — verified live: it flags single-source laundering and unnamed source contradictions.
yagni-rule — flows that PRODUCE a committable artifact (plan-*, adr, coding-standard, runbook, tdd, project-documentation) carry contracts: ['evidence', 'yagni']: the inclusion gate (evidence-of-need), the simpler-version test, and a ## Deferred (YAGNI) section for items that fail the gate. The validator runs the YAGNI review checklist on them.

Per flow, set contracts on the Spine (default ['evidence']). The report header states which rules were applied; the plain-language Summary, the Confidence (High/Med/Low), and any deferrals live in the Validation section.

Template fidelity — partial, by design. Han skills render a fixed per-skill template (e.g. research-report-template.md's Sources registry). This conductor follows that template spine — sourced/numbered evidence (A#/E#) from the agents, then Validation with V# + Confidence — but does not run a model to re-assemble a pixel-faithful template (that would reintroduce model-driven rendering the conductor deliberately avoids). The agents emit the sourced sections; the conductor stitches them in template order. Exact per-skill template rendering is available as a follow-up if wanted (vendor the template, pass it to the terminal agent).

Phase 2 (in-app integration) — not started

This standalone conductor is Phase 1. Phase 2 moves it into apps/coder: persist flow/step rows to Postgres, dispatch through the existing AgentBackends (instead of shelling opencode run), expose an API route, and surface launch/watch in the CoderPane. See docs/research/2026-06-03-boocode-orchestration-integration.md.

Not done yet (deliberate v1 scope)

Tool-permission safety. Workers run on opencode's default agent, so the persona (read-only by charter) is trusted to stay read-only. v2: ship mode: all, edit: deny agent files and dispatch via --agent for enforced read-only.
Conditional / dynamic flows. Steps are static (band-gated). No data- dependent branching or dynamic agent selection yet — the scheduler could support it, but no flow needs it so far.
Output parsing. Worker output is the cleaned default opencode run text (banner + tool-progress lines stripped). A --format json parser would be more robust.