Files

indifferentketchup edc348baf3 docs: changelog for v2.7.17-orchestrator + orchestration research

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-03 15:25:59 +00:00

21 KiB

Raw Blame History

Research: Can we lift Paseo's orchestration into BooCode so it can orchestrate different agents?

Open-ended question: Can we lift Paseo's multi-agent orchestration into BooCode, and can we get this done? — Evidence mode: strict.

Summary

Yes, this is achievable — but "lifting Paseo's orchestrator" turns out to be the wrong mental model, because Paseo has no orchestration engine to lift. Paseo's daemon is a process supervisor; the actual sequencing is done by the parent model, which calls create_agent / wait_for_agent / archive_agent as MCP tools inside its own reasoning loop. So the real choice is between two different things: (a) copy that model-drives-itself pattern, which is simple and Paseo-proven but fragile when a weak local model is the conductor, or (b) build a small deterministic sequencing layer in BooCode's own code, which is more work but reliable regardless of model.

BooCode already owns roughly the worker half of this — dispatch, four agent backends, parallel fan-out, per-agent git worktrees, resumable sessions. What it lacks is any notion of "do step B after step A and feed A's output into B." Which path is right hinges on one unanswered question: who conducts — a strong model like Claude, or a weak local model like Qwen? With Claude, the simple copy-Paseo path wins and a deterministic engine would be over-engineering. With Qwen, the deterministic-code path is necessary, because a 35B local model self-orchestrating is exactly what failed live in this session. Until that conductor question is answered, there is no single winner — there is a fork with a clear deciding criterion.

Confidence: Medium

Research Results

BooCode already owns the worker layer (A15–A19). Task dispatch runs off a Postgres LISTEN/NOTIFY 'tasks_new' fast path plus a poll fallback (A15). Four execution backends sit behind a common AgentBackend interface — opencode-server, warm-ACP, claude-SDK, and one-shot ACP/PTY (A18). Arena already fans the same prompt out to 2–5 agents in parallel, each with its own task and worktree (A16). Per-task and per-session git worktrees, resumable agent_sessions, and output capture (full text to messages.content, a diff to pending_changes) all exist. This half is real, working code.

But the sequencing substrate genuinely does not exist (A16, A17, A19). The dispatcher's poll selects on state='pending' alone — it never reads parent_task_id, and there is no depends_on, step_index, flows table, or fold/synthesis code anywhere (A17, verified by the validator against dispatcher.ts:105-110). Arena is parallel-only and single-step; its "winner selection" is a [SELECTED] string prefix that no code downstream consumes (A16). The one inter-task channel a parent can cheaply read — tasks.output_summary — is capped at 500 characters in every completion path (A19), far too small to pass a real artifact (a research finding, a diff) from one step into the next. So the existing substrate is a head-start on dispatch but actively wrong-shaped for data flow between steps.

Paseo's orchestrator is liftable as plumbing, but it isn't a conductor (A20–A24). AgentManager is a plain TypeScript class with no React-Native/Zustand dependency (A20, confirmed), and the MCP tool server exposing create_agent/wait_for_agent/archive_agent sits cleanly on top (A21). Agent state is JSON files on disk, parent/child is a single label on a flat map (A23, A24) — trivial to replicate. The decisive finding: Paseo has no deterministic DAG or sequencing engine (A22). The parent agent — itself a running ACP process — decides the order by making MCP tool calls in its own loop. Paseo's daemon only spawns, waits, and bookkeeps.

The web evidence argues that model-self-orchestration is fragile for weak models, and recommends deterministic code sequencing (A4, A5, A6, A9) — corroborated live (A25). Disinterested academic work reports a deterministic engine calling the model only for bounded sub-tasks beats self-orchestration by ~10 points with large reductions in turns/tool-calls (A6), and that flat-context steering accuracy falls from ~60% to ~21% as agent count climbs (A9, though this is the 3→10-agent regime, not a 2–3-step bounded flow). In this very session, opencode/Qwen-35B ran the han research skill, dispatched the first specialist, then dropped the adversarial-validator step entirely (A25) — a live, n=1 instance of exactly this failure. The lowest-ops way to host deterministic sequencing on a single-user Postgres stack is in-process Postgres-backed execution; Temporal is too heavy for one user (A1, A12), Restate is lighter but a second stateful service (A2, A3), Inngest isn't truly self-hostable (A13, A14).

Conflict surfaced: the prior-art angle pushes toward a deterministic engine; the Paseo codebase proves the opposite pattern (model self-orchestration) ships and works — with a capable parent model. The two only reconcile once you fix who the conductor is.

Options to Consider

O-A: Copy Paseo's pattern — model self-orchestrates via an MCP toolbox

What it is: Expose create_agent / wait_for_agent / archive_agent (and friends) as MCP tools over BooCode's existing backends; a parent agent sequences sub-agents from inside its own reasoning loop. Optionally lift Paseo's AgentManager + MCP server (~5 files, no RN deps) rather than writing the toolbox fresh.
Trade-offs: Lowest effort and Paseo-proven — with a strong parent model. Fragile with a weak local conductor: non-deterministic, workflow state hidden in the model's context, no crash-replay. This is the pattern that dropped a step on Qwen in this session (A25).
Rests on: (A20, A21, A22, A24) — and the con on (A4, A5, A6, A9, A25)
Evidence status: corroborated

O-B: Build a deterministic code flow-runner on BooCode's own primitives

What it is: Add step sequencing in code — a depends_on / step-state column on the existing tasks table, dispatched by the existing LISTEN/NOTIFY poll; a real result-passing channel (replacing the 500-char output_summary); and a fold step. Agents stay bounded single-task workers.
Trade-offs: Reliable regardless of conductor model and crash-resumable. But the missing sequencing + result-passing + fold is greenfield — the reused worker layer is plumbing, this is where the schedule and the distributed-systems hazards live. The hand-rolled variant is genuinely near-zero new infra; a library (DBOS Transact) is not, given BooCode's host-systemd/Docker split and dual-schema DB (see V6) — DBOS is an unvalidated option, not a co-equal one.
Rests on: (A15, A16, A17, A19) for the substrate; (A6) for the architecture; (A1, A7, A8, A10) for the tooling angle — the last two vendor-sourced and caveated.
Evidence status: architecture corroborated; specific tooling (DBOS/pg-workflows) single-source (caveated)

O-C: Hybrid — lift Paseo's ACP supervisor as the worker layer, code orchestrator on top

What it is: Replace BooCode's backends with Paseo's lifted ACP client + AgentManager, then put an O-B deterministic conductor over it.
Trade-offs: Net-negative. BooCode already has four working backends; importing Paseo's provider tree (heavy intra-package @getpaseo/protocol coupling, per V7) swaps working code for a foreign dependency. Recommended against.
Rests on: (A18, A20) — and V7
Evidence status: corroborated (against)

Recommendation

Recommendation: No single winner until the conductor model is fixed — this is the deciding criterion.
- If the conductor is a strong model (Claude, in-stack today): choose O-A. It is the simpler, Paseo-proven path, and building a deterministic engine would be over-engineering scope that wasn't asked for. Expose the MCP toolbox over BooCode's existing backends; lifting Paseo's AgentManager/MCP server is optional convenience, not necessity.
- If the conductor must be a weak local model (Qwen) — i.e. the goal is free/local multi-agent flows: choose O-B, hand-rolled (depends_on + step-state on tasks, dispatched by the existing LISTEN/NOTIFY, plus a real result channel and a fold step). Determinism is not optional here; the model cannot be trusted to sequence itself (A25). Treat DBOS as an unvalidated alternative, not the default.
- In neither case O-C, and in neither case lift Paseo's conductor — there isn't one to lift (A22).
Evidence basis: The codebase findings (A15–A24) are current-state anchors, all independently verified by the validator against live source — the worker-layer-exists / sequencing-absent / Paseo-has-no-engine conclusions are solid. The "weak models can't self-orchestrate" direction rests on one disinterested source (A6) plus general degradation work (A9) and a single live anecdote (A25) — enough to make O-A fragile on Qwen, not enough to call it impossible. The specific DBOS tooling pick rests only on vendor marketing (A7, A8) and a single-source library (A10), so it is explicitly demoted. The fork itself — which option wins — rests on a constraint (conductor model) that the question never stated and the operator must supply.

Validation

V1: The "reuse 70%, build 30%" split is inverted on effort

Strategy: Challenge the Recommendation
Investigation: Read dispatcher.ts end-to-end; poll query dispatches on state='pending' only, no dependency awareness; no fold/synthesis code exists.
Result: Partially Refuted
Impact: The reused 70% is already-debugged plumbing; the missing piece (crash-resumable step state machine, result-passing, fold) is 100% greenfield and is where the risk lives. Direction survives; the effort framing was misleading and is corrected in O-B.

V2: The 500-char `output_summary` cap makes the substrate hostile to result-passing

Strategy: Challenge the Evidence
Investigation: The 500-char cap is applied in every completion path (dispatcher.ts:249/440/509/855/1120/1375); full output goes to messages.content (50k) but the cheap parent-readable field is 500 chars.
Result: Confirmed
Impact: Result-passing must replace this primitive, not reuse it. Folded into O-B's scope.

V3: "O-A fails on Qwen" is over-weighted by one live anecdote (A25)

Strategy: Challenge the Evidence-Gathering Integrity
Investigation: A25 is n=1, self-collected this session; A9's degradation figures are for 3→10 agents in flat context, not a 2–3-step bounded flow; no test of Qwen on a tightened skill.
Result: Partially Refuted
Impact: Reworded to "O-A is fragile with a weak local conductor," not "fails." A stronger local model or a step-gated skill could flip it.

V4: The synthesis assumes BooCode wants deterministic flows

Strategy: Challenge the Recommendation
Investigation: The question was "lift Paseo's orchestration," which Paseo proves works via MCP self-orchestration with a strong parent. Nothing in the question mandates a weak local conductor; BooCode already runs Claude-SDK and opencode backends.
Result: Refuted (as an unstated assumption)
Impact: Load-bearing. The recommendation was rewritten from "O-B" into the conditional fork above, with conductor-model as the explicit deciding criterion.

V5: Discounting interested-party web sources (A4, A7, A8, A13, A14)

Strategy: Challenge the Evidence-Gathering Integrity
Investigation: Sensitivity test. Remove A4 (Praetorian): architecture claim still holds on A6 (disinterested arXiv). Remove A7/A8/A10 (DBOS/pg-workflows): the specific tooling recommendation loses its entire basis. Remove A13/A14 (Inngest): irrelevant.
Result: Partially Refuted
Impact: Architectural recommendation survives on A6 alone; the DBOS tooling pick was demoted to an unvalidated option in O-B.

V6: "No new infra, just Postgres" ignores the host/container/dual-schema split

Strategy: Challenge the Fix
Investigation: BooCoder runs as a host systemd service; apps/server runs in Docker; the coder schema is applied separately. DBOS keeps its own system tables and owns transaction boundaries — a third schema-owner in the shared DB plus a second durable-execution engine overlapping the existing poll machinery.
Result: Confirmed
Impact: "No new infra" is true only for the hand-rolled variant, false for DBOS. O-B now recommends hand-rolled and separates the two.

V7: The Paseo "5 files, 2–3 days" lift drags in the provider tree

Strategy: Challenge the Evidence
Investigation: agent-manager.ts imports 30+ types from agent-sdk-types, plus @getpaseo/protocol/* and the full provider implementations. The "no React-Native deps" claim is true; the implied cheap isolated lift is not.
Result: Partially Refuted
Impact: O-C (reuse Paseo's worker supervisor) is net-negative since BooCode already has four working backends. O-C recommended against.

V8: Provenance of the two untracked artifact files

Strategy: Challenge the Evidence-Gathering Integrity
Investigation: Untracked docs/features/git-diff-panel/artifacts/*.md are the synthesis's own working files, not fetched external content; low injection risk but unversioned.
Result: Confirmed (low severity)
Impact: Codebase claims (A15–A24) are reproducible and were checked; web claims (A1–A14) are not re-fetchable from here and were taken on the retrieval's word.

Adjustments Made

The recommendation did not survive in its original "O-B as the spine" form. It was rewritten into the conditional fork above (deciding criterion: conductor model), per V4. O-C was dropped (V7); DBOS was demoted from co-equal to unvalidated (V5, V6); the "O-A fails on Qwen" claim was softened to "fragile" (V3); the effort framing and the 500-char result-passing problem were folded into O-B's scope (V1, V2).

Confidence Assessment

Confidence: Medium
Remaining Risks: The web tier (A1–A14) is unverifiable from this environment and the DBOS/pg-workflows specifics are vendor/single-sourced. A25 is n=1. The single assumption that flips the entire recommendation — whether the conductor is Claude or Qwen — is the operator's to confirm; everything downstream depends on it.

Sources

ID	Source	Link / location	Retrieved	Trust class	Summary (one line)	Evidence status
A1	Nango: left Temporal for Postgres orchestration	https://nango.dev/blog/migrating-from-temporal-to-a-postgres-based-task-orchestrator/	2026-06-03	web	Temporal ops overhead drove a move to Postgres-backed orchestration	corroborated by A12
A2	Restate self-hosted overview	https://docs.restate.dev/server/overview	2026-06-03	web	Single binary, embedded RocksDB, no external DB	corroborated by A3
A3	Show HN: Restate	https://news.ycombinator.com/item?id=40659160	2026-06-03	web	Confirms single-binary lightweight deploy	corroborated by A2
A4	Praetorian: Deterministic AI Orchestration	https://www.praetorian.com/blog/deterministic-ai-orchestration-a-platform-architecture-for-autonomous-development/	2026-06-03	web	External orchestration beats self-orchestration; small models do better as bounded workers	corroborated by A6, A9 (interested party)
A5	Hatchworks: Orchestrating AI Agents	https://hatchworks.com/blog/ai-agents/orchestrating-ai-agents/	2026-06-03	web	Self-orchestration failure modes; external control plane as default	corroborated by A4, A6
A6	arXiv 2508.02721 Blueprint-First	https://arxiv.org/abs/2508.02721	2026-06-03	web	Deterministic engine + LLM for bounded sub-tasks; +10.1pp, fewer turns	corroborated by A4, A5
A7	Show HN: DBOS TypeScript	https://news.ycombinator.com/item?id=42727970	2026-06-03	web	In-process Postgres durable execution, decorator steps	corroborated by A8
A8	DBOS Transact	https://www.dbos.dev/dbos-transact	2026-06-03	web	"Just your program and Postgres," no orchestration server	corroborated by A7 (interested party)
A9	arXiv 2604.07911 context scoping	https://arxiv.org/pdf/2604.07911	2026-06-03	web	Flat-context steering accuracy 60%→21% from 3→10 agents	corroborated by A4, A6
A10	pg-workflows	https://sokratisvidros.github.io/pg-workflows/	2026-06-03	web	Pure-Postgres TS workflow library, step exactly-once	single source (caveated)
A11	Hatchet v1 HN	https://news.ycombinator.com/item?id=43572733	2026-06-03	web	Postgres + RabbitMQ; separate server process	corroborated by A1
A12	Temporal self-host guide	https://docs.temporal.io/self-hosted-guide/deployment	2026-06-03	web	Multi-service self-host overhead	corroborated by A1
A13	Inngest vs Trigger vs Restate	https://www.pkgpulse.com/guides/inngest-vs-trigger-dev-v3-vs-restate-2026	2026-06-03	web	Inngest cloud-first; not truly self-hostable	corroborated by A14
A14	Inngest self-hosting docs	https://www.inngest.com/docs/self-hosting	2026-06-03	web	Engine proprietary to Inngest Cloud	corroborated by A13 (interested party)
A15	BooCode dispatcher + LISTEN/NOTIFY	`apps/coder/src/services/dispatcher.ts:46`	n/a	codebase	Task dispatch via `tasks_new` notify + poll; backend routing	corroborated by A18
A16	BooCode Arena	`apps/coder/src/routes/arena.ts:34`	n/a	codebase	Parallel fan-out 2–5 contestants; selection is `[SELECTED]` prefix, no consumer; sessionless	single source (codebase anchor)
A17	BooCode tasks table	`apps/coder/src/schema.sql:18`	n/a	codebase	`parent_task_id` FK written-but-not-dispatched-on; no `depends_on`/step/flows	single source (codebase anchor)
A18	AgentBackend + 4 backends	`apps/coder/src/services/agent-backend.ts:97`	n/a	codebase	Common ensureSession/prompt surface; opencode/warm-acp/claude-sdk/one-shot	corroborated by A15
A19	new_task / output_summary cap	`apps/coder/src/services/tools/new_task.ts:13`	n/a	codebase	Native-only tools; parent reads only 500-char `output_summary`	single source (codebase anchor)
A20	Paseo AgentManager	`/opt/forks/paseo/packages/server/src/server/agent/agent-manager.ts:413`	n/a	codebase	Plain TS class, no RN deps; create/stream/run/wait/cascade-archive	corroborated by A22
A21	Paseo createAgentMcpServer	`/opt/forks/paseo/.../agent/mcp-server.ts:479`	n/a	codebase	create_agent/wait_for_agent/archive_agent as MCP tools; child gets parent MCP URL	corroborated by A20
A22	Paseo has no sequencing engine	`/opt/forks/paseo/.../agent-manager.ts` (verdict)	n/a	codebase	Parent model self-orchestrates via MCP; daemon supervises only	single source (codebase anchor)
A23	Paseo AgentStorage	`/opt/forks/paseo/.../agent/agent-storage.ts:84`	n/a	codebase	JSON files on disk + in-memory timeline, no DB	single source (codebase anchor)
A24	Paseo parent/child label	`/opt/forks/paseo/packages/protocol/src/agent-labels.ts`	n/a	codebase	Relationship is one label on a flat map	corroborated by A22
A25	Live smoke test (this session)	provided: opencode/Qwen-35B han research run	n/a	provided	Qwen dispatched analyst, dropped the validator step, skipped template	single source (live, n=1)

A22: Paseo has no deterministic sequencing engine — recommendation-bearing

Link / location: /opt/forks/paseo/packages/server/src/server/agent/agent-manager.ts:413 (+ explorer verdict)
Retrieved: n/a
Trust class: codebase (current-state anchor)
Summary: Paseo's daemon spawns, waits on, and bookkeeps agents; it contains no DAG or workflow engine. The parent agent — itself an ACP process — does all sequencing by calling create_agent/wait_for_agent/archive_agent as MCP tools in its own reasoning loop. This is why "lift Paseo's orchestrator" is a category error: the conductor is the model, not Paseo. It reframes the entire recommendation into "build a conductor (O-B) vs adopt model-as-conductor (O-A)."
Evidence status: corroborated by A20, A24

A17: BooCode tasks table lacks sequencing columns — recommendation-bearing

Link / location: apps/coder/src/schema.sql:18
Retrieved: n/a
Trust class: codebase (current-state anchor)
Summary: The tasks table has parent_task_id (written by new_task, read only by list_tasks, never by the dispatcher) but no depends_on, step_index, or flows definition. The dispatcher poll selects on state='pending' alone. This is the concrete gap O-B must fill, and it confirms the deterministic sequencing substrate genuinely does not exist today.
Evidence status: single source (codebase anchor), verified by the validator against live source

A6: Blueprint-First deterministic workflow (arXiv) — recommendation-bearing

Link / location: https://arxiv.org/abs/2508.02721
Retrieved: 2026-06-03
Trust class: web (peer-reviewed preprint, disinterested)
Summary: A deterministic engine executes an expert-defined blueprint and calls the LLM only for bounded sub-tasks, never to decide workflow path; reports ~10-point gains and large reductions in turns and tool calls versus self-orchestrating agents. This is the one disinterested source carrying the "deterministic code beats model self-orchestration" direction after the interested parties (A4) are discounted.
Evidence status: corroborated by A4, A5, A9

21 KiB Raw Blame History Unescape Escape