21 KiB
Research: Can we lift Paseo's orchestration into BooCode so it can orchestrate different agents?
Open-ended question: Can we lift Paseo's multi-agent orchestration into BooCode, and can we get this done? — Evidence mode: strict.
Summary
Yes, this is achievable — but "lifting Paseo's orchestrator" turns out to be the wrong mental model, because Paseo has no orchestration engine to lift. Paseo's daemon is a process supervisor; the actual sequencing is done by the parent model, which calls create_agent / wait_for_agent / archive_agent as MCP tools inside its own reasoning loop. So the real choice is between two different things: (a) copy that model-drives-itself pattern, which is simple and Paseo-proven but fragile when a weak local model is the conductor, or (b) build a small deterministic sequencing layer in BooCode's own code, which is more work but reliable regardless of model.
BooCode already owns roughly the worker half of this — dispatch, four agent backends, parallel fan-out, per-agent git worktrees, resumable sessions. What it lacks is any notion of "do step B after step A and feed A's output into B." Which path is right hinges on one unanswered question: who conducts — a strong model like Claude, or a weak local model like Qwen? With Claude, the simple copy-Paseo path wins and a deterministic engine would be over-engineering. With Qwen, the deterministic-code path is necessary, because a 35B local model self-orchestrating is exactly what failed live in this session. Until that conductor question is answered, there is no single winner — there is a fork with a clear deciding criterion.
- Confidence: Medium
Research Results
BooCode already owns the worker layer (A15–A19). Task dispatch runs off a Postgres LISTEN/NOTIFY 'tasks_new' fast path plus a poll fallback (A15). Four execution backends sit behind a common AgentBackend interface — opencode-server, warm-ACP, claude-SDK, and one-shot ACP/PTY (A18). Arena already fans the same prompt out to 2–5 agents in parallel, each with its own task and worktree (A16). Per-task and per-session git worktrees, resumable agent_sessions, and output capture (full text to messages.content, a diff to pending_changes) all exist. This half is real, working code.
But the sequencing substrate genuinely does not exist (A16, A17, A19). The dispatcher's poll selects on state='pending' alone — it never reads parent_task_id, and there is no depends_on, step_index, flows table, or fold/synthesis code anywhere (A17, verified by the validator against dispatcher.ts:105-110). Arena is parallel-only and single-step; its "winner selection" is a [SELECTED] string prefix that no code downstream consumes (A16). The one inter-task channel a parent can cheaply read — tasks.output_summary — is capped at 500 characters in every completion path (A19), far too small to pass a real artifact (a research finding, a diff) from one step into the next. So the existing substrate is a head-start on dispatch but actively wrong-shaped for data flow between steps.
Paseo's orchestrator is liftable as plumbing, but it isn't a conductor (A20–A24). AgentManager is a plain TypeScript class with no React-Native/Zustand dependency (A20, confirmed), and the MCP tool server exposing create_agent/wait_for_agent/archive_agent sits cleanly on top (A21). Agent state is JSON files on disk, parent/child is a single label on a flat map (A23, A24) — trivial to replicate. The decisive finding: Paseo has no deterministic DAG or sequencing engine (A22). The parent agent — itself a running ACP process — decides the order by making MCP tool calls in its own loop. Paseo's daemon only spawns, waits, and bookkeeps.
The web evidence argues that model-self-orchestration is fragile for weak models, and recommends deterministic code sequencing (A4, A5, A6, A9) — corroborated live (A25). Disinterested academic work reports a deterministic engine calling the model only for bounded sub-tasks beats self-orchestration by ~10 points with large reductions in turns/tool-calls (A6), and that flat-context steering accuracy falls from ~60% to ~21% as agent count climbs (A9, though this is the 3→10-agent regime, not a 2–3-step bounded flow). In this very session, opencode/Qwen-35B ran the han research skill, dispatched the first specialist, then dropped the adversarial-validator step entirely (A25) — a live, n=1 instance of exactly this failure. The lowest-ops way to host deterministic sequencing on a single-user Postgres stack is in-process Postgres-backed execution; Temporal is too heavy for one user (A1, A12), Restate is lighter but a second stateful service (A2, A3), Inngest isn't truly self-hostable (A13, A14).
Conflict surfaced: the prior-art angle pushes toward a deterministic engine; the Paseo codebase proves the opposite pattern (model self-orchestration) ships and works — with a capable parent model. The two only reconcile once you fix who the conductor is.
Options to Consider
O-A: Copy Paseo's pattern — model self-orchestrates via an MCP toolbox
- What it is: Expose
create_agent/wait_for_agent/archive_agent(and friends) as MCP tools over BooCode's existing backends; a parent agent sequences sub-agents from inside its own reasoning loop. Optionally lift Paseo'sAgentManager+ MCP server (~5 files, no RN deps) rather than writing the toolbox fresh. - Trade-offs: Lowest effort and Paseo-proven — with a strong parent model. Fragile with a weak local conductor: non-deterministic, workflow state hidden in the model's context, no crash-replay. This is the pattern that dropped a step on Qwen in this session (A25).
- Rests on: (A20, A21, A22, A24) — and the con on (A4, A5, A6, A9, A25)
- Evidence status: corroborated
O-B: Build a deterministic code flow-runner on BooCode's own primitives
- What it is: Add step sequencing in code — a
depends_on/ step-state column on the existingtaskstable, dispatched by the existingLISTEN/NOTIFYpoll; a real result-passing channel (replacing the 500-charoutput_summary); and a fold step. Agents stay bounded single-task workers. - Trade-offs: Reliable regardless of conductor model and crash-resumable. But the missing sequencing + result-passing + fold is greenfield — the reused worker layer is plumbing, this is where the schedule and the distributed-systems hazards live. The hand-rolled variant is genuinely near-zero new infra; a library (DBOS Transact) is not, given BooCode's host-systemd/Docker split and dual-schema DB (see V6) — DBOS is an unvalidated option, not a co-equal one.
- Rests on: (A15, A16, A17, A19) for the substrate; (A6) for the architecture; (A1, A7, A8, A10) for the tooling angle — the last two vendor-sourced and caveated.
- Evidence status: architecture corroborated; specific tooling (DBOS/pg-workflows) single-source (caveated)
O-C: Hybrid — lift Paseo's ACP supervisor as the worker layer, code orchestrator on top
- What it is: Replace BooCode's backends with Paseo's lifted ACP client +
AgentManager, then put an O-B deterministic conductor over it. - Trade-offs: Net-negative. BooCode already has four working backends; importing Paseo's provider tree (heavy intra-package
@getpaseo/protocolcoupling, per V7) swaps working code for a foreign dependency. Recommended against. - Rests on: (A18, A20) — and V7
- Evidence status: corroborated (against)
Recommendation
- Recommendation: No single winner until the conductor model is fixed — this is the deciding criterion.
- If the conductor is a strong model (Claude, in-stack today): choose O-A. It is the simpler, Paseo-proven path, and building a deterministic engine would be over-engineering scope that wasn't asked for. Expose the MCP toolbox over BooCode's existing backends; lifting Paseo's
AgentManager/MCP server is optional convenience, not necessity. - If the conductor must be a weak local model (Qwen) — i.e. the goal is free/local multi-agent flows: choose O-B, hand-rolled (
depends_on+ step-state ontasks, dispatched by the existingLISTEN/NOTIFY, plus a real result channel and a fold step). Determinism is not optional here; the model cannot be trusted to sequence itself (A25). Treat DBOS as an unvalidated alternative, not the default. - In neither case O-C, and in neither case lift Paseo's conductor — there isn't one to lift (A22).
- If the conductor is a strong model (Claude, in-stack today): choose O-A. It is the simpler, Paseo-proven path, and building a deterministic engine would be over-engineering scope that wasn't asked for. Expose the MCP toolbox over BooCode's existing backends; lifting Paseo's
- Evidence basis: The codebase findings (A15–A24) are current-state anchors, all independently verified by the validator against live source — the worker-layer-exists / sequencing-absent / Paseo-has-no-engine conclusions are solid. The "weak models can't self-orchestrate" direction rests on one disinterested source (A6) plus general degradation work (A9) and a single live anecdote (A25) — enough to make O-A fragile on Qwen, not enough to call it impossible. The specific DBOS tooling pick rests only on vendor marketing (A7, A8) and a single-source library (A10), so it is explicitly demoted. The fork itself — which option wins — rests on a constraint (conductor model) that the question never stated and the operator must supply.
Validation
V1: The "reuse 70%, build 30%" split is inverted on effort
- Strategy: Challenge the Recommendation
- Investigation: Read
dispatcher.tsend-to-end; poll query dispatches onstate='pending'only, no dependency awareness; no fold/synthesis code exists. - Result: Partially Refuted
- Impact: The reused 70% is already-debugged plumbing; the missing piece (crash-resumable step state machine, result-passing, fold) is 100% greenfield and is where the risk lives. Direction survives; the effort framing was misleading and is corrected in O-B.
V2: The 500-char output_summary cap makes the substrate hostile to result-passing
- Strategy: Challenge the Evidence
- Investigation: The 500-char cap is applied in every completion path (
dispatcher.ts:249/440/509/855/1120/1375); full output goes tomessages.content(50k) but the cheap parent-readable field is 500 chars. - Result: Confirmed
- Impact: Result-passing must replace this primitive, not reuse it. Folded into O-B's scope.
V3: "O-A fails on Qwen" is over-weighted by one live anecdote (A25)
- Strategy: Challenge the Evidence-Gathering Integrity
- Investigation: A25 is n=1, self-collected this session; A9's degradation figures are for 3→10 agents in flat context, not a 2–3-step bounded flow; no test of Qwen on a tightened skill.
- Result: Partially Refuted
- Impact: Reworded to "O-A is fragile with a weak local conductor," not "fails." A stronger local model or a step-gated skill could flip it.
V4: The synthesis assumes BooCode wants deterministic flows
- Strategy: Challenge the Recommendation
- Investigation: The question was "lift Paseo's orchestration," which Paseo proves works via MCP self-orchestration with a strong parent. Nothing in the question mandates a weak local conductor; BooCode already runs Claude-SDK and opencode backends.
- Result: Refuted (as an unstated assumption)
- Impact: Load-bearing. The recommendation was rewritten from "O-B" into the conditional fork above, with conductor-model as the explicit deciding criterion.
V5: Discounting interested-party web sources (A4, A7, A8, A13, A14)
- Strategy: Challenge the Evidence-Gathering Integrity
- Investigation: Sensitivity test. Remove A4 (Praetorian): architecture claim still holds on A6 (disinterested arXiv). Remove A7/A8/A10 (DBOS/pg-workflows): the specific tooling recommendation loses its entire basis. Remove A13/A14 (Inngest): irrelevant.
- Result: Partially Refuted
- Impact: Architectural recommendation survives on A6 alone; the DBOS tooling pick was demoted to an unvalidated option in O-B.
V6: "No new infra, just Postgres" ignores the host/container/dual-schema split
- Strategy: Challenge the Fix
- Investigation: BooCoder runs as a host systemd service;
apps/serverruns in Docker; the coder schema is applied separately. DBOS keeps its own system tables and owns transaction boundaries — a third schema-owner in the shared DB plus a second durable-execution engine overlapping the existing poll machinery. - Result: Confirmed
- Impact: "No new infra" is true only for the hand-rolled variant, false for DBOS. O-B now recommends hand-rolled and separates the two.
V7: The Paseo "5 files, 2–3 days" lift drags in the provider tree
- Strategy: Challenge the Evidence
- Investigation:
agent-manager.tsimports 30+ types fromagent-sdk-types, plus@getpaseo/protocol/*and the full provider implementations. The "no React-Native deps" claim is true; the implied cheap isolated lift is not. - Result: Partially Refuted
- Impact: O-C (reuse Paseo's worker supervisor) is net-negative since BooCode already has four working backends. O-C recommended against.
V8: Provenance of the two untracked artifact files
- Strategy: Challenge the Evidence-Gathering Integrity
- Investigation: Untracked
docs/features/git-diff-panel/artifacts/*.mdare the synthesis's own working files, not fetched external content; low injection risk but unversioned. - Result: Confirmed (low severity)
- Impact: Codebase claims (A15–A24) are reproducible and were checked; web claims (A1–A14) are not re-fetchable from here and were taken on the retrieval's word.
Adjustments Made
The recommendation did not survive in its original "O-B as the spine" form. It was rewritten into the conditional fork above (deciding criterion: conductor model), per V4. O-C was dropped (V7); DBOS was demoted from co-equal to unvalidated (V5, V6); the "O-A fails on Qwen" claim was softened to "fragile" (V3); the effort framing and the 500-char result-passing problem were folded into O-B's scope (V1, V2).
Confidence Assessment
- Confidence: Medium
- Remaining Risks: The web tier (A1–A14) is unverifiable from this environment and the DBOS/pg-workflows specifics are vendor/single-sourced. A25 is n=1. The single assumption that flips the entire recommendation — whether the conductor is Claude or Qwen — is the operator's to confirm; everything downstream depends on it.
Sources
| ID | Source | Link / location | Retrieved | Trust class | Summary (one line) | Evidence status |
|---|---|---|---|---|---|---|
| A1 | Nango: left Temporal for Postgres orchestration | https://nango.dev/blog/migrating-from-temporal-to-a-postgres-based-task-orchestrator/ | 2026-06-03 | web | Temporal ops overhead drove a move to Postgres-backed orchestration | corroborated by A12 |
| A2 | Restate self-hosted overview | https://docs.restate.dev/server/overview | 2026-06-03 | web | Single binary, embedded RocksDB, no external DB | corroborated by A3 |
| A3 | Show HN: Restate | https://news.ycombinator.com/item?id=40659160 | 2026-06-03 | web | Confirms single-binary lightweight deploy | corroborated by A2 |
| A4 | Praetorian: Deterministic AI Orchestration | https://www.praetorian.com/blog/deterministic-ai-orchestration-a-platform-architecture-for-autonomous-development/ | 2026-06-03 | web | External orchestration beats self-orchestration; small models do better as bounded workers | corroborated by A6, A9 (interested party) |
| A5 | Hatchworks: Orchestrating AI Agents | https://hatchworks.com/blog/ai-agents/orchestrating-ai-agents/ | 2026-06-03 | web | Self-orchestration failure modes; external control plane as default | corroborated by A4, A6 |
| A6 | arXiv 2508.02721 Blueprint-First | https://arxiv.org/abs/2508.02721 | 2026-06-03 | web | Deterministic engine + LLM for bounded sub-tasks; +10.1pp, fewer turns | corroborated by A4, A5 |
| A7 | Show HN: DBOS TypeScript | https://news.ycombinator.com/item?id=42727970 | 2026-06-03 | web | In-process Postgres durable execution, decorator steps | corroborated by A8 |
| A8 | DBOS Transact | https://www.dbos.dev/dbos-transact | 2026-06-03 | web | "Just your program and Postgres," no orchestration server | corroborated by A7 (interested party) |
| A9 | arXiv 2604.07911 context scoping | https://arxiv.org/pdf/2604.07911 | 2026-06-03 | web | Flat-context steering accuracy 60%→21% from 3→10 agents | corroborated by A4, A6 |
| A10 | pg-workflows | https://sokratisvidros.github.io/pg-workflows/ | 2026-06-03 | web | Pure-Postgres TS workflow library, step exactly-once | single source (caveated) |
| A11 | Hatchet v1 HN | https://news.ycombinator.com/item?id=43572733 | 2026-06-03 | web | Postgres + RabbitMQ; separate server process | corroborated by A1 |
| A12 | Temporal self-host guide | https://docs.temporal.io/self-hosted-guide/deployment | 2026-06-03 | web | Multi-service self-host overhead | corroborated by A1 |
| A13 | Inngest vs Trigger vs Restate | https://www.pkgpulse.com/guides/inngest-vs-trigger-dev-v3-vs-restate-2026 | 2026-06-03 | web | Inngest cloud-first; not truly self-hostable | corroborated by A14 |
| A14 | Inngest self-hosting docs | https://www.inngest.com/docs/self-hosting | 2026-06-03 | web | Engine proprietary to Inngest Cloud | corroborated by A13 (interested party) |
| A15 | BooCode dispatcher + LISTEN/NOTIFY | apps/coder/src/services/dispatcher.ts:46 |
n/a | codebase | Task dispatch via tasks_new notify + poll; backend routing |
corroborated by A18 |
| A16 | BooCode Arena | apps/coder/src/routes/arena.ts:34 |
n/a | codebase | Parallel fan-out 2–5 contestants; selection is [SELECTED] prefix, no consumer; sessionless |
single source (codebase anchor) |
| A17 | BooCode tasks table | apps/coder/src/schema.sql:18 |
n/a | codebase | parent_task_id FK written-but-not-dispatched-on; no depends_on/step/flows |
single source (codebase anchor) |
| A18 | AgentBackend + 4 backends | apps/coder/src/services/agent-backend.ts:97 |
n/a | codebase | Common ensureSession/prompt surface; opencode/warm-acp/claude-sdk/one-shot | corroborated by A15 |
| A19 | new_task / output_summary cap | apps/coder/src/services/tools/new_task.ts:13 |
n/a | codebase | Native-only tools; parent reads only 500-char output_summary |
single source (codebase anchor) |
| A20 | Paseo AgentManager | /opt/forks/paseo/packages/server/src/server/agent/agent-manager.ts:413 |
n/a | codebase | Plain TS class, no RN deps; create/stream/run/wait/cascade-archive | corroborated by A22 |
| A21 | Paseo createAgentMcpServer | /opt/forks/paseo/.../agent/mcp-server.ts:479 |
n/a | codebase | create_agent/wait_for_agent/archive_agent as MCP tools; child gets parent MCP URL | corroborated by A20 |
| A22 | Paseo has no sequencing engine | /opt/forks/paseo/.../agent-manager.ts (verdict) |
n/a | codebase | Parent model self-orchestrates via MCP; daemon supervises only | single source (codebase anchor) |
| A23 | Paseo AgentStorage | /opt/forks/paseo/.../agent/agent-storage.ts:84 |
n/a | codebase | JSON files on disk + in-memory timeline, no DB | single source (codebase anchor) |
| A24 | Paseo parent/child label | /opt/forks/paseo/packages/protocol/src/agent-labels.ts |
n/a | codebase | Relationship is one label on a flat map | corroborated by A22 |
| A25 | Live smoke test (this session) | provided: opencode/Qwen-35B han research run | n/a | provided | Qwen dispatched analyst, dropped the validator step, skipped template | single source (live, n=1) |
A22: Paseo has no deterministic sequencing engine — recommendation-bearing
- Link / location:
/opt/forks/paseo/packages/server/src/server/agent/agent-manager.ts:413(+ explorer verdict) - Retrieved: n/a
- Trust class: codebase (current-state anchor)
- Summary: Paseo's daemon spawns, waits on, and bookkeeps agents; it contains no DAG or workflow engine. The parent agent — itself an ACP process — does all sequencing by calling
create_agent/wait_for_agent/archive_agentas MCP tools in its own reasoning loop. This is why "lift Paseo's orchestrator" is a category error: the conductor is the model, not Paseo. It reframes the entire recommendation into "build a conductor (O-B) vs adopt model-as-conductor (O-A)." - Evidence status: corroborated by A20, A24
A17: BooCode tasks table lacks sequencing columns — recommendation-bearing
- Link / location:
apps/coder/src/schema.sql:18 - Retrieved: n/a
- Trust class: codebase (current-state anchor)
- Summary: The
taskstable hasparent_task_id(written bynew_task, read only bylist_tasks, never by the dispatcher) but nodepends_on,step_index, orflowsdefinition. The dispatcher poll selects onstate='pending'alone. This is the concrete gap O-B must fill, and it confirms the deterministic sequencing substrate genuinely does not exist today. - Evidence status: single source (codebase anchor), verified by the validator against live source
A6: Blueprint-First deterministic workflow (arXiv) — recommendation-bearing
- Link / location: https://arxiv.org/abs/2508.02721
- Retrieved: 2026-06-03
- Trust class: web (peer-reviewed preprint, disinterested)
- Summary: A deterministic engine executes an expert-defined blueprint and calls the LLM only for bounded sub-tasks, never to decide workflow path; reports ~10-point gains and large reductions in turns and tool calls versus self-orchestrating agents. This is the one disinterested source carrying the "deterministic code beats model self-orchestration" direction after the interested parties (A4) are discounted.
- Evidence status: corroborated by A4, A5, A9