boocode/docs/research/2026-06-03-boocode-orchestration-integration.md

# Research: Can we lift Paseo's orchestration into BooCode so it can orchestrate different agents?

Open-ended question: *Can we lift Paseo's multi-agent orchestration into BooCode, and can we get this done?* — Evidence mode: **strict**.

## Summary

Yes, this is achievable — but "lifting Paseo's orchestrator" turns out to be the wrong mental model, because **Paseo has no orchestration engine to lift**. Paseo's daemon is a process supervisor; the actual sequencing is done by the parent *model*, which calls `create_agent` / `wait_for_agent` / `archive_agent` as MCP tools inside its own reasoning loop. So the real choice is between two different things: (a) copy that model-drives-itself pattern, which is simple and Paseo-proven but fragile when a weak local model is the conductor, or (b) build a small deterministic sequencing layer in BooCode's own code, which is more work but reliable regardless of model.

BooCode already owns roughly the worker half of this — dispatch, four agent backends, parallel fan-out, per-agent git worktrees, resumable sessions. What it lacks is any notion of "do step B after step A and feed A's output into B." Which path is right hinges on one unanswered question: **who conducts — a strong model like Claude, or a weak local model like Qwen?** With Claude, the simple copy-Paseo path wins and a deterministic engine would be over-engineering. With Qwen, the deterministic-code path is necessary, because a 35B local model self-orchestrating is exactly what failed live in this session. Until that conductor question is answered, there is no single winner — there is a fork with a clear deciding criterion.

- **Confidence:** Medium

## Research Results

**BooCode already owns the worker layer (A15–A19).** Task dispatch runs off a Postgres `LISTEN/NOTIFY 'tasks_new'` fast path plus a poll fallback (A15). Four execution backends sit behind a common `AgentBackend` interface — opencode-server, warm-ACP, claude-SDK, and one-shot ACP/PTY (A18). Arena already fans the same prompt out to 2–5 agents in parallel, each with its own task and worktree (A16). Per-task and per-session git worktrees, resumable `agent_sessions`, and output capture (full text to `messages.content`, a diff to `pending_changes`) all exist. This half is real, working code.

**But the sequencing substrate genuinely does not exist (A16, A17, A19).** The dispatcher's poll selects on `state='pending'` alone — it never reads `parent_task_id`, and there is no `depends_on`, `step_index`, `flows` table, or fold/synthesis code anywhere (A17, verified by the validator against `dispatcher.ts:105-110`). Arena is parallel-only and single-step; its "winner selection" is a `[SELECTED]` string prefix that no code downstream consumes (A16). The one inter-task channel a parent can cheaply read — `tasks.output_summary` — is capped at **500 characters** in every completion path (A19), far too small to pass a real artifact (a research finding, a diff) from one step into the next. So the existing substrate is a head-start on *dispatch* but actively wrong-shaped for *data flow between steps*.

**Paseo's orchestrator is liftable as plumbing, but it isn't a conductor (A20–A24).** `AgentManager` is a plain TypeScript class with no React-Native/Zustand dependency (A20, confirmed), and the MCP tool server exposing `create_agent`/`wait_for_agent`/`archive_agent` sits cleanly on top (A21). Agent state is JSON files on disk, parent/child is a single label on a flat map (A23, A24) — trivial to replicate. **The decisive finding:** Paseo has *no* deterministic DAG or sequencing engine (A22). The parent agent — itself a running ACP process — decides the order by making MCP tool calls in its own loop. Paseo's daemon only spawns, waits, and bookkeeps.

**The web evidence argues that model-self-orchestration is fragile for weak models, and recommends deterministic code sequencing (A4, A5, A6, A9) — corroborated live (A25).** Disinterested academic work reports a deterministic engine calling the model only for bounded sub-tasks beats self-orchestration by ~10 points with large reductions in turns/tool-calls (A6), and that flat-context steering accuracy falls from ~60% to ~21% as agent count climbs (A9, though this is the 3→10-agent regime, not a 2–3-step bounded flow). In this very session, opencode/Qwen-35B ran the han research skill, dispatched the first specialist, then **dropped the adversarial-validator step entirely** (A25) — a live, n=1 instance of exactly this failure. The lowest-ops way to host deterministic sequencing on a single-user Postgres stack is in-process Postgres-backed execution; Temporal is too heavy for one user (A1, A12), Restate is lighter but a second stateful service (A2, A3), Inngest isn't truly self-hostable (A13, A14).

**Conflict surfaced:** the prior-art angle pushes toward a deterministic engine; the Paseo codebase proves the *opposite* pattern (model self-orchestration) ships and works — with a capable parent model. The two only reconcile once you fix who the conductor is.

## Options to Consider

### O-A: Copy Paseo's pattern — model self-orchestrates via an MCP toolbox

- **What it is:** Expose `create_agent` / `wait_for_agent` / `archive_agent` (and friends) as MCP tools over BooCode's existing backends; a parent agent sequences sub-agents from inside its own reasoning loop. Optionally lift Paseo's `AgentManager` + MCP server (~5 files, no RN deps) rather than writing the toolbox fresh.
- **Trade-offs:** Lowest effort and Paseo-proven — *with a strong parent model*. Fragile with a weak local conductor: non-deterministic, workflow state hidden in the model's context, no crash-replay. This is the pattern that dropped a step on Qwen in this session (A25).
- **Rests on:** (A20, A21, A22, A24) — and the con on (A4, A5, A6, A9, A25)
- **Evidence status:** corroborated

### O-B: Build a deterministic code flow-runner on BooCode's own primitives

- **What it is:** Add step sequencing in code — a `depends_on` / step-state column on the existing `tasks` table, dispatched by the existing `LISTEN/NOTIFY` poll; a real result-passing channel (replacing the 500-char `output_summary`); and a fold step. Agents stay bounded single-task workers.
- **Trade-offs:** Reliable regardless of conductor model and crash-resumable. But the missing sequencing + result-passing + fold is **greenfield** — the reused worker layer is plumbing, this is where the schedule and the distributed-systems hazards live. The hand-rolled variant is genuinely near-zero new infra; a library (DBOS Transact) is **not**, given BooCode's host-systemd/Docker split and dual-schema DB (see V6) — DBOS is an unvalidated option, not a co-equal one.
- **Rests on:** (A15, A16, A17, A19) for the substrate; (A6) for the architecture; (A1, A7, A8, A10) for the tooling angle — the last two vendor-sourced and caveated.
- **Evidence status:** architecture corroborated; specific tooling (DBOS/pg-workflows) single-source (caveated)

### O-C: Hybrid — lift Paseo's ACP supervisor as the worker layer, code orchestrator on top

- **What it is:** Replace BooCode's backends with Paseo's lifted ACP client + `AgentManager`, then put an O-B deterministic conductor over it.
- **Trade-offs:** Net-negative. BooCode already has four working backends; importing Paseo's provider tree (heavy intra-package `@getpaseo/protocol` coupling, per V7) swaps working code for a foreign dependency. Recommended **against**.
- **Rests on:** (A18, A20) — and V7
- **Evidence status:** corroborated (against)

## Recommendation

- **Recommendation:** **No single winner until the conductor model is fixed — this is the deciding criterion.**
  - **If the conductor is a strong model (Claude, in-stack today):** choose **O-A**. It is the simpler, Paseo-proven path, and building a deterministic engine would be over-engineering scope that wasn't asked for. Expose the MCP toolbox over BooCode's existing backends; lifting Paseo's `AgentManager`/MCP server is optional convenience, not necessity.
  - **If the conductor must be a weak local model (Qwen) — i.e. the goal is free/local multi-agent flows:** choose **O-B, hand-rolled** (`depends_on` + step-state on `tasks`, dispatched by the existing `LISTEN/NOTIFY`, plus a real result channel and a fold step). Determinism is not optional here; the model cannot be trusted to sequence itself (A25). Treat DBOS as an unvalidated alternative, not the default.
  - **In neither case O-C**, and in neither case lift Paseo's *conductor* — there isn't one to lift (A22).
- **Evidence basis:** The codebase findings (A15–A24) are current-state anchors, all independently verified by the validator against live source — the worker-layer-exists / sequencing-absent / Paseo-has-no-engine conclusions are solid. The "weak models can't self-orchestrate" direction rests on one disinterested source (A6) plus general degradation work (A9) and a single live anecdote (A25) — enough to make O-A *fragile* on Qwen, not enough to call it impossible. The specific DBOS tooling pick rests only on vendor marketing (A7, A8) and a single-source library (A10), so it is explicitly demoted. The fork itself — which option wins — rests on a constraint (conductor model) that the question never stated and the operator must supply.

## Validation

### V1: The "reuse 70%, build 30%" split is inverted on effort

- **Strategy:** Challenge the Recommendation
- **Investigation:** Read `dispatcher.ts` end-to-end; poll query dispatches on `state='pending'` only, no dependency awareness; no fold/synthesis code exists.
- **Result:** Partially Refuted
- **Impact:** The reused 70% is already-debugged plumbing; the missing piece (crash-resumable step state machine, result-passing, fold) is 100% greenfield and is where the risk lives. Direction survives; the *effort framing* was misleading and is corrected in O-B.

### V2: The 500-char `output_summary` cap makes the substrate hostile to result-passing

- **Strategy:** Challenge the Evidence
- **Investigation:** The 500-char cap is applied in every completion path (`dispatcher.ts:249/440/509/855/1120/1375`); full output goes to `messages.content` (50k) but the cheap parent-readable field is 500 chars.
- **Result:** Confirmed
- **Impact:** Result-passing must *replace* this primitive, not reuse it. Folded into O-B's scope.

### V3: "O-A fails on Qwen" is over-weighted by one live anecdote (A25)

- **Strategy:** Challenge the Evidence-Gathering Integrity
- **Investigation:** A25 is n=1, self-collected this session; A9's degradation figures are for 3→10 agents in flat context, not a 2–3-step bounded flow; no test of Qwen on a *tightened* skill.
- **Result:** Partially Refuted
- **Impact:** Reworded to "O-A is *fragile* with a weak local conductor," not "fails." A stronger local model or a step-gated skill could flip it.

### V4: The synthesis assumes BooCode *wants* deterministic flows

- **Strategy:** Challenge the Recommendation
- **Investigation:** The question was "lift Paseo's orchestration," which Paseo proves works via MCP self-orchestration with a strong parent. Nothing in the question mandates a weak local conductor; BooCode already runs Claude-SDK and opencode backends.
- **Result:** Refuted (as an unstated assumption)
- **Impact:** **Load-bearing.** The recommendation was rewritten from "O-B" into the conditional fork above, with conductor-model as the explicit deciding criterion.

### V5: Discounting interested-party web sources (A4, A7, A8, A13, A14)

- **Strategy:** Challenge the Evidence-Gathering Integrity
- **Investigation:** Sensitivity test. Remove A4 (Praetorian): architecture claim still holds on A6 (disinterested arXiv). Remove A7/A8/A10 (DBOS/pg-workflows): the *specific tooling* recommendation loses its entire basis. Remove A13/A14 (Inngest): irrelevant.
- **Result:** Partially Refuted
- **Impact:** Architectural recommendation survives on A6 alone; the DBOS tooling pick was demoted to an unvalidated option in O-B.

### V6: "No new infra, just Postgres" ignores the host/container/dual-schema split

- **Strategy:** Challenge the Fix
- **Investigation:** BooCoder runs as a host systemd service; `apps/server` runs in Docker; the coder schema is applied separately. DBOS keeps its own system tables and owns transaction boundaries — a third schema-owner in the shared DB plus a second durable-execution engine overlapping the existing poll machinery.
- **Result:** Confirmed
- **Impact:** "No new infra" is true only for the **hand-rolled** variant, false for DBOS. O-B now recommends hand-rolled and separates the two.

### V7: The Paseo "5 files, 2–3 days" lift drags in the provider tree

- **Strategy:** Challenge the Evidence
- **Investigation:** `agent-manager.ts` imports 30+ types from `agent-sdk-types`, plus `@getpaseo/protocol/*` and the full provider implementations. The "no React-Native deps" claim is true; the *implied* cheap isolated lift is not.
- **Result:** Partially Refuted
- **Impact:** O-C (reuse Paseo's worker supervisor) is net-negative since BooCode already has four working backends. O-C recommended against.

### V8: Provenance of the two untracked artifact files

- **Strategy:** Challenge the Evidence-Gathering Integrity
- **Investigation:** Untracked `docs/features/git-diff-panel/artifacts/*.md` are the synthesis's own working files, not fetched external content; low injection risk but unversioned.
- **Result:** Confirmed (low severity)
- **Impact:** Codebase claims (A15–A24) are reproducible and were checked; web claims (A1–A14) are not re-fetchable from here and were taken on the retrieval's word.

### Adjustments Made

The recommendation did **not** survive in its original "O-B as the spine" form. It was rewritten into the conditional fork above (deciding criterion: conductor model), per V4. O-C was dropped (V7); DBOS was demoted from co-equal to unvalidated (V5, V6); the "O-A fails on Qwen" claim was softened to "fragile" (V3); the effort framing and the 500-char result-passing problem were folded into O-B's scope (V1, V2).

### Confidence Assessment

- **Confidence:** Medium
- **Remaining Risks:** The web tier (A1–A14) is unverifiable from this environment and the DBOS/pg-workflows specifics are vendor/single-sourced. A25 is n=1. The single assumption that flips the entire recommendation — whether the conductor is Claude or Qwen — is the operator's to confirm; everything downstream depends on it.

## Sources

| ID | Source | Link / location | Retrieved | Trust class | Summary (one line) | Evidence status |
|---|---|---|---|---|---|---|
| A1 | Nango: left Temporal for Postgres orchestration | https://nango.dev/blog/migrating-from-temporal-to-a-postgres-based-task-orchestrator/ | 2026-06-03 | web | Temporal ops overhead drove a move to Postgres-backed orchestration | corroborated by A12 |
| A2 | Restate self-hosted overview | https://docs.restate.dev/server/overview | 2026-06-03 | web | Single binary, embedded RocksDB, no external DB | corroborated by A3 |
| A3 | Show HN: Restate | https://news.ycombinator.com/item?id=40659160 | 2026-06-03 | web | Confirms single-binary lightweight deploy | corroborated by A2 |
| A4 | Praetorian: Deterministic AI Orchestration | https://www.praetorian.com/blog/deterministic-ai-orchestration-a-platform-architecture-for-autonomous-development/ | 2026-06-03 | web | External orchestration beats self-orchestration; small models do better as bounded workers | corroborated by A6, A9 (interested party) |
| A5 | Hatchworks: Orchestrating AI Agents | https://hatchworks.com/blog/ai-agents/orchestrating-ai-agents/ | 2026-06-03 | web | Self-orchestration failure modes; external control plane as default | corroborated by A4, A6 |
| A6 | arXiv 2508.02721 Blueprint-First | https://arxiv.org/abs/2508.02721 | 2026-06-03 | web | Deterministic engine + LLM for bounded sub-tasks; +10.1pp, fewer turns | corroborated by A4, A5 |
| A7 | Show HN: DBOS TypeScript | https://news.ycombinator.com/item?id=42727970 | 2026-06-03 | web | In-process Postgres durable execution, decorator steps | corroborated by A8 |
| A8 | DBOS Transact | https://www.dbos.dev/dbos-transact | 2026-06-03 | web | "Just your program and Postgres," no orchestration server | corroborated by A7 (interested party) |
| A9 | arXiv 2604.07911 context scoping | https://arxiv.org/pdf/2604.07911 | 2026-06-03 | web | Flat-context steering accuracy 60%→21% from 3→10 agents | corroborated by A4, A6 |
| A10 | pg-workflows | https://sokratisvidros.github.io/pg-workflows/ | 2026-06-03 | web | Pure-Postgres TS workflow library, step exactly-once | single source (caveated) |
| A11 | Hatchet v1 HN | https://news.ycombinator.com/item?id=43572733 | 2026-06-03 | web | Postgres + RabbitMQ; separate server process | corroborated by A1 |
| A12 | Temporal self-host guide | https://docs.temporal.io/self-hosted-guide/deployment | 2026-06-03 | web | Multi-service self-host overhead | corroborated by A1 |
| A13 | Inngest vs Trigger vs Restate | https://www.pkgpulse.com/guides/inngest-vs-trigger-dev-v3-vs-restate-2026 | 2026-06-03 | web | Inngest cloud-first; not truly self-hostable | corroborated by A14 |
| A14 | Inngest self-hosting docs | https://www.inngest.com/docs/self-hosting | 2026-06-03 | web | Engine proprietary to Inngest Cloud | corroborated by A13 (interested party) |
| A15 | BooCode dispatcher + LISTEN/NOTIFY | `apps/coder/src/services/dispatcher.ts:46` | n/a | codebase | Task dispatch via `tasks_new` notify + poll; backend routing | corroborated by A18 |
| A16 | BooCode Arena | `apps/coder/src/routes/arena.ts:34` | n/a | codebase | Parallel fan-out 2–5 contestants; selection is `[SELECTED]` prefix, no consumer; sessionless | single source (codebase anchor) |
| A17 | BooCode tasks table | `apps/coder/src/schema.sql:18` | n/a | codebase | `parent_task_id` FK written-but-not-dispatched-on; no `depends_on`/step/flows | single source (codebase anchor) |
| A18 | AgentBackend + 4 backends | `apps/coder/src/services/agent-backend.ts:97` | n/a | codebase | Common ensureSession/prompt surface; opencode/warm-acp/claude-sdk/one-shot | corroborated by A15 |
| A19 | new_task / output_summary cap | `apps/coder/src/services/tools/new_task.ts:13` | n/a | codebase | Native-only tools; parent reads only 500-char `output_summary` | single source (codebase anchor) |
| A20 | Paseo AgentManager | `/opt/forks/paseo/packages/server/src/server/agent/agent-manager.ts:413` | n/a | codebase | Plain TS class, no RN deps; create/stream/run/wait/cascade-archive | corroborated by A22 |
| A21 | Paseo createAgentMcpServer | `/opt/forks/paseo/.../agent/mcp-server.ts:479` | n/a | codebase | create_agent/wait_for_agent/archive_agent as MCP tools; child gets parent MCP URL | corroborated by A20 |
| A22 | Paseo has no sequencing engine | `/opt/forks/paseo/.../agent-manager.ts` (verdict) | n/a | codebase | Parent model self-orchestrates via MCP; daemon supervises only | single source (codebase anchor) |
| A23 | Paseo AgentStorage | `/opt/forks/paseo/.../agent/agent-storage.ts:84` | n/a | codebase | JSON files on disk + in-memory timeline, no DB | single source (codebase anchor) |
| A24 | Paseo parent/child label | `/opt/forks/paseo/packages/protocol/src/agent-labels.ts` | n/a | codebase | Relationship is one label on a flat map | corroborated by A22 |
| A25 | Live smoke test (this session) | provided: opencode/Qwen-35B han research run | n/a | provided | Qwen dispatched analyst, dropped the validator step, skipped template | single source (live, n=1) |

### A22: Paseo has no deterministic sequencing engine — recommendation-bearing

- **Link / location:** `/opt/forks/paseo/packages/server/src/server/agent/agent-manager.ts:413` (+ explorer verdict)
- **Retrieved:** n/a
- **Trust class:** codebase (current-state anchor)
- **Summary:** Paseo's daemon spawns, waits on, and bookkeeps agents; it contains no DAG or workflow engine. The parent agent — itself an ACP process — does all sequencing by calling `create_agent`/`wait_for_agent`/`archive_agent` as MCP tools in its own reasoning loop. This is why "lift Paseo's orchestrator" is a category error: the conductor is the model, not Paseo. It reframes the entire recommendation into "build a conductor (O-B) vs adopt model-as-conductor (O-A)."
- **Evidence status:** corroborated by A20, A24

### A17: BooCode tasks table lacks sequencing columns — recommendation-bearing

- **Link / location:** `apps/coder/src/schema.sql:18`
- **Retrieved:** n/a
- **Trust class:** codebase (current-state anchor)
- **Summary:** The `tasks` table has `parent_task_id` (written by `new_task`, read only by `list_tasks`, never by the dispatcher) but no `depends_on`, `step_index`, or `flows` definition. The dispatcher poll selects on `state='pending'` alone. This is the concrete gap O-B must fill, and it confirms the deterministic sequencing substrate genuinely does not exist today.
- **Evidence status:** single source (codebase anchor), verified by the validator against live source

### A6: Blueprint-First deterministic workflow (arXiv) — recommendation-bearing

- **Link / location:** https://arxiv.org/abs/2508.02721
- **Retrieved:** 2026-06-03
- **Trust class:** web (peer-reviewed preprint, disinterested)
- **Summary:** A deterministic engine executes an expert-defined blueprint and calls the LLM only for bounded sub-tasks, never to decide workflow path; reports ~10-point gains and large reductions in turns and tool calls versus self-orchestrating agents. This is the one disinterested source carrying the "deterministic code beats model self-orchestration" direction after the interested parties (A4) are discounted.
- **Evidence status:** corroborated by A4, A5, A9