Files
boocode/openspec/changes/paseo-orchestrator/proposal.md
indifferentketchup abe9c5a3a8 feat: Paseo-like orchestrator Phase 1-2 — trace system, session persistence, timeline, run_command, auto-fix loop
Phase 1: Trace System + Observability
- tool_traces DB table + insert/update service
- tool_trace_start/tool_trace_finish WS frames (contracts + FE types)
- Instrumented tool-phase.ts with timing around every tool call
- GET /api/chats/:id/traces paginated endpoint
- Trace viewer frontend (collapsible panel with timing bars + token breakdown)

Phase 2: Session Persistence + Resume
- agent_snapshots table (UPSERT per chat, persisted on turn boundaries)
- save/load/delete service functions
- Agent snapshot sent on WS reconnect
- Session timeline view (vertical timeline with scroll-to + restore)

Tooling:
- run_command tool (execFile, 30s timeout, 32KB cap, path-guarded)
- Auto-fix loop: after write tools, runs pnpm build, injects errors into next turn
2026-06-08 02:26:47 +00:00

9.7 KiB

Paseo-like Orchestrator — Trace Observability, Dynamic Workflows & Agent Runtime

Status: Proposed Epic: paseo-orchestrator Depends on: v2.7.17-orchestrator

Why

BooCode's Orchestrator (v2.7.17) runs deterministic Han analysis flows — but it's a fixed pipeline, not a general-purpose agent runtime. Every tool call is opaque: no timing, no cost breakdown, no replay. Sessions evaporate on browser refresh. Workflows are hardcoded. Subagents block until completion. And there's zero visibility into cache efficiency on DeepSeek — despite prompt caching being a major cost lever.

The current architecture treats the LLM as a black box and the agent as a one-shot transaction. To move from "read-only chat" to a Paseo-style thin-client orchestration layer, BooCode needs five capabilities that compound on each other:

  1. Observability — Every tool call timed, logged, and live-streamed. Without it, debugging agent behavior is guesswork.
  2. Persistence — Agent state survives browser refresh. Active sessions resume where they left off.
  3. Dynamic Workflows — User-authored JS scripts using agent(), parallel(), pipeline() instead of hardcoded flows. Hash-based caching skips completed steps on re-run.
  4. Background Subagentsspawn_subagent returns immediately, results collected later. Unlocks parallel research, long-running analyses, and notification-based workflows.
  5. Multi-modal + Cache Shape — Image attachments forwarded to DeepSeek's vision API, plus per-turn cache hit rate visualization to close the cost feedback loop.

Each phase is independently valuable; together they transform BooCode from a chat UI into a durable agent execution platform.

What Changes

Phase 1: Trace System + Observability (3-4 days)

  1. Create tool_traces DB table — id, session_id, chat_id, turn_number, tool_name, input, output, started_at, finished_at, latency_ms, tokens_used, cache_tokens, reasoning_tokens, error, outcome. Applied idempotently via applySchema().

  2. Add tool_trace WS frame — new WsFrame variant in @boocode/contracts published by the server when a tool call starts and completes. Frontend receives live timing deltas via useSessionStream.

  3. Instrument tool-phase.ts — wrap executeToolCall with clock_timestamp() start/end, extract token counts from LLM response metadata, publish tool_trace frames on start (with input) and finish (with output + metrics).

  4. Add GET /api/chats/:id/traces — paginated endpoint returning trace rows ordered by turn_number + started_at. Supports cursor-based pagination for large sessions.

  5. Build trace viewer pane — collapsible tree per turn, timing bars showing latency relative to turn duration, expand/collapse per tool call showing input/output. Integrates into the existing multi-pane workspace alongside chat, coder, and orchestrator panes.

Phase 2: Session Persistence + Resume (2-3 days)

  1. Serialize agent state to DB — on each turn boundary (before and after tool call loop), snapshot the active AgentSession state (provider config, turn history, pending tool calls) to a JSONB column in agent_sessions. Uses clock_timestamp() for ordering.

  2. Restore on WS reconnect — when snapshot frame arrives on reconnection, check for a persisted AgentSession in in_progress or awaiting_input state. Rehydrate the coder pane to match the persisted turn, tool call, and pending state.

  3. Agent session timeline view — a timeline component in the coder pane showing the history of all turns in the current agent session. Each turn shows start time, tool count, token usage, cache hit rate. Clicking a turn scrolls to that point in the conversation.

Phase 3: Dynamic Workflow Engine (5-7 days)

  1. Create isolated-vm sandbox — restricted JS execution environment for workflow scripts. No require, fs, net, child_process. Only the workflow API surface exposed. Token budget enforcement kills runaway scripts.

  2. Implement workflow API primitivesagent(id, { prompt, model, tools, budget }) defines a sub-agent; parallel([agent1, agent2]) runs N agents concurrently with a shared token budget; pipeline([step1, step2]) chains agents sequentially; phase(name, { agents, budget }) groups agents under a named phase; budget(limit) sets token or step limits; log(msg) emits structured workflow log. Compatible with Claude Code workflow script format.

  3. Workflow file discovery — scan .boocode/workflows/*.js (project-local), ~/.boocode/workflows/*.js (global), and a built-in catalog directory. Each file exports a workflow object with {name, description, run}. Discovery runs on server start and on file change (optional watch mode).

  4. Workflow manager + built-in catalogWorkflowManager class with list(), get(name), run(workflow, args), cancel(runId), status(runId). Concurrency limits (configurable max concurrent runs), token budgets per run. Built-in catalog includes: deep-research (parallel source search → per-source analysis → synthesis), multi-review (code health + security + standards reviews in parallel), plan-verify (generate plan → verify plan → generate tasks), bounty-hunt (parallel vulnerability scanning with different focuses).

  5. Workflow resumability — SHA-256 hash of each agent spec (prompt + options). Before executing an agent, check if a completed result exists with the same hash. Skip cached agents, only execute new/changed ones. In-memory LRU cache for current session, optional DB persistence for cross-session reuse.

  6. Workflow UI integration — extend the existing Orchestrator panel (used for Han flows) to support dynamic workflows. Workflow selector dropdown, live run pane with step-by-step progress, cancel button, log output stream, per-agent timing. Reuses the same run-pane component pattern.

Phase 4: Background Subagents (2-3 days)

  1. Background task queue — uses the existing tasks table with a new background type. spawn_subagent tool creates a task row and returns immediately. A background worker picks up the task and executes it without blocking the calling agent.

  2. subagent_status + subagent_result toolssubagent_status(task_id) returns running|completed|failed with optional progress info. subagent_result(task_id) returns the full output when completed. Polling-based (no WS push for background tasks initially).

  3. Background agent pane — new pane type showing running/completed background agents. Each entry shows name, status, duration, progress. Completed entries show a "View Result" action. Notifications hook into the existing notification system (toast on completion, badge count for active tasks).

Phase 5: Multi-modal + Cache Shape (2-3 days)

  1. Image/file attachment pipeline — accept file uploads (drag-drop or file picker), store on tmpfs with a reference in the message row. Forward to DeepSeek's multimodal API as base64-encoded image parts. Size limit enforcement (configurable, default 20MB per attachment).

  2. Image render in message bubble — render attached images inline in the chat message bubble. Lightbox on click for expanded view. Thumbnail generation for large images to keep chat scrolling performant.

  3. Cache shape telemetry — extract prompt_cache_hit_tokens from DeepSeek provider metadata on each turn. Break down by segment: system prompt, tool schemas, conversation history. Store in tool_traces columns and/or a dedicated cache_stats table.

  4. Cache hit rate visualization — per-turn cache hit bar in the trace viewer (showing cached vs non-cached tokens). Cumulative cache hit rate in the session footer. Highlight when a turn achieves high cache reuse (green indicator) or unusually low (yellow/red).

Non-Goals

  • No changes to the existing Han flow orchestrator (runs alongside dynamic workflows)
  • No removal of existing agent dispatch paths (PTY, ACP, Claude SDK — dynamic workflows are additive)
  • No distributed execution (all orchestration is single-node)
  • No persistent workflow file watching (manual reload or server restart to pick up new workflows)
  • No workflow editing UI (workflows are authored as JS files)

Capabilities

New Capabilities

  • Tool trace viewer — every tool call with timing, token costs, cache breakdown, expandable input/output
  • Agent session resume — browser refresh preserves active agent state
  • Dynamic workflows — user-authored JS scripts with agent()/parallel()/pipeline() API
  • Workflow resumability — hash-based step caching skips completed agents on re-run
  • Built-in workflow catalog — deep-research, multi-review, plan-verify, bounty-hunt
  • Background subagents — non-blocking spawn with deferred result collection
  • Multi-modal support — image attachments forwarded to DeepSeek vision API
  • Cache shape telemetry — per-turn and cumulative cache hit rate visualization

Modified Capabilities

  • Orchestrator panel — extended from fixed Han flows to dynamic workflow selection and streaming run pane
  • tool-phase.ts — instrumented with start/end timing and trace publishing
  • WsFrame contract — new tool_trace frame variant
  • tasks table — extended with background type for async subagent execution

Metrics

  • Tool call observability: 0% → 100% of calls traced with timing
  • Session continuity: lost on refresh → preserved on reconnect
  • Workflow authoring: hardcoded → user-authored JS scripts
  • Workflow re-run efficiency: 0% cache → hash-based step reuse
  • Background execution: blocking only → blocking + non-blocking
  • Cache visibility: 0% → per-turn + cumulative hit rate
  • Multi-modal: text-only → text + image attachments