Files
boocode/openspec/changes/paseo-orchestrator/tasks.md
indifferentketchup abe9c5a3a8 feat: Paseo-like orchestrator Phase 1-2 — trace system, session persistence, timeline, run_command, auto-fix loop
Phase 1: Trace System + Observability
- tool_traces DB table + insert/update service
- tool_trace_start/tool_trace_finish WS frames (contracts + FE types)
- Instrumented tool-phase.ts with timing around every tool call
- GET /api/chats/:id/traces paginated endpoint
- Trace viewer frontend (collapsible panel with timing bars + token breakdown)

Phase 2: Session Persistence + Resume
- agent_snapshots table (UPSERT per chat, persisted on turn boundaries)
- save/load/delete service functions
- Agent snapshot sent on WS reconnect
- Session timeline view (vertical timeline with scroll-to + restore)

Tooling:
- run_command tool (execFile, 30s timeout, 32KB cap, path-guarded)
- Auto-fix loop: after write tools, runs pnpm build, injects errors into next turn
2026-06-08 02:26:47 +00:00

17 KiB

Tasks — Paseo-like Orchestrator

Phase 1: Trace System + Observability (5 tasks)

1. Create tool_traces DB table + migration

Add tool_traces table to apps/server/src/schema.sql:

  • Columns: id (UUID PK), session_id (UUID FK → sessions), chat_id (UUID FK → chats), turn_number (int), tool_name (text), input (jsonb), output (jsonb), started_at (timestamptz), finished_at (timestamptz), latency_ms (int), tokens_used (int), cache_tokens (int), reasoning_tokens (int), error (text), outcome (text)
  • Index on (chat_id, turn_number, started_at) for trace queries
  • Index on (session_id) for session-level aggregation
  • Applied idempotently via applySchema() — wrap in CREATE TABLE IF NOT EXISTS Verification: psql shows tool_traces table with all columns and indexes. Schema re-run is no-op.

2. Add tool_trace WS frame + contracts schema

Add tool_trace frame to WsFrameSchema in packages/contracts/src/ws-frames.ts:

  • Frame types: tool_trace:start (tool_name, input, started_at) and tool_trace:complete (tool_name, output, latency_ms, tokens_used, cache_tokens, reasoning_tokens, error)
  • Add to InferenceFrame loose union in apps/server/src/services/inference/turn.ts
  • Add to strict WsFrame discriminated union in apps/web/src/api/types.ts
  • Rebuild contracts: pnpm -C packages/contracts build Verification: tsc --noEmit passes. WS client receives tool_trace:start and tool_trace:complete frames.

3. Instrument tool-phase.ts with start/end timing

Update apps/server/src/services/tools/tool-phase.ts:

  • Before executeToolCall: record clock_timestamp() as start, publish tool_trace:start frame with tool_name and input
  • After executeToolCall: record clock_timestamp() as finish, compute latency_ms, extract token counts from response metadata, INSERT into tool_traces table, publish tool_trace:complete frame
  • Handle errors: on thrown error, publish tool_trace:complete with error field set, set outcome='error'; on success, outcome='success'
  • Use sql.json(input as never) for JSONB columns — no double-serialization Verification: Every tool call produces a tool_traces row with correct latency_ms and outcome. WS client receives both start and complete frames.

4. Add GET /api/chats/:id/traces endpoint

Create apps/server/src/routes/traces.ts:

  • GET /api/chats/:id/traces — paginated, ordered by (turn_number, started_at)
  • Query params: cursor (opaque cursor for keyset pagination), limit (default 50, max 200), turn_number (optional filter to single turn)
  • Returns {traces: Trace[], next_cursor: string | null}
  • Register in Fastify router with chatOwnershipPreHandler guard Verification: curl /api/chats/:id/traces returns paginated trace rows. Turn filter returns only matching traces.

5. Build trace viewer frontend component

Create apps/web/src/components/TraceViewer.tsx (and supporting files):

  • Collapsible tree grouped by turn_number
  • Per tool call row: tool_name badge, latency bar (relative bar width, color-coded: green <1s, yellow <5s, red ≥5s), token count, expand/collapse chevron
  • Expanded view: tool input (JSON formatted), tool output (JSON formatted), error message if any
  • Fetch traces from /api/chats/:id/traces on pane mount, paginate on scroll
  • Integrate as a new pane option in the multi-pane workspace (existing pane registry) Verification: Trace viewer loads, groups by turn, shows timing bars, expands/collapses tool calls. Pagination works for sessions with 50+ traces.

Phase 2: Session Persistence + Resume (3 tasks)

6. Serialize agent state to DB on turn boundaries

Modify apps/coder agent dispatch:

  • On each turn boundary (after LLM response, before next tool call loop), serialize AgentSession state to agent_sessions table
  • Persist: provider config, turn history, pending tool calls, current phase, token budget remaining
  • Use JSONB column for the snapshot state, clock_timestamp() for last_update
  • Guard against rapid consecutive saves (debounce 200ms) Verification: Agent session state is written to agent_sessions after each LLM turn. JSONB snapshot contains all fields needed for resume.

7. Restore state on WS reconnect

Update apps/server/src/services/ws.ts:

  • On snapshot frame from a reconnecting client, check for AgentSession in in_progress or awaiting_input state
  • If found, rehydrate the coder pane: restore provider config, replay pending tool calls, set turn history
  • Publish a session_restored frame with the restored state metadata
  • Client-side: useSessionStream handles session_restored by resetting pane state to match Verification: Refresh browser mid-agent-session → after reconnect, the coder pane shows the same turn state, pending tool calls, and conversation history.

8. Agent session timeline view

Add timeline component to the coder pane:

  • Horizontal timeline showing all turns in the current agent session
  • Each turn entry: turn number, start time, tool call count, token usage, cache hit rate
  • Active turn highlighted, past turns dimmed
  • Clicking a past turn scrolls the conversation to that turn and collapses later turns
  • Fetch turn metadata from existing session data (no new endpoint needed) Verification: Timeline shows all turns. Clicking a turn scrolls to it. Active turn is highlighted.

Phase 3: Dynamic Workflow Engine (6 tasks)

9. Create isolated-vm workflow sandbox

Create apps/server/src/services/workflow/sandbox.ts:

  • Use isolated-vm npm package to create a V8 isolate for each workflow run
  • No require, fs, net, child_process accessible in the sandbox
  • Expose only the workflow API surface (agent, parallel, pipeline, phase, budget, log, args)
  • Token budget enforcement: inject a step counter, throw when budget exceeded
  • Timeout: 30s default, configurable per workflow
  • Error boundary: caught exceptions produce structured error results instead of crashing the worker
  • Add isolated-vm to apps/server/package.json dependencies Verification: Workflow script that calls agent() runs without error. Script trying require('fs') throws a sandbox violation. Run exceeding budget is killed with a clear message.

10. Implement agent/parallel/pipeline primitives

Create apps/server/src/services/workflow/api.ts:

  • agent(id, { prompt, model?, tools?, budget? }) — registers a sub-agent. Returns an object with .run(input) that dispatches the agent through the existing agent dispatch system and returns result.
  • parallel([agents], { budget? }) — runs all agents concurrently. Returns when all complete (or any fails). Shared token budget across parallel agents. Uses Promise.allSettled for resilience.
  • pipeline([steps], { budget? }) — runs steps sequentially. Each step receives the previous step's output. Steps can be agent() results or inline functions.
  • phase(name, { agents, budget }) — groups agents under a named phase. Phases can have their own budget. Results are namespaced by phase name.
  • budget(limit) — sets token or step limits. Returns a budget object consumed by agent/parallel/pipeline.
  • log(msg) — emits a structured log entry tagged with current phase/agent context. Published as WS frame to the Orchestrator pane.
  • args — the input arguments passed to workflow.run(args). Verification: A test workflow using agent(), parallel(), and pipeline() executes correctly. Logs appear in the output stream. Token budgets are enforced.

11. Workflow file discovery system

Create apps/server/src/services/workflow/discovery.ts:

  • Scan .boocode/workflows/*.js (project root, relative to PROJECT_ROOT_WHITELIST)
  • Scan ~/.boocode/workflows/*.js (global, os.homedir())
  • Scan data/workflows/ (built-in catalog)
  • Each file must export a workflow object: {name, description, run(args) => {...}}
  • Validate the workflow object at discovery time: required fields, run must be a function
  • On server start, run full discovery. Cache results in a Map<name, Workflow>.
  • Log discovered workflows with name + description at info level Verification: Placing a valid .boocode/workflows/test.js file makes the workflow appear in WorkflowManager.list(). Invalid workflow files are logged as warnings and skipped.

12. Workflow manager + built-in catalog

Create apps/server/src/services/workflow/manager.ts:

  • WorkflowManager singleton class:
    • list() — returns all discovered workflows with name, description, and arg schema
    • get(name) — returns a workflow by name
    • run(workflow, args) — creates a sandbox, injects args, executes workflow.run(). Returns a runId (UUID).
    • cancel(runId) — terminates the sandbox, marks run as cancelled
    • status(runId) — returns run status: pending|running|completed|failed|cancelled, with progress info
  • Concurrency limit: configurable via WORKFLOW_MAX_CONCURRENT env var (default 3)
  • Token budget: configurable via WORKFLOW_DEFAULT_BUDGET env var (default 100_000 tokens)
  • Run state tracked in-memory with optional DB persistence

Built-in workflows in data/workflows/:

  • deep-research — parallel source search → per-source analysis → synthesis report
  • multi-review — run code health + security + standards reviews in parallel, merge findings
  • plan-verify — generate implementation plan → verify plan → generate work items
  • bounty-hunt — parallel vulnerability scans with different focus areas (injection, auth, crypto, business logic) Verification: list() returns built-in workflows. run() executes a workflow and returns runId. status() reflects progress. cancel() stops execution cleanly.

13. Workflow resumability (hash-based cache)

Create apps/server/src/services/workflow/cache.ts:

  • Compute SHA-256 hash of each agent spec: crypto.createHash('sha256').update(JSON.stringify({prompt, options})).digest('hex')
  • Before executing an agent, check in-memory LRU cache for existing result matching the hash
  • Hit: return cached result, emit log('cached', agentId, hash) — no actual dispatch
  • Miss: execute agent, store result in cache keyed by hash
  • LRU eviction: WORKFLOW_CACHE_SIZE env var (default 100 entries)
  • Optional DB persistence: workflow_cache table with hash, result, created_at — cross-session reuse
  • Re-run detection: identical workflow with same args → all agents skipped
  • Partial re-run: changed args → only changed agents re-execute, unchanged ones read from cache Verification: First run of a workflow executes all agents. Second run with identical args skips all agents (logs show 'cached'). Run with modified args for one agent only re-executes that agent.

14. Workflow UI integration with Orchestrator panel

Extend apps/web/src/components/Orchestrator/:

  • Add workflow selector dropdown listing workflows from WorkflowManager.list()
  • Add "Run Workflow" button that opens workflow args editor (JSON or form)
  • Extend existing run pane to show workflow steps with per-agent progress
  • Live log stream from workflow log() calls, displayed in a scrollable log view
  • Cancel button for running workflows
  • Resumability indicator: "3/5 steps cached — skipping" when hash cache hits
  • Fetch workflow list via new API endpoint or WS message (add GET /api/orchestrator/workflows) Verification: Workflow selector lists built-in workflows. Running a workflow shows step-by-step progress in the run pane. Cancelling a running workflow works. Cached steps show "skipped" indicator.

Phase 4: Background Subagents (3 tasks)

15. Background task queue + spawn_subagent tool

Modify apps/coder/ and apps/server/:

  • Extend tasks table usage with a new task type marker for background subagent tasks
  • Create spawn_subagent tool in apps/server/src/services/tools/:
    • Schema: {prompt, model?, tools?, budget?, metadata?}
    • Creates a tasks row with state=pending, type=background_subagent
    • Returns {task_id, status: 'pending'} immediately — does NOT block
  • Background worker loop: polls tasks table for background_subagent tasks in pending state, picks one up, executes it via existing agent dispatch, writes result back to tasks row on completion
  • Max concurrency: BACKGROUND_MAX_CONCURRENT env var (default 2)
  • Worker polls interval: 1s (configurable) Verification: Calling spawn_subagent returns immediately with a task_id. The task eventually completes with a result in the tasks table. Multiple background tasks run concurrently up to the concurrency limit.

16. subagent_status + subagent_result tools

Create two tools in apps/server/src/services/tools/:

  • subagent_status(task_id):
    • Schema: {task_id}
    • Returns: {task_id, status: 'pending'|'running'|'completed'|'failed', progress?: string, started_at?, finished_at?}
    • Queries tasks table for the status
  • subagent_result(task_id):
    • Schema: {task_id}
    • Returns: {task_id, status, result?: json, error?: string}
    • Only returns result when status='completed'; returns empty result otherwise with a message
    • Updates task state to read on successful result retrieval (optional) Verification: Calling subagent_status on a running task returns 'running'. Calling subagent_result on a completed task returns the full result. Calling subagent_result on a pending task returns a clear "not ready yet" message.

17. Background agent pane

Create apps/web/src/components/BackgroundAgentPane.tsx:

  • New pane type showing running, completed, and failed background subagents
  • Each entry: agent name/description, status badge, duration (elapsed or total), progress indicator
  • Running entries: progress bar (if available), cancel button
  • Completed entries: "View Result" action that opens a modal or inline view with the full output
  • Failed entries: error message, "Retry" action
  • Badge counter on pane tab showing number of running tasks
  • Poll status every 2s for running entries, stop polling on completion
  • Register in pane registry alongside existing pane types Verification: Background pane shows spawning tasks as "pending", transitioning to "running", then "completed"/"failed". "View Result" shows the full output. Badge counter reflects active running tasks.

Phase 5: Multi-modal + Cache Shape (4 tasks)

18. Multi-modal attachment pipeline

Add file upload support:

  • Accept file uploads via drag-drop or file picker in the message input area
  • Store uploaded files on tmpfs (/tmp/boocode-uploads/ by default, configurable via UPLOAD_DIR)
  • Reference attachments in message row via message_parts with type='image' and a url pointing to the tmpfs path
  • Forward to DeepSeek API: encode image as base64 data URI, send as multimodal content part in the user message
  • Supported formats: png, jpg, jpeg, gif, webp
  • Size limit: 20MB default, configurable via MAX_ATTACHMENT_SIZE_MB env var
  • Server-side cleanup: delete tmpfs files after message is fully processed or on a periodic sweep Verification: Uploading an image creates a file on tmpfs and a referenced message_parts row. DeepSeek API call includes the image as a base64 content part. Error on files over size limit.

19. Image render in message bubble

Update message rendering in apps/web/src/components/MessageBubble.tsx:

  • Detect message_parts with type='image' in the message content
  • Render attached images inline in the chat bubble, below the text content
  • Thumbnail: max 300px wide, aspect-ratio preserved, rounded corners
  • Lightbox: clicking the thumbnail opens a full-size overlay with close button
  • Loading state: skeleton placeholder while image loads from tmpfs URL
  • Error state: broken image placeholder with retry option
  • Clean layout: images displayed in a grid (1-2 columns depending on count) Verification: Chat messages with image attachments render inline thumbnails. Clicking opens lightbox. Large images are thumbnailed. Broken images show error state.

20. Cache shape telemetry data pipeline

Extract and store cache metrics:

  • In the DeepSeek provider response handler, extract prompt_cache_hit_tokens and prompt_cache_miss_tokens from the API response metadata
  • Break down cache segments: system prompt tokens, tool schema tokens, conversation history tokens (approximate by measuring each segment length)
  • Store cache metrics in tool_traces.cache_tokens column (already created in Phase 1)
  • Optionally create a cache_stats table for per-segment breakdown: {turn_id, segment_name, hit_tokens, miss_tokens}
  • Expose via existing traces API (cache fields already part of the Trace schema) Verification: After a DeepSeek call, tool_traces row has cache_tokens populated. Cache segment breakdown is available when querying traces.

21. Cache shape visualization in trace viewer

Update the TraceViewer component with cache metrics:

  • Per-turn cache hit bar: horizontal stacked bar showing cached (green) vs non-cached (gray) tokens
  • Hit rate percentage displayed as a badge next to token count
  • Cumulative cache hit rate in the session footer: "Cache hit rate: 67% (45K/67K tokens)"
  • Color coding: green ≥60%, yellow 30-59%, red <30%
  • Tooltip on hover showing segment breakdown if available
  • Animate transitions when new trace data arrives Verification: Trace viewer shows cache hit/miss bars per turn. Cumulative rate in footer updates as new traces load. Color coding matches thresholds.