# Tasks — Paseo-like Orchestrator ## Phase 1: Trace System + Observability (5 tasks) ### 1. Create tool_traces DB table + migration Add `tool_traces` table to `apps/server/src/schema.sql`: - Columns: id (UUID PK), session_id (UUID FK → sessions), chat_id (UUID FK → chats), turn_number (int), tool_name (text), input (jsonb), output (jsonb), started_at (timestamptz), finished_at (timestamptz), latency_ms (int), tokens_used (int), cache_tokens (int), reasoning_tokens (int), error (text), outcome (text) - Index on (chat_id, turn_number, started_at) for trace queries - Index on (session_id) for session-level aggregation - Applied idempotently via `applySchema()` — wrap in `CREATE TABLE IF NOT EXISTS` **Verification**: `psql` shows `tool_traces` table with all columns and indexes. Schema re-run is no-op. ### 2. Add tool_trace WS frame + contracts schema Add `tool_trace` frame to `WsFrameSchema` in `packages/contracts/src/ws-frames.ts`: - Frame types: `tool_trace:start` (tool_name, input, started_at) and `tool_trace:complete` (tool_name, output, latency_ms, tokens_used, cache_tokens, reasoning_tokens, error) - Add to `InferenceFrame` loose union in `apps/server/src/services/inference/turn.ts` - Add to strict `WsFrame` discriminated union in `apps/web/src/api/types.ts` - Rebuild contracts: `pnpm -C packages/contracts build` **Verification**: tsc --noEmit passes. WS client receives `tool_trace:start` and `tool_trace:complete` frames. ### 3. Instrument tool-phase.ts with start/end timing Update `apps/server/src/services/tools/tool-phase.ts`: - Before `executeToolCall`: record `clock_timestamp()` as start, publish `tool_trace:start` frame with tool_name and input - After `executeToolCall`: record `clock_timestamp()` as finish, compute latency_ms, extract token counts from response metadata, INSERT into `tool_traces` table, publish `tool_trace:complete` frame - Handle errors: on thrown error, publish `tool_trace:complete` with error field set, set outcome='error'; on success, outcome='success' - Use `sql.json(input as never)` for JSONB columns — no double-serialization **Verification**: Every tool call produces a `tool_traces` row with correct latency_ms and outcome. WS client receives both start and complete frames. ### 4. Add GET /api/chats/:id/traces endpoint Create `apps/server/src/routes/traces.ts`: - `GET /api/chats/:id/traces` — paginated, ordered by (turn_number, started_at) - Query params: `cursor` (opaque cursor for keyset pagination), `limit` (default 50, max 200), `turn_number` (optional filter to single turn) - Returns `{traces: Trace[], next_cursor: string | null}` - Register in Fastify router with `chatOwnershipPreHandler` guard **Verification**: `curl /api/chats/:id/traces` returns paginated trace rows. Turn filter returns only matching traces. ### 5. Build trace viewer frontend component Create `apps/web/src/components/TraceViewer.tsx` (and supporting files): - Collapsible tree grouped by turn_number - Per tool call row: tool_name badge, latency bar (relative bar width, color-coded: green <1s, yellow <5s, red ≥5s), token count, expand/collapse chevron - Expanded view: tool input (JSON formatted), tool output (JSON formatted), error message if any - Fetch traces from `/api/chats/:id/traces` on pane mount, paginate on scroll - Integrate as a new pane option in the multi-pane workspace (existing pane registry) **Verification**: Trace viewer loads, groups by turn, shows timing bars, expands/collapses tool calls. Pagination works for sessions with 50+ traces. ## Phase 2: Session Persistence + Resume (3 tasks) ### 6. Serialize agent state to DB on turn boundaries Modify `apps/coder` agent dispatch: - On each turn boundary (after LLM response, before next tool call loop), serialize `AgentSession` state to `agent_sessions` table - Persist: provider config, turn history, pending tool calls, current phase, token budget remaining - Use JSONB column for the snapshot state, `clock_timestamp()` for last_update - Guard against rapid consecutive saves (debounce 200ms) **Verification**: Agent session state is written to `agent_sessions` after each LLM turn. JSONB snapshot contains all fields needed for resume. ### 7. Restore state on WS reconnect Update `apps/server/src/services/ws.ts`: - On `snapshot` frame from a reconnecting client, check for `AgentSession` in `in_progress` or `awaiting_input` state - If found, rehydrate the coder pane: restore provider config, replay pending tool calls, set turn history - Publish a `session_restored` frame with the restored state metadata - Client-side: `useSessionStream` handles `session_restored` by resetting pane state to match **Verification**: Refresh browser mid-agent-session → after reconnect, the coder pane shows the same turn state, pending tool calls, and conversation history. ### 8. Agent session timeline view Add timeline component to the coder pane: - Horizontal timeline showing all turns in the current agent session - Each turn entry: turn number, start time, tool call count, token usage, cache hit rate - Active turn highlighted, past turns dimmed - Clicking a past turn scrolls the conversation to that turn and collapses later turns - Fetch turn metadata from existing session data (no new endpoint needed) **Verification**: Timeline shows all turns. Clicking a turn scrolls to it. Active turn is highlighted. ## Phase 3: Dynamic Workflow Engine (6 tasks) ### 9. Create isolated-vm workflow sandbox Create `apps/server/src/services/workflow/sandbox.ts`: - Use `isolated-vm` npm package to create a V8 isolate for each workflow run - No `require`, `fs`, `net`, `child_process` accessible in the sandbox - Expose only the workflow API surface (`agent`, `parallel`, `pipeline`, `phase`, `budget`, `log`, `args`) - Token budget enforcement: inject a step counter, throw when budget exceeded - Timeout: 30s default, configurable per workflow - Error boundary: caught exceptions produce structured error results instead of crashing the worker - Add `isolated-vm` to `apps/server/package.json` dependencies **Verification**: Workflow script that calls `agent()` runs without error. Script trying `require('fs')` throws a sandbox violation. Run exceeding budget is killed with a clear message. ### 10. Implement agent/parallel/pipeline primitives Create `apps/server/src/services/workflow/api.ts`: - `agent(id, { prompt, model?, tools?, budget? })` — registers a sub-agent. Returns an object with `.run(input)` that dispatches the agent through the existing agent dispatch system and returns result. - `parallel([agents], { budget? })` — runs all agents concurrently. Returns when all complete (or any fails). Shared token budget across parallel agents. Uses `Promise.allSettled` for resilience. - `pipeline([steps], { budget? })` — runs steps sequentially. Each step receives the previous step's output. Steps can be `agent()` results or inline functions. - `phase(name, { agents, budget })` — groups agents under a named phase. Phases can have their own budget. Results are namespaced by phase name. - `budget(limit)` — sets token or step limits. Returns a budget object consumed by agent/parallel/pipeline. - `log(msg)` — emits a structured log entry tagged with current phase/agent context. Published as WS frame to the Orchestrator pane. - `args` — the input arguments passed to `workflow.run(args)`. **Verification**: A test workflow using `agent()`, `parallel()`, and `pipeline()` executes correctly. Logs appear in the output stream. Token budgets are enforced. ### 11. Workflow file discovery system Create `apps/server/src/services/workflow/discovery.ts`: - Scan `.boocode/workflows/*.js` (project root, relative to `PROJECT_ROOT_WHITELIST`) - Scan `~/.boocode/workflows/*.js` (global, `os.homedir()`) - Scan `data/workflows/` (built-in catalog) - Each file must export a `workflow` object: `{name, description, run(args) => {...}}` - Validate the workflow object at discovery time: required fields, run must be a function - On server start, run full discovery. Cache results in a `Map`. - Log discovered workflows with name + description at `info` level **Verification**: Placing a valid `.boocode/workflows/test.js` file makes the workflow appear in `WorkflowManager.list()`. Invalid workflow files are logged as warnings and skipped. ### 12. Workflow manager + built-in catalog Create `apps/server/src/services/workflow/manager.ts`: - `WorkflowManager` singleton class: - `list()` — returns all discovered workflows with name, description, and arg schema - `get(name)` — returns a workflow by name - `run(workflow, args)` — creates a sandbox, injects args, executes `workflow.run()`. Returns a runId (UUID). - `cancel(runId)` — terminates the sandbox, marks run as cancelled - `status(runId)` — returns run status: `pending|running|completed|failed|cancelled`, with progress info - Concurrency limit: configurable via `WORKFLOW_MAX_CONCURRENT` env var (default 3) - Token budget: configurable via `WORKFLOW_DEFAULT_BUDGET` env var (default 100_000 tokens) - Run state tracked in-memory with optional DB persistence Built-in workflows in `data/workflows/`: - `deep-research` — parallel source search → per-source analysis → synthesis report - `multi-review` — run code health + security + standards reviews in parallel, merge findings - `plan-verify` — generate implementation plan → verify plan → generate work items - `bounty-hunt` — parallel vulnerability scans with different focus areas (injection, auth, crypto, business logic) **Verification**: `list()` returns built-in workflows. `run()` executes a workflow and returns runId. `status()` reflects progress. `cancel()` stops execution cleanly. ### 13. Workflow resumability (hash-based cache) Create `apps/server/src/services/workflow/cache.ts`: - Compute SHA-256 hash of each agent spec: `crypto.createHash('sha256').update(JSON.stringify({prompt, options})).digest('hex')` - Before executing an agent, check in-memory LRU cache for existing result matching the hash - Hit: return cached result, emit `log('cached', agentId, hash)` — no actual dispatch - Miss: execute agent, store result in cache keyed by hash - LRU eviction: `WORKFLOW_CACHE_SIZE` env var (default 100 entries) - Optional DB persistence: `workflow_cache` table with `hash`, `result`, `created_at` — cross-session reuse - Re-run detection: identical workflow with same args → all agents skipped - Partial re-run: changed args → only changed agents re-execute, unchanged ones read from cache **Verification**: First run of a workflow executes all agents. Second run with identical args skips all agents (logs show 'cached'). Run with modified args for one agent only re-executes that agent. ### 14. Workflow UI integration with Orchestrator panel Extend `apps/web/src/components/Orchestrator/`: - Add workflow selector dropdown listing workflows from `WorkflowManager.list()` - Add "Run Workflow" button that opens workflow args editor (JSON or form) - Extend existing run pane to show workflow steps with per-agent progress - Live log stream from workflow `log()` calls, displayed in a scrollable log view - Cancel button for running workflows - Resumability indicator: "3/5 steps cached — skipping" when hash cache hits - Fetch workflow list via new API endpoint or WS message (add `GET /api/orchestrator/workflows`) **Verification**: Workflow selector lists built-in workflows. Running a workflow shows step-by-step progress in the run pane. Cancelling a running workflow works. Cached steps show "skipped" indicator. ## Phase 4: Background Subagents (3 tasks) ### 15. Background task queue + spawn_subagent tool Modify `apps/coder/` and `apps/server/`: - Extend `tasks` table usage with a new task type marker for background subagent tasks - Create `spawn_subagent` tool in `apps/server/src/services/tools/`: - Schema: `{prompt, model?, tools?, budget?, metadata?}` - Creates a `tasks` row with state=`pending`, type=`background_subagent` - Returns `{task_id, status: 'pending'}` immediately — does NOT block - Background worker loop: polls `tasks` table for `background_subagent` tasks in `pending` state, picks one up, executes it via existing agent dispatch, writes result back to tasks row on completion - Max concurrency: `BACKGROUND_MAX_CONCURRENT` env var (default 2) - Worker polls interval: 1s (configurable) **Verification**: Calling `spawn_subagent` returns immediately with a task_id. The task eventually completes with a result in the tasks table. Multiple background tasks run concurrently up to the concurrency limit. ### 16. subagent_status + subagent_result tools Create two tools in `apps/server/src/services/tools/`: - `subagent_status(task_id)`: - Schema: `{task_id}` - Returns: `{task_id, status: 'pending'|'running'|'completed'|'failed', progress?: string, started_at?, finished_at?}` - Queries `tasks` table for the status - `subagent_result(task_id)`: - Schema: `{task_id}` - Returns: `{task_id, status, result?: json, error?: string}` - Only returns result when status='completed'; returns empty result otherwise with a message - Updates task state to `read` on successful result retrieval (optional) **Verification**: Calling `subagent_status` on a running task returns 'running'. Calling `subagent_result` on a completed task returns the full result. Calling `subagent_result` on a pending task returns a clear "not ready yet" message. ### 17. Background agent pane Create `apps/web/src/components/BackgroundAgentPane.tsx`: - New pane type showing running, completed, and failed background subagents - Each entry: agent name/description, status badge, duration (elapsed or total), progress indicator - Running entries: progress bar (if available), cancel button - Completed entries: "View Result" action that opens a modal or inline view with the full output - Failed entries: error message, "Retry" action - Badge counter on pane tab showing number of running tasks - Poll status every 2s for running entries, stop polling on completion - Register in pane registry alongside existing pane types **Verification**: Background pane shows spawning tasks as "pending", transitioning to "running", then "completed"/"failed". "View Result" shows the full output. Badge counter reflects active running tasks. ## Phase 5: Multi-modal + Cache Shape (4 tasks) ### 18. Multi-modal attachment pipeline Add file upload support: - Accept file uploads via drag-drop or file picker in the message input area - Store uploaded files on tmpfs (`/tmp/boocode-uploads/` by default, configurable via `UPLOAD_DIR`) - Reference attachments in message row via `message_parts` with `type='image'` and a `url` pointing to the tmpfs path - Forward to DeepSeek API: encode image as base64 data URI, send as multimodal content part in the user message - Supported formats: png, jpg, jpeg, gif, webp - Size limit: 20MB default, configurable via `MAX_ATTACHMENT_SIZE_MB` env var - Server-side cleanup: delete tmpfs files after message is fully processed or on a periodic sweep **Verification**: Uploading an image creates a file on tmpfs and a referenced `message_parts` row. DeepSeek API call includes the image as a base64 content part. Error on files over size limit. ### 19. Image render in message bubble Update message rendering in `apps/web/src/components/MessageBubble.tsx`: - Detect `message_parts` with `type='image'` in the message content - Render attached images inline in the chat bubble, below the text content - Thumbnail: max 300px wide, aspect-ratio preserved, rounded corners - Lightbox: clicking the thumbnail opens a full-size overlay with close button - Loading state: skeleton placeholder while image loads from tmpfs URL - Error state: broken image placeholder with retry option - Clean layout: images displayed in a grid (1-2 columns depending on count) **Verification**: Chat messages with image attachments render inline thumbnails. Clicking opens lightbox. Large images are thumbnailed. Broken images show error state. ### 20. Cache shape telemetry data pipeline Extract and store cache metrics: - In the DeepSeek provider response handler, extract `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` from the API response metadata - Break down cache segments: system prompt tokens, tool schema tokens, conversation history tokens (approximate by measuring each segment length) - Store cache metrics in `tool_traces.cache_tokens` column (already created in Phase 1) - Optionally create a `cache_stats` table for per-segment breakdown: `{turn_id, segment_name, hit_tokens, miss_tokens}` - Expose via existing traces API (cache fields already part of the Trace schema) **Verification**: After a DeepSeek call, `tool_traces` row has `cache_tokens` populated. Cache segment breakdown is available when querying traces. ### 21. Cache shape visualization in trace viewer Update the TraceViewer component with cache metrics: - Per-turn cache hit bar: horizontal stacked bar showing cached (green) vs non-cached (gray) tokens - Hit rate percentage displayed as a badge next to token count - Cumulative cache hit rate in the session footer: "Cache hit rate: 67% (45K/67K tokens)" - Color coding: green ≥60%, yellow 30-59%, red <30% - Tooltip on hover showing segment breakdown if available - Animate transitions when new trace data arrives **Verification**: Trace viewer shows cache hit/miss bars per turn. Cumulative rate in footer updates as new traces load. Color coding matches thresholds.