Phase 1: Trace System + Observability - tool_traces DB table + insert/update service - tool_trace_start/tool_trace_finish WS frames (contracts + FE types) - Instrumented tool-phase.ts with timing around every tool call - GET /api/chats/:id/traces paginated endpoint - Trace viewer frontend (collapsible panel with timing bars + token breakdown) Phase 2: Session Persistence + Resume - agent_snapshots table (UPSERT per chat, persisted on turn boundaries) - save/load/delete service functions - Agent snapshot sent on WS reconnect - Session timeline view (vertical timeline with scroll-to + restore) Tooling: - run_command tool (execFile, 30s timeout, 32KB cap, path-guarded) - Auto-fix loop: after write tools, runs pnpm build, injects errors into next turn
17 KiB
Tasks — Paseo-like Orchestrator
Phase 1: Trace System + Observability (5 tasks)
1. Create tool_traces DB table + migration
Add tool_traces table to apps/server/src/schema.sql:
- Columns: id (UUID PK), session_id (UUID FK → sessions), chat_id (UUID FK → chats), turn_number (int), tool_name (text), input (jsonb), output (jsonb), started_at (timestamptz), finished_at (timestamptz), latency_ms (int), tokens_used (int), cache_tokens (int), reasoning_tokens (int), error (text), outcome (text)
- Index on (chat_id, turn_number, started_at) for trace queries
- Index on (session_id) for session-level aggregation
- Applied idempotently via
applySchema()— wrap inCREATE TABLE IF NOT EXISTSVerification:psqlshowstool_tracestable with all columns and indexes. Schema re-run is no-op.
2. Add tool_trace WS frame + contracts schema
Add tool_trace frame to WsFrameSchema in packages/contracts/src/ws-frames.ts:
- Frame types:
tool_trace:start(tool_name, input, started_at) andtool_trace:complete(tool_name, output, latency_ms, tokens_used, cache_tokens, reasoning_tokens, error) - Add to
InferenceFrameloose union inapps/server/src/services/inference/turn.ts - Add to strict
WsFramediscriminated union inapps/web/src/api/types.ts - Rebuild contracts:
pnpm -C packages/contracts buildVerification: tsc --noEmit passes. WS client receivestool_trace:startandtool_trace:completeframes.
3. Instrument tool-phase.ts with start/end timing
Update apps/server/src/services/tools/tool-phase.ts:
- Before
executeToolCall: recordclock_timestamp()as start, publishtool_trace:startframe with tool_name and input - After
executeToolCall: recordclock_timestamp()as finish, compute latency_ms, extract token counts from response metadata, INSERT intotool_tracestable, publishtool_trace:completeframe - Handle errors: on thrown error, publish
tool_trace:completewith error field set, set outcome='error'; on success, outcome='success' - Use
sql.json(input as never)for JSONB columns — no double-serialization Verification: Every tool call produces atool_tracesrow with correct latency_ms and outcome. WS client receives both start and complete frames.
4. Add GET /api/chats/:id/traces endpoint
Create apps/server/src/routes/traces.ts:
GET /api/chats/:id/traces— paginated, ordered by (turn_number, started_at)- Query params:
cursor(opaque cursor for keyset pagination),limit(default 50, max 200),turn_number(optional filter to single turn) - Returns
{traces: Trace[], next_cursor: string | null} - Register in Fastify router with
chatOwnershipPreHandlerguard Verification:curl /api/chats/:id/tracesreturns paginated trace rows. Turn filter returns only matching traces.
5. Build trace viewer frontend component
Create apps/web/src/components/TraceViewer.tsx (and supporting files):
- Collapsible tree grouped by turn_number
- Per tool call row: tool_name badge, latency bar (relative bar width, color-coded: green <1s, yellow <5s, red ≥5s), token count, expand/collapse chevron
- Expanded view: tool input (JSON formatted), tool output (JSON formatted), error message if any
- Fetch traces from
/api/chats/:id/traceson pane mount, paginate on scroll - Integrate as a new pane option in the multi-pane workspace (existing pane registry) Verification: Trace viewer loads, groups by turn, shows timing bars, expands/collapses tool calls. Pagination works for sessions with 50+ traces.
Phase 2: Session Persistence + Resume (3 tasks)
6. Serialize agent state to DB on turn boundaries
Modify apps/coder agent dispatch:
- On each turn boundary (after LLM response, before next tool call loop), serialize
AgentSessionstate toagent_sessionstable - Persist: provider config, turn history, pending tool calls, current phase, token budget remaining
- Use JSONB column for the snapshot state,
clock_timestamp()for last_update - Guard against rapid consecutive saves (debounce 200ms)
Verification: Agent session state is written to
agent_sessionsafter each LLM turn. JSONB snapshot contains all fields needed for resume.
7. Restore state on WS reconnect
Update apps/server/src/services/ws.ts:
- On
snapshotframe from a reconnecting client, check forAgentSessioninin_progressorawaiting_inputstate - If found, rehydrate the coder pane: restore provider config, replay pending tool calls, set turn history
- Publish a
session_restoredframe with the restored state metadata - Client-side:
useSessionStreamhandlessession_restoredby resetting pane state to match Verification: Refresh browser mid-agent-session → after reconnect, the coder pane shows the same turn state, pending tool calls, and conversation history.
8. Agent session timeline view
Add timeline component to the coder pane:
- Horizontal timeline showing all turns in the current agent session
- Each turn entry: turn number, start time, tool call count, token usage, cache hit rate
- Active turn highlighted, past turns dimmed
- Clicking a past turn scrolls the conversation to that turn and collapses later turns
- Fetch turn metadata from existing session data (no new endpoint needed) Verification: Timeline shows all turns. Clicking a turn scrolls to it. Active turn is highlighted.
Phase 3: Dynamic Workflow Engine (6 tasks)
9. Create isolated-vm workflow sandbox
Create apps/server/src/services/workflow/sandbox.ts:
- Use
isolated-vmnpm package to create a V8 isolate for each workflow run - No
require,fs,net,child_processaccessible in the sandbox - Expose only the workflow API surface (
agent,parallel,pipeline,phase,budget,log,args) - Token budget enforcement: inject a step counter, throw when budget exceeded
- Timeout: 30s default, configurable per workflow
- Error boundary: caught exceptions produce structured error results instead of crashing the worker
- Add
isolated-vmtoapps/server/package.jsondependencies Verification: Workflow script that callsagent()runs without error. Script tryingrequire('fs')throws a sandbox violation. Run exceeding budget is killed with a clear message.
10. Implement agent/parallel/pipeline primitives
Create apps/server/src/services/workflow/api.ts:
agent(id, { prompt, model?, tools?, budget? })— registers a sub-agent. Returns an object with.run(input)that dispatches the agent through the existing agent dispatch system and returns result.parallel([agents], { budget? })— runs all agents concurrently. Returns when all complete (or any fails). Shared token budget across parallel agents. UsesPromise.allSettledfor resilience.pipeline([steps], { budget? })— runs steps sequentially. Each step receives the previous step's output. Steps can beagent()results or inline functions.phase(name, { agents, budget })— groups agents under a named phase. Phases can have their own budget. Results are namespaced by phase name.budget(limit)— sets token or step limits. Returns a budget object consumed by agent/parallel/pipeline.log(msg)— emits a structured log entry tagged with current phase/agent context. Published as WS frame to the Orchestrator pane.args— the input arguments passed toworkflow.run(args). Verification: A test workflow usingagent(),parallel(), andpipeline()executes correctly. Logs appear in the output stream. Token budgets are enforced.
11. Workflow file discovery system
Create apps/server/src/services/workflow/discovery.ts:
- Scan
.boocode/workflows/*.js(project root, relative toPROJECT_ROOT_WHITELIST) - Scan
~/.boocode/workflows/*.js(global,os.homedir()) - Scan
data/workflows/(built-in catalog) - Each file must export a
workflowobject:{name, description, run(args) => {...}} - Validate the workflow object at discovery time: required fields, run must be a function
- On server start, run full discovery. Cache results in a
Map<name, Workflow>. - Log discovered workflows with name + description at
infolevel Verification: Placing a valid.boocode/workflows/test.jsfile makes the workflow appear inWorkflowManager.list(). Invalid workflow files are logged as warnings and skipped.
12. Workflow manager + built-in catalog
Create apps/server/src/services/workflow/manager.ts:
WorkflowManagersingleton class:list()— returns all discovered workflows with name, description, and arg schemaget(name)— returns a workflow by namerun(workflow, args)— creates a sandbox, injects args, executesworkflow.run(). Returns a runId (UUID).cancel(runId)— terminates the sandbox, marks run as cancelledstatus(runId)— returns run status:pending|running|completed|failed|cancelled, with progress info
- Concurrency limit: configurable via
WORKFLOW_MAX_CONCURRENTenv var (default 3) - Token budget: configurable via
WORKFLOW_DEFAULT_BUDGETenv var (default 100_000 tokens) - Run state tracked in-memory with optional DB persistence
Built-in workflows in data/workflows/:
deep-research— parallel source search → per-source analysis → synthesis reportmulti-review— run code health + security + standards reviews in parallel, merge findingsplan-verify— generate implementation plan → verify plan → generate work itemsbounty-hunt— parallel vulnerability scans with different focus areas (injection, auth, crypto, business logic) Verification:list()returns built-in workflows.run()executes a workflow and returns runId.status()reflects progress.cancel()stops execution cleanly.
13. Workflow resumability (hash-based cache)
Create apps/server/src/services/workflow/cache.ts:
- Compute SHA-256 hash of each agent spec:
crypto.createHash('sha256').update(JSON.stringify({prompt, options})).digest('hex') - Before executing an agent, check in-memory LRU cache for existing result matching the hash
- Hit: return cached result, emit
log('cached', agentId, hash)— no actual dispatch - Miss: execute agent, store result in cache keyed by hash
- LRU eviction:
WORKFLOW_CACHE_SIZEenv var (default 100 entries) - Optional DB persistence:
workflow_cachetable withhash,result,created_at— cross-session reuse - Re-run detection: identical workflow with same args → all agents skipped
- Partial re-run: changed args → only changed agents re-execute, unchanged ones read from cache Verification: First run of a workflow executes all agents. Second run with identical args skips all agents (logs show 'cached'). Run with modified args for one agent only re-executes that agent.
14. Workflow UI integration with Orchestrator panel
Extend apps/web/src/components/Orchestrator/:
- Add workflow selector dropdown listing workflows from
WorkflowManager.list() - Add "Run Workflow" button that opens workflow args editor (JSON or form)
- Extend existing run pane to show workflow steps with per-agent progress
- Live log stream from workflow
log()calls, displayed in a scrollable log view - Cancel button for running workflows
- Resumability indicator: "3/5 steps cached — skipping" when hash cache hits
- Fetch workflow list via new API endpoint or WS message (add
GET /api/orchestrator/workflows) Verification: Workflow selector lists built-in workflows. Running a workflow shows step-by-step progress in the run pane. Cancelling a running workflow works. Cached steps show "skipped" indicator.
Phase 4: Background Subagents (3 tasks)
15. Background task queue + spawn_subagent tool
Modify apps/coder/ and apps/server/:
- Extend
taskstable usage with a new task type marker for background subagent tasks - Create
spawn_subagenttool inapps/server/src/services/tools/:- Schema:
{prompt, model?, tools?, budget?, metadata?} - Creates a
tasksrow with state=pending, type=background_subagent - Returns
{task_id, status: 'pending'}immediately — does NOT block
- Schema:
- Background worker loop: polls
taskstable forbackground_subagenttasks inpendingstate, picks one up, executes it via existing agent dispatch, writes result back to tasks row on completion - Max concurrency:
BACKGROUND_MAX_CONCURRENTenv var (default 2) - Worker polls interval: 1s (configurable)
Verification: Calling
spawn_subagentreturns immediately with a task_id. The task eventually completes with a result in the tasks table. Multiple background tasks run concurrently up to the concurrency limit.
16. subagent_status + subagent_result tools
Create two tools in apps/server/src/services/tools/:
subagent_status(task_id):- Schema:
{task_id} - Returns:
{task_id, status: 'pending'|'running'|'completed'|'failed', progress?: string, started_at?, finished_at?} - Queries
taskstable for the status
- Schema:
subagent_result(task_id):- Schema:
{task_id} - Returns:
{task_id, status, result?: json, error?: string} - Only returns result when status='completed'; returns empty result otherwise with a message
- Updates task state to
readon successful result retrieval (optional) Verification: Callingsubagent_statuson a running task returns 'running'. Callingsubagent_resulton a completed task returns the full result. Callingsubagent_resulton a pending task returns a clear "not ready yet" message.
- Schema:
17. Background agent pane
Create apps/web/src/components/BackgroundAgentPane.tsx:
- New pane type showing running, completed, and failed background subagents
- Each entry: agent name/description, status badge, duration (elapsed or total), progress indicator
- Running entries: progress bar (if available), cancel button
- Completed entries: "View Result" action that opens a modal or inline view with the full output
- Failed entries: error message, "Retry" action
- Badge counter on pane tab showing number of running tasks
- Poll status every 2s for running entries, stop polling on completion
- Register in pane registry alongside existing pane types Verification: Background pane shows spawning tasks as "pending", transitioning to "running", then "completed"/"failed". "View Result" shows the full output. Badge counter reflects active running tasks.
Phase 5: Multi-modal + Cache Shape (4 tasks)
18. Multi-modal attachment pipeline
Add file upload support:
- Accept file uploads via drag-drop or file picker in the message input area
- Store uploaded files on tmpfs (
/tmp/boocode-uploads/by default, configurable viaUPLOAD_DIR) - Reference attachments in message row via
message_partswithtype='image'and aurlpointing to the tmpfs path - Forward to DeepSeek API: encode image as base64 data URI, send as multimodal content part in the user message
- Supported formats: png, jpg, jpeg, gif, webp
- Size limit: 20MB default, configurable via
MAX_ATTACHMENT_SIZE_MBenv var - Server-side cleanup: delete tmpfs files after message is fully processed or on a periodic sweep
Verification: Uploading an image creates a file on tmpfs and a referenced
message_partsrow. DeepSeek API call includes the image as a base64 content part. Error on files over size limit.
19. Image render in message bubble
Update message rendering in apps/web/src/components/MessageBubble.tsx:
- Detect
message_partswithtype='image'in the message content - Render attached images inline in the chat bubble, below the text content
- Thumbnail: max 300px wide, aspect-ratio preserved, rounded corners
- Lightbox: clicking the thumbnail opens a full-size overlay with close button
- Loading state: skeleton placeholder while image loads from tmpfs URL
- Error state: broken image placeholder with retry option
- Clean layout: images displayed in a grid (1-2 columns depending on count) Verification: Chat messages with image attachments render inline thumbnails. Clicking opens lightbox. Large images are thumbnailed. Broken images show error state.
20. Cache shape telemetry data pipeline
Extract and store cache metrics:
- In the DeepSeek provider response handler, extract
prompt_cache_hit_tokensandprompt_cache_miss_tokensfrom the API response metadata - Break down cache segments: system prompt tokens, tool schema tokens, conversation history tokens (approximate by measuring each segment length)
- Store cache metrics in
tool_traces.cache_tokenscolumn (already created in Phase 1) - Optionally create a
cache_statstable for per-segment breakdown:{turn_id, segment_name, hit_tokens, miss_tokens} - Expose via existing traces API (cache fields already part of the Trace schema)
Verification: After a DeepSeek call,
tool_tracesrow hascache_tokenspopulated. Cache segment breakdown is available when querying traces.
21. Cache shape visualization in trace viewer
Update the TraceViewer component with cache metrics:
- Per-turn cache hit bar: horizontal stacked bar showing cached (green) vs non-cached (gray) tokens
- Hit rate percentage displayed as a badge next to token count
- Cumulative cache hit rate in the session footer: "Cache hit rate: 67% (45K/67K tokens)"
- Color coding: green ≥60%, yellow 30-59%, red <30%
- Tooltip on hover showing segment breakdown if available
- Animate transitions when new trace data arrives Verification: Trace viewer shows cache hit/miss bars per turn. Cumulative rate in footer updates as new traces load. Color coding matches thresholds.