Files

indifferentketchup abe9c5a3a8 feat: Paseo-like orchestrator Phase 1-2 — trace system, session persistence, timeline, run_command, auto-fix loop

Phase 1: Trace System + Observability
- tool_traces DB table + insert/update service
- tool_trace_start/tool_trace_finish WS frames (contracts + FE types)
- Instrumented tool-phase.ts with timing around every tool call
- GET /api/chats/:id/traces paginated endpoint
- Trace viewer frontend (collapsible panel with timing bars + token breakdown)

Phase 2: Session Persistence + Resume
- agent_snapshots table (UPSERT per chat, persisted on turn boundaries)
- save/load/delete service functions
- Agent snapshot sent on WS reconnect
- Session timeline view (vertical timeline with scroll-to + restore)

Tooling:
- run_command tool (execFile, 30s timeout, 32KB cap, path-guarded)
- Auto-fix loop: after write tools, runs pnpm build, injects errors into next turn

2026-06-08 02:26:47 +00:00

17 KiB

Raw Blame History

Tasks — Paseo-like Orchestrator

Phase 1: Trace System + Observability (5 tasks)

1. Create tool_traces DB table + migration

Add tool_traces table to apps/server/src/schema.sql:

Columns: id (UUID PK), session_id (UUID FK → sessions), chat_id (UUID FK → chats), turn_number (int), tool_name (text), input (jsonb), output (jsonb), started_at (timestamptz), finished_at (timestamptz), latency_ms (int), tokens_used (int), cache_tokens (int), reasoning_tokens (int), error (text), outcome (text)
Index on (chat_id, turn_number, started_at) for trace queries
Index on (session_id) for session-level aggregation
Applied idempotently via applySchema() — wrap in CREATE TABLE IF NOT EXISTS Verification: psql shows tool_traces table with all columns and indexes. Schema re-run is no-op.

2. Add tool_trace WS frame + contracts schema

Add tool_trace frame to WsFrameSchema in packages/contracts/src/ws-frames.ts:

Frame types: tool_trace:start (tool_name, input, started_at) and tool_trace:complete (tool_name, output, latency_ms, tokens_used, cache_tokens, reasoning_tokens, error)
Add to InferenceFrame loose union in apps/server/src/services/inference/turn.ts
Add to strict WsFrame discriminated union in apps/web/src/api/types.ts
Rebuild contracts: pnpm -C packages/contracts build Verification: tsc --noEmit passes. WS client receives tool_trace:start and tool_trace:complete frames.

3. Instrument tool-phase.ts with start/end timing

Update apps/server/src/services/tools/tool-phase.ts:

Before executeToolCall: record clock_timestamp() as start, publish tool_trace:start frame with tool_name and input
After executeToolCall: record clock_timestamp() as finish, compute latency_ms, extract token counts from response metadata, INSERT into tool_traces table, publish tool_trace:complete frame
Handle errors: on thrown error, publish tool_trace:complete with error field set, set outcome='error'; on success, outcome='success'
Use sql.json(input as never) for JSONB columns — no double-serialization Verification: Every tool call produces a tool_traces row with correct latency_ms and outcome. WS client receives both start and complete frames.

4. Add GET /api/chats/:id/traces endpoint

Create apps/server/src/routes/traces.ts:

GET /api/chats/:id/traces — paginated, ordered by (turn_number, started_at)
Query params: cursor (opaque cursor for keyset pagination), limit (default 50, max 200), turn_number (optional filter to single turn)
Returns {traces: Trace[], next_cursor: string | null}
Register in Fastify router with chatOwnershipPreHandler guard Verification: curl /api/chats/:id/traces returns paginated trace rows. Turn filter returns only matching traces.

5. Build trace viewer frontend component

Create apps/web/src/components/TraceViewer.tsx (and supporting files):

Collapsible tree grouped by turn_number
Per tool call row: tool_name badge, latency bar (relative bar width, color-coded: green <1s, yellow <5s, red ≥5s), token count, expand/collapse chevron
Expanded view: tool input (JSON formatted), tool output (JSON formatted), error message if any
Fetch traces from /api/chats/:id/traces on pane mount, paginate on scroll
Integrate as a new pane option in the multi-pane workspace (existing pane registry) Verification: Trace viewer loads, groups by turn, shows timing bars, expands/collapses tool calls. Pagination works for sessions with 50+ traces.

Phase 2: Session Persistence + Resume (3 tasks)

6. Serialize agent state to DB on turn boundaries

Modify apps/coder agent dispatch:

On each turn boundary (after LLM response, before next tool call loop), serialize AgentSession state to agent_sessions table
Persist: provider config, turn history, pending tool calls, current phase, token budget remaining
Use JSONB column for the snapshot state, clock_timestamp() for last_update
Guard against rapid consecutive saves (debounce 200ms) Verification: Agent session state is written to agent_sessions after each LLM turn. JSONB snapshot contains all fields needed for resume.

7. Restore state on WS reconnect

Update apps/server/src/services/ws.ts:

On snapshot frame from a reconnecting client, check for AgentSession in in_progress or awaiting_input state
If found, rehydrate the coder pane: restore provider config, replay pending tool calls, set turn history
Publish a session_restored frame with the restored state metadata
Client-side: useSessionStream handles session_restored by resetting pane state to match Verification: Refresh browser mid-agent-session → after reconnect, the coder pane shows the same turn state, pending tool calls, and conversation history.

8. Agent session timeline view

Add timeline component to the coder pane:

Horizontal timeline showing all turns in the current agent session
Each turn entry: turn number, start time, tool call count, token usage, cache hit rate
Active turn highlighted, past turns dimmed
Clicking a past turn scrolls the conversation to that turn and collapses later turns
Fetch turn metadata from existing session data (no new endpoint needed) Verification: Timeline shows all turns. Clicking a turn scrolls to it. Active turn is highlighted.

Phase 3: Dynamic Workflow Engine (6 tasks)

9. Create isolated-vm workflow sandbox

Create apps/server/src/services/workflow/sandbox.ts:

Use isolated-vm npm package to create a V8 isolate for each workflow run
No require, fs, net, child_process accessible in the sandbox
Expose only the workflow API surface (agent, parallel, pipeline, phase, budget, log, args)
Token budget enforcement: inject a step counter, throw when budget exceeded
Timeout: 30s default, configurable per workflow
Error boundary: caught exceptions produce structured error results instead of crashing the worker
Add isolated-vm to apps/server/package.json dependencies Verification: Workflow script that calls agent() runs without error. Script trying require('fs') throws a sandbox violation. Run exceeding budget is killed with a clear message.

10. Implement agent/parallel/pipeline primitives

Create apps/server/src/services/workflow/api.ts:

agent(id, { prompt, model?, tools?, budget? }) — registers a sub-agent. Returns an object with .run(input) that dispatches the agent through the existing agent dispatch system and returns result.
parallel([agents], { budget? }) — runs all agents concurrently. Returns when all complete (or any fails). Shared token budget across parallel agents. Uses Promise.allSettled for resilience.
pipeline([steps], { budget? }) — runs steps sequentially. Each step receives the previous step's output. Steps can be agent() results or inline functions.
phase(name, { agents, budget }) — groups agents under a named phase. Phases can have their own budget. Results are namespaced by phase name.
budget(limit) — sets token or step limits. Returns a budget object consumed by agent/parallel/pipeline.
log(msg) — emits a structured log entry tagged with current phase/agent context. Published as WS frame to the Orchestrator pane.
args — the input arguments passed to workflow.run(args). Verification: A test workflow using agent(), parallel(), and pipeline() executes correctly. Logs appear in the output stream. Token budgets are enforced.

11. Workflow file discovery system

Create apps/server/src/services/workflow/discovery.ts:

Scan .boocode/workflows/*.js (project root, relative to PROJECT_ROOT_WHITELIST)
Scan ~/.boocode/workflows/*.js (global, os.homedir())
Scan data/workflows/ (built-in catalog)
Each file must export a workflow object: {name, description, run(args) => {...}}
Validate the workflow object at discovery time: required fields, run must be a function
On server start, run full discovery. Cache results in a Map<name, Workflow>.
Log discovered workflows with name + description at info level Verification: Placing a valid .boocode/workflows/test.js file makes the workflow appear in WorkflowManager.list(). Invalid workflow files are logged as warnings and skipped.

12. Workflow manager + built-in catalog

Create apps/server/src/services/workflow/manager.ts:

WorkflowManager singleton class:
- list() — returns all discovered workflows with name, description, and arg schema
- get(name) — returns a workflow by name
- run(workflow, args) — creates a sandbox, injects args, executes workflow.run(). Returns a runId (UUID).
- cancel(runId) — terminates the sandbox, marks run as cancelled
- status(runId) — returns run status: pending|running|completed|failed|cancelled, with progress info
Concurrency limit: configurable via WORKFLOW_MAX_CONCURRENT env var (default 3)
Token budget: configurable via WORKFLOW_DEFAULT_BUDGET env var (default 100_000 tokens)
Run state tracked in-memory with optional DB persistence

Built-in workflows in data/workflows/:

deep-research — parallel source search → per-source analysis → synthesis report
multi-review — run code health + security + standards reviews in parallel, merge findings
plan-verify — generate implementation plan → verify plan → generate work items
bounty-hunt — parallel vulnerability scans with different focus areas (injection, auth, crypto, business logic) Verification: list() returns built-in workflows. run() executes a workflow and returns runId. status() reflects progress. cancel() stops execution cleanly.

13. Workflow resumability (hash-based cache)

Create apps/server/src/services/workflow/cache.ts:

Compute SHA-256 hash of each agent spec: crypto.createHash('sha256').update(JSON.stringify({prompt, options})).digest('hex')
Before executing an agent, check in-memory LRU cache for existing result matching the hash
Hit: return cached result, emit log('cached', agentId, hash) — no actual dispatch
Miss: execute agent, store result in cache keyed by hash
LRU eviction: WORKFLOW_CACHE_SIZE env var (default 100 entries)
Optional DB persistence: workflow_cache table with hash, result, created_at — cross-session reuse
Re-run detection: identical workflow with same args → all agents skipped
Partial re-run: changed args → only changed agents re-execute, unchanged ones read from cache Verification: First run of a workflow executes all agents. Second run with identical args skips all agents (logs show 'cached'). Run with modified args for one agent only re-executes that agent.

14. Workflow UI integration with Orchestrator panel

Extend apps/web/src/components/Orchestrator/:

Add workflow selector dropdown listing workflows from WorkflowManager.list()
Add "Run Workflow" button that opens workflow args editor (JSON or form)
Extend existing run pane to show workflow steps with per-agent progress
Live log stream from workflow log() calls, displayed in a scrollable log view
Cancel button for running workflows
Resumability indicator: "3/5 steps cached — skipping" when hash cache hits
Fetch workflow list via new API endpoint or WS message (add GET /api/orchestrator/workflows) Verification: Workflow selector lists built-in workflows. Running a workflow shows step-by-step progress in the run pane. Cancelling a running workflow works. Cached steps show "skipped" indicator.

Phase 4: Background Subagents (3 tasks)

15. Background task queue + spawn_subagent tool

Modify apps/coder/ and apps/server/:

Extend tasks table usage with a new task type marker for background subagent tasks
Create spawn_subagent tool in apps/server/src/services/tools/:
- Schema: {prompt, model?, tools?, budget?, metadata?}
- Creates a tasks row with state=pending, type=background_subagent
- Returns {task_id, status: 'pending'} immediately — does NOT block
Background worker loop: polls tasks table for background_subagent tasks in pending state, picks one up, executes it via existing agent dispatch, writes result back to tasks row on completion
Max concurrency: BACKGROUND_MAX_CONCURRENT env var (default 2)
Worker polls interval: 1s (configurable) Verification: Calling spawn_subagent returns immediately with a task_id. The task eventually completes with a result in the tasks table. Multiple background tasks run concurrently up to the concurrency limit.

16. subagent_status + subagent_result tools

Create two tools in apps/server/src/services/tools/:

subagent_status(task_id):
- Schema: {task_id}
- Returns: {task_id, status: 'pending'|'running'|'completed'|'failed', progress?: string, started_at?, finished_at?}
- Queries tasks table for the status
subagent_result(task_id):
- Schema: {task_id}
- Returns: {task_id, status, result?: json, error?: string}
- Only returns result when status='completed'; returns empty result otherwise with a message
- Updates task state to read on successful result retrieval (optional) Verification: Calling subagent_status on a running task returns 'running'. Calling subagent_result on a completed task returns the full result. Calling subagent_result on a pending task returns a clear "not ready yet" message.

17. Background agent pane

Create apps/web/src/components/BackgroundAgentPane.tsx:

New pane type showing running, completed, and failed background subagents
Each entry: agent name/description, status badge, duration (elapsed or total), progress indicator
Running entries: progress bar (if available), cancel button
Completed entries: "View Result" action that opens a modal or inline view with the full output
Failed entries: error message, "Retry" action
Badge counter on pane tab showing number of running tasks
Poll status every 2s for running entries, stop polling on completion
Register in pane registry alongside existing pane types Verification: Background pane shows spawning tasks as "pending", transitioning to "running", then "completed"/"failed". "View Result" shows the full output. Badge counter reflects active running tasks.

Add file upload support:

Accept file uploads via drag-drop or file picker in the message input area
Store uploaded files on tmpfs (/tmp/boocode-uploads/ by default, configurable via UPLOAD_DIR)
Reference attachments in message row via message_parts with type='image' and a url pointing to the tmpfs path
Forward to DeepSeek API: encode image as base64 data URI, send as multimodal content part in the user message
Supported formats: png, jpg, jpeg, gif, webp
Size limit: 20MB default, configurable via MAX_ATTACHMENT_SIZE_MB env var
Server-side cleanup: delete tmpfs files after message is fully processed or on a periodic sweep Verification: Uploading an image creates a file on tmpfs and a referenced message_parts row. DeepSeek API call includes the image as a base64 content part. Error on files over size limit.

19. Image render in message bubble

Update message rendering in apps/web/src/components/MessageBubble.tsx:

Detect message_parts with type='image' in the message content
Render attached images inline in the chat bubble, below the text content
Thumbnail: max 300px wide, aspect-ratio preserved, rounded corners
Lightbox: clicking the thumbnail opens a full-size overlay with close button
Loading state: skeleton placeholder while image loads from tmpfs URL
Error state: broken image placeholder with retry option
Clean layout: images displayed in a grid (1-2 columns depending on count) Verification: Chat messages with image attachments render inline thumbnails. Clicking opens lightbox. Large images are thumbnailed. Broken images show error state.

20. Cache shape telemetry data pipeline

Extract and store cache metrics:

In the DeepSeek provider response handler, extract prompt_cache_hit_tokens and prompt_cache_miss_tokens from the API response metadata
Break down cache segments: system prompt tokens, tool schema tokens, conversation history tokens (approximate by measuring each segment length)
Store cache metrics in tool_traces.cache_tokens column (already created in Phase 1)
Optionally create a cache_stats table for per-segment breakdown: {turn_id, segment_name, hit_tokens, miss_tokens}
Expose via existing traces API (cache fields already part of the Trace schema) Verification: After a DeepSeek call, tool_traces row has cache_tokens populated. Cache segment breakdown is available when querying traces.

21. Cache shape visualization in trace viewer

Update the TraceViewer component with cache metrics:

Per-turn cache hit bar: horizontal stacked bar showing cached (green) vs non-cached (gray) tokens
Hit rate percentage displayed as a badge next to token count
Cumulative cache hit rate in the session footer: "Cache hit rate: 67% (45K/67K tokens)"
Color coding: green ≥60%, yellow 30-59%, red <30%
Tooltip on hover showing segment breakdown if available
Animate transitions when new trace data arrives Verification: Trace viewer shows cache hit/miss bars per turn. Cumulative rate in footer updates as new traces load. Color coding matches thresholds.

17 KiB Raw Blame History