feat: Paseo-like orchestrator Phase 1-2 — trace system, session persistence, timeline, run_command, auto-fix loop
Phase 1: Trace System + Observability - tool_traces DB table + insert/update service - tool_trace_start/tool_trace_finish WS frames (contracts + FE types) - Instrumented tool-phase.ts with timing around every tool call - GET /api/chats/:id/traces paginated endpoint - Trace viewer frontend (collapsible panel with timing bars + token breakdown) Phase 2: Session Persistence + Resume - agent_snapshots table (UPSERT per chat, persisted on turn boundaries) - save/load/delete service functions - Agent snapshot sent on WS reconnect - Session timeline view (vertical timeline with scroll-to + restore) Tooling: - run_command tool (execFile, 30s timeout, 32KB cap, path-guarded) - Auto-fix loop: after write tools, runs pnpm build, injects errors into next turn
This commit is contained in:
107
openspec/changes/paseo-orchestrator/proposal.md
Normal file
107
openspec/changes/paseo-orchestrator/proposal.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# Paseo-like Orchestrator — Trace Observability, Dynamic Workflows & Agent Runtime
|
||||
|
||||
**Status:** Proposed
|
||||
**Epic:** paseo-orchestrator
|
||||
**Depends on:** v2.7.17-orchestrator
|
||||
|
||||
## Why
|
||||
|
||||
BooCode's Orchestrator (v2.7.17) runs deterministic Han analysis flows — but it's a fixed pipeline, not a general-purpose agent runtime. Every tool call is opaque: no timing, no cost breakdown, no replay. Sessions evaporate on browser refresh. Workflows are hardcoded. Subagents block until completion. And there's zero visibility into cache efficiency on DeepSeek — despite prompt caching being a major cost lever.
|
||||
|
||||
The current architecture treats the LLM as a black box and the agent as a one-shot transaction. To move from "read-only chat" to a **Paseo-style thin-client orchestration layer**, BooCode needs five capabilities that compound on each other:
|
||||
|
||||
1. **Observability** — Every tool call timed, logged, and live-streamed. Without it, debugging agent behavior is guesswork.
|
||||
2. **Persistence** — Agent state survives browser refresh. Active sessions resume where they left off.
|
||||
3. **Dynamic Workflows** — User-authored JS scripts using `agent()`, `parallel()`, `pipeline()` instead of hardcoded flows. Hash-based caching skips completed steps on re-run.
|
||||
4. **Background Subagents** — `spawn_subagent` returns immediately, results collected later. Unlocks parallel research, long-running analyses, and notification-based workflows.
|
||||
5. **Multi-modal + Cache Shape** — Image attachments forwarded to DeepSeek's vision API, plus per-turn cache hit rate visualization to close the cost feedback loop.
|
||||
|
||||
Each phase is independently valuable; together they transform BooCode from a chat UI into a durable agent execution platform.
|
||||
|
||||
## What Changes
|
||||
|
||||
### Phase 1: Trace System + Observability (3-4 days)
|
||||
|
||||
1. **Create `tool_traces` DB table** — id, session_id, chat_id, turn_number, tool_name, input, output, started_at, finished_at, latency_ms, tokens_used, cache_tokens, reasoning_tokens, error, outcome. Applied idempotently via `applySchema()`.
|
||||
|
||||
2. **Add `tool_trace` WS frame** — new WsFrame variant in `@boocode/contracts` published by the server when a tool call starts and completes. Frontend receives live timing deltas via `useSessionStream`.
|
||||
|
||||
3. **Instrument `tool-phase.ts`** — wrap `executeToolCall` with `clock_timestamp()` start/end, extract token counts from LLM response metadata, publish `tool_trace` frames on start (with input) and finish (with output + metrics).
|
||||
|
||||
4. **Add GET `/api/chats/:id/traces`** — paginated endpoint returning trace rows ordered by turn_number + started_at. Supports cursor-based pagination for large sessions.
|
||||
|
||||
5. **Build trace viewer pane** — collapsible tree per turn, timing bars showing latency relative to turn duration, expand/collapse per tool call showing input/output. Integrates into the existing multi-pane workspace alongside chat, coder, and orchestrator panes.
|
||||
|
||||
### Phase 2: Session Persistence + Resume (2-3 days)
|
||||
|
||||
6. **Serialize agent state to DB** — on each turn boundary (before and after tool call loop), snapshot the active `AgentSession` state (provider config, turn history, pending tool calls) to a JSONB column in `agent_sessions`. Uses `clock_timestamp()` for ordering.
|
||||
|
||||
7. **Restore on WS reconnect** — when `snapshot` frame arrives on reconnection, check for a persisted `AgentSession` in `in_progress` or `awaiting_input` state. Rehydrate the coder pane to match the persisted turn, tool call, and pending state.
|
||||
|
||||
8. **Agent session timeline view** — a timeline component in the coder pane showing the history of all turns in the current agent session. Each turn shows start time, tool count, token usage, cache hit rate. Clicking a turn scrolls to that point in the conversation.
|
||||
|
||||
### Phase 3: Dynamic Workflow Engine (5-7 days)
|
||||
|
||||
9. **Create `isolated-vm` sandbox** — restricted JS execution environment for workflow scripts. No `require`, `fs`, `net`, `child_process`. Only the workflow API surface exposed. Token budget enforcement kills runaway scripts.
|
||||
|
||||
10. **Implement workflow API primitives** — `agent(id, { prompt, model, tools, budget })` defines a sub-agent; `parallel([agent1, agent2])` runs N agents concurrently with a shared token budget; `pipeline([step1, step2])` chains agents sequentially; `phase(name, { agents, budget })` groups agents under a named phase; `budget(limit)` sets token or step limits; `log(msg)` emits structured workflow log. Compatible with Claude Code workflow script format.
|
||||
|
||||
11. **Workflow file discovery** — scan `.boocode/workflows/*.js` (project-local), `~/.boocode/workflows/*.js` (global), and a built-in catalog directory. Each file exports a `workflow` object with `{name, description, run}`. Discovery runs on server start and on file change (optional watch mode).
|
||||
|
||||
12. **Workflow manager + built-in catalog** — `WorkflowManager` class with `list()`, `get(name)`, `run(workflow, args)`, `cancel(runId)`, `status(runId)`. Concurrency limits (configurable max concurrent runs), token budgets per run. Built-in catalog includes: `deep-research` (parallel source search → per-source analysis → synthesis), `multi-review` (code health + security + standards reviews in parallel), `plan-verify` (generate plan → verify plan → generate tasks), `bounty-hunt` (parallel vulnerability scanning with different focuses).
|
||||
|
||||
13. **Workflow resumability** — SHA-256 hash of each agent spec (prompt + options). Before executing an agent, check if a completed result exists with the same hash. Skip cached agents, only execute new/changed ones. In-memory LRU cache for current session, optional DB persistence for cross-session reuse.
|
||||
|
||||
14. **Workflow UI integration** — extend the existing Orchestrator panel (used for Han flows) to support dynamic workflows. Workflow selector dropdown, live run pane with step-by-step progress, cancel button, log output stream, per-agent timing. Reuses the same run-pane component pattern.
|
||||
|
||||
### Phase 4: Background Subagents (2-3 days)
|
||||
|
||||
15. **Background task queue** — uses the existing `tasks` table with a new `background` type. `spawn_subagent` tool creates a task row and returns immediately. A background worker picks up the task and executes it without blocking the calling agent.
|
||||
|
||||
16. **`subagent_status` + `subagent_result` tools** — `subagent_status(task_id)` returns `running|completed|failed` with optional progress info. `subagent_result(task_id)` returns the full output when completed. Polling-based (no WS push for background tasks initially).
|
||||
|
||||
17. **Background agent pane** — new pane type showing running/completed background agents. Each entry shows name, status, duration, progress. Completed entries show a "View Result" action. Notifications hook into the existing notification system (toast on completion, badge count for active tasks).
|
||||
|
||||
### Phase 5: Multi-modal + Cache Shape (2-3 days)
|
||||
|
||||
18. **Image/file attachment pipeline** — accept file uploads (drag-drop or file picker), store on tmpfs with a reference in the message row. Forward to DeepSeek's multimodal API as base64-encoded image parts. Size limit enforcement (configurable, default 20MB per attachment).
|
||||
|
||||
19. **Image render in message bubble** — render attached images inline in the chat message bubble. Lightbox on click for expanded view. Thumbnail generation for large images to keep chat scrolling performant.
|
||||
|
||||
20. **Cache shape telemetry** — extract `prompt_cache_hit_tokens` from DeepSeek provider metadata on each turn. Break down by segment: system prompt, tool schemas, conversation history. Store in `tool_traces` columns and/or a dedicated `cache_stats` table.
|
||||
|
||||
21. **Cache hit rate visualization** — per-turn cache hit bar in the trace viewer (showing cached vs non-cached tokens). Cumulative cache hit rate in the session footer. Highlight when a turn achieves high cache reuse (green indicator) or unusually low (yellow/red).
|
||||
|
||||
## Non-Goals
|
||||
- No changes to the existing Han flow orchestrator (runs alongside dynamic workflows)
|
||||
- No removal of existing agent dispatch paths (PTY, ACP, Claude SDK — dynamic workflows are additive)
|
||||
- No distributed execution (all orchestration is single-node)
|
||||
- No persistent workflow file watching (manual reload or server restart to pick up new workflows)
|
||||
- No workflow editing UI (workflows are authored as JS files)
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
- **Tool trace viewer** — every tool call with timing, token costs, cache breakdown, expandable input/output
|
||||
- **Agent session resume** — browser refresh preserves active agent state
|
||||
- **Dynamic workflows** — user-authored JS scripts with `agent()/parallel()/pipeline()` API
|
||||
- **Workflow resumability** — hash-based step caching skips completed agents on re-run
|
||||
- **Built-in workflow catalog** — deep-research, multi-review, plan-verify, bounty-hunt
|
||||
- **Background subagents** — non-blocking spawn with deferred result collection
|
||||
- **Multi-modal support** — image attachments forwarded to DeepSeek vision API
|
||||
- **Cache shape telemetry** — per-turn and cumulative cache hit rate visualization
|
||||
|
||||
### Modified Capabilities
|
||||
- **Orchestrator panel** — extended from fixed Han flows to dynamic workflow selection and streaming run pane
|
||||
- **tool-phase.ts** — instrumented with start/end timing and trace publishing
|
||||
- **WsFrame contract** — new `tool_trace` frame variant
|
||||
- **tasks table** — extended with `background` type for async subagent execution
|
||||
|
||||
## Metrics
|
||||
- Tool call observability: 0% → 100% of calls traced with timing
|
||||
- Session continuity: lost on refresh → preserved on reconnect
|
||||
- Workflow authoring: hardcoded → user-authored JS scripts
|
||||
- Workflow re-run efficiency: 0% cache → hash-based step reuse
|
||||
- Background execution: blocking only → blocking + non-blocking
|
||||
- Cache visibility: 0% → per-turn + cumulative hit rate
|
||||
- Multi-modal: text-only → text + image attachments
|
||||
230
openspec/changes/paseo-orchestrator/tasks.md
Normal file
230
openspec/changes/paseo-orchestrator/tasks.md
Normal file
@@ -0,0 +1,230 @@
|
||||
# Tasks — Paseo-like Orchestrator
|
||||
|
||||
## Phase 1: Trace System + Observability (5 tasks)
|
||||
|
||||
### 1. Create tool_traces DB table + migration
|
||||
Add `tool_traces` table to `apps/server/src/schema.sql`:
|
||||
- Columns: id (UUID PK), session_id (UUID FK → sessions), chat_id (UUID FK → chats), turn_number (int), tool_name (text), input (jsonb), output (jsonb), started_at (timestamptz), finished_at (timestamptz), latency_ms (int), tokens_used (int), cache_tokens (int), reasoning_tokens (int), error (text), outcome (text)
|
||||
- Index on (chat_id, turn_number, started_at) for trace queries
|
||||
- Index on (session_id) for session-level aggregation
|
||||
- Applied idempotently via `applySchema()` — wrap in `CREATE TABLE IF NOT EXISTS`
|
||||
**Verification**: `psql` shows `tool_traces` table with all columns and indexes. Schema re-run is no-op.
|
||||
|
||||
### 2. Add tool_trace WS frame + contracts schema
|
||||
Add `tool_trace` frame to `WsFrameSchema` in `packages/contracts/src/ws-frames.ts`:
|
||||
- Frame types: `tool_trace:start` (tool_name, input, started_at) and `tool_trace:complete` (tool_name, output, latency_ms, tokens_used, cache_tokens, reasoning_tokens, error)
|
||||
- Add to `InferenceFrame` loose union in `apps/server/src/services/inference/turn.ts`
|
||||
- Add to strict `WsFrame` discriminated union in `apps/web/src/api/types.ts`
|
||||
- Rebuild contracts: `pnpm -C packages/contracts build`
|
||||
**Verification**: tsc --noEmit passes. WS client receives `tool_trace:start` and `tool_trace:complete` frames.
|
||||
|
||||
### 3. Instrument tool-phase.ts with start/end timing
|
||||
Update `apps/server/src/services/tools/tool-phase.ts`:
|
||||
- Before `executeToolCall`: record `clock_timestamp()` as start, publish `tool_trace:start` frame with tool_name and input
|
||||
- After `executeToolCall`: record `clock_timestamp()` as finish, compute latency_ms, extract token counts from response metadata, INSERT into `tool_traces` table, publish `tool_trace:complete` frame
|
||||
- Handle errors: on thrown error, publish `tool_trace:complete` with error field set, set outcome='error'; on success, outcome='success'
|
||||
- Use `sql.json(input as never)` for JSONB columns — no double-serialization
|
||||
**Verification**: Every tool call produces a `tool_traces` row with correct latency_ms and outcome. WS client receives both start and complete frames.
|
||||
|
||||
### 4. Add GET /api/chats/:id/traces endpoint
|
||||
Create `apps/server/src/routes/traces.ts`:
|
||||
- `GET /api/chats/:id/traces` — paginated, ordered by (turn_number, started_at)
|
||||
- Query params: `cursor` (opaque cursor for keyset pagination), `limit` (default 50, max 200), `turn_number` (optional filter to single turn)
|
||||
- Returns `{traces: Trace[], next_cursor: string | null}`
|
||||
- Register in Fastify router with `chatOwnershipPreHandler` guard
|
||||
**Verification**: `curl /api/chats/:id/traces` returns paginated trace rows. Turn filter returns only matching traces.
|
||||
|
||||
### 5. Build trace viewer frontend component
|
||||
Create `apps/web/src/components/TraceViewer.tsx` (and supporting files):
|
||||
- Collapsible tree grouped by turn_number
|
||||
- Per tool call row: tool_name badge, latency bar (relative bar width, color-coded: green <1s, yellow <5s, red ≥5s), token count, expand/collapse chevron
|
||||
- Expanded view: tool input (JSON formatted), tool output (JSON formatted), error message if any
|
||||
- Fetch traces from `/api/chats/:id/traces` on pane mount, paginate on scroll
|
||||
- Integrate as a new pane option in the multi-pane workspace (existing pane registry)
|
||||
**Verification**: Trace viewer loads, groups by turn, shows timing bars, expands/collapses tool calls. Pagination works for sessions with 50+ traces.
|
||||
|
||||
## Phase 2: Session Persistence + Resume (3 tasks)
|
||||
|
||||
### 6. Serialize agent state to DB on turn boundaries
|
||||
Modify `apps/coder` agent dispatch:
|
||||
- On each turn boundary (after LLM response, before next tool call loop), serialize `AgentSession` state to `agent_sessions` table
|
||||
- Persist: provider config, turn history, pending tool calls, current phase, token budget remaining
|
||||
- Use JSONB column for the snapshot state, `clock_timestamp()` for last_update
|
||||
- Guard against rapid consecutive saves (debounce 200ms)
|
||||
**Verification**: Agent session state is written to `agent_sessions` after each LLM turn. JSONB snapshot contains all fields needed for resume.
|
||||
|
||||
### 7. Restore state on WS reconnect
|
||||
Update `apps/server/src/services/ws.ts`:
|
||||
- On `snapshot` frame from a reconnecting client, check for `AgentSession` in `in_progress` or `awaiting_input` state
|
||||
- If found, rehydrate the coder pane: restore provider config, replay pending tool calls, set turn history
|
||||
- Publish a `session_restored` frame with the restored state metadata
|
||||
- Client-side: `useSessionStream` handles `session_restored` by resetting pane state to match
|
||||
**Verification**: Refresh browser mid-agent-session → after reconnect, the coder pane shows the same turn state, pending tool calls, and conversation history.
|
||||
|
||||
### 8. Agent session timeline view
|
||||
Add timeline component to the coder pane:
|
||||
- Horizontal timeline showing all turns in the current agent session
|
||||
- Each turn entry: turn number, start time, tool call count, token usage, cache hit rate
|
||||
- Active turn highlighted, past turns dimmed
|
||||
- Clicking a past turn scrolls the conversation to that turn and collapses later turns
|
||||
- Fetch turn metadata from existing session data (no new endpoint needed)
|
||||
**Verification**: Timeline shows all turns. Clicking a turn scrolls to it. Active turn is highlighted.
|
||||
|
||||
## Phase 3: Dynamic Workflow Engine (6 tasks)
|
||||
|
||||
### 9. Create isolated-vm workflow sandbox
|
||||
Create `apps/server/src/services/workflow/sandbox.ts`:
|
||||
- Use `isolated-vm` npm package to create a V8 isolate for each workflow run
|
||||
- No `require`, `fs`, `net`, `child_process` accessible in the sandbox
|
||||
- Expose only the workflow API surface (`agent`, `parallel`, `pipeline`, `phase`, `budget`, `log`, `args`)
|
||||
- Token budget enforcement: inject a step counter, throw when budget exceeded
|
||||
- Timeout: 30s default, configurable per workflow
|
||||
- Error boundary: caught exceptions produce structured error results instead of crashing the worker
|
||||
- Add `isolated-vm` to `apps/server/package.json` dependencies
|
||||
**Verification**: Workflow script that calls `agent()` runs without error. Script trying `require('fs')` throws a sandbox violation. Run exceeding budget is killed with a clear message.
|
||||
|
||||
### 10. Implement agent/parallel/pipeline primitives
|
||||
Create `apps/server/src/services/workflow/api.ts`:
|
||||
- `agent(id, { prompt, model?, tools?, budget? })` — registers a sub-agent. Returns an object with `.run(input)` that dispatches the agent through the existing agent dispatch system and returns result.
|
||||
- `parallel([agents], { budget? })` — runs all agents concurrently. Returns when all complete (or any fails). Shared token budget across parallel agents. Uses `Promise.allSettled` for resilience.
|
||||
- `pipeline([steps], { budget? })` — runs steps sequentially. Each step receives the previous step's output. Steps can be `agent()` results or inline functions.
|
||||
- `phase(name, { agents, budget })` — groups agents under a named phase. Phases can have their own budget. Results are namespaced by phase name.
|
||||
- `budget(limit)` — sets token or step limits. Returns a budget object consumed by agent/parallel/pipeline.
|
||||
- `log(msg)` — emits a structured log entry tagged with current phase/agent context. Published as WS frame to the Orchestrator pane.
|
||||
- `args` — the input arguments passed to `workflow.run(args)`.
|
||||
**Verification**: A test workflow using `agent()`, `parallel()`, and `pipeline()` executes correctly. Logs appear in the output stream. Token budgets are enforced.
|
||||
|
||||
### 11. Workflow file discovery system
|
||||
Create `apps/server/src/services/workflow/discovery.ts`:
|
||||
- Scan `.boocode/workflows/*.js` (project root, relative to `PROJECT_ROOT_WHITELIST`)
|
||||
- Scan `~/.boocode/workflows/*.js` (global, `os.homedir()`)
|
||||
- Scan `data/workflows/` (built-in catalog)
|
||||
- Each file must export a `workflow` object: `{name, description, run(args) => {...}}`
|
||||
- Validate the workflow object at discovery time: required fields, run must be a function
|
||||
- On server start, run full discovery. Cache results in a `Map<name, Workflow>`.
|
||||
- Log discovered workflows with name + description at `info` level
|
||||
**Verification**: Placing a valid `.boocode/workflows/test.js` file makes the workflow appear in `WorkflowManager.list()`. Invalid workflow files are logged as warnings and skipped.
|
||||
|
||||
### 12. Workflow manager + built-in catalog
|
||||
Create `apps/server/src/services/workflow/manager.ts`:
|
||||
- `WorkflowManager` singleton class:
|
||||
- `list()` — returns all discovered workflows with name, description, and arg schema
|
||||
- `get(name)` — returns a workflow by name
|
||||
- `run(workflow, args)` — creates a sandbox, injects args, executes `workflow.run()`. Returns a runId (UUID).
|
||||
- `cancel(runId)` — terminates the sandbox, marks run as cancelled
|
||||
- `status(runId)` — returns run status: `pending|running|completed|failed|cancelled`, with progress info
|
||||
- Concurrency limit: configurable via `WORKFLOW_MAX_CONCURRENT` env var (default 3)
|
||||
- Token budget: configurable via `WORKFLOW_DEFAULT_BUDGET` env var (default 100_000 tokens)
|
||||
- Run state tracked in-memory with optional DB persistence
|
||||
|
||||
Built-in workflows in `data/workflows/`:
|
||||
- `deep-research` — parallel source search → per-source analysis → synthesis report
|
||||
- `multi-review` — run code health + security + standards reviews in parallel, merge findings
|
||||
- `plan-verify` — generate implementation plan → verify plan → generate work items
|
||||
- `bounty-hunt` — parallel vulnerability scans with different focus areas (injection, auth, crypto, business logic)
|
||||
**Verification**: `list()` returns built-in workflows. `run()` executes a workflow and returns runId. `status()` reflects progress. `cancel()` stops execution cleanly.
|
||||
|
||||
### 13. Workflow resumability (hash-based cache)
|
||||
Create `apps/server/src/services/workflow/cache.ts`:
|
||||
- Compute SHA-256 hash of each agent spec: `crypto.createHash('sha256').update(JSON.stringify({prompt, options})).digest('hex')`
|
||||
- Before executing an agent, check in-memory LRU cache for existing result matching the hash
|
||||
- Hit: return cached result, emit `log('cached', agentId, hash)` — no actual dispatch
|
||||
- Miss: execute agent, store result in cache keyed by hash
|
||||
- LRU eviction: `WORKFLOW_CACHE_SIZE` env var (default 100 entries)
|
||||
- Optional DB persistence: `workflow_cache` table with `hash`, `result`, `created_at` — cross-session reuse
|
||||
- Re-run detection: identical workflow with same args → all agents skipped
|
||||
- Partial re-run: changed args → only changed agents re-execute, unchanged ones read from cache
|
||||
**Verification**: First run of a workflow executes all agents. Second run with identical args skips all agents (logs show 'cached'). Run with modified args for one agent only re-executes that agent.
|
||||
|
||||
### 14. Workflow UI integration with Orchestrator panel
|
||||
Extend `apps/web/src/components/Orchestrator/`:
|
||||
- Add workflow selector dropdown listing workflows from `WorkflowManager.list()`
|
||||
- Add "Run Workflow" button that opens workflow args editor (JSON or form)
|
||||
- Extend existing run pane to show workflow steps with per-agent progress
|
||||
- Live log stream from workflow `log()` calls, displayed in a scrollable log view
|
||||
- Cancel button for running workflows
|
||||
- Resumability indicator: "3/5 steps cached — skipping" when hash cache hits
|
||||
- Fetch workflow list via new API endpoint or WS message (add `GET /api/orchestrator/workflows`)
|
||||
**Verification**: Workflow selector lists built-in workflows. Running a workflow shows step-by-step progress in the run pane. Cancelling a running workflow works. Cached steps show "skipped" indicator.
|
||||
|
||||
## Phase 4: Background Subagents (3 tasks)
|
||||
|
||||
### 15. Background task queue + spawn_subagent tool
|
||||
Modify `apps/coder/` and `apps/server/`:
|
||||
- Extend `tasks` table usage with a new task type marker for background subagent tasks
|
||||
- Create `spawn_subagent` tool in `apps/server/src/services/tools/`:
|
||||
- Schema: `{prompt, model?, tools?, budget?, metadata?}`
|
||||
- Creates a `tasks` row with state=`pending`, type=`background_subagent`
|
||||
- Returns `{task_id, status: 'pending'}` immediately — does NOT block
|
||||
- Background worker loop: polls `tasks` table for `background_subagent` tasks in `pending` state, picks one up, executes it via existing agent dispatch, writes result back to tasks row on completion
|
||||
- Max concurrency: `BACKGROUND_MAX_CONCURRENT` env var (default 2)
|
||||
- Worker polls interval: 1s (configurable)
|
||||
**Verification**: Calling `spawn_subagent` returns immediately with a task_id. The task eventually completes with a result in the tasks table. Multiple background tasks run concurrently up to the concurrency limit.
|
||||
|
||||
### 16. subagent_status + subagent_result tools
|
||||
Create two tools in `apps/server/src/services/tools/`:
|
||||
- `subagent_status(task_id)`:
|
||||
- Schema: `{task_id}`
|
||||
- Returns: `{task_id, status: 'pending'|'running'|'completed'|'failed', progress?: string, started_at?, finished_at?}`
|
||||
- Queries `tasks` table for the status
|
||||
- `subagent_result(task_id)`:
|
||||
- Schema: `{task_id}`
|
||||
- Returns: `{task_id, status, result?: json, error?: string}`
|
||||
- Only returns result when status='completed'; returns empty result otherwise with a message
|
||||
- Updates task state to `read` on successful result retrieval (optional)
|
||||
**Verification**: Calling `subagent_status` on a running task returns 'running'. Calling `subagent_result` on a completed task returns the full result. Calling `subagent_result` on a pending task returns a clear "not ready yet" message.
|
||||
|
||||
### 17. Background agent pane
|
||||
Create `apps/web/src/components/BackgroundAgentPane.tsx`:
|
||||
- New pane type showing running, completed, and failed background subagents
|
||||
- Each entry: agent name/description, status badge, duration (elapsed or total), progress indicator
|
||||
- Running entries: progress bar (if available), cancel button
|
||||
- Completed entries: "View Result" action that opens a modal or inline view with the full output
|
||||
- Failed entries: error message, "Retry" action
|
||||
- Badge counter on pane tab showing number of running tasks
|
||||
- Poll status every 2s for running entries, stop polling on completion
|
||||
- Register in pane registry alongside existing pane types
|
||||
**Verification**: Background pane shows spawning tasks as "pending", transitioning to "running", then "completed"/"failed". "View Result" shows the full output. Badge counter reflects active running tasks.
|
||||
|
||||
## Phase 5: Multi-modal + Cache Shape (4 tasks)
|
||||
|
||||
### 18. Multi-modal attachment pipeline
|
||||
Add file upload support:
|
||||
- Accept file uploads via drag-drop or file picker in the message input area
|
||||
- Store uploaded files on tmpfs (`/tmp/boocode-uploads/` by default, configurable via `UPLOAD_DIR`)
|
||||
- Reference attachments in message row via `message_parts` with `type='image'` and a `url` pointing to the tmpfs path
|
||||
- Forward to DeepSeek API: encode image as base64 data URI, send as multimodal content part in the user message
|
||||
- Supported formats: png, jpg, jpeg, gif, webp
|
||||
- Size limit: 20MB default, configurable via `MAX_ATTACHMENT_SIZE_MB` env var
|
||||
- Server-side cleanup: delete tmpfs files after message is fully processed or on a periodic sweep
|
||||
**Verification**: Uploading an image creates a file on tmpfs and a referenced `message_parts` row. DeepSeek API call includes the image as a base64 content part. Error on files over size limit.
|
||||
|
||||
### 19. Image render in message bubble
|
||||
Update message rendering in `apps/web/src/components/MessageBubble.tsx`:
|
||||
- Detect `message_parts` with `type='image'` in the message content
|
||||
- Render attached images inline in the chat bubble, below the text content
|
||||
- Thumbnail: max 300px wide, aspect-ratio preserved, rounded corners
|
||||
- Lightbox: clicking the thumbnail opens a full-size overlay with close button
|
||||
- Loading state: skeleton placeholder while image loads from tmpfs URL
|
||||
- Error state: broken image placeholder with retry option
|
||||
- Clean layout: images displayed in a grid (1-2 columns depending on count)
|
||||
**Verification**: Chat messages with image attachments render inline thumbnails. Clicking opens lightbox. Large images are thumbnailed. Broken images show error state.
|
||||
|
||||
### 20. Cache shape telemetry data pipeline
|
||||
Extract and store cache metrics:
|
||||
- In the DeepSeek provider response handler, extract `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` from the API response metadata
|
||||
- Break down cache segments: system prompt tokens, tool schema tokens, conversation history tokens (approximate by measuring each segment length)
|
||||
- Store cache metrics in `tool_traces.cache_tokens` column (already created in Phase 1)
|
||||
- Optionally create a `cache_stats` table for per-segment breakdown: `{turn_id, segment_name, hit_tokens, miss_tokens}`
|
||||
- Expose via existing traces API (cache fields already part of the Trace schema)
|
||||
**Verification**: After a DeepSeek call, `tool_traces` row has `cache_tokens` populated. Cache segment breakdown is available when querying traces.
|
||||
|
||||
### 21. Cache shape visualization in trace viewer
|
||||
Update the TraceViewer component with cache metrics:
|
||||
- Per-turn cache hit bar: horizontal stacked bar showing cached (green) vs non-cached (gray) tokens
|
||||
- Hit rate percentage displayed as a badge next to token count
|
||||
- Cumulative cache hit rate in the session footer: "Cache hit rate: 67% (45K/67K tokens)"
|
||||
- Color coding: green ≥60%, yellow 30-59%, red <30%
|
||||
- Tooltip on hover showing segment breakdown if available
|
||||
- Animate transitions when new trace data arrives
|
||||
**Verification**: Trace viewer shows cache hit/miss bars per turn. Cumulative rate in footer updates as new traces load. Color coding matches thresholds.
|
||||
Reference in New Issue
Block a user