diff --git a/openspec/changes/v2.0-boocoder/implementation-plan.md b/openspec/changes/v2.0-boocoder/implementation-plan.md new file mode 100644 index 0000000..27cd243 --- /dev/null +++ b/openspec/changes/v2.0-boocoder/implementation-plan.md @@ -0,0 +1,413 @@ +# v2.0 BooCoder — Implementation Plan + +Ordered execution plan across all 4 sub-versions. Each phase is dispatchable as a single batch. Phases 1-4 are sequential (each builds on the prior); phases within a sub-version can sometimes be parallelized. + +--- + +## Phase 1 — Foundation (v2.0.0-alpha) + +**Goal:** Standalone BooCoder container boots, connects to DB, serves a health endpoint. No inference yet. + +**Estimated:** ~200 LoC + +### Steps + +1. **Clone lift sources** (prep, no code) + - `cd /opt/forks && git clone agent-hub, plandex, opencode, qodo-ai/agents` + - Read agent-hub schema, plandex pending-changes, opencode permission/evaluate.ts + - Read RA.Aid README for three-stage pattern + +2. **Create `apps/coder/` skeleton** + - `apps/coder/package.json` (Fastify, postgres, zod — same deps as `apps/server`) + - `apps/coder/tsconfig.json` (extends base, NodeNext) + - `apps/coder/src/index.ts` (Fastify boot, health endpoint, DB connect) + - `apps/coder/src/config.ts` (Zod config schema — DATABASE_URL, PORT, HOST, LLAMA_SWAP_URL, CONTAINER_GUIDANCE_FILE) + - `apps/coder/src/db.ts` (postgres connection, schema apply — shared with `apps/server` or fresh) + +3. **Create Dockerfile** + - `apps/coder/Dockerfile` — Node 20 bookworm-slim (matches booterm for glibc compat with node-pty later) + - Mount: `/opt:/opt:rw` + - COPY built server + BOOCODER.md + +4. **docker-compose.yml** — add `boocoder` service + - Port `100.114.205.53:9502:3000` + - Environment: `DATABASE_URL`, `LLAMA_SWAP_URL`, `CONTAINER_GUIDANCE_FILE=/app/BOOCODER.md` + - Network: `boocode_net` + - Depends on: `boocode_db` + +5. **DB rename** — `boocode_db` → `boochat_db` + - `ALTER DATABASE boocode RENAME TO boochat;` (one-time, run manually) + - Update `DATABASE_URL` in all docker-compose services + - Update volume name mapping + - Verify all 3 services boot against renamed DB + +6. **Schema migration** — new tables in `apps/coder/src/schema.sql` + - `pending_changes` table + - `tasks` table + - `available_agents` table + - `human_inbox` view + - Applied idempotently on boot (same pattern as BooChat's `applySchema()`) + +7. **BOOCODER.md** — container guidance file + - Write tools enabled (unlike BOOCHAT.md which declares read-only) + - Pending-changes queue discipline + - Path-guard rules + +### Verification +- `docker compose up --build -d` — boocoder container starts +- `curl http://100.114.205.53:9502/api/health` — 200 OK +- `psql` confirms new tables exist +- BooChat + BooTerm unaffected (still boot, still serve) + +--- + +## Phase 2 — Write Tools + Pending Changes (v2.0.0-beta) + +**Goal:** BooCoder can chat with the LLM, the LLM can call write tools, changes queue in `pending_changes`, user can apply/reject. + +**Estimated:** ~400 LoC + +### Steps + +1. **Write-path guard** (`apps/coder/src/services/write_guard.ts`) + - `resolveWritePath(projectRoot, filePath): string` — `resolve()` + prefix check (no realpath — file may not exist for creates) + - Deny list: inherit from BooChat's `secret_guard.ts` (`.env`, `*.pem`, `id_rsa*`, etc.) + - Fuzz tests: `../` escape, symlink outside root, null bytes, non-existent parent dirs + +2. **Pending-changes service** (`apps/coder/src/services/pending_changes.ts`) + - `queueEdit(session_id, task_id, file_path, old_string, new_string): PendingChange` — computes unified diff, validates write path, INSERTs + - `queueCreate(session_id, task_id, file_path, content): PendingChange` + - `queueDelete(session_id, task_id, file_path): PendingChange` + - `applyAll(session_id): ApplyResult[]` — re-validates each path, writes to disk, marks `status='applied'` + - `applyOne(change_id): ApplyResult` + - `rejectOne(change_id): void` — marks `status='rejected'` + - `rejectAll(session_id): void` + - `rewindOne(change_id): void` — inverse-diff, writes to disk, marks `status='reverted'` + - `listPending(session_id): PendingChange[]` + +3. **Write tools** (`apps/coder/src/services/tools/`) + - `edit_file.ts` — input: `{file_path, old_string, new_string}`, calls `queueEdit` + - `create_file.ts` — input: `{file_path, content}`, calls `queueCreate` + - `delete_file.ts` — input: `{file_path}`, calls `queueDelete` + - `apply_pending.ts` — calls `applyAll` for current session + - `rewind.ts` — input: `{change_id}` or `{all: true}`, calls `rewindOne`/`rewindAll` + +4. **Tool registry** — register write tools alongside ALL read tools from BooChat + - Import BooChat's read tools (view_file, grep, etc.) + codecontext tools + - Add the 5 write tools + - Alpha-sort the combined list + +5. **Inference loop** — port from BooChat or share via workspace package + - Copy `apps/server/src/services/inference/` into `apps/coder/src/services/inference/` (or symlink via pnpm workspace) + - The outer loop (v1.14) runs unchanged — write tools are just ToolDefs with `execute()` functions + - Compaction, doom-loop, step cap all carry forward + +6. **API routes** + - `POST /api/sessions/:id/messages` — same as BooChat (creates user + assistant rows, enqueues inference) + - `GET /api/sessions/:id/pending` — returns pending changes for the session + - `POST /api/sessions/:id/pending/apply` — applies all pending + - `POST /api/pending/:id/apply` — applies one + - `POST /api/pending/:id/reject` — rejects one + - `POST /api/pending/:id/rewind` — reverts one + - WebSocket streaming (same protocol as BooChat) + +### Verification +- Send a chat asking BooCoder to edit a file +- LLM calls `edit_file` → change queued in `pending_changes` +- `GET /api/sessions/:id/pending` shows the queued change with diff +- `POST /api/pending/:id/apply` writes to disk +- `POST /api/pending/:id/rewind` reverts it +- Fuzz test: attempt traversal via `edit_file("../../etc/passwd", ...)` → rejected by write_guard + +--- + +## Phase 3 — Frontend: Diff Pane + Chat (v2.0.0) + +**Goal:** Browser UI at `coder.indifferentketchup.com` with chat pane + diff pane side by side. + +**Estimated:** ~200 LoC + +### Steps + +1. **Create `apps/coder/web/`** — React + Vite SPA (same stack as BooChat's `apps/web/`) + - Copy BooChat's Vite config, Tailwind v4 setup, font pipeline + - Shared components: `MarkdownRenderer`, `CodeBlock`, `Button`, `Input` + - New app shell: sidebar (sessions) + workspace (panes) + +2. **Chat pane** — reuse BooChat's ChatPane/MessageBubble pattern + - Same WS streaming, same `useSessionStream` hook, same message rendering + - ActionRow includes tool-call rendering for write tools + +3. **Diff pane** — NEW (`apps/coder/web/src/components/DiffPane.tsx`) + - Fetches `GET /api/sessions/:id/pending` + - Lists pending changes: file path + operation badge (create/edit/delete) + - Per-change: syntax-highlighted unified diff view (use Shiki or a diff-specific highlighter) + - Buttons: Approve / Reject per change, Approve All / Reject All + - Real-time updates via WS frame (`pending_change_added`, `pending_change_applied`, etc.) + +4. **Workspace splitter** — chat left, diff right (or configurable) + +5. **Caddy route** — `coder.indifferentketchup.com` → boocoder:9502 + - Authelia gating (same as BooChat) + +### Verification +- Open `coder.indifferentketchup.com` in browser +- Send a message asking for a code change +- See the change appear in the diff pane in real time +- Click Approve → file written, change marked applied +- Click Reject → change discarded + +--- + +## Phase 4 — Dispatcher + Tasks (v2.0.0 final) + +**Goal:** Task queue works. User can create tasks, dispatcher picks them up and runs them through Path A. + +**Estimated:** ~150 LoC + +### Steps + +1. **Dispatcher** (`apps/coder/src/services/dispatcher.ts`) + - In-process `setInterval(5000)` polling `tasks` WHERE `state='pending'` ORDER BY `created_at` + - For each ready task: mark `state='running'`, run inference with the task's `input` as the user message + - On completion: mark `state='completed'` + - On error: mark `state='failed'` + - On abort: mark `state='cancelled'` + - Respects `app.addHook('onClose')` — stops polling, waits for in-flight task + +2. **Task API routes** + - `POST /api/tasks` — create a task `{project_id, input, agent?, model?}` + - `GET /api/tasks` — list tasks (filterable by state, project) + - `GET /api/tasks/:id` — get task details + output_summary + - `POST /api/tasks/:id/cancel` — cancel a running task + +3. **Task → session linkage** + - Each task creates its own session + chat for isolation + - Task's pending_changes reference the task_id + - When task completes, its pending_changes are visible in the UI for approval + +4. **Agent probing** (`apps/coder/src/services/agent-probe.ts`) + - On startup: `which opencode`, `which goose`, `which claude`, `which pi` + - Parse version from ` --version` + - Check ACP support: `opencode acp --help` exits 0 → supports_acp = true + - Populate `available_agents` table + +### Verification +- `POST /api/tasks {input: "add a /api/version endpoint"}` → task created +- Dispatcher picks it up → inference runs → `edit_file` queued → task completes +- `GET /api/tasks/:id` shows `state='completed'` + output_summary +- Pending changes visible in diff pane for approval + +--- + +## Phase 5 — ACP Dispatch (v2.0.1) + +**Goal:** Tasks can be dispatched to external agents via ACP. opencode and goose run as subprocesses, their events flow back into BooCode. + +**Estimated:** ~350 LoC + +### Steps + +1. **ACP client** (`apps/coder/src/services/acp-client.ts`) + - Install: `pnpm -C apps/coder add @zed-industries/agent-client-protocol` + - `spawnAcpAgent(agent: string, task: string, worktree: string, mcpServers: McpConfig[]): AcpSession` + - Uses SDK's `StdioTransport` — spawn `opencode acp` or `goose acp` as child + - Pass `context_servers` for MCP auto-forward + - Event listener: maps ACP events to BooCode's parts taxonomy + +2. **ACP event mapping** + - `file_operation` → queue into `pending_changes` (same as Path A native writes) + - `tool_call` / `tool_result` → insert as `message_parts` in the task's session + - `terminal_output` → publish as WS frame for BooTerm routing + - `permission_request` → pause (same mechanism as `ask_user_input`) + - `session_end` → task state → `completed` or `failed` + +3. **Worktree management** (`apps/coder/src/services/worktrees.ts`) + - `createWorktree(projectPath, taskId): string` — `git worktree add /tmp/booworktrees/ -b task- HEAD` + - `diffWorktree(worktreePath, projectPath): UnifiedDiff[]` — `git diff HEAD...` + - `cleanupWorktree(worktreePath): void` — `git worktree remove` + - On ACP session end: diff the worktree, queue diffs into `pending_changes`, cleanup + +4. **PTY fallback** (`apps/coder/src/services/pty-dispatch.ts`) + - For agents without ACP (claude, pi, smallcode) + - `spawnPtyAgent(agent: string, task: string, worktree: string): PtySession` + - Uses `node-pty` — spawn `claude` or `pi` with cwd = worktree + - Capture stdout/stderr into `message_parts` (kind='text', less structured than ACP) + - On exit: diff worktree → queue pending_changes → cleanup + +5. **Dispatcher update** — transport selection + - Check `available_agents[agent].supports_acp` at dispatch time + - ACP-capable → `spawnAcpAgent` + - PTY fallback → `spawnPtyAgent` + - Native (no agent specified) → Path A inference loop (Phase 4) + +6. **AGENTS.md extensions** + - Add `execution_strategy: plan | act | research` field + - Add `expert_model` field for cost-routing + - Add `output_schema` field (optional JSON Schema for structured final output) + +### Verification +- Create task with `agent: 'opencode'` → ACP subprocess spawns +- opencode edits files in worktree → events stream into UI +- On completion: worktree diff queued in `pending_changes` +- Approve → changes applied to main project +- Fallback: create task with `agent: 'claude'` → PTY captures output → worktree diff queued + +--- + +## Phase 6 — MCP Server (v2.0.2) + +**Goal:** BooCoder exposes its own primitives as MCP tools. External opencode sessions in Termius can drive the task queue. + +**Estimated:** ~250 LoC + +### Steps + +1. **MCP server** (`apps/coder/src/services/mcp-server.ts`) + - Use `@modelcontextprotocol/sdk` server-side (`Server` class) + - Stdio transport (read from stdin, write to stdout) + - Entry point: `boocoder --mcp` CLI flag starts the MCP server instead of the HTTP server + +2. **Tool handlers** (6 tools) + - `boocoder.create_task` → INSERT into tasks table, return task_id + - `boocoder.list_pending_changes` → SELECT from pending_changes WHERE session matches + - `boocoder.apply` → call `applyOne(change_id)` + - `boocoder.reject` → call `rejectOne(change_id)` + - `boocoder.dispatch_external_agent` → create task with agent specified, return task_id + - `boocoder.list_worktrees` → list active worktrees from tasks WHERE worktree_path IS NOT NULL AND state='running' + +3. **10-question eval** (per `anthropics/skills/mcp-builder` framework) + - Write 10 independent, read-only, verifiable questions about the BooCoder state + - Run eval: `echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"boocoder.list_pending_changes","arguments":{}},"id":1}' | boocoder --mcp` + - All 10 must return correct answers + +4. **opencode integration test** + - Add BooCoder as an MCP server in `~/.opencode/config.json`: + ```json + {"mcpServers": {"boocoder": {"type": "stdio", "command": "boocoder", "args": ["--mcp"]}}} + ``` + - From opencode: call `boocoder.create_task` → verify task appears in BooCoder UI + +### Verification +- `echo '...' | boocoder --mcp` returns valid MCP responses +- 10-question eval passes +- opencode can drive BooCoder's task queue via MCP + +--- + +## Phase 7 — CLI + Polish (v2.0.3) + +**Goal:** `boocode` CLI client, human inbox UI, cost tracking, observation hooks. + +**Estimated:** ~400 LoC + +### Steps + +1. **CLI client** (`apps/coder/src/cli.ts`) + - Thin HTTP/WS client against BooCoder API + - `boocode run "task description"` → POST /api/tasks → stream output via WS + - `boocode ls` → GET /api/tasks → formatted table + - `boocode attach ` → WS subscribe to task's session → stream live + - `boocode send "message"` → POST message to task's session chat + - Build as a standalone binary via `pkg` or `esbuild --bundle` + +2. **Human inbox UI** (frontend) + - New route: `/inbox` → shows tasks WHERE `state IN ('blocked', 'failed')` + - Per-task: view output, retry (reset state to pending), cancel, reassign agent + - Badge on sidebar showing count of inbox items + +3. **Cost tracking** + - `tasks.cost_tokens` populated from inference `usage` callback (same as BooChat's `tokens_used`) + - Summary API: `GET /api/stats/costs?group_by=project|agent|day` → aggregated token spend + - Simple UI: cost badge on each task, totals in settings + +4. **Observation hooks** (budi taxonomy) + - Emit 5 event types on the BooCoder WS protocol for dispatched agents: + - `session_start` — agent spawned + - `user_prompt_submit` — task spec delivered + - `post_tool_use` — each tool call completed + - `subagent_start` — nested dispatch (Boomerang) + - `stop` — agent finished + - Consumed by frontend for real-time status indicators + +5. **Boomerang `new_task` tool** (subagent isolation) + - When an agent's toolset includes `new_task`: + - Creates a child task (fresh session, fresh context) + - Child runs to completion + - Parent gets only `attempt_completion` summary + - Orchestrator agent profile: tools = `[new_task, list_tasks, check_task_status]` ONLY + +### Verification +- `boocode run "add health endpoint"` from terminal → task runs → output streams → diff queued +- `boocode ls` shows task list with states + cost +- Inbox shows failed tasks, retry works +- Boomerang: orchestrator creates subtask → subtask runs isolated → parent gets summary only + +--- + +## Phase 8 — Hardening + Ship (v2.0.x) + +**Goal:** Security hardening, integration tests, documentation, production deploy. + +**Estimated:** ~100 LoC (mostly tests + docs) + +### Steps + +1. **Path-guard fuzz suite** — property tests for every traversal pattern: + - `../` sequences (all depths) + - Symlink outside project root + - Null bytes in path + - Unicode normalization attacks + - Race conditions (TOCTOU between validate + write) + - MCP-served filesystem writes routed through pending_changes + +2. **Integration tests** + - End-to-end: create task → inference → edit_file → apply → file written → verify content + - ACP dispatch: mock opencode → events flow → pending_changes queued + - MCP server: 10-question eval automated in CI + +3. **Documentation** + - `BOOCODER.md` finalized (container guidance) + - `CLAUDE.md` updated with BooCoder architecture section + - `boocode_roadmap.md` v2.0 retrospective + - `CHANGELOG.md` entries for each sub-version + +4. **Production deploy** + - Caddy config: `coder.indifferentketchup.com` + - Authelia: same SSO group as BooChat + - Smoke: full workflow (chat → edit → approve → verify) + +5. **Tag** — `v2.0.0` (or `v2.0.0-rc1` if Sam wants a bake period) + +--- + +## Execution order summary + +``` +Phase 1 (foundation) → v2.0.0-alpha ~200 LoC container boots +Phase 2 (write tools) → v2.0.0-beta ~400 LoC inference + pending_changes +Phase 3 (frontend) → v2.0.0 ~200 LoC chat + diff panes +Phase 4 (dispatcher) → v2.0.0-final ~150 LoC task queue + native dispatch +Phase 5 (ACP dispatch) → v2.0.1 ~350 LoC external agents + worktrees +Phase 6 (MCP server) → v2.0.2 ~250 LoC boocoder.* tools + eval +Phase 7 (CLI + polish) → v2.0.3 ~400 LoC CLI + inbox + hooks + Boomerang +Phase 8 (hardening) → v2.0.x ~100 LoC fuzz + integration tests + docs + -------- + ~2050 LoC total +``` + +Each phase is independently dispatchable. Phases 1-4 are sequential (each needs the prior). Phases 5-7 are parallelizable after Phase 4 ships (they're independent protocol surfaces). Phase 8 gates the production tag. + +--- + +## Risk register + +| Risk | Mitigation | +|---|---| +| Path-guard bypass → arbitrary writes | Pending-changes double-validates (at queue time + apply time). Fuzz suite in Phase 8. OpenHands sandbox (v2.1) as fallback. | +| ACP spec instability (remote transport WIP) | Use stdio only. No remote ACP in v2.0. | +| node-pty native compilation breaks in Docker | bookworm-slim + glibc matches booterm's working config. Pin node-pty version. | +| Worktree cleanup failure → disk bloat | 30-min idle timeout sweeper. `git worktree prune` on startup. | +| DB rename breaks existing sessions | One-time migration with explicit backup. BooChat/BooTerm URLs unchanged. | +| MCP server eval failure | Ship stdio MCP server only after 10/10 eval passes. | +| Boomerang context leak (child leaks state to parent) | Architectural enforcement: child's session_id ≠ parent's. Summary field is the ONLY bridge. |