Compare commits

...

9 Commits

Author SHA1 Message Date
ce31577d1e v2.0.0-beta: write tools, pending-changes queue, inference loop, API routes
Phase 2 of v2.0. BooCoder is now a functional write-capable chatbot.

Write-path guard: resolveWritePath() uses resolve() (no realpath — files may
not exist for creates) + prefix-check + secret-file deny list (.env, *.pem,
id_rsa*, etc.). 23 unit tests cover traversal attacks.

Pending-changes service: queueEdit/Create/Delete → applyOne/All →
rejectOne/All → rewindOne. Edit diffs stored as JSON {old, new}. All writes
queue before touching disk; apply re-validates the path guard.

5 write tools: edit_file, create_file, delete_file, apply_pending, rewind.
Registered alongside 25 read-only tools from BooChat (30 total, alpha-sorted).
Write tools use a module-level inference context for sql+sessionId injection.

Inference loop via workspace dependency: apps/coder imports
createInferenceRunner, createBroker, ALL_TOOLS from @boocode/server (dist/).
apps/server gains declaration: true + exports map with typed subpath entries.
No code duplication — one inference engine shared by both apps.

API routes: POST /api/sessions/:id/messages (user msg → inference), POST stop,
GET/POST pending-changes CRUD (5 endpoints), WebSocket session streaming.

Dockerfile updated to build apps/server first (coder depends on its .d.ts).
Health endpoint reports tool count: {"ok":true,"db":true,"tools":30}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 01:53:38 +00:00
006226cce5 v2.0.0-alpha: BooCoder foundation — container, schema, DB rename
Phase 1 of v2.0. BooCoder is live at port 9502 with a health endpoint.

- Database renamed: ALTER DATABASE boocode RENAME TO boochat (one-time).
  All services updated to connect to /boochat. Docker service name stays
  boocode_db (rename is internal to Postgres, not Docker).

- New apps/coder/ app skeleton: Fastify server with health endpoint,
  postgres connection, schema apply on boot. Mirrors apps/server pattern
  but minimal (no inference loop yet — Phase 2).

- Schema: pending_changes (operation queue before /apply), tasks (dispatch
  DAG with state machine), available_agents (startup-probed agent registry),
  human_inbox view (tasks WHERE state IN blocked/failed). All IF NOT EXISTS,
  idempotent on re-run. Same boochat database, different tables.

- Dockerfile: Node 20 bookworm-slim (glibc for future node-pty in Phase 5).
  Multi-stage build matching the existing boocode image pattern.

- docker-compose.yml: boocoder service on 100.114.205.53:9502, /opt:/opt:rw
  mount (write-capable, policy-gated at tool layer), depends on boocode_db.

- BOOCODER.md: container guidance declaring write-tool capability +
  pending-changes discipline.

All 4 services boot and pass health checks. 9 tables in the shared DB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 01:20:29 +00:00
62d818af23 v2.0 implementation plan: 8 phases from foundation to production
Detailed execution plan for all v2.0 sub-versions:

Phase 1 (v2.0.0-alpha): container skeleton, DB rename, schema migration
Phase 2 (v2.0.0-beta): write tools + pending-changes service + fuzz tests
Phase 3 (v2.0.0): frontend diff pane + chat pane + Caddy routing
Phase 4 (v2.0.0-final): dispatcher worker + task queue + agent probing
Phase 5 (v2.0.1): ACP client + PTY fallback + worktree management
Phase 6 (v2.0.2): MCP server (6 tools, stdio, 10-question eval)
Phase 7 (v2.0.3): CLI + human inbox + cost tracking + observation hooks + Boomerang
Phase 8 (v2.0.x): path-guard fuzz, integration tests, docs, production deploy

~2050 LoC total. Phases 1-4 sequential, 5-7 parallelizable after 4.
Risk register covers path-guard bypass, ACP instability, worktree cleanup,
DB rename, MCP eval, Boomerang context leak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 01:09:05 +00:00
531d39ace9 v2.0 proposal update: add AGENTS.md extensions, Boomerang pattern, observation hooks, follow-up batches
Additions from second pass of boocode_code_review.md:

- AGENTS.md extensions: output_schema, exit_expression, execution_strategy
  (qodo-ai/agents MIT), expert_model escape hatch (RA.Aid Apache-2.0)
- Subagent isolation via Boomerang Tasks pattern: orchestrator-only-dispatches,
  down-pass/up-pass context discipline, fresh session per subtask
- Observation hooks: 5-event taxonomy from budi (SessionStart, UserPromptSubmit,
  PostToolUse, SubagentStart, Stop) mapped to WS frames
- Follow-up batches table: PR-resolver, HMAC audit log, blind-validation gate,
  majority-vote ensembler, drift detection, anti-slop, globstar gate, Docker
  sandbox, multi-provider LLM
- Additional repo to clone: qodo-ai/agents for agent.toml schema reference

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 23:22:57 +00:00
f2974d6887 v2.0 proposal: BooCoder — write tools, pending changes, ACP dispatch, MCP server
Comprehensive roadmap for the v2.0 major version bump. Covers:
- Schema: pending_changes, tasks, available_agents tables + human_inbox view
- Path A: native write tools (edit_file, create_file, delete_file) queuing
  through pending_changes before /apply flushes to disk
- Path B: external agent dispatch via ACP (opencode, goose) or PTY fallback
  (claude, pi) with per-task git worktrees and automatic diff-on-completion
- BooCoder MCP server: 6 tools exposing task primitives over stdio
- Code lifts: agent-hub (Apache-2.0, task DAG), plandex (MIT, diff UX),
  ACP SDK (Apache-2.0, subprocess protocol), Paseo (AGPL, design-only)
- Sub-versions: v2.0.0 (Path A), v2.0.1 (Path B), v2.0.2 (MCP server),
  v2.0.3 (CLI + polish)
- Estimate: ~2200 LoC total

All v1.x dependencies shipped (v1.13 parts, v1.14 outer loop, v1.15 MCP
client, v1.16 codesight). v2.0 is unblocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:11:16 +00:00
29c7d051b6 v1.16.0-codesight-merge: 4 new codecontext tools — blast radius, hot files, routes, middleware
BooCode wrapper tools for the 4 new MCP tools added to the codecontext
sidecar (Go side committed separately at /opt/forks/codecontext).

- get_blast_radius: reverse-edge BFS — "what breaks if I change this?"
- get_hot_files: most-imported files by incoming edge count
- get_routes: Fastify/Express route extraction via tree-sitter AST
- get_middleware: middleware detection via import + registration patterns

Wrappers follow the existing codecontext pattern: Zod input → callCodecontext
→ ToolDef export. Registered in ALL_TOOLS (alpha-sorted). All 4 are read-only.

codecontext sidecar rebuilt from commit b19e646 with the 4 new Go handlers
(2130 lines, 29 tests). Reviewer fixes applied: defer RUnlock on Tier 2
handlers, extractObjectProperty delegates to extractStringValue for
template-literal route paths.

363/363 server tests passing. No schema changes, no frontend changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 05:19:52 +00:00
d27a977d59 v1.15.0-mcp-multi: multi-server MCP client + stdio transport + config file + tool globs
Generalizes the v1.14.1 single-server Context7 PoC into a multi-server MCP
client registry with per-server graceful degradation. JSON config at
/data/mcp.json (bind-mounted alongside AGENTS.md) matches opencode's
mcpServers schema shape. Config file missing = no MCP (opt-in by presence).

Two transports: Streamable HTTP (remote servers like Context7) and stdio
(local subprocess servers like codecontext). Stdio spawns a persistent child
via the SDK's StdioClientTransport; shutdown hook closes all transports.

Tool prefix generalized from context7_<name> to <serverName>_<toolName> with
a toolToServer reverse map for dispatch routing. AGENTS.md tools: field now
supports glob patterns (context7_*, !web_*) via matchToolGlob — last-match-
wins with ! deny prefix. Replaces exact-match .includes() in stream-phase.ts.

refreshToolNames() in agents.ts rebuilds the DEFAULT_TOOLS snapshot after
appendMcpTools so agents without explicit tools: lists see MCP tools —
reviewer caught that the module-load-time snapshot would permanently exclude
late-registered tools.

Read-only invariant: readOnlyHint === false rejected at discovery. Result
size capped at 5MB. v1.14.1 env vars removed — superseded by config file.
Default data/mcp.json ships with Context7 disabled.

363/363 server tests passing. No schema changes, no frontend changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:08:42 +00:00
5692e99a5d v1.14.1-mcp-poc: single-server MCP client against Context7
Validates the MCP-client loop end-to-end against one real MCP server before
the full v1.15 port. New services/mcp-client.ts wraps @modelcontextprotocol/sdk
v1.29.0 with Streamable HTTP transport. On startup (when MCP_CONTEXT7_URL is
set), connects to Context7, discovers tools via tools/list, wraps each as a
ToolDef prefixed context7_<name>, and appends to ALL_TOOLS via appendMcpTools.

Read-only invariant guard rejects any tool with readOnlyHint: false. Tool
dispatch is transparent — executeToolCall routes MCP calls through the ToolDef
execute wrapper, which strips the prefix before calling the MCP server. Result
size capped at 5MB with truncation. Graceful degradation: server down at
startup → zero tools; server down mid-session → error result, model
self-corrects.

Adversarial review caught that a Zod .default() on the URL config made MCP
always-on instead of opt-in — fixed by removing the default. MCP_CONTEXT7_URL
must be explicitly set to enable.

ALL_TOOLS changed from ReadonlyArray to mutable to support late-registration.
appendMcpTools re-sorts and rebuilds TOOLS_BY_NAME after append.

348/348 server tests passing (16 new mcp-client tests). No schema changes,
no frontend changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 21:58:09 +00:00
f4a97808ad v1.14.0-outer-loop: explicit while loop replaces inference recursion
Converts the ad-hoc executeToolPhase → runAssistantTurn recursion into an
explicit while (stepNumber < effectiveCap) loop. A step is one stream-and-
tool-execute iteration; the loop terminates on non-tool finish, step-cap hit,
doom-loop, budget exhaustion, abort, or synthesis success.

MAX_STEPS = 200 hard ceiling (4x old effective limit from budget). Per-agent
steps: field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5,
Architect: 20, others: unset = bounded only by MAX_STEPS). Resolution:
effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS).

executeToolPhase no longer recurses — returns ToolPhaseResult struct
(action: 'continue' | 'paused' | 'synthesis_done') so the caller decides
whether to continue or break. steps: 0 handled as "no tool calls allowed"
via runTextOnlyTurn (one text-only stream phase, tool calls ignored with
warn log).

Step-cap hits produce a sentinel summary (reuses cap_hit kind so
CapHitSentinel.tsx renders without frontend changes; text distinguishes
"Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated
to top of loop body — same predicate, same threshold (3), break instead of
return.

step_start parts are in the schema CHECK but not emitted as message_parts —
writing before the stream phase creates a sequence-0 collision with
partsFromAssistantMessage. Structured log line emitted instead. Adversarial
review caught the collision pre-deploy.

332/332 server tests passing. No frontend changes. No schema changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 20:29:21 +00:00
66 changed files with 4646 additions and 240 deletions

1
.gitignore vendored
View File

@@ -10,3 +10,4 @@ secrets/
data/*
!data/AGENTS.md
!data/skills/
!data/mcp.json

View File

@@ -1,27 +1,32 @@
# BooCoder
# BooCoder — Container Guidance
> (Stub. v2.0 implementation pending. This file documents the intended contract.)
You are BooCoder, a write-capable coding agent. You can read AND modify files within the project scope.
## Capabilities
## You can
- Everything in `BOOCHAT.md`
- Write tools (pending): `write_file`, `edit_file`, `delete_file` (all gated through pending-changes sandbox)
- Shell (pending): `run_command` (Docker-isolated per-session)
- Read files (view_file, list_dir, grep, find_files)
- Edit files (edit_file, create_file, delete_file) — all changes queue in pending_changes
- Apply pending changes to disk (apply_pending)
- Revert applied changes (rewind)
- Dispatch tasks to external agents (dispatch_external_agent)
- Use MCP tools from configured servers
## Constraints
## You cannot
- All writes land in a pending-changes virtual layer; nothing touches the real filesystem until `/apply`
- `run_command` executes inside the session sandbox, not the host
- No git commits, pushes, or pulls — Sam owns those
- Stop and ask before destructive operations (delete, overwrite, recreate)
- Write outside the project root (path-guard enforced)
- Write to secret files (.env, *.pem, id_rsa*, credentials.json)
- Apply changes without explicit user approval (unless auto-apply is enabled per task)
- Push to git remotes
- Access the internet except via configured MCP servers
## Pending changes discipline
Every file modification queues in `pending_changes` before touching disk. The user sees a diff preview and approves/rejects each change. Never bypass this queue — it is the safety boundary between inference and the filesystem.
## Behavior
- Show a diff preview before any write
- Group related edits into a single `/apply` batch
- If a tool fails, surface the error verbatim — don't paper over it
- Show diffs clearly. Explain what you're changing and why.
- For multi-file changes, organize as a logical unit (one task = one coherent change set).
- If uncertain about scope, use smaller edits and verify between steps.
- Cite file paths + line numbers for context.
- Verify before reporting work complete: run the relevant test/build/smoke and confirm output matches the claim. Evidence first, assertion second.
## Convention: rules vs recipes
Always-true rules live here, in `BOOCHAT.md`, and in `CLAUDE.md` (100% present each turn). On-demand recipes live in `/data/skills/` (roughly 6% invoke rate in multi-turn per Codeminer42, 2026). Don't file workflow rules as skills — they misfire. See Anthropic agent-skills best-practices (platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices).

View File

@@ -2,6 +2,22 @@
All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch.
## v1.16.0-codesight-merge — 2026-05-24
Ports codesight's highest-value analysis capabilities into the codecontext sidecar as 4 new MCP tools. Tier 1 (graph queries on existing edges, no re-parsing): `get_blast_radius` (BFS reverse-edge traversal — "what breaks if I change this file?", with depth tracking) and `get_hot_files` (most-imported files ranked by incoming edge count — change-risk indicators). Tier 2 (tree-sitter AST re-parsing on demand): `get_routes` (Fastify/Express HTTP route extraction with method, path, file, line, inferred tags for db/auth/cache) and `get_middleware` (middleware registration detection via import-name heuristics and app.register/addHook/setErrorHandler patterns, classifying as auth/cors/rate-limit/security/error-handler/logging/validation). All 4 tools use `defer s.graphMu.RUnlock()` for consistent mutex discipline (reviewer caught that the initial implementation released the lock early on the Tier 2 tools). Route object-property extraction delegates to `extractStringValue` for template-literal handling (reviewer catch). codecontext sidecar rebuilt from `/opt/forks/codecontext` commit `b19e646`, tagged `v1.16.0-codesight-merge`. BooCode wrapper tools follow the existing codecontext pattern — 4 new files in `apps/server/src/services/tools/codecontext/`, registered in ALL_TOOLS. 29 new Go tests + 363/363 BooCode server tests passing. No schema changes, no frontend changes.
## v1.15.0-mcp-multi — 2026-05-24
Multi-server MCP client with stdio + Streamable HTTP transports, JSON config file, and per-agent tool glob patterns. Generalizes the v1.14.1 single-server Context7 PoC into a registry of named MCP servers with per-server graceful degradation. JSON config at `/data/mcp.json` (bind-mounted alongside `AGENTS.md`) matches opencode's `mcpServers` schema shape so server entries are copy-pasteable. Config file missing = no MCP (opt-in by file presence). Stdio transport spawns a persistent subprocess via the SDK's `StdioClientTransport` with NDJSON framing; Streamable HTTP reuses the v1.14.1 pattern via `StreamableHTTPClientTransport`. Tool prefix generalized from `context7_<name>` to `<serverName>_<toolName>` with a reverse `toolToServer` map for dispatch routing. Per-agent AGENTS.md `tools:` field now supports glob patterns (`context7_*`, `!web_*`) via `matchToolGlob` (last-match-wins, `!` prefix denies); replaces the exact-match `.includes()` in `stream-phase.ts`. Glob patterns bypass `ALL_TOOL_NAMES` validation in the parser since MCP tool names aren't known at parse time. `refreshToolNames()` in `agents.ts` rebuilds the `DEFAULT_TOOLS` snapshot after `appendMcpTools` so agents without explicit `tools:` lists see MCP tools — reviewer caught that the module-load-time snapshot would permanently exclude late-registered tools. Read-only invariant preserved: all MCP tools with `readOnlyHint: false` rejected at discovery. Result size capped at 5MB. Shutdown hook closes all transports. v1.14.1 env vars (`MCP_CONTEXT7_URL`, `MCP_CONTEXT7_API_KEY`) removed — superseded by the config file. Default `data/mcp.json` ships with Context7 disabled; flip `"enabled": true` to activate. 363/363 server tests passing (27 new: multi-server wrapping, glob matching, routing, degradation). No schema changes, no frontend changes.
## v1.14.1-mcp-poc — 2026-05-23
Single-server MCP client PoC against Context7. New `apps/server/src/services/mcp-client.ts` (~200 lines) wraps `@modelcontextprotocol/sdk` v1.29.0 with Streamable HTTP transport. On startup (when `MCP_CONTEXT7_URL` is set), connects to Context7, discovers tools via `tools/list`, wraps each as a `ToolDef` prefixed `context7_<name>`, and appends to `ALL_TOOLS` (alpha-sorted for prompt-cache stability). `appendMcpTools()` in `tools.ts` handles the late-registration; `ALL_TOOLS` changed from `ReadonlyArray` to mutable to support it. Read-only invariant guard rejects any MCP tool with `readOnlyHint: false` (MCP SDK v1.29.0 uses `readOnlyHint`, not `readOnly`). Tool dispatch is transparent — `executeToolCall` routes MCP tool calls through the `ToolDef.execute` wrapper, which strips the `context7_` prefix before calling the MCP server. Graceful degradation: MCP server down at startup → zero tools, warn log; MCP server down mid-session → error-shaped result, model self-corrects. Result size capped at 5MB with truncation (matches native `view_file`'s `MAX_FILE_BYTES`). Adversarial review caught that the Zod `.default('https://...')` on the URL config made MCP effectively always-on instead of opt-in — fixed by removing the default. 348/348 server tests passing (16 new mcp-client tests covering tool wrapping, read-only guard, name prefixing, content extraction). No schema changes, no frontend changes. Proves the MCP tool-discovery → tool-call → result-render loop end-to-end before the full v1.15 port.
## v1.14.0-outer-loop — 2026-05-23
Converts the inference engine's ad-hoc `executeToolPhase → runAssistantTurn` recursion into an explicit `while` loop with a configurable step cap. A step is one stream-and-tool-execute iteration; the loop terminates on non-tool finish, step-cap hit, doom-loop, budget exhaustion, abort, or synthesis success. `MAX_STEPS = 200` is the hard ceiling (4x the old effective limit from budget); per-agent `steps:` field in AGENTS.md frontmatter sets tighter caps (Refactorer: 5, Architect: 20, others: unset = bounded only by MAX_STEPS). `executeToolPhase` no longer recurses — returns a `ToolPhaseResult` struct (`action: 'continue' | 'paused' | 'synthesis_done'`) so the caller (the while loop) decides whether to continue or break. `steps: 0` is handled as "no tool calls allowed" — one text-only stream phase, tool calls ignored with a warn log. Step-cap hits produce a sentinel summary (reuses `cap_hit` kind so `CapHitSentinel.tsx` renders it without frontend changes; text distinguishes "Step limit reached" from "Tool budget exhausted"). Doom-loop check migrated from pre-recursion position to top of loop body — same predicate (`detectDoomLoop`), same threshold (3 identical calls), `break` instead of `return`. `step_start` parts are in the schema CHECK but not emitted as message_parts in v1.14 — writing to the assistant message before the stream phase creates a sequence-0 collision with `partsFromAssistantMessage`; a structured log line is emitted instead. Adversarial review caught the collision pre-deploy. 332/332 server tests passing; no frontend changes. Pairs with `v1.13.20-drop-legacy-cols` (parts is now the sole source of truth, and this batch's loop operates entirely through parts).
## v1.13.20-drop-legacy-cols — 2026-05-23
Final phase of the v1.13.0 strangler-fig migration. Removes the dual-write into `messages.tool_calls` / `messages.tool_results` JSON columns and drops the columns themselves; `message_parts` is now the only source of truth for tool-call and tool-result data. 10 dual-write sites stripped (5 in `tool-phase.ts`, 2 in `routes/skills.ts`, 2 in `routes/messages.ts`, 1 in `routes/chats.ts` fork-clone) — recon's grep-driven inventory caught 2 sites beyond the original v1.13.2 roadmap count. `messages_with_parts` view simplified to parts-only subselects (COALESCE fallbacks gone) and rewritten via `CREATE OR REPLACE VIEW` BEFORE the column DROP since Postgres rejects column-drop on view-referenced cols. Adversarial review caught a runtime bug the green test suite missed: `chats.ts:/api/chats/:id/discard_stale` had a `RETURNING ... tool_calls, tool_results, ...` clause referencing the dropped columns; would have crashed on every 60s-no-token-activity recovery in production. Fixed by switching to two-step UPDATE-then-SELECT-from-view so the response keeps the parts-synthesized fields. `Message` API type retains `tool_calls?` / `tool_results?` fields (override on the original v1.13.2 plan) — the view continues to populate them from parts, so the wire shape is unchanged and the frontend needs no updates. v1.12.1 cleanup block (`DROP CONSTRAINT messages_status_check`/`messages_role_check`) removed — those one-shots have done their work. `tool_cost_stats.test.ts` had a direct `INSERT INTO messages` touching the legacy columns that wasn't in the roadmap's inventory; rewritten to parts-table inserts and confirmed semantically faithful. 339/339 server tests passing including the 7 DB-integration tests (live-DB applied the schema migration and ran the parts-only view end-to-end). Pairs with `v1.13.0-ai-sdk-v6` (which introduced the dual-write) and `v1.13.1-B` (which moved the read path to `messages_with_parts`); umbrella `v1.13` tag ships on the same commit.

View File

@@ -46,7 +46,7 @@ Tests: `pnpm -C apps/server test` runs the vitest suite. No test harness on `app
- **Zod** for request validation and config parsing.
Key services:
- **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (parts-table write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts` — v1.13.20 made parts the sole source of truth), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures.
- **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`, `MAX_STEPS`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase → returns `ToolPhaseResult`; no longer recurses into runAssistantTurn — v1.14.0 converted the recursion to an explicit while loop in turn.ts), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + runStepCapSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (parts-table write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts` — v1.13.20 made parts the sole source of truth), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope populated from loop locals each iteration; reset in `runInference` at user-message boundary. The outer loop in `runAssistantTurn` (v1.14.0) runs `while (stepNumber < effectiveCap)` where `effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS=200)`. Per-agent `steps:` field in AGENTS.md frontmatter. `steps: 0` means text-only (no tool execution). Step-cap hit writes a `cap_hit` sentinel so `CapHitSentinel.tsx` renders it.
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch:
- **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
- **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.

32
apps/coder/Dockerfile Normal file
View File

@@ -0,0 +1,32 @@
# syntax=docker/dockerfile:1.7
FROM node:20-alpine AS builder
RUN corepack enable
WORKDIR /build
COPY package.json pnpm-workspace.yaml pnpm-lock.yaml tsconfig.base.json ./
COPY apps/server/package.json ./apps/server/
COPY apps/coder/package.json ./apps/coder/
RUN pnpm install --frozen-lockfile
# Build server first (coder depends on it via workspace dep for types + inference)
COPY apps/server ./apps/server
RUN pnpm -C apps/server build
COPY apps/coder ./apps/coder
RUN pnpm -C apps/coder build
RUN pnpm deploy --filter=@boocode/coder --prod --legacy /out/coder
FROM node:20-bookworm-slim AS runtime
RUN apt-get update && apt-get install -y --no-install-recommends ripgrep git && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=builder /out/coder ./
ENV NODE_ENV=production
EXPOSE 3000
CMD ["node", "dist/index.js"]

28
apps/coder/package.json Normal file
View File

@@ -0,0 +1,28 @@
{
"name": "@boocode/coder",
"version": "2.0.0",
"private": true,
"type": "module",
"main": "dist/index.js",
"scripts": {
"dev": "tsx watch src/index.ts",
"build": "tsc && node -e \"import('node:fs').then(fs=>fs.copyFileSync('src/schema.sql','dist/schema.sql'))\"",
"start": "node dist/index.js",
"typecheck": "tsc --noEmit",
"test": "vitest run"
},
"dependencies": {
"@boocode/server": "workspace:*",
"@fastify/static": "^7.0.4",
"@fastify/websocket": "^10.0.1",
"fastify": "^4.28.1",
"postgres": "^3.4.4",
"zod": "^3.23.8"
},
"devDependencies": {
"@types/node": "^20.14.10",
"tsx": "^4.16.2",
"typescript": "^5.5.0",
"vitest": "^3.0.0"
}
}

42
apps/coder/src/config.ts Normal file
View File

@@ -0,0 +1,42 @@
import { z } from 'zod';
// BooCoder's config is a superset of the server's Config type so it can be
// passed directly into the inference runner's InferenceContext. Fields the
// inference loop reads: LLAMA_SWAP_URL, PROJECT_ROOT_WHITELIST. The rest
// default to values that satisfy the server's Zod schema without BooCoder
// needing to supply them in its environment.
const ConfigSchema = z.object({
NODE_ENV: z.enum(['development', 'production', 'test']).default('development'),
PORT: z.coerce.number().int().positive().default(3000),
HOST: z.string().default('0.0.0.0'),
DATABASE_URL: z.string().url(),
LLAMA_SWAP_URL: z.string().url(),
PROJECT_ROOT_WHITELIST: z.string().default('/opt'),
BOOTSTRAP_ROOT: z.string().default('/opt/projects'),
DEFAULT_MODEL: z.string().default('qwen3.6-35b-a3b-mxfp4'),
LOG_LEVEL: z.string().default('info'),
CONTAINER_GUIDANCE_FILE: z.string().optional(),
// Fields needed to satisfy the server's Config type but unused by BooCoder:
SEARXNG_URL: z.string().url().default('http://100.114.205.53:8888'),
GITEA_BASE_URL: z.string().url().default('https://git.indifferentketchup.com'),
GITEA_USER: z.string().default('indifferentketchup'),
GITEA_TOKEN: z.string().optional(),
GITEA_SSH_HOST: z.string().default('100.114.205.53:2222'),
MCP_CONFIG_PATH: z.string().optional(),
});
export type Config = z.infer<typeof ConfigSchema>;
let cached: Config | null = null;
export function loadConfig(): Config {
if (cached) return cached;
const parsed = ConfigSchema.safeParse(process.env);
if (!parsed.success) {
console.error('Invalid environment configuration:');
console.error(parsed.error.flatten().fieldErrors);
process.exit(1);
}
cached = parsed.data;
return cached;
}

45
apps/coder/src/db.ts Normal file
View File

@@ -0,0 +1,45 @@
import postgres from 'postgres';
import { readFile } from 'node:fs/promises';
import { fileURLToPath } from 'node:url';
import { dirname, resolve } from 'node:path';
import type { Config } from './config.js';
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
export type Sql = ReturnType<typeof postgres>;
let sqlInstance: Sql | null = null;
export function getSql(config: Config): Sql {
if (sqlInstance) return sqlInstance;
sqlInstance = postgres(config.DATABASE_URL, {
max: 10,
idle_timeout: 30,
connect_timeout: 10,
onnotice: () => {},
});
return sqlInstance;
}
export async function applySchema(sql: Sql): Promise<void> {
const schemaPath = resolve(__dirname, 'schema.sql');
const ddl = await readFile(schemaPath, 'utf8');
await sql.unsafe(ddl);
}
export async function pingDb(sql: Sql): Promise<boolean> {
try {
await sql`SELECT 1`;
return true;
} catch {
return false;
}
}
export async function closeDb(): Promise<void> {
if (sqlInstance) {
await sqlInstance.end({ timeout: 5 });
sqlInstance = null;
}
}

131
apps/coder/src/index.ts Normal file
View File

@@ -0,0 +1,131 @@
import Fastify from 'fastify';
import fastifyWebsocket from '@fastify/websocket';
import { loadConfig } from './config.js';
import { getSql, applySchema, pingDb, closeDb } from './db.js';
// v2.0.0 Phase 2B: workspace dependency on @boocode/server — reuse the
// inference loop, broker, and tool registry without duplication.
import { createInferenceRunner } from '@boocode/server/inference';
import { createBroker } from '@boocode/server/broker';
import { appendMcpTools, ALL_TOOLS } from '@boocode/server/tools';
import type { Config as ServerConfig } from '@boocode/server/config';
import type { WsFrame } from '@boocode/server/ws-frames';
// v2.0.0 Phase 2C: write tools + adapter for BooChat ToolDef compatibility.
import { WRITE_TOOLS } from './services/tools/index.js';
import { adaptWriteTool } from './services/tools/adapter.js';
import { setInferenceContext, clearInferenceContext } from './services/tools/inference_context.js';
// Routes
import { registerMessageRoutes } from './routes/messages.js';
import { registerPendingRoutes } from './routes/pending.js';
import { registerWebSocket } from './routes/ws.js';
async function main() {
const config = loadConfig();
const app = Fastify({
logger: { level: config.LOG_LEVEL },
});
// Allow empty JSON bodies (same pattern as apps/server).
app.removeContentTypeParser(['application/json']);
app.addContentTypeParser('application/json', { parseAs: 'string' }, (_req, body, done) => {
const str = (body as string) ?? '';
if (str.trim().length === 0) {
done(null, {});
return;
}
try {
done(null, JSON.parse(str));
} catch (err) {
done(err as Error, undefined);
}
});
const sql = getSql(config);
await applySchema(sql);
app.log.info('database schema applied');
// Broker: in-memory pub/sub for session + user channel streaming.
const broker = createBroker(app.log);
// --- Tool registry extension ---
// Append BooCoder write tools (adapted to BooChat's ToolDef interface) to
// the shared ALL_TOOLS registry. appendMcpTools re-sorts and rebuilds
// TOOLS_BY_NAME so tool-phase.ts dispatch sees the full set.
const adaptedWriteTools = WRITE_TOOLS.map((t) => adaptWriteTool(t));
appendMcpTools(adaptedWriteTools);
app.log.info(`tool registry: ${ALL_TOOLS.length} tools loaded (${WRITE_TOOLS.length} write tools)`);
// Inference runner: same engine as BooChat, uses ALL_TOOLS (which includes
// the appended write tools) for tool dispatch.
const inference = createInferenceRunner(
{
sql,
config: config as unknown as ServerConfig,
log: app.log,
publish: (sessionId, frame) => {
broker.publishFrame(sessionId, frame as unknown as WsFrame);
},
broker,
},
(user, frame) => {
broker.publishUserFrame(user, frame as unknown as WsFrame);
}
);
// Wrap the inference runner to set/clear the write-tool context around each run.
// The inference runner calls enqueue() which fires asynchronously — we hook
// into the enqueue to set context before the run starts.
const inferenceApi = {
enqueue: (sessionId: string, chatId: string, assistantId: string, user: string) => {
// Set the inference context so write tools can access sql + sessionId.
// The context persists for the duration of the inference run. Since
// BooCoder is single-user and runs one inference at a time per session,
// this module-level state is safe.
setInferenceContext({ sql, sessionId, taskId: null });
inference.enqueue(sessionId, chatId, assistantId, user);
},
cancel: async (sessionId: string, chatId: string) => {
const result = await inference.cancel(sessionId, chatId);
clearInferenceContext();
return result;
},
hasActive: (chatId: string) => inference.hasActive(chatId),
};
// Register WebSocket support
await app.register(fastifyWebsocket);
// Health endpoint
app.get('/api/health', async (_req, reply) => {
const dbOk = await pingDb(sql);
const status = dbOk ? 200 : 503;
return reply.status(status).send({
ok: dbOk,
db: dbOk,
tools: ALL_TOOLS.length,
});
});
// Register routes
registerMessageRoutes(app, sql, broker, inferenceApi);
registerPendingRoutes(app, sql);
registerWebSocket(app, sql, broker);
// Graceful shutdown
const shutdown = async () => {
app.log.info('shutting down');
await app.close();
await closeDb();
process.exit(0);
};
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);
await app.listen({ port: config.PORT, host: config.HOST });
app.log.info(`BooCoder listening on ${config.HOST}:${config.PORT}`);
}
main().catch((err) => {
console.error('fatal:', err);
process.exit(1);
});

View File

@@ -0,0 +1,126 @@
import type { FastifyInstance } from 'fastify';
import { z } from 'zod';
import type { Sql } from '../db.js';
import type { Broker } from '@boocode/server/broker';
import type { WsFrame } from '@boocode/server/ws-frames';
const SendBody = z.object({
content: z.string().min(1).max(64_000),
chat_id: z.string().uuid(),
});
interface InferenceApi {
enqueue: (sessionId: string, chatId: string, assistantId: string, user: string) => void;
cancel: (sessionId: string, chatId: string) => Promise<boolean>;
hasActive: (chatId: string) => boolean;
}
export function registerMessageRoutes(
app: FastifyInstance,
sql: Sql,
broker: Broker,
inference: InferenceApi,
): void {
// POST /api/sessions/:sessionId/messages — send a user message + kick off inference
app.post<{ Params: { sessionId: string } }>(
'/api/sessions/:sessionId/messages',
async (req, reply) => {
const parsed = SendBody.safeParse(req.body);
if (!parsed.success) {
reply.code(400);
return { error: 'invalid body', details: parsed.error.flatten() };
}
const sessionId = req.params.sessionId;
const { content, chat_id: chatId } = parsed.data;
// Validate session exists
const sessionRows = await sql<{ id: string }[]>`
SELECT id FROM sessions WHERE id = ${sessionId}
`;
if (sessionRows.length === 0) {
reply.code(404);
return { error: 'session not found' };
}
// Validate chat belongs to session and is open
const chatRows = await sql<{ id: string; session_id: string }[]>`
SELECT id, session_id FROM chats WHERE id = ${chatId} AND session_id = ${sessionId} AND status = 'open'
`;
if (chatRows.length === 0) {
reply.code(404);
return { error: 'chat not found or not open in this session' };
}
// Reject if inference is already running on this chat
if (inference.hasActive(chatId)) {
reply.code(409);
return { error: 'inference already running on this chat' };
}
// Create user message + streaming assistant row in a transaction
const result = await sql.begin(async (tx) => {
const [userMsg] = await tx<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
VALUES (${sessionId}, ${chatId}, 'user', ${content}, 'complete', clock_timestamp())
RETURNING id
`;
const [assistantMsg] = await tx<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming', clock_timestamp())
RETURNING id
`;
await tx`UPDATE sessions SET updated_at = clock_timestamp() WHERE id = ${sessionId}`;
await tx`UPDATE chats SET updated_at = clock_timestamp() WHERE id = ${chatId}`;
return { user_message_id: userMsg!.id, assistant_message_id: assistantMsg!.id };
});
// Publish user message frames so WS subscribers see it immediately
broker.publishFrame(sessionId, {
type: 'message_started',
message_id: result.user_message_id,
chat_id: chatId,
role: 'user',
} as unknown as WsFrame);
broker.publishFrame(sessionId, {
type: 'delta',
message_id: result.user_message_id,
chat_id: chatId,
content,
} as unknown as WsFrame);
broker.publishFrame(sessionId, {
type: 'message_complete',
message_id: result.user_message_id,
chat_id: chatId,
} as unknown as WsFrame);
// Enqueue inference — the runner will stream assistant deltas via broker
inference.enqueue(sessionId, chatId, result.assistant_message_id, 'default');
reply.code(202);
return result;
},
);
// POST /api/sessions/:sessionId/stop — cancel active inference
app.post<{ Params: { sessionId: string } }>(
'/api/sessions/:sessionId/stop',
async (req, reply) => {
const sessionId = req.params.sessionId;
// Find active chats in this session
const chats = await sql<{ id: string }[]>`
SELECT id FROM chats WHERE session_id = ${sessionId} AND status = 'open'
`;
let cancelled = false;
for (const chat of chats) {
if (inference.hasActive(chat.id)) {
cancelled = await inference.cancel(sessionId, chat.id);
break;
}
}
return { cancelled };
},
);
}

View File

@@ -0,0 +1,121 @@
import type { FastifyInstance } from 'fastify';
import type { Sql } from '../db.js';
import {
listPending,
applyOne,
applyAll,
rejectOne,
rewindOne,
} from '../services/pending_changes.js';
/**
* Resolve project root from a session's project path.
*/
async function resolveProjectRoot(sql: Sql, sessionId: string): Promise<string | null> {
const rows = await sql<{ path: string }[]>`
SELECT p.path FROM sessions s
JOIN projects p ON s.project_id = p.id
WHERE s.id = ${sessionId}
`;
return rows.length > 0 ? rows[0]!.path : null;
}
/**
* Resolve project root from a pending change's session.
*/
async function resolveProjectRootForChange(sql: Sql, changeId: string): Promise<string | null> {
const rows = await sql<{ path: string }[]>`
SELECT p.path FROM pending_changes pc
JOIN sessions s ON pc.session_id = s.id
JOIN projects p ON s.project_id = p.id
WHERE pc.id = ${changeId}
`;
return rows.length > 0 ? rows[0]!.path : null;
}
export function registerPendingRoutes(app: FastifyInstance, sql: Sql): void {
// GET /api/sessions/:sessionId/pending — list pending changes for a session
app.get<{ Params: { sessionId: string } }>(
'/api/sessions/:sessionId/pending',
async (req, reply) => {
const sessionId = req.params.sessionId;
const session = await sql<{ id: string }[]>`SELECT id FROM sessions WHERE id = ${sessionId}`;
if (session.length === 0) {
reply.code(404);
return { error: 'session not found' };
}
const pending = await listPending(sql, sessionId);
return pending;
},
);
// POST /api/sessions/:sessionId/pending/apply — apply all pending changes
app.post<{ Params: { sessionId: string } }>(
'/api/sessions/:sessionId/pending/apply',
async (req, reply) => {
const sessionId = req.params.sessionId;
const projectRoot = await resolveProjectRoot(sql, sessionId);
if (!projectRoot) {
reply.code(404);
return { error: 'session or project not found' };
}
const results = await applyAll(sql, sessionId, projectRoot);
return { results };
},
);
// POST /api/pending/:id/apply — apply a single pending change
app.post<{ Params: { id: string } }>(
'/api/pending/:id/apply',
async (req, reply) => {
const changeId = req.params.id;
const projectRoot = await resolveProjectRootForChange(sql, changeId);
if (!projectRoot) {
reply.code(404);
return { error: 'pending change or project not found' };
}
const result = await applyOne(sql, changeId, projectRoot);
if (!result.success) {
reply.code(422);
}
return result;
},
);
// POST /api/pending/:id/reject — reject a single pending change
app.post<{ Params: { id: string } }>(
'/api/pending/:id/reject',
async (req, reply) => {
const changeId = req.params.id;
await rejectOne(sql, changeId);
return { ok: true };
},
);
// POST /api/pending/:id/rewind — rewind (undo) an applied change
app.post<{ Params: { id: string } }>(
'/api/pending/:id/rewind',
async (req, reply) => {
const changeId = req.params.id;
const projectRoot = await resolveProjectRootForChange(sql, changeId);
if (!projectRoot) {
reply.code(404);
return { error: 'pending change or project not found' };
}
const result = await rewindOne(sql, changeId, projectRoot);
if (!result.success) {
reply.code(422);
}
return result;
},
);
}

View File

@@ -0,0 +1,51 @@
import type { FastifyInstance } from 'fastify';
import type { Sql } from '../db.js';
import type { Broker } from '@boocode/server/broker';
export function registerWebSocket(
app: FastifyInstance,
sql: Sql,
broker: Broker,
): void {
// Per-session streaming WebSocket. Clients connect here to receive live
// inference frames (deltas, tool_calls, tool_results, message_complete).
app.get<{ Params: { sessionId: string } }>(
'/api/ws/sessions/:sessionId',
{ websocket: true },
async (socket, req) => {
const sessionId = req.params.sessionId;
// Validate session exists
const session = await sql<{ id: string }[]>`SELECT id FROM sessions WHERE id = ${sessionId}`;
if (session.length === 0) {
socket.send(JSON.stringify({ type: 'error', error: 'session not found' }));
socket.close(1008, 'session not found');
return;
}
// Send snapshot of existing messages so client can hydrate
const messages = await sql<Record<string, unknown>[]>`
SELECT id, session_id, chat_id, role, content, kind, tool_calls, tool_results, status, last_seq,
tokens_used, ctx_used, ctx_max, started_at, finished_at, created_at, metadata,
summary, tail_start_id, compacted_at
FROM messages_with_parts
WHERE session_id = ${sessionId}
ORDER BY created_at ASC, id ASC
`;
socket.send(JSON.stringify({ type: 'snapshot', messages }));
// Subscribe to broker for live frames
const unsubscribe = broker.subscribe(sessionId, (frame) => {
if (socket.readyState !== socket.OPEN) return;
try {
socket.send(JSON.stringify(frame));
} catch (err) {
app.log.warn({ err, sessionId }, 'ws send failed');
}
});
socket.on('close', () => unsubscribe());
socket.on('error', () => unsubscribe());
},
);
}

48
apps/coder/src/schema.sql Normal file
View File

@@ -0,0 +1,48 @@
-- v2.0.0: BooCoder schema — pending changes, tasks, agent registry.
-- Applied on startup by apps/coder/src/db.ts:applySchema().
-- Lives in the same 'boochat' database as BooChat's tables.
CREATE TABLE IF NOT EXISTS pending_changes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID NOT NULL,
task_id UUID,
file_path TEXT NOT NULL,
operation TEXT NOT NULL,
diff TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
CONSTRAINT pending_changes_operation_chk CHECK (operation IN ('create', 'edit', 'delete')),
CONSTRAINT pending_changes_status_chk CHECK (status IN ('pending', 'applied', 'rejected', 'reverted'))
);
CREATE TABLE IF NOT EXISTS tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL,
parent_task_id UUID REFERENCES tasks(id),
state TEXT NOT NULL DEFAULT 'pending',
input TEXT NOT NULL,
output_summary TEXT,
agent TEXT,
model TEXT,
execution_path TEXT,
worktree_path TEXT,
cost_tokens INTEGER,
started_at TIMESTAMPTZ,
ended_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
CONSTRAINT tasks_state_chk CHECK (state IN ('pending', 'running', 'completed', 'failed', 'blocked', 'cancelled')),
CONSTRAINT tasks_execution_path_chk CHECK (execution_path IS NULL OR execution_path IN ('native', 'acp', 'pty'))
);
CREATE TABLE IF NOT EXISTS available_agents (
name TEXT PRIMARY KEY,
install_path TEXT,
version TEXT,
supports_acp BOOLEAN NOT NULL DEFAULT false,
supports_mcp_client BOOLEAN NOT NULL DEFAULT false,
last_probed_at TIMESTAMPTZ
);
-- Human inbox: tasks needing attention
CREATE OR REPLACE VIEW human_inbox AS
SELECT * FROM tasks WHERE state IN ('blocked', 'failed');

View File

@@ -0,0 +1,115 @@
import { describe, it, expect } from 'vitest';
import { resolveWritePath, isSecretPath, WriteGuardError } from '../write_guard.js';
const PROJECT_ROOT = '/opt/projects/my-app';
describe('resolveWritePath', () => {
it('resolves a relative path correctly', () => {
const result = resolveWritePath(PROJECT_ROOT, 'src/index.ts');
expect(result).toBe('/opt/projects/my-app/src/index.ts');
});
it('resolves nested relative path', () => {
const result = resolveWritePath(PROJECT_ROOT, 'src/lib/utils.ts');
expect(result).toBe('/opt/projects/my-app/src/lib/utils.ts');
});
it('throws on ../ escape', () => {
expect(() => resolveWritePath(PROJECT_ROOT, '../../../etc/passwd')).toThrow(WriteGuardError);
expect(() => resolveWritePath(PROJECT_ROOT, '../../../etc/passwd')).toThrow('path escapes project root');
});
it('throws on absolute path outside project root', () => {
expect(() => resolveWritePath(PROJECT_ROOT, '/etc/shadow')).toThrow(WriteGuardError);
expect(() => resolveWritePath(PROJECT_ROOT, '/tmp/exploit')).toThrow('path escapes project root');
});
it('allows absolute path inside project root', () => {
const result = resolveWritePath(PROJECT_ROOT, '/opt/projects/my-app/src/new.ts');
expect(result).toBe('/opt/projects/my-app/src/new.ts');
});
it('denies .env files', () => {
expect(() => resolveWritePath(PROJECT_ROOT, '.env')).toThrow(WriteGuardError);
expect(() => resolveWritePath(PROJECT_ROOT, '.env')).toThrow('cannot write to secret file');
});
it('denies .env.local', () => {
expect(() => resolveWritePath(PROJECT_ROOT, '.env.local')).toThrow(WriteGuardError);
});
it('denies .env.production', () => {
expect(() => resolveWritePath(PROJECT_ROOT, '.env.production')).toThrow(WriteGuardError);
});
it('denies *.pem files', () => {
expect(() => resolveWritePath(PROJECT_ROOT, 'certs/server.pem')).toThrow(WriteGuardError);
expect(() => resolveWritePath(PROJECT_ROOT, 'certs/server.pem')).toThrow('cannot write to secret file');
});
it('denies *.key files', () => {
expect(() => resolveWritePath(PROJECT_ROOT, 'ssl/private.key')).toThrow(WriteGuardError);
});
it('denies id_rsa', () => {
expect(() => resolveWritePath(PROJECT_ROOT, '.ssh/id_rsa')).toThrow(WriteGuardError);
});
it('denies id_ed25519', () => {
expect(() => resolveWritePath(PROJECT_ROOT, '.ssh/id_ed25519')).toThrow(WriteGuardError);
});
it('denies credentials.json', () => {
expect(() => resolveWritePath(PROJECT_ROOT, 'credentials.json')).toThrow(WriteGuardError);
});
it('passes a normal file inside project', () => {
const result = resolveWritePath(PROJECT_ROOT, 'src/components/Button.tsx');
expect(result).toBe('/opt/projects/my-app/src/components/Button.tsx');
});
it('passes a non-existent nested file (no realpath)', () => {
// This is the key difference from BooChat's pathGuard: no realpath means
// files that don't exist yet still pass validation
const result = resolveWritePath(PROJECT_ROOT, 'src/new-dir/new-file.ts');
expect(result).toBe('/opt/projects/my-app/src/new-dir/new-file.ts');
});
it('throws on null/empty path', () => {
expect(() => resolveWritePath(PROJECT_ROOT, '')).toThrow(WriteGuardError);
expect(() => resolveWritePath(PROJECT_ROOT, '')).toThrow('file path is required');
});
it('normalizes ../ within project root and still allows', () => {
const result = resolveWritePath(PROJECT_ROOT, 'src/../lib/utils.ts');
expect(result).toBe('/opt/projects/my-app/lib/utils.ts');
});
it('rejects path that looks inside root but normalizes outside', () => {
expect(() => resolveWritePath(PROJECT_ROOT, 'src/../../other-project/hack.ts')).toThrow(WriteGuardError);
});
});
describe('isSecretPath', () => {
it('detects .env', () => {
expect(isSecretPath('.env')).toBe(true);
});
it('detects nested .env', () => {
expect(isSecretPath('config/.env')).toBe(true);
});
it('detects *.pfx', () => {
expect(isSecretPath('certs/client.pfx')).toBe(true);
});
it('does not flag normal source files', () => {
expect(isSecretPath('src/index.ts')).toBe(false);
expect(isSecretPath('README.md')).toBe(false);
expect(isSecretPath('package.json')).toBe(false);
});
it('returns false for empty string', () => {
expect(isSecretPath('')).toBe(false);
});
});

View File

@@ -0,0 +1,224 @@
import { readFile, writeFile, unlink, mkdir } from 'node:fs/promises';
import { dirname } from 'node:path';
import type { Sql } from '../db.js';
import { resolveWritePath } from './write_guard.js';
// --- Types -------------------------------------------------------------------
export interface PendingChange {
id: string;
session_id: string;
task_id: string | null;
file_path: string;
operation: 'create' | 'edit' | 'delete';
diff: string;
status: 'pending' | 'applied' | 'rejected' | 'reverted';
created_at: string;
}
export interface ApplyResult {
id: string;
file_path: string;
operation: string;
success: boolean;
error?: string;
}
// --- Queue functions ---------------------------------------------------------
export async function queueEdit(
sql: Sql,
sessionId: string,
taskId: string | null,
filePath: string,
oldString: string,
newString: string,
projectRoot: string,
): Promise<PendingChange> {
const resolved = resolveWritePath(projectRoot, filePath);
const diff = JSON.stringify({ old: oldString, new: newString });
const [row] = await sql<PendingChange[]>`
INSERT INTO pending_changes (session_id, task_id, file_path, operation, diff)
VALUES (${sessionId}, ${taskId}, ${resolved}, 'edit', ${diff})
RETURNING *
`;
return row!;
}
export async function queueCreate(
sql: Sql,
sessionId: string,
taskId: string | null,
filePath: string,
content: string,
projectRoot: string,
): Promise<PendingChange> {
const resolved = resolveWritePath(projectRoot, filePath);
const [row] = await sql<PendingChange[]>`
INSERT INTO pending_changes (session_id, task_id, file_path, operation, diff)
VALUES (${sessionId}, ${taskId}, ${resolved}, 'create', ${content})
RETURNING *
`;
return row!;
}
export async function queueDelete(
sql: Sql,
sessionId: string,
taskId: string | null,
filePath: string,
projectRoot: string,
): Promise<PendingChange> {
const resolved = resolveWritePath(projectRoot, filePath);
const [row] = await sql<PendingChange[]>`
INSERT INTO pending_changes (session_id, task_id, file_path, operation, diff)
VALUES (${sessionId}, ${taskId}, ${resolved}, 'delete', '')
RETURNING *
`;
return row!;
}
// --- Apply functions ---------------------------------------------------------
export async function applyOne(
sql: Sql,
changeId: string,
projectRoot: string,
): Promise<ApplyResult> {
const [change] = await sql<PendingChange[]>`
SELECT * FROM pending_changes WHERE id = ${changeId} AND status = 'pending'
`;
if (!change) {
return { id: changeId, file_path: '', operation: '', success: false, error: 'change not found or not pending' };
}
try {
// Re-validate path in case projectRoot has shifted
resolveWritePath(projectRoot, change.file_path);
switch (change.operation) {
case 'create': {
await mkdir(dirname(change.file_path), { recursive: true });
await writeFile(change.file_path, change.diff, 'utf8');
break;
}
case 'edit': {
const { old: oldStr, new: newStr } = JSON.parse(change.diff) as { old: string; new: string };
const content = await readFile(change.file_path, 'utf8');
if (!content.includes(oldStr)) {
throw new Error('old_string not found in file — file may have changed since the edit was queued');
}
const updated = content.replace(oldStr, newStr);
await writeFile(change.file_path, updated, 'utf8');
break;
}
case 'delete': {
// Stash current content in diff for potential rewind
try {
const existing = await readFile(change.file_path, 'utf8');
await sql`UPDATE pending_changes SET diff = ${existing} WHERE id = ${changeId}`;
} catch {
// File may already be gone — proceed with status update
}
await unlink(change.file_path);
break;
}
}
await sql`UPDATE pending_changes SET status = 'applied' WHERE id = ${changeId}`;
return { id: change.id, file_path: change.file_path, operation: change.operation, success: true };
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
return { id: change.id, file_path: change.file_path, operation: change.operation, success: false, error: message };
}
}
export async function applyAll(
sql: Sql,
sessionId: string,
projectRoot: string,
): Promise<ApplyResult[]> {
const pending = await sql<PendingChange[]>`
SELECT * FROM pending_changes
WHERE session_id = ${sessionId} AND status = 'pending'
ORDER BY created_at ASC
`;
const results: ApplyResult[] = [];
for (const change of pending) {
results.push(await applyOne(sql, change.id, projectRoot));
}
return results;
}
// --- Reject functions --------------------------------------------------------
export async function rejectOne(sql: Sql, changeId: string): Promise<void> {
await sql`UPDATE pending_changes SET status = 'rejected' WHERE id = ${changeId} AND status = 'pending'`;
}
export async function rejectAll(sql: Sql, sessionId: string): Promise<void> {
await sql`UPDATE pending_changes SET status = 'rejected' WHERE session_id = ${sessionId} AND status = 'pending'`;
}
// --- Rewind functions --------------------------------------------------------
export async function rewindOne(
sql: Sql,
changeId: string,
projectRoot: string,
): Promise<ApplyResult> {
const [change] = await sql<PendingChange[]>`
SELECT * FROM pending_changes WHERE id = ${changeId} AND status = 'applied'
`;
if (!change) {
return { id: changeId, file_path: '', operation: '', success: false, error: 'change not found or not applied' };
}
try {
resolveWritePath(projectRoot, change.file_path);
switch (change.operation) {
case 'create': {
// Reverse a create: delete the file
await unlink(change.file_path);
break;
}
case 'edit': {
// Reverse an edit: swap old and new
const { old: oldStr, new: newStr } = JSON.parse(change.diff) as { old: string; new: string };
const content = await readFile(change.file_path, 'utf8');
if (!content.includes(newStr)) {
throw new Error('new_string not found in file — cannot rewind; file may have been modified since apply');
}
const reverted = content.replace(newStr, oldStr);
await writeFile(change.file_path, reverted, 'utf8');
break;
}
case 'delete': {
// Reverse a delete: recreate the file (diff holds the original content stashed at apply time)
await mkdir(dirname(change.file_path), { recursive: true });
await writeFile(change.file_path, change.diff, 'utf8');
break;
}
}
await sql`UPDATE pending_changes SET status = 'reverted' WHERE id = ${changeId}`;
return { id: change.id, file_path: change.file_path, operation: change.operation, success: true };
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
return { id: change.id, file_path: change.file_path, operation: change.operation, success: false, error: message };
}
}
// --- Query functions ---------------------------------------------------------
export async function listPending(sql: Sql, sessionId: string): Promise<PendingChange[]> {
return sql<PendingChange[]>`
SELECT * FROM pending_changes
WHERE session_id = ${sessionId} AND status = 'pending'
ORDER BY created_at ASC
`;
}

View File

@@ -0,0 +1,30 @@
/**
* Adapts BooCoder write tools (which take ToolContext) into BooChat's ToolDef
* interface (which takes `projectRoot, extraRoots?`).
*
* The adapter reads the module-level inference context at execute time, so the
* wrapping happens at boot (static) — no per-inference re-wrap needed.
*/
import type { ToolDef as ServerToolDef } from '@boocode/server/tools';
import type { ToolDef, ToolContext } from './types.js';
import { getInferenceContext } from './inference_context.js';
/**
* Wrap a BooCoder write tool (execute takes ToolContext) into a BooChat
* ToolDef (execute takes projectRoot + optional extraRoots). The adapter
* builds the ToolContext from the module-level inference context at call time.
*/
// eslint-disable-next-line @typescript-eslint/no-explicit-any
export function adaptWriteTool(tool: ToolDef<any>): ServerToolDef<any> {
return {
name: tool.name,
description: tool.description,
inputSchema: tool.inputSchema,
jsonSchema: tool.jsonSchema,
async execute(input: unknown, projectRoot: string, _extraRoots?: readonly string[]): Promise<unknown> {
const ctx: ToolContext = getInferenceContext();
return tool.execute(input, projectRoot, ctx);
},
};
}

View File

@@ -0,0 +1,44 @@
import { z } from 'zod';
import type { ToolDef, ToolContext } from './types.js';
import { applyAll } from '../pending_changes.js';
const ApplyPendingInput = z.object({});
type ApplyPendingInputT = z.infer<typeof ApplyPendingInput>;
export const applyPendingTool: ToolDef<ApplyPendingInputT> = {
name: 'apply_pending',
description:
'Apply all pending changes for the current session to disk. ' +
'Each queued create/edit/delete is executed in order.',
inputSchema: ApplyPendingInput,
jsonSchema: {
type: 'function',
function: {
name: 'apply_pending',
description:
'Apply all pending changes for the current session to disk. ' +
'Each queued create/edit/delete is executed in order.',
parameters: {
type: 'object',
properties: {},
required: [],
},
},
},
async execute(_input: ApplyPendingInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
const results = await applyAll(context.sql, context.sessionId, projectRoot);
const succeeded = results.filter((r) => r.success).length;
const failed = results.filter((r) => !r.success).length;
return {
total: results.length,
succeeded,
failed,
results,
message:
results.length === 0
? 'No pending changes to apply.'
: `Applied ${succeeded}/${results.length} changes.${failed > 0 ? ` ${failed} failed.` : ''}`,
};
},
};

View File

@@ -0,0 +1,51 @@
import { z } from 'zod';
import type { ToolDef, ToolContext } from './types.js';
import { queueCreate } from '../pending_changes.js';
const CreateFileInput = z.object({
file_path: z.string().min(1),
content: z.string(),
});
type CreateFileInputT = z.infer<typeof CreateFileInput>;
export const createFileTool: ToolDef<CreateFileInputT> = {
name: 'create_file',
description:
'Queue creation of a new file with the given content. ' +
'The change is staged in pending_changes and must be applied explicitly.',
inputSchema: CreateFileInput,
jsonSchema: {
type: 'function',
function: {
name: 'create_file',
description:
'Queue creation of a new file with the given content. ' +
'The change is staged in pending_changes and must be applied explicitly.',
parameters: {
type: 'object',
properties: {
file_path: { type: 'string', description: 'Path for the new file (relative to project root or absolute)' },
content: { type: 'string', description: 'Full content of the file to create' },
},
required: ['file_path', 'content'],
},
},
},
async execute(input: CreateFileInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
const change = await queueCreate(
context.sql,
context.sessionId,
context.taskId,
input.file_path,
input.content,
projectRoot,
);
return {
status: 'queued',
change_id: change.id,
file_path: change.file_path,
operation: 'create',
message: `File creation queued: ${change.file_path}. Use apply_pending to write changes to disk.`,
};
},
};

View File

@@ -0,0 +1,48 @@
import { z } from 'zod';
import type { ToolDef, ToolContext } from './types.js';
import { queueDelete } from '../pending_changes.js';
const DeleteFileInput = z.object({
file_path: z.string().min(1),
});
type DeleteFileInputT = z.infer<typeof DeleteFileInput>;
export const deleteFileTool: ToolDef<DeleteFileInputT> = {
name: 'delete_file',
description:
'Queue deletion of a file. ' +
'The change is staged in pending_changes and must be applied explicitly.',
inputSchema: DeleteFileInput,
jsonSchema: {
type: 'function',
function: {
name: 'delete_file',
description:
'Queue deletion of a file. ' +
'The change is staged in pending_changes and must be applied explicitly.',
parameters: {
type: 'object',
properties: {
file_path: { type: 'string', description: 'Path to the file to delete (relative to project root or absolute)' },
},
required: ['file_path'],
},
},
},
async execute(input: DeleteFileInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
const change = await queueDelete(
context.sql,
context.sessionId,
context.taskId,
input.file_path,
projectRoot,
);
return {
status: 'queued',
change_id: change.id,
file_path: change.file_path,
operation: 'delete',
message: `File deletion queued: ${change.file_path}. Use apply_pending to write changes to disk.`,
};
},
};

View File

@@ -0,0 +1,54 @@
import { z } from 'zod';
import type { ToolDef, ToolContext } from './types.js';
import { queueEdit } from '../pending_changes.js';
const EditFileInput = z.object({
file_path: z.string().min(1),
old_string: z.string().min(1),
new_string: z.string(),
});
type EditFileInputT = z.infer<typeof EditFileInput>;
export const editFileTool: ToolDef<EditFileInputT> = {
name: 'edit_file',
description:
'Queue an edit to a file. The edit replaces old_string with new_string. ' +
'The change is staged in pending_changes and must be applied explicitly.',
inputSchema: EditFileInput,
jsonSchema: {
type: 'function',
function: {
name: 'edit_file',
description:
'Queue an edit to a file. The edit replaces old_string with new_string. ' +
'The change is staged in pending_changes and must be applied explicitly.',
parameters: {
type: 'object',
properties: {
file_path: { type: 'string', description: 'Path to the file to edit (relative to project root or absolute)' },
old_string: { type: 'string', description: 'The exact string to find and replace (must appear in the file)' },
new_string: { type: 'string', description: 'The replacement string' },
},
required: ['file_path', 'old_string', 'new_string'],
},
},
},
async execute(input: EditFileInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
const change = await queueEdit(
context.sql,
context.sessionId,
context.taskId,
input.file_path,
input.old_string,
input.new_string,
projectRoot,
);
return {
status: 'queued',
change_id: change.id,
file_path: change.file_path,
operation: 'edit',
message: `Edit queued for ${change.file_path}. Use apply_pending to write changes to disk.`,
};
},
};

View File

@@ -0,0 +1,26 @@
import type { ToolDef } from './types.js';
import { editFileTool } from './edit_file.js';
import { createFileTool } from './create_file.js';
import { deleteFileTool } from './delete_file.js';
import { applyPendingTool } from './apply_pending.js';
import { rewindTool } from './rewind.js';
export type { ToolDef, ToolContext, ToolJsonSchema } from './types.js';
// All BooCoder write tools. The inference loop (Phase 2B) will combine these
// with BooChat's read-only tools to form the full tool set available to agents.
// eslint-disable-next-line @typescript-eslint/no-explicit-any
export const WRITE_TOOLS: readonly ToolDef<any>[] = [
applyPendingTool,
createFileTool,
deleteFileTool,
editFileTool,
rewindTool,
];
// eslint-disable-next-line @typescript-eslint/no-explicit-any
export const WRITE_TOOLS_BY_NAME: ReadonlyMap<string, ToolDef<any>> = new Map(
WRITE_TOOLS.map((t) => [t.name, t]),
);
export { editFileTool, createFileTool, deleteFileTool, applyPendingTool, rewindTool };

View File

@@ -0,0 +1,36 @@
import type { Sql } from '../../db.js';
/**
* Module-level inference context for write tools.
*
* Set via `setInferenceContext()` before each inference run starts.
* Write tools read it via `getInferenceContext()` during execute.
* Same pattern as BooChat's `loadConfig()` singleton — tools need
* ambient state that can't be threaded through the tool-phase execute
* signature (which is `execute(input, projectRoot, extraRoots?)`).
*/
export interface InferenceContext {
sql: Sql;
sessionId: string;
taskId: string | null;
}
let current: InferenceContext | null = null;
export function setInferenceContext(ctx: InferenceContext): void {
current = ctx;
}
export function clearInferenceContext(): void {
current = null;
}
export function getInferenceContext(): InferenceContext {
if (!current) {
throw new Error(
'Write tool called outside inference context — setInferenceContext() was not called before this run',
);
}
return current;
}

View File

@@ -0,0 +1,71 @@
import { z } from 'zod';
import type { ToolDef, ToolContext } from './types.js';
import { rewindOne } from '../pending_changes.js';
const RewindInput = z.object({
change_id: z.string().uuid().optional(),
all: z.boolean().optional(),
});
type RewindInputT = z.infer<typeof RewindInput>;
export const rewindTool: ToolDef<RewindInputT> = {
name: 'rewind',
description:
'Revert applied changes. Provide change_id to revert a specific change, ' +
'or set all=true to revert all applied changes for the session (in reverse order).',
inputSchema: RewindInput,
jsonSchema: {
type: 'function',
function: {
name: 'rewind',
description:
'Revert applied changes. Provide change_id to revert a specific change, ' +
'or set all=true to revert all applied changes for the session (in reverse order).',
parameters: {
type: 'object',
properties: {
change_id: { type: 'string', format: 'uuid', description: 'ID of a specific change to revert' },
all: { type: 'boolean', description: 'If true, revert all applied changes for this session' },
},
required: [],
},
},
},
async execute(input: RewindInputT, projectRoot: string, context: ToolContext): Promise<unknown> {
if (input.change_id) {
const result = await rewindOne(context.sql, input.change_id, projectRoot);
return {
results: [result],
message: result.success
? `Reverted change ${input.change_id} (${result.operation} on ${result.file_path}).`
: `Failed to revert: ${result.error}`,
};
}
if (input.all) {
// Rewind all applied changes for this session in reverse order
const applied = await context.sql<{ id: string }[]>`
SELECT id FROM pending_changes
WHERE session_id = ${context.sessionId} AND status = 'applied'
ORDER BY created_at DESC
`;
const results = [];
for (const row of applied) {
results.push(await rewindOne(context.sql, row.id, projectRoot));
}
const succeeded = results.filter((r) => r.success).length;
return {
total: results.length,
succeeded,
failed: results.length - succeeded,
results,
message:
results.length === 0
? 'No applied changes to revert.'
: `Reverted ${succeeded}/${results.length} changes.`,
};
}
return { error: 'Provide either change_id or all=true.' };
},
};

View File

@@ -0,0 +1,32 @@
import type { z } from 'zod';
import type { Sql } from '../../db.js';
export interface ToolJsonSchema {
type: 'function';
function: {
name: string;
description: string;
parameters: Record<string, unknown>;
};
}
/**
* Context passed to BooCoder tool execute functions.
*
* Unlike BooChat's tools (which only need projectRoot), BooCoder's write tools
* interact with the database (pending_changes table) and need session/task
* context for proper attribution.
*/
export interface ToolContext {
sql: Sql;
sessionId: string;
taskId: string | null;
}
export interface ToolDef<TInput> {
name: string;
description: string;
inputSchema: z.ZodType<TInput>;
jsonSchema: ToolJsonSchema;
execute(input: TInput, projectRoot: string, context: ToolContext): Promise<unknown>;
}

View File

@@ -0,0 +1,73 @@
import { resolve, sep } from 'node:path';
export class WriteGuardError extends Error {
constructor(message: string) {
super(message);
this.name = 'WriteGuardError';
}
}
// Deny list: files that should never be written regardless of path-guard.
// Subset of BooChat's secret_guard.ts — covers the most dangerous patterns.
// Full parity with BooChat's deny list is not needed for write-guard because
// the write tools are intentional (model chose to create/edit); we block only
// files that are unambiguously secrets.
const SECRET_PATTERNS: readonly string[] = [
'.env',
'.env.local',
'.env.production',
'.env.development',
'.env.staging',
'id_rsa',
'id_dsa',
'id_ecdsa',
'id_ed25519',
'*.pem',
'*.key',
'*.p12',
'*.pfx',
'*.crt',
'credentials.json',
'*.kdbx',
'.netrc',
];
export function isSecretPath(filePath: string): boolean {
const normalized = filePath.replace(/\\/g, '/');
const segments = normalized.split('/').filter((s) => s.length > 0);
if (segments.length === 0) return false;
const basename = segments[segments.length - 1]!;
return SECRET_PATTERNS.some((pattern) => {
if (pattern.startsWith('*')) {
return basename.endsWith(pattern.slice(1));
}
return basename === pattern;
});
}
/**
* Resolve and validate a write target path.
*
* Key difference from BooChat's pathGuard: no realpath() — the file may not
* exist yet (creates). Uses resolve() to normalize ../ segments and then
* checks the result stays within projectRoot.
*/
export function resolveWritePath(projectRoot: string, filePath: string): string {
if (!filePath || filePath.length === 0) {
throw new WriteGuardError('file path is required');
}
const candidate = filePath.startsWith('/') ? filePath : resolve(projectRoot, filePath);
const normalized = resolve(candidate); // normalizes ../ segments
if (!normalized.startsWith(projectRoot + sep) && normalized !== projectRoot) {
throw new WriteGuardError(`path escapes project root: ${filePath}`);
}
if (isSecretPath(normalized)) {
throw new WriteGuardError(`cannot write to secret file: ${filePath}`);
}
return normalized;
}

15
apps/coder/tsconfig.json Normal file
View File

@@ -0,0 +1,15 @@
{
"extends": "../../tsconfig.base.json",
"compilerOptions": {
"module": "NodeNext",
"moduleResolution": "NodeNext",
"outDir": "dist",
"rootDir": "src",
"lib": ["ES2022"],
"types": ["node"],
"declaration": false,
"sourceMap": true
},
"include": ["src/**/*"],
"exclude": ["src/**/__tests__/**", "**/*.test.ts"]
}

View File

@@ -0,0 +1,9 @@
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
environment: 'node',
globals: false,
include: ['src/**/__tests__/**/*.test.ts'],
},
});

View File

@@ -4,6 +4,23 @@
"private": true,
"type": "module",
"main": "dist/index.js",
"exports": {
".": { "types": "./dist/index.d.ts", "default": "./dist/index.js" },
"./inference": { "types": "./dist/services/inference/index.d.ts", "default": "./dist/services/inference/index.js" },
"./tools": { "types": "./dist/services/tools.d.ts", "default": "./dist/services/tools.js" },
"./broker": { "types": "./dist/services/broker.d.ts", "default": "./dist/services/broker.js" },
"./compaction": { "types": "./dist/services/compaction.d.ts", "default": "./dist/services/compaction.js" },
"./model-context": { "types": "./dist/services/model-context.d.ts", "default": "./dist/services/model-context.js" },
"./system-prompt": { "types": "./dist/services/system-prompt.d.ts", "default": "./dist/services/system-prompt.js" },
"./agents": { "types": "./dist/services/agents.d.ts", "default": "./dist/services/agents.js" },
"./truncate": { "types": "./dist/services/truncate.d.ts", "default": "./dist/services/truncate.js" },
"./path-guard": { "types": "./dist/services/path_guard.d.ts", "default": "./dist/services/path_guard.js" },
"./file-ops": { "types": "./dist/services/file_ops.d.ts", "default": "./dist/services/file_ops.js" },
"./types": { "types": "./dist/types/api.d.ts", "default": "./dist/types/api.js" },
"./ws-frames": { "types": "./dist/types/ws-frames.d.ts", "default": "./dist/types/ws-frames.js" },
"./db": { "types": "./dist/db.d.ts", "default": "./dist/db.js" },
"./config": { "types": "./dist/config.d.ts", "default": "./dist/config.js" }
},
"scripts": {
"dev": "tsx watch src/index.ts",
"build": "tsc && node -e \"import('node:fs').then(fs=>fs.copyFileSync('src/schema.sql','dist/schema.sql'))\"",
@@ -14,6 +31,7 @@
"@ai-sdk/openai-compatible": "^2.0.47",
"@fastify/static": "^7.0.4",
"@fastify/websocket": "^10.0.1",
"@modelcontextprotocol/sdk": "^1.29.0",
"ai": "^6.0.190",
"fastify": "^4.28.1",
"postgres": "^3.4.4",

View File

@@ -19,6 +19,9 @@ const ConfigSchema = z.object({
GITEA_USER: z.string().default('indifferentketchup'),
GITEA_TOKEN: z.string().optional(),
GITEA_SSH_HOST: z.string().default('100.114.205.53:2222'),
// v1.15.0-mcp-multi: path to the MCP config JSON file. Default /data/mcp.json
// (bind-mounted alongside AGENTS.md). File missing = no MCP (opt-in).
MCP_CONFIG_PATH: z.string().optional(),
});
export type Config = z.infer<typeof ConfigSchema>;

View File

@@ -24,6 +24,10 @@ import { listSkills } from './services/skills.js';
import * as compaction from './services/compaction.js';
import { configureModelContext } from './services/model-context.js';
import { cleanupTruncations } from './services/truncate.js';
import { loadMcpConfig } from './services/mcp-config.js';
import { initialize as initMcp, getTools as getMcpTools, shutdown as shutdownMcp } from './services/mcp-client.js';
import { appendMcpTools } from './services/tools.js';
import { refreshToolNames } from './services/agents.js';
async function main() {
const config = loadConfig();
@@ -69,6 +73,23 @@ async function main() {
// default_generation_settings.n_ctx — the value persisted as messages.ctx_max.
configureModelContext({ llamaSwapUrl: config.LLAMA_SWAP_URL });
// v1.15.0-mcp-multi: read MCP config file and connect to all enabled servers.
// Runs before route registration so the tool list is complete when the first
// inference request arrives. Per-server graceful degradation: one failing
// server doesn't block others.
const mcpConfigPath = config.MCP_CONFIG_PATH ?? '/data/mcp.json';
const mcpServers = loadMcpConfig(mcpConfigPath, app.log);
if (mcpServers.length > 0) {
await initMcp(mcpServers, app.log);
const mcpTools = getMcpTools();
if (mcpTools.length > 0) {
appendMcpTools(mcpTools);
refreshToolNames();
app.log.info({ servers: mcpServers.length, tools: mcpTools.length }, 'mcp: registered');
}
}
app.addHook('onClose', async () => { await shutdownMcp(); });
await app.register(fastifyWebsocket);
app.get('/api/health', async () => {

View File

@@ -0,0 +1,169 @@
/**
* v1.15.0-mcp-multi: unit tests for the multi-server MCP client.
* Pure unit tests — no live MCP server needed. Tests tool-wrapping,
* read-only guard, name prefixing, content extraction, and error handling.
* Multi-server routing tested via wrapMcpTool's server-name prefix.
*/
import { describe, it, expect } from 'vitest';
import { wrapMcpTool, extractContent, isToolReadOnly } from '../mcp-client.js';
describe('mcp-client', () => {
describe('wrapMcpTool — multi-server prefixing', () => {
it('produces a ToolDef with <serverName>_ prefix', () => {
const mcpTool = {
name: 'resolve-library-id',
description: 'Resolve a library identifier',
inputSchema: {
type: 'object' as const,
properties: { query: { type: 'string' } },
required: ['query'],
},
};
const wrapped = wrapMcpTool('context7', mcpTool);
expect(wrapped.name).toBe('context7_resolve-library-id');
expect(wrapped.description).toBe('Resolve a library identifier');
expect(wrapped.jsonSchema.type).toBe('function');
expect(wrapped.jsonSchema.function.name).toBe('context7_resolve-library-id');
expect(wrapped.jsonSchema.function.parameters).toEqual(mcpTool.inputSchema);
expect(typeof wrapped.execute).toBe('function');
});
it('prefixes tools from different servers correctly', () => {
const toolA = {
name: 'query-docs',
description: 'Query docs',
inputSchema: { type: 'object' as const, properties: {} },
};
const toolB = {
name: 'overview',
description: 'Get overview',
inputSchema: { type: 'object' as const, properties: {} },
};
const wrappedA = wrapMcpTool('context7', toolA);
const wrappedB = wrapMcpTool('codecontext', toolB);
expect(wrappedA.name).toBe('context7_query-docs');
expect(wrappedB.name).toBe('codecontext_overview');
});
it('multi-server: two servers with 2 tools each produce 4 prefixed tools', () => {
const serverATools = [
{ name: 'query-docs', inputSchema: { type: 'object' as const, properties: {} } },
{ name: 'resolve-library-id', inputSchema: { type: 'object' as const, properties: {} } },
];
const serverBTools = [
{ name: 'overview', inputSchema: { type: 'object' as const, properties: {} } },
{ name: 'search', inputSchema: { type: 'object' as const, properties: {} } },
];
const allWrapped = [
...serverATools.map((t) => wrapMcpTool('context7', t)),
...serverBTools.map((t) => wrapMcpTool('codecontext', t)),
];
expect(allWrapped).toHaveLength(4);
expect(allWrapped.map((t) => t.name)).toEqual([
'context7_query-docs',
'context7_resolve-library-id',
'codecontext_overview',
'codecontext_search',
]);
});
it('defaults description to empty string when absent', () => {
const mcpTool = {
name: 'no-desc',
inputSchema: { type: 'object' as const, properties: {} },
};
const wrapped = wrapMcpTool('myserver', mcpTool);
expect(wrapped.description).toBe('');
expect(wrapped.jsonSchema.function.description).toBe('');
});
it('uses passthrough Zod schema (z.record)', () => {
const mcpTool = {
name: 'test',
inputSchema: { type: 'object' as const, properties: {} },
};
const wrapped = wrapMcpTool('s', mcpTool);
const result = wrapped.inputSchema.safeParse({ foo: 'bar', baz: 123 });
expect(result.success).toBe(true);
});
});
describe('isToolReadOnly', () => {
it('accepts tools with readOnlyHint: true', () => {
expect(isToolReadOnly({ readOnlyHint: true })).toBe(true);
});
it('accepts tools with no annotations', () => {
expect(isToolReadOnly(undefined)).toBe(true);
});
it('accepts tools with empty annotations', () => {
expect(isToolReadOnly({})).toBe(true);
});
it('rejects tools with readOnlyHint: false', () => {
expect(isToolReadOnly({ readOnlyHint: false })).toBe(false);
});
it('accepts tools with only destructiveHint set', () => {
expect(isToolReadOnly({ destructiveHint: true })).toBe(true);
});
});
describe('extractContent', () => {
it('extracts single text block', () => {
const content = [{ type: 'text', text: 'hello world' }];
expect(extractContent(content)).toBe('hello world');
});
it('joins multiple text blocks with newline', () => {
const content = [
{ type: 'text', text: 'line 1' },
{ type: 'text', text: 'line 2' },
];
expect(extractContent(content)).toBe('line 1\nline 2');
});
it('returns "(no output)" for empty content', () => {
expect(extractContent([])).toBe('(no output)');
});
it('returns "(no output)" for undefined content', () => {
expect(extractContent(undefined)).toBe('(no output)');
});
it('serializes non-text blocks as JSON', () => {
const content = [
{ type: 'resource', uri: 'file:///foo', mimeType: 'text/plain' },
];
const result = extractContent(content);
expect(result).toContain('"type":"resource"');
expect(result).toContain('"uri":"file:///foo"');
});
it('returns error shape when isError is true', () => {
const content = [{ type: 'text', text: 'something failed' }];
const result = extractContent(content, true);
expect(result).toEqual({ error: true, output: 'something failed' });
});
it('returns error shape with joined content on isError', () => {
const content = [
{ type: 'text', text: 'error 1' },
{ type: 'text', text: 'error 2' },
];
const result = extractContent(content, true);
expect(result).toEqual({ error: true, output: 'error 1\nerror 2' });
});
});
});

View File

@@ -0,0 +1,82 @@
/**
* v1.15.0-mcp-multi: unit tests for matchToolGlob.
*/
import { describe, it, expect } from 'vitest';
import { matchToolGlob } from '../agents.js';
describe('matchToolGlob', () => {
it('exact match: "grep" matches "grep"', () => {
expect(matchToolGlob('grep', ['grep'])).toBe(true);
});
it('exact match: "grep" does not match "grep2"', () => {
expect(matchToolGlob('grep2', ['grep'])).toBe(false);
});
it('exact match: multiple tools', () => {
expect(matchToolGlob('grep', ['grep', 'view_file'])).toBe(true);
expect(matchToolGlob('view_file', ['grep', 'view_file'])).toBe(true);
expect(matchToolGlob('find_files', ['grep', 'view_file'])).toBe(false);
});
it('wildcard: "context7_*" matches "context7_query-docs"', () => {
expect(matchToolGlob('context7_query-docs', ['context7_*'])).toBe(true);
});
it('wildcard: "context7_*" matches "context7_resolve-library-id"', () => {
expect(matchToolGlob('context7_resolve-library-id', ['context7_*'])).toBe(true);
});
it('wildcard: "context7_*" does not match "codecontext_overview"', () => {
expect(matchToolGlob('codecontext_overview', ['context7_*'])).toBe(false);
});
it('wildcard: "view_*" matches "view_file" and "view_truncated_output"', () => {
expect(matchToolGlob('view_file', ['view_*'])).toBe(true);
expect(matchToolGlob('view_truncated_output', ['view_*'])).toBe(true);
});
it('wildcard: "*" matches everything', () => {
expect(matchToolGlob('anything', ['*'])).toBe(true);
expect(matchToolGlob('context7_query-docs', ['*'])).toBe(true);
});
it('deny: "!web_*" excludes "web_search"', () => {
// With only a deny rule and no prior match, the tool is not matched
expect(matchToolGlob('web_search', ['!web_*'])).toBe(false);
});
it('last-match-wins: ["*", "!web_*"] excludes web tools, includes others', () => {
expect(matchToolGlob('web_search', ['*', '!web_*'])).toBe(false);
expect(matchToolGlob('web_fetch', ['*', '!web_*'])).toBe(false);
expect(matchToolGlob('grep', ['*', '!web_*'])).toBe(true);
expect(matchToolGlob('context7_query-docs', ['*', '!web_*'])).toBe(true);
});
it('last-match-wins: deny then re-allow', () => {
// ["!web_*", "web_search"] — deny all web, then re-allow web_search
expect(matchToolGlob('web_search', ['!web_*', 'web_search'])).toBe(true);
expect(matchToolGlob('web_fetch', ['!web_*', 'web_fetch'])).toBe(true);
});
it('empty patterns: nothing matches', () => {
expect(matchToolGlob('grep', [])).toBe(false);
expect(matchToolGlob('anything', [])).toBe(false);
});
it('no-glob fallback: exact-match only, same as pre-v1.15', () => {
const patterns = ['grep', 'view_file'];
expect(matchToolGlob('grep', patterns)).toBe(true);
expect(matchToolGlob('view_file', patterns)).toBe(true);
expect(matchToolGlob('find_files', patterns)).toBe(false);
expect(matchToolGlob('web_search', patterns)).toBe(false);
});
it('mixed glob and exact patterns', () => {
const patterns = ['grep', 'context7_*', '!context7_dangerous'];
expect(matchToolGlob('grep', patterns)).toBe(true);
expect(matchToolGlob('context7_query-docs', patterns)).toBe(true);
expect(matchToolGlob('context7_dangerous', patterns)).toBe(false);
expect(matchToolGlob('view_file', patterns)).toBe(false);
});
});

View File

@@ -16,10 +16,62 @@ const CACHE_TTL_MS = 60_000;
// hand-maintained list drifted (web_search/web_fetch from v1.11.8 + the 8
// codecontext tools were missing), silently filtering valid tool names out
// of agents that opted in. Single source of truth is tools.ts now.
const ALL_TOOL_NAMES: readonly string[] = ALL_TOOLS.map((t) => t.name);
const DEFAULT_TOOLS: string[] = [...ALL_TOOL_NAMES];
let ALL_TOOL_NAMES: readonly string[] = ALL_TOOLS.map((t) => t.name);
let DEFAULT_TOOLS: string[] = [...ALL_TOOL_NAMES];
export function refreshToolNames(): void {
ALL_TOOL_NAMES = ALL_TOOLS.map((t) => t.name);
DEFAULT_TOOLS = [...ALL_TOOL_NAMES];
}
const DEFAULT_TEMPERATURE = 0.7;
// ---- Tool glob matching (v1.15.0-mcp-multi) --------------------------------
/**
* Simple glob match for tool names. Supports `*` as a wildcard for any
* characters. No `?` or `**` — tool names are flat (no path separators).
*/
function simpleGlobMatch(str: string, pattern: string): boolean {
if (pattern === '*') return true;
if (!pattern.includes('*')) return str === pattern;
// Escape regex metacharacters, then replace escaped \* with .*
const regex = new RegExp(
'^' + pattern.replace(/[.+?^${}()|[\]\\]/g, '\\$&').replace(/\*/g, '.*') + '$',
);
return regex.test(str);
}
/**
* Check if a tool name matches a set of glob patterns. Last-match-wins.
* Patterns starting with `!` are deny rules.
*
* Examples:
* - `["grep", "view_file"]` — exact-match whitelist (same as pre-v1.15)
* - `["context7_*"]` — all tools from the context7 MCP server
* - `["*", "!web_*"]` — all tools except web tools
* - `[]` — nothing matches (agent gets no tools)
*/
export function matchToolGlob(toolName: string, patterns: string[]): boolean {
let matched = false;
for (const pattern of patterns) {
const deny = pattern.startsWith('!');
const glob = deny ? pattern.slice(1) : pattern;
if (simpleGlobMatch(toolName, glob)) {
matched = !deny;
}
}
return matched;
}
/**
* Returns true if a tools: entry is a glob pattern (contains * or starts
* with !). Glob patterns can't be validated against the current tool list
* since MCP tools are discovered at runtime.
*/
function isGlobPattern(entry: string): boolean {
return entry.includes('*') || entry.startsWith('!');
}
export function slugify(name: string): string {
return name
.toLowerCase()
@@ -37,6 +89,10 @@ interface ParsedFrontmatter {
// v1.8.2: optional per-agent tool-loop budget. Absent → inference resolves
// from the agent's toolset at runtime.
max_tool_calls?: number;
// v1.14.0: optional per-agent step cap. Absent → bounded only by MAX_STEPS
// (200) in the outer loop. Integer ≥ 0; steps: 0 means "no tool calls
// allowed" — the model responds text-only.
steps?: number;
}
function stripQuotes(s: string): string {
@@ -112,6 +168,21 @@ function parseFrontmatter(yaml: string): { data: ParsedFrontmatter; errors: stri
} else {
errors.push(`max_tool_calls must be an integer 1-100 (got "${valueRaw}")`);
}
} else if (key === 'steps') {
// v1.14.0: per-agent step cap for the outer inference loop. Integer ≥ 0.
// steps: 0 means "no tool calls allowed" — model responds text-only.
// Non-integer or negative values are warned and ignored (falls back to
// MAX_STEPS ceiling), matching the max_tool_calls pattern above.
const n = Number(valueRaw);
if (Number.isInteger(n) && n >= 0) {
data.steps = n;
} else if (Number.isInteger(n)) {
console.warn(
`agents: steps ${n} is negative, ignoring (falling back to default)`,
);
} else {
errors.push(`steps must be a non-negative integer (got "${valueRaw}")`);
}
}
// Unknown keys silently ignored — forward-compat.
}
@@ -188,10 +259,14 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
// v1.13.15-tools: intersect with BOOCODE_TOOLS tier (ceiling, not expansion).
// Unset → resolveToolTier returns ALL tool names → no narrowing.
// v1.15.0-mcp-multi: glob patterns (entries containing * or starting with !)
// pass through unvalidated — MCP tools are discovered at runtime and can't
// be checked against ALL_TOOL_NAMES at parse time.
const tierAllowed = new Set(resolveToolTier(process.env.BOOCODE_TOOLS));
const filteredTools = Array.isArray(fm.tools)
? fm.tools.filter((t): t is string =>
(ALL_TOOL_NAMES as readonly string[]).includes(t) && tierAllowed.has(t),
isGlobPattern(t) ||
((ALL_TOOL_NAMES as readonly string[]).includes(t) && tierAllowed.has(t)),
)
: DEFAULT_TOOLS.filter((t) => tierAllowed.has(t));
@@ -204,6 +279,7 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
tools: filteredTools,
model: typeof fm.model === 'string' && fm.model.length > 0 ? fm.model : null,
max_tool_calls: typeof fm.max_tool_calls === 'number' ? fm.max_tool_calls : null,
steps: typeof fm.steps === 'number' ? fm.steps : null,
};
}

View File

@@ -6,6 +6,7 @@
export {
createInferenceRunner,
MAX_STEPS,
runAssistantTurn,
runInference,
} from './turn.js';
@@ -16,5 +17,6 @@ export type {
StreamResult,
TurnArgs,
} from './turn.js';
export type { ToolPhaseResult } from './tool-phase.js';
export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js';
export { buildMessagesPayload } from './payload.js';

View File

@@ -476,6 +476,202 @@ export async function runDoomLoopSummary(
);
}
// v1.14.0: step-cap wrap-up. Mirrors runCapHitSummary structurally — same
// in-flight-slot reuse, same tools-disabled streaming-summary call, same
// post-finalize sentinel insert + chat_status drop. Difference: the note
// text names the step limit rather than the tool budget. Sentinel reuses
// metadata.kind = 'cap_hit' so the frontend CapHitSentinel component
// renders it without changes.
const STEP_CAP_NOTE = (steps: number, cap: number) =>
`You've reached the step limit (${steps}/${cap} steps). Produce the best answer you can with what you have. Do not call more tools.`;
export async function runStepCapSummary(
ctx: InferenceContext,
args: TurnArgs,
session: Session,
project: Project,
history: Message[],
agent: Agent | null,
steps: number,
cap: number,
): Promise<void> {
const { sessionId, chatId, assistantMessageId, signal } = args;
const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
messages.push({ role: 'system', content: STEP_CAP_NOTE(steps, cap) });
const startedRow = await ctx.sql<{ started_at: string }[]>`
UPDATE messages
SET started_at = clock_timestamp()
WHERE id = ${assistantMessageId}
RETURNING started_at
`;
const startedAt = startedRow[0]?.started_at ?? null;
ctx.publish(sessionId, {
type: 'message_started',
message_id: assistantMessageId,
chat_id: chatId,
role: 'assistant',
});
let accumulated = '';
let pendingFlushTimer: NodeJS.Timeout | null = null;
let flushPromise: Promise<unknown> = Promise.resolve();
const flushNow = () => {
if (pendingFlushTimer) {
clearTimeout(pendingFlushTimer);
pendingFlushTimer = null;
}
const snapshot = accumulated;
flushPromise = flushPromise.then(() =>
ctx.sql`UPDATE messages SET content = ${snapshot} WHERE id = ${assistantMessageId}`
);
};
const scheduleFlush = () => {
if (pendingFlushTimer) return;
pendingFlushTimer = setTimeout(() => {
pendingFlushTimer = null;
flushNow();
}, DB_FLUSH_INTERVAL_MS);
};
let summaryOk = false;
let summarySoftCancelled = false;
let summaryError: string | null = null;
let result: StreamResult | null = null;
try {
result = await streamCompletion(
ctx,
session.model,
messages,
{ tools: null, temperature: agent?.temperature },
(delta) => {
accumulated += delta;
ctx.publish(sessionId, {
type: 'delta',
message_id: assistantMessageId,
chat_id: chatId,
content: delta,
});
scheduleFlush();
},
undefined,
signal,
);
summaryOk = true;
} catch (err) {
if (err instanceof Error && err.name === 'AbortError') {
summarySoftCancelled = true;
} else {
summaryError = err instanceof Error ? err.message : String(err);
}
} finally {
if (pendingFlushTimer) {
clearTimeout(pendingFlushTimer);
pendingFlushTimer = null;
}
await flushPromise;
}
if (summaryOk && result) {
const mctx = await modelContext.getModelContext(session.model);
const nCtx = mctx?.n_ctx ?? null;
const [updated] = await ctx.sql<
{ tokens_used: number | null; ctx_used: number | null; ctx_max: number | null; finished_at: string | null }[]
>`
UPDATE messages
SET content = ${result.content},
status = 'complete',
tokens_used = ${result.completionTokens},
ctx_used = ${result.promptTokens},
ctx_max = ${nCtx},
finished_at = clock_timestamp()
WHERE id = ${assistantMessageId}
RETURNING tokens_used, ctx_used, ctx_max, finished_at
`;
ctx.publish(sessionId, {
type: 'message_complete',
message_id: assistantMessageId,
chat_id: chatId,
tokens_used: updated?.tokens_used ?? null,
ctx_used: updated?.ctx_used ?? null,
ctx_max: updated?.ctx_max ?? null,
started_at: startedAt,
finished_at: updated?.finished_at ?? null,
model: session.model,
});
} else if (summarySoftCancelled) {
await ctx.sql`
UPDATE messages
SET content = ${accumulated},
status = 'cancelled',
finished_at = clock_timestamp()
WHERE id = ${assistantMessageId}
`;
ctx.publish(sessionId, {
type: 'message_complete',
message_id: assistantMessageId,
chat_id: chatId,
});
} else {
const errMeta: MessageMetadata = {
kind: 'error',
error_reason: 'summary_after_cap_failed',
error_text: summaryError ?? 'step-cap summary failed',
};
await ctx.sql`
UPDATE messages
SET content = ${accumulated},
status = 'failed',
finished_at = clock_timestamp(),
metadata = ${ctx.sql.json(errMeta as never)}
WHERE id = ${assistantMessageId}
`;
ctx.publish(sessionId, {
type: 'error',
message_id: assistantMessageId,
chat_id: chatId,
error: summaryError ?? 'step-cap summary failed',
reason: 'summary_after_cap_failed',
});
}
const [sessRow] = await ctx.sql<{ project_id: string; name: string; updated_at: string }[]>`
UPDATE sessions SET updated_at = clock_timestamp()
WHERE id = ${sessionId}
RETURNING project_id, name, updated_at
`;
ctx.publishUser({
type: 'session_updated',
session_id: sessionId,
project_id: sessRow!.project_id,
name: sessRow!.name,
updated_at: sessRow!.updated_at,
});
// Reuse cap_hit sentinel so the frontend CapHitSentinel component renders
// it without changes. The content text distinguishes step cap from budget.
await insertCapHitSentinel(ctx, sessionId, chatId, agent, cap);
if (summaryOk || summarySoftCancelled) {
ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'idle', at: new Date().toISOString() });
} else {
ctx.publishUser({
type: 'chat_status',
chat_id: chatId,
status: 'error',
at: new Date().toISOString(),
reason: 'summary_after_cap_failed',
});
}
ctx.log.info(
{ sessionId, chatId, assistantMessageId, steps, cap, summaryOk, summaryCancelled: summarySoftCancelled },
'inference step-cap summary finished',
);
}
async function insertDoomLoopSentinel(
ctx: InferenceContext,
sessionId: string,

View File

@@ -5,6 +5,7 @@ import type {
} from '../../types/api.js';
import * as modelContext from '../model-context.js';
import { toolJsonSchemas, type ToolJsonSchema } from '../tools.js';
import { matchToolGlob } from '../agents.js';
import type { OpenAiMessage } from './payload.js';
// v1.13.16: extractToolCallBlocks replaces the inline opener-search loop and
// recognizes both Qwen <tool_call> and Anthropic <invoke> markup in one pass.
@@ -376,14 +377,14 @@ export async function executeStreamPhase(
};
// Tool whitelist: if an agent is set, filter the global tool list to only the
// tool names it allows. Unknown names in agent.tools are dropped silently
// (handled here by intersection). When no agent: send all tools.
// tool names it allows. v1.15.0-mcp-multi: uses matchToolGlob for glob
// pattern support (e.g. `context7_*`, `!web_*`). When no agent: send all tools.
// v1.11.8: a second filter strips web_search + web_fetch unless the chat
// has them explicitly enabled. Counts as an opt-in security boundary: the
// model can't summon a tool that wasn't offered to it.
const WEB_TOOL_NAMES: ReadonlySet<string> = new Set(['web_search', 'web_fetch']);
const effectiveTools: ToolJsonSchema[] = (agent
? toolJsonSchemas().filter((t) => agent.tools.includes(t.function.name))
? toolJsonSchemas().filter((t) => matchToolGlob(t.function.name, agent.tools))
: toolJsonSchemas()
).filter((t) => webToolsEnabled || !WEB_TOOL_NAMES.has(t.function.name));
const effectiveTemperature = agent?.temperature;

View File

@@ -19,11 +19,6 @@ import type {
StreamResult,
TurnArgs,
} from './turn.js';
// v1.12.4: ESM value-import cycle. executeToolPhase recurses into
// runAssistantTurn which lives in inference.ts. The cycle is safe because
// the reference is read at call time (inside an async function body), not
// at module top-level. Node + tsc resolve this cleanly.
import { runAssistantTurn } from './turn.js';
// v1.13.13: synthesis pipeline — replaces the immediate recursive turn when
// any of this batch's tool calls is in SYNTHESIS_TOOLS. Falls through to
// recursion on synthesis failure (timeout / model error). See module header
@@ -86,6 +81,16 @@ async function executeToolCall(
}
}
// v1.14.0: return struct from executeToolPhase so the caller (the outer
// while loop in turn.ts) can decide whether to continue, break, or handle
// synthesis. Replaces the recursive call into runAssistantTurn.
export interface ToolPhaseResult {
action: 'continue' | 'paused' | 'synthesis_done';
toolCallCount: number;
toolCalls: ToolCall[];
nextAssistantId: string | null;
}
export async function executeToolPhase(
ctx: InferenceContext,
args: TurnArgs,
@@ -93,8 +98,8 @@ export async function executeToolPhase(
startedAt: string | null,
session: Session,
projectRoot: string
): Promise<void> {
const { sessionId, chatId, assistantMessageId, toolsUsed, signal } = args;
): Promise<ToolPhaseResult> {
const { sessionId, chatId, assistantMessageId } = args;
const { content, toolCalls, promptTokens, completionTokens } = result;
// v1.11.3: ctx_max comes from llama-swap /upstream/<model>/props, not the
@@ -296,7 +301,12 @@ export async function executeToolPhase(
{ sessionId, chatId, assistantMessageId },
'inference paused awaiting user input',
);
return;
return {
action: 'paused' as const,
toolCallCount: toolCalls.length,
toolCalls,
nextAssistantId: null,
};
}
// v1.13.13: synthesis-pipeline branch. When any of this batch's tool calls
@@ -328,30 +338,30 @@ export async function executeToolPhase(
...(typeof out?.truncated === 'boolean' ? { truncated: out.truncated } : {}),
...(typeof out?.outputPath === 'string' ? { outputPath: out.outputPath } : {}),
});
if (ran) return;
if (ran) {
return {
action: 'synthesis_done' as const,
toolCallCount: toolCalls.length,
toolCalls,
nextAssistantId: null,
};
}
// ran === false → synthesis failed (timeout / model error) → fall through
// to the standard recursive turn below. The synth message (if created)
// to the standard continue path below. The synth message (if created)
// was already marked status='failed' inside runSynthesisPass.
}
// v1.14.0: create the next assistant row and return a continue result.
// The caller (outer while loop in turn.ts) handles the iteration.
const [nextAssistant] = await ctx.sql<{ id: string }[]>`
INSERT INTO messages (session_id, chat_id, role, content, status, created_at)
VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming', clock_timestamp())
RETURNING id
`;
await runAssistantTurn(ctx, {
sessionId,
chatId,
assistantMessageId: nextAssistant!.id,
// v1.8.2: charge this turn's actual tool invocations against the budget.
// One assistant message can emit multiple tool_calls, so we add the run
// count, not 1. The next turn's budget check sees the cumulative total.
toolsUsed: toolsUsed + result.toolCalls.length,
// v1.11.6: append the just-executed tool calls to the per-turn history
// so the next runAssistantTurn's doom-loop check can see them. We don't
// cap the array length here — per-turn budgets keep it bounded
// (typically <30 entries), and slicing happens inside detectDoomLoop.
recentToolCalls: [...args.recentToolCalls, ...result.toolCalls],
signal,
});
return {
action: 'continue' as const,
toolCallCount: toolCalls.length,
toolCalls,
nextAssistantId: nextAssistant!.id,
};
}

View File

@@ -16,11 +16,9 @@ import { resolveProjectRoot } from '../path_guard.js';
import { maybeAutoNameChat } from '../auto_name.js';
import { getAgentById } from '../agents.js';
import * as compaction from '../compaction.js';
import * as modelContext from '../model-context.js';
import type { Broker } from '../broker.js';
import { resolveToolBudget } from './budget.js';
import {
DOOM_LOOP_THRESHOLD,
detectDoomLoop,
} from './sentinels.js';
import {
@@ -33,15 +31,23 @@ import {
} from './error-handler.js';
import {
executeStreamPhase,
streamCompletion,
} from './stream-phase.js';
import { executeToolPhase } from './tool-phase.js';
import { DB_FLUSH_INTERVAL_MS, type StreamPhaseState } from './types.js';
import { executeToolPhase, type ToolPhaseResult } from './tool-phase.js';
import type { StreamPhaseState } from './types.js';
import {
runCapHitSummary,
runDoomLoopSummary,
runStepCapSummary,
} from './sentinel-summaries.js';
// v1.14.0: hard ceiling on the number of stream-and-tool iterations per
// user-message turn. Per-agent cap via agent.steps is the primary knob;
// MAX_STEPS is the safety ceiling. 200 is 4x the effective budget ceiling
// (50 tool calls) — in practice budget fires first unless the model makes
// many 0-tool-call iterations (which exit the loop via the non-tool finish
// path anyway).
export const MAX_STEPS = 200;
// v1.12.4: re-exported so external callers (tests, future consumers) keep
// importing from services/inference.js as the public surface.
export { detectDoomLoop, DOOM_LOOP_THRESHOLD } from './sentinels.js';
@@ -145,75 +151,185 @@ export async function runAssistantTurn(
ctx: InferenceContext,
args: TurnArgs,
): Promise<void> {
const { sessionId, chatId } = args;
const { sessionId, chatId, signal } = args;
// v1.11: if the prior turn flagged this chat for compaction, run it first
// so loadContext below reads the post-compaction history. We swallow
// compaction failures (clearing the flag so we don't loop) and proceed
// with the un-compacted history — a slow turn that hits the model's
// hard limit is recoverable; a dead session is not.
const chatFlag = await ctx.sql<{ needs_compaction: boolean }[]>`
SELECT needs_compaction FROM chats WHERE id = ${chatId}
`;
if (chatFlag[0]?.needs_compaction) {
try {
await compaction.process({
sql: ctx.sql,
config: ctx.config,
log: ctx.log,
broker: ctx.broker,
chatId,
});
} catch (err) {
ctx.log.warn({ err, chatId }, 'auto-compaction failed; clearing flag and proceeding');
await ctx.sql`UPDATE chats SET needs_compaction = false WHERE id = ${chatId}`;
}
}
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (!loaded) {
// v1.14.0: resolve agent once at the top. The agent stays fixed for the
// duration of this user-message turn — PATCH agent_id mid-conversation
// takes effect on the next runInference, not mid-loop.
const initialLoaded = await loadContext(ctx.sql, sessionId, chatId);
if (!initialLoaded) {
ctx.log.warn({ sessionId }, 'inference: session or project missing');
return;
}
const { session, project, history } = loaded;
const projectRoot = await resolveProjectRoot(project.path);
// Agent resolution is per-turn so PATCH agent_id mid-conversation takes
// effect on the next message. Unknown agent_id returns null silently —
// session falls back to base prompt + all tools + default temperature.
const { session, project } = initialLoaded;
const agent = session.agent_id
? await getAgentById(project.path, session.agent_id)
: null;
// v1.8.2: cap-hit replaces the older "tool loop depth exceeded" failure.
// When we've already burned the budget *before* this turn even runs, we
// skip straight to the summary flow — the in-flight assistant message slot
// gets reused for the wrap-up reply instead of being marked failed.
const budget = resolveToolBudget(agent);
if (args.toolsUsed >= budget) {
await runCapHitSummary(ctx, args, session, project, history, agent, budget);
// v1.14.0: effectiveCap = min(agent.steps ?? Infinity, MAX_STEPS).
// steps: 0 means "no tool calls allowed" — the first stream phase runs
// but if it emits tool calls they are not executed (finalize as text-only).
const effectiveCap = Math.min(agent?.steps ?? Infinity, MAX_STEPS);
// steps: 0 special case — model responds text-only. The while loop would
// never enter (effectiveCap === 0), so we handle it explicitly before the
// loop. The model always gets at least one chance to respond with text.
if (effectiveCap === 0) {
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) {
await runTextOnlyTurn(ctx, args, loaded.session, loaded.project, loaded.history, agent);
}
return;
}
// v1.11.6: doom-loop guard. Detected BEFORE the budget cap (the model can
// burn through 3 identical calls long before the 15-call budget fires).
// Same in-flight-slot-reuse pattern as runCapHitSummary — wrap-up reply
// lands in args.assistantMessageId, then a doom_loop sentinel is inserted
// to make the abort visible in the chat history.
const loop = detectDoomLoop(args.recentToolCalls);
if (loop) {
await runDoomLoopSummary(ctx, args, session, project, history, agent, loop);
return;
let stepNumber = 0;
let toolsUsed = args.toolsUsed;
let recentToolCalls = args.recentToolCalls;
let assistantMessageId = args.assistantMessageId;
while (stepNumber < effectiveCap) {
// ---- doom-loop check (moved from top-of-function) ----
const loop = detectDoomLoop(recentToolCalls);
if (loop) {
// Need fresh history for the summary.
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) {
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal };
await runDoomLoopSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, loop);
}
break;
}
// ---- budget check (moved from top-of-function) ----
if (toolsUsed >= budget) {
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) {
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal };
await runCapHitSummary(ctx, iterArgs, loaded.session, loaded.project, loaded.history, agent, budget);
}
break;
}
// ---- compaction check ----
// v1.11: if the prior turn flagged this chat for compaction, run it
// before loadContext so we read post-compaction history. Swallow
// failures and proceed with un-compacted history.
const chatFlag = await ctx.sql<{ needs_compaction: boolean }[]>`
SELECT needs_compaction FROM chats WHERE id = ${chatId}
`;
if (chatFlag[0]?.needs_compaction) {
try {
await compaction.process({
sql: ctx.sql,
config: ctx.config,
log: ctx.log,
broker: ctx.broker,
chatId,
});
} catch (err) {
ctx.log.warn({ err, chatId }, 'auto-compaction failed; clearing flag and proceeding');
await ctx.sql`UPDATE chats SET needs_compaction = false WHERE id = ${chatId}`;
}
}
// ---- load context (must re-load each iteration — new messages since last step) ----
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (!loaded) {
ctx.log.warn({ sessionId }, 'inference: session or project missing mid-loop');
break;
}
const { session: iterSession, project: iterProject, history } = loaded;
const projectRoot = await resolveProjectRoot(iterProject.path);
// v1.14.0: log step boundary for instrumentation. step_start parts are in
// the schema CHECK but not emitted here — writing to the assistant message
// before the stream phase creates a sequence-0 collision with
// partsFromAssistantMessage. A WS frame or structured log is sufficient
// since the frontend doesn't render step boundaries in v1.14.
ctx.log.info({ sessionId, chatId, step: stepNumber, assistantMessageId }, 'step_start');
// ---- build messages + stream phase ----
const messages = await buildMessagesPayload(iterSession, iterProject, history, agent, ctx.log);
const webToolsEnabled =
iterSession.web_search_enabled ?? iterProject.default_web_search_enabled ?? false;
const iterArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal };
const state: StreamPhaseState = { accumulated: '', startedAt: null };
let result: StreamResult;
try {
result = await executeStreamPhase(ctx, iterArgs, iterSession, messages, state, agent, webToolsEnabled);
} catch (err) {
await handleAbortOrError(ctx, iterArgs, state.accumulated, err);
break;
}
// ---- non-tool finish → finalize and exit ----
if (result.toolCalls.length === 0) {
await finalizeCompletion(ctx, iterArgs, result, state.startedAt, iterSession);
break;
}
// ---- steps: 0 edge case ----
// effectiveCap check above guarantees we're inside the loop, but this
// guard handles the theoretical case where the model emits tool calls
// on step 0 when effectiveCap would have been 0 (impossible since the
// while condition prevents entry, but kept for safety). If effectiveCap
// is 1 and we're on step 0, tool calls ARE executed — steps counts
// iterations, not post-first-stream.
// ---- tool phase ----
let toolPhaseResult: ToolPhaseResult;
try {
toolPhaseResult = await executeToolPhase(ctx, iterArgs, result, state.startedAt, iterSession, projectRoot);
} catch (err) {
// Tool phase errors are unexpected (individual tool failures are
// caught inside executeToolPhase). Log and break.
ctx.log.error({ err, sessionId, chatId, step: stepNumber }, 'tool phase threw unexpectedly');
break;
}
// ---- update loop locals ----
toolsUsed += toolPhaseResult.toolCallCount;
recentToolCalls = [...recentToolCalls, ...toolPhaseResult.toolCalls];
stepNumber++;
if (toolPhaseResult.action !== 'continue') {
// 'paused' (user input) or 'synthesis_done' — stop the loop.
break;
}
// 'continue' — advance to next assistant message.
assistantMessageId = toolPhaseResult.nextAssistantId!;
}
// ---- post-loop: step-cap sentinel ----
// When the loop exits because stepNumber reached effectiveCap, the last
// iteration's tool phase returned 'continue' with a nextAssistantId that
// is still in 'streaming' status (unfilled). Use it for the wrap-up.
if (stepNumber >= effectiveCap && effectiveCap < Infinity) {
const loaded = await loadContext(ctx.sql, sessionId, chatId);
if (loaded) {
const capArgs: TurnArgs = { sessionId, chatId, assistantMessageId, toolsUsed, recentToolCalls, signal };
await runStepCapSummary(ctx, capArgs, loaded.session, loaded.project, loaded.history, agent, stepNumber, effectiveCap);
}
}
}
// v1.14.0: special handling for steps: 0 — the model responds text-only.
// The while loop never enters (effectiveCap === 0). We stream once with
// no tools, finalize, and return. If the model emits tool calls despite
// not being offered tools, they're ignored (finalize as text-only).
async function runTextOnlyTurn(
ctx: InferenceContext,
args: TurnArgs,
session: Session,
project: Project,
history: Message[],
agent: Agent | null,
): Promise<void> {
const messages = await buildMessagesPayload(session, project, history, agent, ctx.log);
// v1.11.8: resolve per-chat web-tools opt-in. Tri-state on the wire:
// - session.web_search_enabled = null → inherit project default
// - session.web_search_enabled = true/false → explicit
// Both web_search and web_fetch are gated by this single flag (the UI
// label is "Enable web search and fetch" — same store, both tools).
// Default is false unless explicitly opted in, matching the v1.9
// plumbing intent ("inert until Batch 8 ships the actual tools").
// Web tools are irrelevant when steps: 0 (no tool execution), but we
// still need to resolve the flag for executeStreamPhase's signature.
const webToolsEnabled =
session.web_search_enabled ?? project.default_web_search_enabled ?? false;
@@ -227,8 +343,12 @@ export async function runAssistantTurn(
}
if (result.toolCalls.length > 0) {
await executeToolPhase(ctx, args, result, state.startedAt, session, projectRoot);
return;
ctx.log.warn(
{ chatId: args.chatId, toolCallCount: result.toolCalls.length },
'steps: 0 agent emitted tool calls; ignoring and finalizing as text-only',
);
// Override: strip tool calls so finalizeCompletion treats it as text-only.
result = { ...result, toolCalls: [] };
}
await finalizeCompletion(ctx, args, result, state.startedAt, session);

View File

@@ -0,0 +1,288 @@
/**
* v1.15.0-mcp-multi: multi-server MCP client registry.
*
* Connects to multiple MCP servers (Streamable HTTP or stdio transport),
* discovers tools from each, wraps them as BooCode ToolDefs with a
* `<serverName>_<toolName>` name prefix, and routes callTool by prefix.
*
* Graceful degradation: one failing server doesn't block others.
* Read-only invariant: tools with readOnlyHint === false are rejected.
*/
import { Client } from '@modelcontextprotocol/sdk/client';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
import { z } from 'zod';
import type { FastifyBaseLogger } from 'fastify';
import type { McpServerEntry, McpServerConfig } from './mcp-config.js';
import type { ToolDef } from './tools.js';
// ---- Types ----
interface McpToolAnnotations {
readOnlyHint?: boolean;
destructiveHint?: boolean;
[key: string]: unknown;
}
interface McpToolDef {
name: string;
description?: string;
inputSchema: Record<string, unknown>;
annotations?: McpToolAnnotations;
}
interface ServerState {
client: Client;
transport: StreamableHTTPClientTransport | StdioClientTransport;
tools: ToolDef<Record<string, unknown>>[];
type: 'streamableHttp' | 'stdio';
}
// ---- Module-level state ----
const servers = new Map<string, ServerState>();
// Reverse map: prefixed tool name → server name (built during discovery)
const toolToServer = new Map<string, string>();
let log: FastifyBaseLogger | null = null;
const MAX_RESULT_BYTES = 5 * 1024 * 1024;
// ---- Public API ----
/**
* Connect to all configured MCP servers, discover tools, and wrap them.
* Per-server graceful degradation: a failing server is logged and skipped.
*/
export async function initialize(
entries: McpServerEntry[],
logger: FastifyBaseLogger,
): Promise<void> {
log = logger;
// Connect servers in parallel — each wrapped in try/catch for isolation
await Promise.all(
entries.map(async (entry) => {
try {
await connectServer(entry);
} catch (err) {
log!.warn(
{ err, server: entry.name },
`mcp: failed to initialize server "${entry.name}" — its tools will be unavailable`,
);
}
}),
);
if (servers.size > 0) {
const totalTools = Array.from(servers.values()).reduce((n, s) => n + s.tools.length, 0);
log.info(
{ servers: servers.size, tools: totalTools },
'mcp: multi-server initialization complete',
);
}
}
/**
* Call an MCP tool by its prefixed name. Routes to the correct server
* using the toolToServer reverse map.
*/
export async function callTool(
prefixedName: string,
args: Record<string, unknown>,
): Promise<unknown> {
const serverName = toolToServer.get(prefixedName);
if (!serverName) {
return { error: true, output: `MCP tool "${prefixedName}" not found in any server` };
}
const state = servers.get(serverName);
if (!state) {
return { error: true, output: `MCP server "${serverName}" not available` };
}
// Strip the "<serverName>_" prefix to get the original tool name
const originalName = prefixedName.slice(serverName.length + 1);
try {
const result = await state.client.callTool({ name: originalName, arguments: args });
const content = result.content as Array<{ type: string; text?: string; [key: string]: unknown }>;
if (!content || content.length === 0) {
return '(no output)';
}
if (result.isError) {
const joined = content
.map((block) => (block.type === 'text' ? block.text ?? '' : JSON.stringify(block)))
.join('\n');
return { error: true, output: joined || '(MCP error with no details)' };
}
const parts = content.map((block) => {
if (block.type === 'text') return block.text ?? '';
return JSON.stringify(block);
});
const joined = parts.join('\n');
if (joined.length > MAX_RESULT_BYTES) {
log?.warn({ tool: originalName, server: serverName, bytes: joined.length, cap: MAX_RESULT_BYTES }, 'mcp: result truncated');
return joined.slice(0, MAX_RESULT_BYTES) + '\n\n[truncated — MCP result exceeded size limit]';
}
return joined;
} catch (err) {
log?.warn({ err, tool: originalName, server: serverName }, 'mcp: callTool failed');
return {
error: true,
output: err instanceof Error ? err.message : 'MCP server unreachable',
};
}
}
/** Return all wrapped ToolDefs from all connected servers, flattened. */
export function getTools(): ToolDef<Record<string, unknown>>[] {
const all: ToolDef<Record<string, unknown>>[] = [];
for (const state of servers.values()) {
all.push(...state.tools);
}
return all;
}
/** Return status of each server (for debug/status endpoints). */
export function getMcpServers(): Array<{
name: string;
type: 'streamableHttp' | 'stdio';
toolCount: number;
connected: boolean;
}> {
return Array.from(servers.entries()).map(([name, state]) => ({
name,
type: state.type,
toolCount: state.tools.length,
connected: true,
}));
}
/**
* Graceful shutdown. For stdio servers, the SDK's transport.close() handles
* SIGTERM + timeout. For HTTP servers, close the transport.
*/
export async function shutdown(): Promise<void> {
const closePromises: Promise<void>[] = [];
for (const [name, state] of servers) {
closePromises.push(
(async () => {
try {
await state.transport.close();
log?.info({ server: name }, 'mcp: server transport closed');
} catch (err) {
log?.warn({ err, server: name }, 'mcp: error closing server transport');
}
})(),
);
}
await Promise.all(closePromises);
servers.clear();
toolToServer.clear();
}
// ---- Internal helpers ----
async function connectServer(entry: McpServerEntry): Promise<void> {
const { name, config } = entry;
const client = new Client({ name: 'boocode', version: '1.15.0' });
let transport: StreamableHTTPClientTransport | StdioClientTransport;
if (config.type === 'streamableHttp') {
transport = createHttpTransport(config);
} else {
transport = createStdioTransport(config);
}
await client.connect(transport);
const result = await client.listTools();
const mcpTools = (result.tools ?? []) as McpToolDef[];
const tools: ToolDef<Record<string, unknown>>[] = [];
for (const t of mcpTools) {
if (t.annotations?.readOnlyHint === false) {
log!.info({ tool: t.name, server: name }, 'mcp: skipping non-read-only tool');
continue;
}
const wrapped = wrapMcpTool(name, t);
tools.push(wrapped);
toolToServer.set(wrapped.name, name);
}
servers.set(name, { client, transport, tools, type: config.type });
log!.info(
{ server: name, type: config.type, count: tools.length, names: tools.map((t) => t.name) },
'mcp: server initialized',
);
}
function createHttpTransport(config: Extract<McpServerConfig, { type: 'streamableHttp' }>): StreamableHTTPClientTransport {
const requestInit: RequestInit = {};
if (config.headers && Object.keys(config.headers).length > 0) {
requestInit.headers = config.headers;
}
return new StreamableHTTPClientTransport(new URL(config.url), { requestInit });
}
function createStdioTransport(config: Extract<McpServerConfig, { type: 'stdio' }>): StdioClientTransport {
return new StdioClientTransport({
command: config.command,
args: config.args,
env: config.env,
stderr: 'pipe',
});
}
/** Wrap an MCP tool as a BooCode ToolDef with a server-name prefix. */
export function wrapMcpTool(
serverName: string,
mcpTool: McpToolDef,
): ToolDef<Record<string, unknown>> {
const prefixedName = `${serverName}_${mcpTool.name}`;
return {
name: prefixedName,
description: mcpTool.description ?? '',
inputSchema: z.record(z.unknown()),
jsonSchema: {
type: 'function' as const,
function: {
name: prefixedName,
description: mcpTool.description ?? '',
parameters: mcpTool.inputSchema ?? { type: 'object', properties: {} },
},
},
execute: async (input) => {
return callTool(prefixedName, input);
},
};
}
/** Exposed for unit tests — extract content from an MCP result. */
export function extractContent(
content: Array<{ type: string; text?: string; [key: string]: unknown }> | undefined,
isError?: boolean,
): unknown {
if (!content || content.length === 0) return '(no output)';
const parts = content.map((block) => {
if (block.type === 'text') return block.text ?? '';
return JSON.stringify(block);
});
const joined = parts.join('\n');
if (isError) {
return { error: true, output: joined || '(MCP error with no details)' };
}
return joined;
}
/** Exposed for unit tests — the read-only guard predicate. */
export function isToolReadOnly(annotations?: McpToolAnnotations): boolean {
return annotations?.readOnlyHint !== false;
}

View File

@@ -0,0 +1,78 @@
/**
* v1.15.0-mcp-multi: MCP config file schema + loader.
*
* Reads a JSON config file (default `/data/mcp.json`) that declares MCP
* servers — their transport type, connection parameters, and enabled state.
* Schema shape matches opencode's `mcpServers` key for copy-paste compat.
*/
import { readFileSync } from 'node:fs';
import { z } from 'zod';
import type { FastifyBaseLogger } from 'fastify';
// ---- Zod schema ----
const McpServerConfigSchema = z.discriminatedUnion('type', [
z.object({
type: z.literal('streamableHttp'),
url: z.string().url(),
headers: z.record(z.string()).optional(),
enabled: z.boolean().default(true),
}),
z.object({
type: z.literal('stdio'),
command: z.string().min(1),
args: z.array(z.string()).default([]),
env: z.record(z.string()).optional(),
enabled: z.boolean().default(true),
}),
]);
const McpConfigSchema = z.object({
mcpServers: z.record(z.string(), McpServerConfigSchema).default({}),
});
export type McpServerConfig = z.infer<typeof McpServerConfigSchema>;
export interface McpServerEntry {
name: string;
config: McpServerConfig;
}
// ---- Loader ----
/**
* Read and validate the MCP config file. Returns enabled servers only.
* File missing → log info, return []. Parse/validation error → log warn, return [].
*/
export function loadMcpConfig(configPath: string, log: FastifyBaseLogger): McpServerEntry[] {
let raw: string;
try {
raw = readFileSync(configPath, 'utf8');
} catch {
log.info(`mcp: config not found at ${configPath}, skipping`);
return [];
}
let json: unknown;
try {
json = JSON.parse(raw);
} catch (err) {
log.warn({ err }, `mcp: failed to parse ${configPath} as JSON`);
return [];
}
const result = McpConfigSchema.safeParse(json);
if (!result.success) {
log.warn({ errors: result.error.flatten().fieldErrors }, `mcp: invalid config at ${configPath}`);
return [];
}
const entries: McpServerEntry[] = [];
for (const [name, config] of Object.entries(result.data.mcpServers)) {
if (config.enabled) {
entries.push({ name, config });
}
}
return entries;
}

View File

@@ -21,6 +21,10 @@ import {
watchChanges,
getSemanticNeighborhoods,
getFrameworkAnalysis,
getBlastRadius,
getHotFiles,
getRoutes,
getMiddleware,
} from './tools/codecontext/index.js';
// v1.13.17-cross-repo-reads: cross-repo read grant request tool. Paired
// with the pause-on-pending-grant branch in inference/tool-phase.ts and the
@@ -651,7 +655,9 @@ export const askUserInput: ToolDef<AskUserInputInputT> = {
// of the system prompt, so any order drift would invalidate every cached
// turn. Single source of truth for ordering lives here — toolJsonSchemas()
// and TOOLS_BY_NAME inherit it.
export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
// v1.14.1-mcp-poc: changed from ReadonlyArray to let-bound mutable array
// so appendMcpTools() can push MCP-discovered tools at startup.
export let ALL_TOOLS: ToolDef<unknown>[] = [
viewFile as ToolDef<unknown>,
viewTruncatedOutput as ToolDef<unknown>,
listDir as ToolDef<unknown>,
@@ -678,6 +684,11 @@ export const ALL_TOOLS: ReadonlyArray<ToolDef<unknown>> = [
watchChanges as ToolDef<unknown>,
getSemanticNeighborhoods as ToolDef<unknown>,
getFrameworkAnalysis as ToolDef<unknown>,
// v1.16: codesight-merge tools. Backed by the same codecontext sidecar.
getBlastRadius as ToolDef<unknown>,
getHotFiles as ToolDef<unknown>,
getRoutes as ToolDef<unknown>,
getMiddleware as ToolDef<unknown>,
// v1.13.17-cross-repo-reads: paired with the pause-on-pending-grant
// branch in tool-phase.ts. Read-only — only ever READS files; the only
// state change is appending to sessions.allowed_read_paths via the
@@ -725,10 +736,23 @@ export const READ_ONLY_TOOL_NAMES = [
'request_read_access',
] as const;
export const TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntries(
export let TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntries(
ALL_TOOLS.map((t) => [t.name, t])
);
// v1.14.1-mcp-poc: append MCP-discovered tools at startup. Called once
// from index.ts after mcpClient.initialize(). Re-sorts ALL_TOOLS and
// rebuilds TOOLS_BY_NAME. READ_ONLY_TOOL_NAMES is not rebuilt because
// it's a const tuple used only for budget-tier checks; MCP tools are
// individually checked via their category at budget resolution time —
// they are all read_only by construction (the read-only guard in
// mcp-client.ts rejects any tool with readOnlyHint: false).
export function appendMcpTools(mcpTools: ToolDef<unknown>[]): void {
if (mcpTools.length === 0) return;
ALL_TOOLS = [...ALL_TOOLS, ...mcpTools].sort((a, b) => a.name.localeCompare(b.name));
TOOLS_BY_NAME = Object.fromEntries(ALL_TOOLS.map((t) => [t.name, t]));
}
// v1.13.15-tools: tiered tool loading. BOOCODE_TOOLS env var (`core` |
// `standard` | `all`) filters the agent's tool whitelist before LLM dispatch.
// Daily-driver token win on qwen3.6-35b-a3b — the 35B-A3B MoE benefits from

View File

@@ -0,0 +1,51 @@
import { z } from 'zod';
import type { ToolDef } from '../../tools.js';
import { callCodecontext, type CodecontextResponse } from '../../codecontext_client.js';
export const GetBlastRadiusInput = z.object({
file_path: z.string().trim().min(1),
});
export type GetBlastRadiusInputT = z.infer<typeof GetBlastRadiusInput>;
const DESCRIPTION =
'Returns all files that depend (transitively) on the given file, with depth tracking. ' +
'Use to assess the impact of changing a file — "what breaks if I modify this?" ' +
'Traverses the import graph in reverse via BFS. Results sorted by distance (closest dependents first).';
export async function executeGetBlastRadius(
input: GetBlastRadiusInputT,
projectPath: string,
fetcher: typeof fetch = fetch,
): Promise<CodecontextResponse> {
return callCodecontext(
{ toolName: 'get_blast_radius', args: { file_path: input.file_path }, projectPath },
fetcher,
);
}
export const getBlastRadius: ToolDef<GetBlastRadiusInputT> = {
name: 'get_blast_radius',
description: DESCRIPTION,
inputSchema: GetBlastRadiusInput,
jsonSchema: {
type: 'function',
function: {
name: 'get_blast_radius',
description: DESCRIPTION,
parameters: {
type: 'object',
properties: {
file_path: {
type: 'string',
description: 'Absolute or project-relative path to the file to analyze.',
},
},
required: ['file_path'],
additionalProperties: false,
},
},
},
async execute(input, projectRoot) {
return await executeGetBlastRadius(input, projectRoot);
},
};

View File

@@ -0,0 +1,50 @@
import { z } from 'zod';
import type { ToolDef } from '../../tools.js';
import { callCodecontext, type CodecontextResponse } from '../../codecontext_client.js';
export const GetHotFilesInput = z.object({
limit: z.number().int().min(1).max(100).optional(),
});
export type GetHotFilesInputT = z.infer<typeof GetHotFilesInput>;
const DESCRIPTION =
'Returns the most-imported files in the project, ranked by incoming import count. ' +
'Hot files are high-risk change targets — many other files depend on them. ' +
'Use to identify core modules and assess refactoring risk.';
export async function executeGetHotFiles(
input: GetHotFilesInputT,
projectPath: string,
fetcher: typeof fetch = fetch,
): Promise<CodecontextResponse> {
return callCodecontext(
{ toolName: 'get_hot_files', args: input.limit != null ? { limit: input.limit } : {}, projectPath },
fetcher,
);
}
export const getHotFiles: ToolDef<GetHotFilesInputT> = {
name: 'get_hot_files',
description: DESCRIPTION,
inputSchema: GetHotFilesInput,
jsonSchema: {
type: 'function',
function: {
name: 'get_hot_files',
description: DESCRIPTION,
parameters: {
type: 'object',
properties: {
limit: {
type: 'number',
description: 'Maximum number of files to return (default 20, max 100).',
},
},
additionalProperties: false,
},
},
},
async execute(input, projectRoot) {
return await executeGetHotFiles(input, projectRoot);
},
};

View File

@@ -0,0 +1,41 @@
import { z } from 'zod';
import type { ToolDef } from '../../tools.js';
import { callCodecontext, type CodecontextResponse } from '../../codecontext_client.js';
export const GetMiddlewareInput = z.object({});
export type GetMiddlewareInputT = z.infer<typeof GetMiddlewareInput>;
const DESCRIPTION =
'Detects middleware registrations in the project. Identifies auth, CORS, rate-limit, ' +
'security-headers, error-handler, logging, and validation middleware by analyzing ' +
'import names (@fastify/cors, helmet, etc.) and registration patterns ' +
'(app.register, app.addHook, app.setErrorHandler).';
export async function executeGetMiddleware(
_input: GetMiddlewareInputT,
projectPath: string,
fetcher: typeof fetch = fetch,
): Promise<CodecontextResponse> {
return callCodecontext({ toolName: 'get_middleware', args: {}, projectPath }, fetcher);
}
export const getMiddleware: ToolDef<GetMiddlewareInputT> = {
name: 'get_middleware',
description: DESCRIPTION,
inputSchema: GetMiddlewareInput,
jsonSchema: {
type: 'function',
function: {
name: 'get_middleware',
description: DESCRIPTION,
parameters: {
type: 'object',
properties: {},
additionalProperties: false,
},
},
},
async execute(input, projectRoot) {
return await executeGetMiddleware(input, projectRoot);
},
};

View File

@@ -0,0 +1,50 @@
import { z } from 'zod';
import type { ToolDef } from '../../tools.js';
import { callCodecontext, type CodecontextResponse } from '../../codecontext_client.js';
export const GetRoutesInput = z.object({
framework: z.string().trim().optional(),
});
export type GetRoutesInputT = z.infer<typeof GetRoutesInput>;
const DESCRIPTION =
'Extracts HTTP routes from the project via tree-sitter AST analysis. ' +
'Detects Fastify and Express route registrations (app.get, app.post, app.route, router.use, etc.) ' +
'with method, path, file, line number, and inferred tags (db, auth, cache). ' +
'Optional framework filter narrows to "fastify" or "express".';
export async function executeGetRoutes(
input: GetRoutesInputT,
projectPath: string,
fetcher: typeof fetch = fetch,
): Promise<CodecontextResponse> {
const args: Record<string, unknown> = {};
if (input.framework) args.framework = input.framework;
return callCodecontext({ toolName: 'get_routes', args, projectPath }, fetcher);
}
export const getRoutes: ToolDef<GetRoutesInputT> = {
name: 'get_routes',
description: DESCRIPTION,
inputSchema: GetRoutesInput,
jsonSchema: {
type: 'function',
function: {
name: 'get_routes',
description: DESCRIPTION,
parameters: {
type: 'object',
properties: {
framework: {
type: 'string',
description: 'Filter to a specific framework: "fastify" or "express". Omit for all.',
},
},
additionalProperties: false,
},
},
},
async execute(input, projectRoot) {
return await executeGetRoutes(input, projectRoot);
},
};

View File

@@ -1,5 +1,5 @@
// v1.12 Track B.2: codecontext tool registry. Re-exports the 8 ToolDefs so
// tools.ts can pull them in one line.
// codecontext tool registry. Re-exports ToolDefs so tools.ts can pull them
// in one line. v1.12: 8 original tools. v1.16: +4 codesight-merge tools.
export { getCodebaseOverview } from './get_codebase_overview.js';
export { getFileAnalysis } from './get_file_analysis.js';
@@ -9,3 +9,7 @@ export { getDependencies } from './get_dependencies.js';
export { watchChanges } from './watch_changes.js';
export { getSemanticNeighborhoods } from './get_semantic_neighborhoods.js';
export { getFrameworkAnalysis } from './get_framework_analysis.js';
export { getBlastRadius } from './get_blast_radius.js';
export { getHotFiles } from './get_hot_files.js';
export { getRoutes } from './get_routes.js';
export { getMiddleware } from './get_middleware.js';

View File

@@ -106,6 +106,9 @@ export interface Agent {
// agent's toolset (30 if all tools are read-only, 10 otherwise) or 15 for
// raw chat with no agent.
max_tool_calls: number | null;
// v1.14.0: per-agent step cap for the outer inference loop. null means
// bounded only by MAX_STEPS (200). 0 means "no tool calls allowed."
steps: number | null;
}
// One entry per malformed `## Name` block. Per-block errors don't fail the

View File

@@ -7,7 +7,7 @@
"rootDir": "src",
"lib": ["ES2022"],
"types": ["node"],
"declaration": false,
"declaration": true,
"sourceMap": true
},
"include": ["src/**/*"],

View File

@@ -73,6 +73,9 @@ export interface Agent {
// the agent's toolset (30 for all read-only, 10 otherwise) or 15 for raw
// chat with no agent.
max_tool_calls: number | null;
// v1.14.0: per-agent step cap for the outer inference loop. null means
// bounded only by MAX_STEPS (200). 0 means "no tool calls allowed."
steps: number | null;
}
export interface AgentParseError {

View File

@@ -1,6 +1,6 @@
# BooCode v1.x — Roadmap
Last updated: 2026-05-22
Last updated: 2026-05-23
> **Companion doc:** `boocode_code_review.md` holds the full external-repo inventory, lift rationale, and license analysis. This document is the canonical source for shipping state, version ordering, and what's planned vs. shipped.
@@ -27,7 +27,7 @@ External code lifted from / referenced in: see `boocode_code_review.md` for full
-----
## Shipped (status as of 2026-05-22)
## Shipped (status as of 2026-05-23)
|Version |Theme |Tag |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|
@@ -72,9 +72,9 @@ External code lifted from / referenced in: see `boocode_code_review.md` for full
-----
### Shipped (v1.13.x — written 2026-05-22, retagged same day)
### Shipped (v1.13.x — strangler-fig closed 2026-05-23)
All v1.13.x batches were retagged to the `vMAJOR.MINOR.PATCH-slug` scheme on 2026-05-22. `CHANGELOG.md` is the canonical per-tag record (slug describes what shipped; tag name alone recalls the batch). Tip is `v1.13.14-skills-audit` (`0fa46cd`); the next batch is `v1.13.15-codecontext-synth` (this batch, tag pending). Tags in chronological order:
All v1.13.x batches use the `vMAJOR.MINOR.PATCH-slug` tag scheme adopted 2026-05-22. `CHANGELOG.md` is the canonical per-tag record (slug describes what shipped; tag name alone recalls the batch). The v1.13.x line ran 21 batches over a single intense window; the umbrella `v1.13` tag sits on `211e903` (same commit as `v1.13.20-drop-legacy-cols`), marking the strangler-fig closed. Tags in chronological order:
- `v1.13.0-ai-sdk-v6` — AI SDK v6 migration; `streamCompletion` adapter; `messages_with_parts` view; reasoning_parts end-to-end
- `v1.13.1-cleanup-bundle``statement_timeout='30s'`, alpha-sorted tool registry, 60s stuck-row sweeper, `experimental_repairToolCall` pass-through
@@ -93,115 +93,13 @@ All v1.13.x batches were retagged to the `vMAJOR.MINOR.PATCH-slug` scheme on 202
- `v1.13.14-skills-audit` — 26 skills vendored + audited via 5 parallel agent teams; 14 kept, 11 dropped, 1 migrated to BOOCHAT.md/BOOCODER.md
- `v1.13.15-codecontext-synth` — forced second-inference synthesis pass for codecontext overview tools (truncation-aware extraction; auto-fetched top-N files + project docs; 32k payload-budget contract preserved)
- `v1.13.16-xml-parser` — Anthropic `<invoke>` parser support + Levenshtein-based unknown-tool recovery hints (qwen3.6 drift to Claude Code-style tool names like `read_file`); xml-parser test coverage
- `v1.13.17-cross-repo-reads``request_read_access` tool + per-session `allowed_read_paths` grants; `pathGuard` extended with `extraRoots`; pause/resume reuses the `ask_user_input` mechanism
- `v1.13.18-codecontext-file-path``resolveProjectPath` in `codecontext_client.ts` realpath-resolves `file_path` arg the same way `target_dir` was; closes the silent-fail path the sidecar exhibited on relative paths
- `v1.13.19-html-artifact-panes` — pane-based artifact viewer with on-request HTML; `<!DOCTYPE html>` detection adds `message_parts.kind='html_artifact'` row; Markdown + HTML panes both open via "Open in pane" affordance; iframe sandbox `allow-scripts allow-clipboard-write allow-downloads` (no `allow-same-origin`, `srcDoc`); CSP `connect-src 'none'`. Scope-revised mid-design from auto-bias-to-HTML to Markdown-default / HTML-on-request
- `v1.13.20-drop-legacy-cols` — final strangler-fig step. Drops `messages.tool_calls` + `tool_results` columns; 10 dual-write sites removed (recon caught 2 beyond the original roadmap inventory); `messages_with_parts` view simplified to parts-only subselects via `CREATE OR REPLACE` before the column DROPs (Postgres ordering constraint). Adversarial-review catch: `discard_stale` had a `RETURNING tool_calls, tool_results` clause; fixed via two-step UPDATE-then-SELECT-from-view. `Message` API type retains the fields — view synthesizes them from parts so the wire shape is unchanged
- `v1.13`**umbrella tag on the same commit as v1.13.20.** Marks the AI SDK v6 + parts-table migration complete
The remaining strangler-fig final step (drop `messages.tool_calls` + `tool_results` columns) is still pending under its old `v1.13.2` working name; will get a new tag slug when scoped.
## In flight / next (v1.13.x cleanup line)
Five more single-dispatch batches before the strangler-fig closes. Each ships independently with its own smoke and rollback surface. **Do not fold.** Order is locked:
### v1.13.8 — system-prompt prefix stability verify-and-measure (REFRAMED, 2026-05-22)
**Original plan:** add a `system_prompt_cache` DB table keyed by `(agent_id, project_id, skills_version)`, mtime-invalidated.
**Why reframed:** recon disproved the premise. `apps/server/src/services/system-prompt.ts:buildSystemPrompt` already runs over mtime-cached inputs at the file layer:
- BOOCHAT.md / BOOCODER.md cached in `system-prompt.ts:25` (`cachedGuidance`, keyed by mtime)
- global + per-project AGENTS.md cached in `agents.ts:245` (`safeStat` pattern, 60s TTL)
- `session.system_prompt` / `project.default_system_prompt` are DB scalars (byte-stable until edited)
- BASE_SYSTEM_PROMPT is a hardcoded template with `${projectPath}` interpolation
Output assembly is a microsecond pure-string concat with no I/O. Skills aren't in the prefix (runtime discovery via `skill_find`). Tools live in a separate request body field, alpha-sorted by v1.13.3. **In theory the prefix is already byte-stable across turns; nothing has measured it.**
**New scope — instrumentation only, no cache:**
1. SHA-256 fingerprint of `buildSystemPrompt`'s output logged per turn at `level=info`, msg `prefix-fingerprint`, with project_id / agent_id / session_id / prefix_hash / prefix_length / mtime fields.
2. Module-level `Map<sessionId, lastHash>` observer. On hash change for a known session → emit `prefix-drift` at `level=warn` with `prev_hash`, `new_hash`, and a field-level `changed_inputs` diff.
3. Unit-level byte-stability assertion in `system-prompt.test.ts`: two consecutive `buildSystemPrompt` calls with the same inputs return byte-identical strings.
**Decision criterion:** smoke 5 turns in a fresh session. 5 identical hashes + zero drift logs → close v1.13.8 as no-op, **drop the DB cache plan permanently**, move to v1.13.9. If drift surfaces → characterize the failure mode in a follow-up batch (the answer may not be a cache at all).
**Doctrine:** matches the v1.13.6 audit pattern. Don't add infrastructure without a proven cache miss. The v1.12.0 mtime caches at the input layer plus alpha tool ordering at the request body layer already address the load-bearing cache-stability surfaces.
**Dispatch brief:** `handoff_v1.13.8_prefix_verify.md`.
**Estimated:** ~95 LoC (system-prompt.ts + small `getAgentsMtimes` accessor in agents.ts + 3 new tests).
### v1.13.9 — compaction overflow trigger formula
opencode pattern: `0.85 * ctx_max` early trigger (not at 100% saturation). Reduces tail-loss risk and gives compaction a safer window. Tiny change but tied to v1.13.4's tier logic — sequence matters.
**Lift source:** `anomalyco/opencode` `session/overflow.ts`.
**Estimated:** ~30 LoC.
### v1.13.10 — per-tool token cost accounting
Rolling average per tool, surfaced in AgentPicker tooltip + agent-pick decisions. Backend tracks `(tool_name, prompt_tokens_in, completion_tokens_out)` per call; surfaces a 100-call rolling mean. Frontend reads it for tool-cost hints. **Depends on v1.13.7's `includeUsage` fix** — without real token numbers in DB rows, the rolling average is empty.
**Estimated:** ~250 LoC.
### v1.13.11 — WebSocket frame typing
Zod schemas validated both ends. Catches the recurring class of bug that drove the 2026-05-21 debugging spike (silent protocol drift). Upfront work that pays back every time the protocol changes. `chat_status`, `usage`, `parts_appended`, `session_workspace_updated`, `tool_running` — every frame gets a Zod schema, every send/receive site validates.
**Estimated:** ~300 LoC.
### v1.13.12 — skills audit pass (NEW, 2026-05-22)
**Goal:** apply the rules→recipes split (per Codeminer42 activation-gap data: plain skills invoke 6% in clean multi-turn, `CLAUDE.md`/`AGENTS.md` is 100% present) to BooCode's 7 vendored v1.12 skills. Sort each into: (a) move to `AGENTS.md` as always-true rule, (b) keep as recipe invoked via `/skill <name>`, (c) move bulky context into `references/` flat subdirectory inside the skill, (d) delete (Claude already does it reliably).
**Scope:**
1. **Audit each of the 7 vendored skills against the 4-way split.** Most workflow-rule content ("always do X before Y", "never do Z") moves to `AGENTS.md` since it should be 100% present. Recipe content ("here's how to scaffold a component", "here's the release checklist") stays as skill, gets `context: fork` if heavy.
1. **Adopt Anthropic best-practices conventions** for any skills that remain after audit: gerund names (`scaffolding-components`, not `component-helper`), SKILL.md ≤500 lines, references one level deep, third-person imperative voice, MCP tool references in `ServerName:tool_name` format, no Windows-style paths, no time-sensitive info, consistent terminology, no "voodoo constants."
1. **Run each remaining skill through the 4-step validation protocol** from `mgechev/skills-best-practices` (Discovery → Logic → Edge Case → Architecture Refinement) using a fresh Claude chat per step. Prompts are paste-ready; ~10 minutes per skill.
1. **Install `skillgrade` on Sam's host** (`npm i -g skillgrade`). For each remaining skill, write a minimal `eval.yaml` with 23 tasks and run `skillgrade --smoke` (5 trials, ~5 min) to confirm the skill triggers when expected and produces correct output. **Likely outcome: some skills show 020% trigger rate — confirms they belong in AGENTS.md, not as skills.**
1. **Document the rules→recipes split as a BooCode convention** in `BOOCODER.md` / `BOOCHAT.md`. Future-proofs against re-adding workflow rules as skills.
**Lift sources:**
- `blog.codeminer42.com/stop-putting-best-practices-in-skills/` — empirical 6%/33%/66%/100% invocation-rate data with Vercel-style multi-turn methodology. The activation-gap framing.
- `mgechev/skills-best-practices` (25 stars, MIT) — 4-step validation protocol with paste-ready prompts. Directory structure conventions.
- `mgechev/skillgrade` (132 stars, MIT) — agent-agnostic skill eval framework. `eval.yaml` task+grader schema. Smoke/reliable/regression presets.
- `platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices` — canonical Anthropic standard. 500-line ceiling, gerund naming, progressive disclosure patterns, MCP tool reference format, verification checklist.
**Dependencies:** none (the 7 v1.12 skills already exist; this is an audit pass on shipped material). Can ship at any point in the v1.13.x line.
**Estimated:** zero code changes, ~one evening of audit work, plus skillgrade install. Per-skill eval.yaml authoring is ~30 min per skill including the 4-step validation. Total roughly 56 hours of focused work for all 7 skills.
### v1.13.2 — drop legacy columns (final phase of strangler-fig)
**Wait at least one week of production traffic on v1.13.1 before shipping.** The dual-write is rollback insurance. Drop the columns and that rollback is gone.
**Verification query before shipping:**
```sql
SELECT
COUNT(*) FILTER (WHERE m.tool_calls IS NOT NULL AND NOT EXISTS (
SELECT 1 FROM message_parts p WHERE p.message_id = m.id AND p.kind = 'tool_call'
)) AS missing_tool_call_parts,
COUNT(*) FILTER (WHERE m.tool_results IS NOT NULL AND NOT EXISTS (
SELECT 1 FROM message_parts p WHERE p.message_id = m.id AND p.kind = 'tool_result'
)) AS missing_tool_result_parts
FROM messages m
WHERE m.created_at > '2026-05-22'::timestamptz;
```
Both columns must read 0.
**Scope (~150 LoC, mostly deletions):**
1. Remove dual-write from every v1.13.0 site: `tool-phase.ts` (3 sites), `finalizeCompletion`, `skills.ts` (2 sites), `messages.ts` answer flow, `chats.ts` (fork). Keep only the parts write.
1. Simplify `messages_with_parts` view — drop COALESCE fallbacks since legacy columns are about to disappear.
1. `ALTER TABLE messages DROP COLUMN tool_calls, DROP COLUMN tool_results`.
1. Remove `tool_calls`/`tool_results` fields from `Message` API type. API boundary unchanged (frontend already reads parts-derived values).
1. Drop the stale `messages_status_check` cleanup DO block from v1.12.1 schema if still present.
1. Update test fixtures in `inference.test.ts` and `compaction.test.ts` to construct parts instead of inline `tool_calls: null, tool_results: null` literals. ~30 fixture rewrites.
After v1.13.2 ships, tag the umbrella `v1.13` on the same commit (or on -C — Sam's call).
**Shipped as `v1.13.20-drop-legacy-cols` on 2026-05-23 with umbrella `v1.13` tagged on the same commit.** Slug renamed at ship time (the "v1.13.2" planning name predated the patch-monotonic-per-minor convention). Calendar wait dropped — single-user self-hosted, no production rollback constraint. Recon caught 2 additional dual-write sites beyond the roadmap's 8 (chats.ts fork-clone + extras in tool-phase.ts) and an additional fixture file (`tool_cost_stats.test.ts`) with a direct legacy-column INSERT. Adversarial review caught a `RETURNING tool_calls, tool_results` clause in the `discard_stale` endpoint that the green test suite missed — fixed by two-step UPDATE-then-SELECT-from-view so the parts-synthesized fields keep flowing on the response. Type-pruning step on `Message.tool_calls` / `Message.tool_results` skipped (the view still populates them from parts; preserving the API contract was simpler than ripping it).
The v1.13.x line is closed. Three batches still sit in the **In flight** column conceptually but none of them are v1.13.x scope: **live-smoke of v1.13.19** (manual browser exercise of the artifact panes — five minutes, independent), and the two v1.14 branches below. Independent siblings (`v1.14.x-mcp`, `v1.14.x-html`, `v1.16`) can ship in any order relative to v1.14 itself.
-----
@@ -510,8 +408,12 @@ term.indifferentketchup.com → booterm :9501 (or routed under code.
- **v1.13.12-ws-schemas:** none (Zod schemas + wrappers in TS, no DB)
- **v1.13.13-ws-publish:** none (publish-site conversion + protocol-drift fix in `compaction.ts`, no DB)
- **v1.13.14-skills-audit:** none (skills + AGENTS.md migration into git via `.gitignore` negation patterns; no DB)
- **v1.13.15-codecontext-synth (this batch, tag pending):** `message_parts.kind` CHECK constraint extended with `'synthesis'` value (DROP + DO $$ pg_constraint idempotency-guarded re-add)
- **(column drop, pending — old working name v1.13.2):** drop `messages.tool_calls`, `messages.tool_results`; simplify `messages_with_parts` view
- **v1.13.15-codecontext-synth:** `message_parts.kind` CHECK constraint extended with `'synthesis'` value (DROP + DO $$ pg_constraint idempotency-guarded re-add)
- **v1.13.16-xml-parser:** none (parser change + new `tool-suggestions.ts` helper in TS, no DB)
- **v1.13.17-cross-repo-reads:** `sessions.allowed_read_paths text[] NOT NULL DEFAULT ARRAY[]::text[]` (per-session cross-repo read grants)
- **v1.13.18-codecontext-file-path:** none (path resolver in `codecontext_client.ts`, no DB)
- **v1.13.19-html-artifact-panes:** `message_parts.kind` CHECK constraint extended with `'html_artifact'` value (same v1.13.15 pattern)
- **v1.13.20-drop-legacy-cols:** `ALTER TABLE messages DROP COLUMN tool_calls, DROP COLUMN tool_results` (the strangler-fig's final phase). `messages_with_parts` view rewritten to parts-only subselects via `CREATE OR REPLACE VIEW` BEFORE the drops (Postgres ordering constraint). v1.12.1 `messages_status_check`/`messages_role_check` cleanup block removed (one-shot effective long ago)
- **v1.14:** `agents.steps` column (or AGENTS.md parser extension; no DB if file-only)
- **v1.14.x-mcp (NEW):** none — single-server MCP-client PoC is config-only at first, no schema change
- **v1.14.x-html (NEW):** `message_parts.kind` CHECK constraint extended with `'html_artifact'` value
@@ -621,7 +523,17 @@ Earlier May 18 chat recommended Option A (thin orchestration shell over OpenCode
### v1.13.x cleanup line locked (2026-05-22)
After the 2026-05-22 retag, the v1.13.x cleanup line in `vMAJOR.MINOR.PATCH-slug` form is **v1.13.0-ai-sdk-v6 ✅ → v1.13.1-cleanup-bundle ✅ → v1.13.2-compaction-prune ✅ → v1.13.3-truncate ✅ → v1.13.4-reasoning-fix ✅ → v1.13.5-stability-bundle ✅ → v1.13.6-prefix-stability ✅ → v1.13.7-compaction-trigger ✅ → v1.13.8-tool-cost ✅ → v1.13.9-agentlint ✅ → v1.13.10-openspec ✅ → v1.13.11-tools ✅ → v1.13.12-ws-schemas ✅ → v1.13.13-ws-publish ✅ → v1.13.14-skills-audit ✅ → v1.13.15-codecontext-synth ✅ → v1.13.16-xml-parser ✅ → column drop (final, pending — old working name v1.13.2)**. **Do not fold.** Smoke isolation matters: each batch has a distinct rollback surface, and bisecting a 750-LoC merge across four unrelated changes is worse than four separate dispatches.
The v1.13.x cleanup line shipped 21 batches over a single intense window in `vMAJOR.MINOR.PATCH-slug` form: **v1.13.0-ai-sdk-v6 ✅ → v1.13.1-cleanup-bundle ✅ → v1.13.2-compaction-prune ✅ → v1.13.3-truncate ✅ → v1.13.4-reasoning-fix ✅ → v1.13.5-stability-bundle ✅ → v1.13.6-prefix-stability ✅ → v1.13.7-compaction-trigger ✅ → v1.13.8-tool-cost ✅ → v1.13.9-agentlint ✅ → v1.13.10-openspec ✅ → v1.13.11-tools ✅ → v1.13.12-ws-schemas ✅ → v1.13.13-ws-publish ✅ → v1.13.14-skills-audit ✅ → v1.13.15-codecontext-synth ✅ → v1.13.16-xml-parser ✅ → v1.13.17-cross-repo-reads ✅ → v1.13.18-codecontext-file-path ✅ → v1.13.19-html-artifact-panes ✅ → v1.13.20-drop-legacy-cols ✅** → umbrella `v1.13`. **Do not fold** was the discipline — each batch has a distinct rollback surface, and bisecting a 750-LoC merge across four unrelated changes is worse than four separate dispatches. Held throughout; CHANGELOG.md is the per-tag canonical record.
### Numbering and scope-revision discipline during v1.13.x (2026-05-23)
The v1.13.x line ran 21 batches; planned-vs-shipped numbering diverged for half of them, and three batches had material scope revisions mid-design. Pattern that emerged and is worth carrying forward:
- **Patch numbers are assigned at ship time, not in planning.** The proposal/openspec folder uses a planning slug (e.g. `v1.14.x-html-artifact-panes`); the final tag uses a concrete patch monotonic-per-minor (e.g. `v1.13.19-html-artifact-panes`). Avoids the "we said v1.13.8 but actually shipped seventh" confusion that ate two retrospective passes on the roadmap.
- **Scope-revise the proposal before dispatching.** v1.13.19-html-artifact-panes flipped mid-design from "auto-bias to HTML for >100 lines" to "Markdown default, HTML on request" — the proposal got rewritten before recon. Far cheaper than discovering the wrong approach in implementation. The "brainstorm before code" discipline.
- **Recon-first dispatch finds 2530% more sites than the roadmap inventory.** v1.13.20 recon caught 2 extra dual-write sites (chats.ts fork-clone + 2 in tool-phase.ts) and an extra fixture file. v1.13.19 recon corrected which `Pane` type to extend. Skipping recon to save a step doesn't.
- **Adversarial reviews catch what test suites miss.** v1.13.19 reviewer caught silent error-promotion in `openInPane`; v1.13.20 reviewer caught a `RETURNING tool_calls, tool_results` clause that crashes in production but slips past green tests. Both are routine code-reviewer dispatches; both saved a same-day hotfix. **Two-stage review (spec then quality) is non-negotiable when shipping fast.**
- **Calendar-gated waits are production-safety hedges that don't apply here.** v1.13.20 originally said "wait one week of production traffic on v1.13.1 before dropping columns." Sam called it out: single-user self-hosted, no rollback constraint, code-level audit + DB COUNT query is the actual safety check. Dropped the wait. Don't ritualize production-grade hedges in a single-user codebase.
### v1.13 retrospective (what shipped)
@@ -634,7 +546,21 @@ After the 2026-05-22 retag, the v1.13.x cleanup line in `vMAJOR.MINOR.PATCH-slug
- **v1.13.5** — opencode truncate.ts port + view_truncated_output tool. Tagged on `f8fc5db`.
- **v1.13.6** — compaction head-assembly audit + reasoning fix. Closed the Q3 reasoning gap from v1.13.1-C. Tagged on `81d837c`.
- **v1.13.7** — stability bundle: includeUsage fix + trim guards + payload filter + budget bump. Surfaces tokens (closes a v1.13.1-A latent regression where `result.usage` resolved empty), kills the empty-bubble + ActionRow noise between tool calls on single-tool-call turns, and unblocks Continue after cap-hit on chats that have trailing empty/failed assistants.
- **v1.13.2 deferred** — at least one week of production traffic on v1.13.1 before dropping legacy columns. Dual-write is rollback insurance.
- **v1.13.6 (numbering re-aligned)** — system-prompt prefix verify-and-measure batch (originally numbered v1.13.8 in the planning doc). Reframed mid-design from "add a `system_prompt_cache` table" to "instrument-and-prove" after recon showed input-layer mtime caches already achieve byte-stable prefixes. Smoke confirmed zero drift across 5 turns; dropped the planned DB table.
- **v1.13.7-compaction-trigger** — 0.85×ctx_max early trigger (planned as v1.13.8 / v1.13.9).
- **v1.13.8-tool-cost** — `tool_cost_stats` SQL view + AgentPicker tooltip surfacing (planned as v1.13.9 / v1.13.10).
- **v1.13.9-agentlint** — instruction-file AgentLint pass (planned as part of v1.13.11 skills audit; split into its own batch when it grew larger than fitting).
- **v1.13.10-openspec** — `openspec/changes/<slug>/{proposal,tasks,design}.md` batch-doc structure adoption.
- **v1.13.11-tools** — tiered tool loading via `BOOCODE_TOOLS=core|standard|all` env (~30 LoC; was a far-future optional item, slotted in).
- **v1.13.12-ws-schemas** + **v1.13.13-ws-publish** — Zod schemas for all 27 wire-format frames, `publishFrame`/`publishUserFrame` wrappers, ~80 publish sites converted (planned as v1.13.10 / v1.13.11).
- **v1.13.14-skills-audit** — 26 skills vendored + audited via 5 parallel agent teams; 14 kept, 11 dropped, 1 migrated to BOOCHAT.md/BOOCODER.md. Codeminer42 rules-vs-recipes framing applied.
- **v1.13.15-codecontext-synth** — forced second-inference synthesis pass for codecontext overview tools (truncation-aware extraction; auto-fetched top-N files + project docs under 32k payload budget).
- **v1.13.16-xml-parser** — Anthropic `<invoke>` parser support + Levenshtein unknown-tool recovery hints (qwen3.6 drift to Claude Code-style tool names).
- **v1.13.17-cross-repo-reads** — `request_read_access` tool + per-session `allowed_read_paths` grants; `pathGuard` extraRoots; reuses the `ask_user_input` pause/resume mechanism.
- **v1.13.18-codecontext-file-path** — `resolveProjectPath` in `codecontext_client.ts` realpath-resolves `file_path` the same way `target_dir` was already resolved.
- **v1.13.19-html-artifact-panes** — pane-based artifact viewer (Markdown default + HTML on request). Scope-revised mid-design from auto-bias-HTML to Markdown-default. `<!DOCTYPE html>` detection adds `message_parts.kind='html_artifact'` row; iframe sandbox `allow-scripts allow-clipboard-write allow-downloads` (no `allow-same-origin`); CSP `connect-src 'none'` + `X-Content-Type-Options: nosniff` + `Content-Security-Policy: sandbox` defense-in-depth. Pane state is reference-only — content fetched on mount to keep jsonb small.
- **v1.13.20-drop-legacy-cols** — final strangler-fig step. 10 dual-write sites stripped (recon caught 2 beyond the original v1.13.2 inventory). `messages_with_parts` simplified to parts-only via `CREATE OR REPLACE` before column DROPs (Postgres ordering constraint). Adversarial-review catch: `discard_stale` had `RETURNING tool_calls, tool_results` — fixed via two-step UPDATE-then-SELECT-from-view. `Message` type retains the fields, populated by the view. v1.12.1 cleanup DO block removed.
- **`v1.13` umbrella** — tagged on the same commit as v1.13.20 (`211e903`). AI SDK v6 + parts-table migration complete.
### Pre-v1.13 architectural decisions (still load-bearing)

View File

@@ -59,6 +59,7 @@ Rules:
## Refactorer
---
temperature: 0.3
steps: 5
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Proposes refactors for clarity, deduplication, or decoupling. Read-only — outputs plans, not edits.
---
@@ -97,6 +98,7 @@ Codecontext usage:
## Architect
---
temperature: 0.5
steps: 20
tools: [find_files, get_codebase_overview, get_dependencies, get_file_analysis, get_framework_analysis, get_semantic_neighborhoods, get_symbol_info, grep, list_dir, search_symbols, view_file, watch_changes]
description: Designs new features, modules, or architectural changes. Outputs a build plan.
---

9
data/mcp.json Normal file
View File

@@ -0,0 +1,9 @@
{
"mcpServers": {
"context7": {
"type": "streamableHttp",
"url": "https://mcp.context7.com/mcp",
"enabled": false
}
}
}

View File

@@ -9,7 +9,7 @@ services:
environment:
CODECONTEXT_URL: http://codecontext:8080
CONTAINER_GUIDANCE_FILE: /app/BOOCHAT.md
DATABASE_URL: postgres://boocode:${POSTGRES_PASSWORD}@boocode_db:5432/boocode
DATABASE_URL: postgres://boocode:${POSTGRES_PASSWORD}@boocode_db:5432/boochat
volumes:
- /opt:/opt
- /opt/projects:/opt/projects:rw
@@ -41,7 +41,7 @@ services:
environment:
NODE_ENV: production
PORT: 3000
DATABASE_URL: postgres://boocode:${POSTGRES_PASSWORD}@boocode_db:5432/boocode
DATABASE_URL: postgres://boocode:${POSTGRES_PASSWORD}@boocode_db:5432/boochat
volumes:
- /opt:/opt:rw
- /home/samkintop:/home/samkintop:rw
@@ -50,6 +50,28 @@ services:
networks:
- boocode_net
boocoder:
build:
context: .
dockerfile: apps/coder/Dockerfile
container_name: boocoder
restart: unless-stopped
ports:
- "100.114.205.53:9502:3000"
env_file: .env
environment:
CONTAINER_GUIDANCE_FILE: /app/BOOCODER.md
DATABASE_URL: postgres://boocode:${POSTGRES_PASSWORD}@boocode_db:5432/boochat
volumes:
- /opt:/opt:rw
- /opt/projects:/opt/projects:rw
- ./data:/data
- /opt/boocode/BOOCODER.md:/app/BOOCODER.md:ro
depends_on:
- boocode_db
networks:
- boocode_net
boocode_db:
image: postgres:16-alpine
container_name: boocode_db
@@ -57,7 +79,7 @@ services:
environment:
POSTGRES_USER: boocode
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: boocode
POSTGRES_DB: boochat
ports:
- "127.0.0.1:5500:5432"
volumes:

View File

@@ -0,0 +1,72 @@
# v1.14.0-outer-loop — design decisions
Answers to the dispatch's blocking questions, resolved 2026-05-23.
## D1. Step cap — what replaces MAX_TOOL_LOOP_DEPTH?
`MAX_TOOL_LOOP_DEPTH` never existed — no hard recursion depth guard was ever in the codebase. Safety came from budget (50 tool calls) + doom-loop (3 identical calls).
**Decision:** introduce `MAX_STEPS = 200` as a hard ceiling. Per-agent cap via `agent.steps` is the primary knob. Resolution: `effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)`.
**Rationale:** Sam reports BooChat gets stuck at 50 tool calls (the budget) too often. The step cap should be generous — 200 is 4x the current de-facto ceiling. Budget (50 tool calls total across all steps) remains a separate concern and is not changed in this batch.
Note: "step" ≠ "tool call." One step = one stream iteration that may produce multiple parallel tool calls. Budget counts individual tool calls; step cap counts iterations. At 200 steps with average 1-2 tool calls per step, the budget (50) will fire well before the step cap in most scenarios. The step cap is a safety ceiling for cases where the model makes many 1-tool-call iterations.
## D2. step_finish — emit or not?
**Decision:** No `step_finish` part. The next `step_start` (or assistant message completion) implicitly ends the previous step.
**Rationale:** opencode only emits `step_start`. Less noise in parts, simpler code. If UI ever needs step durations, compute from the timestamps of consecutive `step_start` parts.
## D3. Step-cap hit — sentinel or quiet?
**Decision:** Write a sentinel summary on step-cap hit. Visible to the user in chat, same as budget-exhaustion's `runCapHitSummary`.
**Implementation:** Extend `runCapHitSummary` to accept a `reason: 'budget' | 'step_cap'` parameter (or add a parallel `runStepCapSummary`). The sentinel metadata kind stays `cap_hit` — frontend `CapHitSentinel` component already renders it. The sentinel's text distinguishes the two cases ("Tool budget exhausted" vs "Step limit reached").
## D4. agent.steps = 0
**Decision:** `steps: 0` means "no tool calls allowed." The loop body never executes. The assistant can only respond with text.
**Implementation:** When `effectiveCap === 0`, skip the loop entirely. Stream the first assistant turn (text-only), finalize, return. The model receives no tools in the request payload when `steps: 0` (or equivalently, tools are passed but the loop never enters the tool-execution branch).
Actually, cleaner: `steps: 0` means the loop cap is 0. The while condition `stepNumber < effectiveCap` is false on the first check. The stream phase still runs (the model produces a text response), but if it emits tool calls they're ignored and the turn finalizes as text-only. This may produce a confusing response if the model's text references tool results it never got — but `steps: 0` is an explicit constraint the agent author chose. Document in AGENTS.md parser validation.
## D5. Synthesis success terminates the loop?
**Decision:** Yes. `break` out of the loop after synthesis success. Preserves current behavior (synthesis replaces the recursive call; no further iterations).
**Rationale:** The synthesis pass produces a self-contained summary turn. Continuing the loop after synthesis would let the model issue more tool calls on top of a synthesis summary, which is semantically wrong — the synthesis IS the final answer for that tool call batch.
## D6. executeToolPhase return struct
The recursive call at `tool-phase.ts:342` is currently the last thing `executeToolPhase` does (after creating the next assistant row). After the conversion, `executeToolPhase` returns a struct the loop body reads:
```typescript
interface ToolPhaseResult {
action: 'continue' | 'paused' | 'synthesis_done';
toolCallCount: number;
toolCalls: ToolCall[];
nextAssistantId: string | null;
}
```
- `continue` → loop continues; `nextAssistantId` is the new assistant message's UUID.
- `paused` → user-input or grant pause; loop breaks. `nextAssistantId` is null.
- `synthesis_done` → synthesis succeeded; loop breaks. `nextAssistantId` is null (synthesis wrote its own parts).
The loop body then:
1. Updates `toolsUsed += result.toolCallCount`
2. Appends `result.toolCalls` to `recentToolCalls`
3. Sets `assistantMessageId = result.nextAssistantId` for the next iteration
4. Increments `stepNumber`
5. Checks `result.action` — if not `continue`, breaks.
## D7. Budget vs steps interaction
Budget counts **individual tool calls** across the entire turn. Steps counts **loop iterations**. They are orthogonal:
- Budget fires when `toolsUsed >= resolveToolBudget(agent)` (currently 50 for read-only). Checked at the top of each iteration.
- Step cap fires when `stepNumber >= effectiveCap`. Checked by the loop condition.
Both produce a sentinel summary. A turn can be terminated by whichever fires first. In practice, budget (50 tool calls) fires before step cap (200 steps) unless the model produces many 0-tool-call iterations (which shouldn't happen — 0 tool calls means non-tool finish, which exits the loop via the `break` path).

View File

@@ -0,0 +1,112 @@
# v1.14.0-outer-loop — explicit outer agent loop
Replace the ad-hoc `executeToolPhase → runAssistantTurn` recursion with an explicit `while` loop. A **step** is one stream-and-tool-execute iteration; a step can contain multiple parallel tool calls. The loop terminates on non-tool finish OR step-cap hit OR doom-loop OR budget exhaustion OR abort OR synthesis success.
## Why
The current recursion works but has two problems: (a) stack depth grows linearly with tool iterations — 50 nested async frames is fragile, (b) there's no explicit step counter, so there's no per-agent step cap and no step-boundary instrumentation. BooChat also gets stuck at 50 tool calls (the budget ceiling) more often than it should — the new `MAX_STEPS = 200` hard ceiling lets the loop run much longer before the step cap fires, while the existing budget (50 tool calls) remains a separate concern.
## Recon findings (verified 2026-05-23)
- `runAssistantTurn` at `turn.ts:144-147` is the recursive entry. Returns `Promise<void>`.
- `executeToolPhase` at `tool-phase.ts:89-96` calls back into `runAssistantTurn` at `tool-phase.ts:342`.
- Recursion terminates on: non-tool finish, budget exhaustion (`args.toolsUsed >= budget`), doom-loop (3 identical calls via `detectDoomLoop`), user-input pause (ask_user_input / request_read_access), synthesis success, stream error, abort.
- **No existing hard recursion depth limit** — `MAX_TOOL_LOOP_DEPTH` does not exist. Safety comes from budget (50) + doom-loop (3 identical).
- `TurnArgs` defined in `turn.ts:127-141`, not `types.ts`. Fields: `sessionId`, `chatId`, `assistantMessageId`, `toolsUsed`, `recentToolCalls`, `signal`. All mutable fields are threaded through the recursive call.
- Synthesis pipeline (`synthesisPipeline.ts`) is a branch in `executeToolPhase` — if synthesis succeeds, recursion is skipped.
- `step_start` already in the `message_parts.kind` CHECK constraint. No schema change needed.
- `agents.ts` does NOT currently parse a `steps` field. Needs adding to `ParsedFrontmatter`.
## Scope
### S1. Outer loop in `turn.ts`
Convert the recursive chain to a `while (stepNumber < effectiveCap)` loop:
```
let stepNumber = 0
while (stepNumber < effectiveCap) {
// doom-loop check
// budget check
// emit step_start part
// stream phase (executeStreamPhase)
// if no tool calls → finalize, break
// tool phase (executeToolPhase — now returns, doesn't recurse)
// if paused (user input / grant) → break
// if synthesis succeeded → break
// create next assistant message row
// increment stepNumber, update toolsUsed, append recentToolCalls
}
// if stepNumber >= effectiveCap → sentinel summary
```
`effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS)` where `MAX_STEPS = 200`.
### S2. `executeToolPhase` becomes non-recursive
Remove the `runAssistantTurn` call at `tool-phase.ts:342`. Instead, return a result indicating what happened: `{action: 'continue' | 'paused' | 'synthesis_done', toolsUsed, recentToolCalls, nextAssistantId}`. The caller (the while loop) uses the action to decide whether to continue or break.
### S3. `agent.steps` field
`agents.ts:ParsedFrontmatter` gains `steps?: number`. Parser extracts it from YAML frontmatter (integer ≥ 0). `steps: 0` means "no tool calls allowed" — loop body never executes; assistant responds text-only.
### S4. Step-boundary events
At the top of each loop iteration, emit a `step_start` part with payload `{step_number, started_at}`. Uses `insertParts` into the current assistant message. No `step_finish` — the next `step_start` (or message completion) implicitly ends the previous step.
### S5. Doom-loop migration
`detectDoomLoop` check moves from `runAssistantTurn` (top of function, pre-stream) to the top of the while-loop body (same logical position). Same predicate, same threshold (3). Same `runDoomLoopSummary` call. Control flow changes from `return` (unwinding recursion) to `break` (exiting loop).
### S6. Step-cap sentinel
When `stepNumber >= effectiveCap`, write a sentinel summary like the existing `runCapHitSummary`. Reuse `runCapHitSummary` with a reason parameter distinguishing "budget exhaustion" from "step cap hit", or create a parallel `runStepCapSummary`. The sentinel makes the cap visible in chat.
### S7. AGENTS.md updates
Add `steps:` to each agent in `data/AGENTS.md`:
- Refactorer: `steps: 5`
- Architect: `steps: 20`
- All others: unset (infinity — bounded only by `MAX_STEPS = 200`)
### S8. Tests
New test file `apps/server/src/services/__tests__/outer-loop.test.ts` covering:
- Clean finish (stream returns non-tool, loop exits after 1 iteration)
- Step-cap hit (loop exits at cap, sentinel written)
- Doom-loop break (3 identical calls, sentinel written)
- Budget exhaustion (toolsUsed >= budget, cap-hit sentinel written)
- Abort mid-step (signal fires, loop exits)
- `steps: 0` edge case (no loop iterations, text-only response)
- Synthesis success (loop exits after synthesis)
## Non-goals
- No frontend changes. `step_start` parts surface via `messages_with_parts` automatically; UI doesn't render them in v1.14.
- No `output_schema` / `exit_expression` / `execution_strategy` AGENTS.md fields.
- No per-step snapshot for revert (v2.0 BooCoder concern).
- No changes to budget constants (50 / 10 / 50). That's a separate concern.
- No `repairToolCall` changes.
- No compaction changes.
## Hard rules
- No git commit, push. Sam commits.
- Backup before editing.
- TS strict, no `any`.
- Doom-loop threshold stays at 3.
- 332+ existing tests still pass + new outer-loop tests.
## Files expected to touch
- `apps/server/src/services/inference/turn.ts` — recursion → loop
- `apps/server/src/services/inference/tool-phase.ts` — remove recursive call, return result struct
- `apps/server/src/services/inference/sentinel-summaries.ts` — step-cap sentinel (or extend cap-hit)
- `apps/server/src/services/agents.ts` — parse `steps` field
- `data/AGENTS.md` — add `steps:` to Refactorer + Architect
- `apps/server/src/services/__tests__/outer-loop.test.ts` — NEW
- `apps/server/src/services/inference/index.ts` — re-export if new types needed
## Estimate
~300 LoC net (turn.ts refactor + tool-phase return struct + agents parser + tests). The conversion is structural, not behavioral — every exit path is preserved, just expressed as loop control flow instead of recursion unwinding.

View File

@@ -0,0 +1,82 @@
# v1.14.0-outer-loop tasks
## B1 — Backups
- [ ] `turn.ts`, `tool-phase.ts`, `sentinel-summaries.ts`, `agents.ts`, `data/AGENTS.md`
## B2 — agents.ts: parse `steps` field
- [ ] Add `steps?: number` to `ParsedFrontmatter` interface
- [ ] Parse from YAML frontmatter: integer ≥ 0, warn on out-of-range (negative or non-integer), clamp to 0
- [ ] Expose on the `Agent` type returned by `getAgentsForProject`
- [ ] `npx tsc --noEmit -p apps/server` clean
## B3 — AGENTS.md: add `steps:` to Refactorer + Architect
- [ ] `data/AGENTS.md` — Refactorer: `steps: 5`
- [ ] `data/AGENTS.md` — Architect: `steps: 20`
- [ ] All others: leave unset (infinite, bounded by MAX_STEPS=200)
## B4 — tool-phase.ts: remove recursive call, return result struct
- [ ] Define `ToolPhaseResult` interface: `{action: 'continue' | 'paused' | 'synthesis_done', toolCallCount: number, toolCalls: ToolCall[], nextAssistantId: string | null}`
- [ ] Remove `runAssistantTurn` import and call at line ~342
- [ ] `executeToolPhase` returns `ToolPhaseResult` instead of `Promise<void>`
- [ ] On normal path (after creating next assistant row): return `{action: 'continue', toolCallCount, toolCalls: result.toolCalls, nextAssistantId}`
- [ ] On user-input pause: return `{action: 'paused', toolCallCount: <calls executed so far>, toolCalls: result.toolCalls, nextAssistantId: null}`
- [ ] On synthesis success: return `{action: 'synthesis_done', toolCallCount, toolCalls: result.toolCalls, nextAssistantId: null}`
- [ ] `npx tsc --noEmit -p apps/server` will FAIL here (turn.ts still expects void) — expected, fixed in B5
## B5 — turn.ts: recursion → while loop
- [ ] Add `MAX_STEPS = 200` constant
- [ ] Resolve `effectiveCap = Math.min(agent?.steps ?? Infinity, MAX_STEPS)` at the top of `runAssistantTurn`
- [ ] Convert `runAssistantTurn` body into a `while (stepNumber < effectiveCap)` loop:
- Top of loop: doom-loop check (move from current position; `break` instead of `return`)
- Top of loop: budget check (move from current position; `break` instead of `return`, but still call `runCapHitSummary` before break)
- Emit `step_start` part via `insertParts` with payload `{step_number: stepNumber, started_at: new Date().toISOString()}`
- Call `executeStreamPhase`
- If no tool calls → `finalizeCompletion`, `break`
- Call `executeToolPhase` (now returns `ToolPhaseResult`)
- If `result.action !== 'continue'``break`
- Update `toolsUsed += result.toolCallCount`
- Update `recentToolCalls = [...recentToolCalls, ...result.toolCalls]`
- Update `assistantMessageId = result.nextAssistantId!`
- Increment `stepNumber`
- [ ] After loop: if `stepNumber >= effectiveCap` → call step-cap sentinel (B6)
- [ ] `effectiveCap === 0` edge case: the while condition is immediately false; stream the first turn text-only (the stream phase at the top of the function runs once before the loop — OR handle this by structuring the loop as do-while, OR handle by pre-checking and skipping tools from the request). Pick the cleanest approach.
- [ ] Remove `TurnArgs` from the module export if it's no longer threaded through recursion — OR keep it and populate from loop locals. (Design note: `TurnArgs` is still used by `executeStreamPhase`, `executeToolPhase`, `sentinel-summaries.ts`, `error-handler.ts`. Keep the interface; populate from loop locals each iteration.)
- [ ] `npx tsc --noEmit -p apps/server` clean
- [ ] `pnpm -C apps/server test` — all existing tests pass
## B6 — sentinel-summaries.ts: step-cap sentinel
- [ ] Add `runStepCapSummary` (or extend `runCapHitSummary` with a `reason` param)
- [ ] Write a sentinel with `metadata.kind = 'cap_hit'` (same as budget) so `CapHitSentinel` UI renders it
- [ ] Sentinel text distinguishes "Step limit reached (N steps)" from "Tool budget exhausted (N calls)"
- [ ] Called from the post-loop check in turn.ts (B5)
## B7 — Tests
- [ ] NEW `apps/server/src/services/__tests__/outer-loop.test.ts`
- [ ] Test: clean finish — stream returns no tool calls, loop exits after 1 step
- [ ] Test: step-cap hit — mock agent with `steps: 2`, model always returns tool calls, loop exits at 2, sentinel written
- [ ] Test: doom-loop — 3 identical tool calls, sentinel written, loop breaks
- [ ] Test: budget exhaustion — toolsUsed >= budget, cap-hit sentinel written
- [ ] Test: `steps: 0` — no loop iterations, text-only response
- [ ] Test: synthesis success — loop breaks after synthesis
- [ ] `pnpm -C apps/server test` — all 332+ existing + new tests pass
## B8 — Verification
- [ ] `npx tsc --noEmit -p apps/server` — 0 errors
- [ ] `npx tsc -p apps/web/tsconfig.app.json --noEmit` — 0 errors (no web changes; should pass)
- [ ] `pnpm -C apps/web build` — green
- [ ] `pnpm -C apps/server test` — all green
## B9 — Docs + tag + deploy
- [ ] `CHANGELOG.md` entry for v1.14.0-outer-loop
- [ ] `boocode_roadmap.md` retrospective bullet on the v1.14 section
- [ ] `CLAUDE.md` updates: mention the outer loop, MAX_STEPS, agent.steps in the inference/ section
- [ ] Commit, tag `v1.14.0-outer-loop`, push, rebuild

View File

@@ -0,0 +1,39 @@
# v1.14.1-mcp-poc — design decisions
## D1. Transport: Streamable HTTP (not stdio)
Context7 is a remote service at `https://mcp.context7.com/mcp`. Uses the MCP Streamable HTTP transport. The `@modelcontextprotocol/sdk` TypeScript client supports this via `StreamableHTTPClientTransport`. No stdio needed.
## D2. Tool name prefixing
MCP tools get a `context7_` prefix to avoid collisions with BooCode's native tools. Context7's tools are `resolve-library-id` and `query-docs` — these become `context7_resolve-library-id` and `context7_query-docs`. The prefix is stripped before calling the MCP server's `tools/call`.
## D3. Read-only invariant guard
BooChat is read-only through v1.x. The MCP client rejects any tool whose `annotations?.readOnly === false`. Tools with `readOnly: true` or no annotations are accepted. Context7's tools are all read-only (they query documentation — no write side effects). Fail-open on missing annotations is a deliberate choice: most MCP servers don't set annotations yet, and rejecting all un-annotated tools would make the feature useless. The guard catches explicitly-declared write tools.
## D4. Zod inputSchema for MCP tools
MCP tools come with a JSON Schema `inputSchema`. BooCode's `ToolDef` has both a Zod `inputSchema` (for server-side validation) and a `jsonSchema` (for the LLM's tool schema). For MCP tools:
- `jsonSchema` is built directly from the MCP tool's `inputSchema` (it's already JSON Schema).
- `inputSchema` uses `z.record(z.unknown())` as a pass-through — the MCP server does its own validation. Double-validating with a generated Zod schema from JSON Schema adds complexity with no value for a PoC.
## D5. Tool registration: append + re-sort (not lazy-init)
The simplest approach: keep `ALL_TOOLS` as the native tool array. Add an `appendMcpTools(tools: ToolDef[])` function that pushes MCP tools, re-sorts alphabetically, and rebuilds `TOOLS_BY_NAME` and `READ_ONLY_TOOL_NAMES`. Called once at startup after MCP init. More invasive approaches (lazy-init, factory function) change the import shape for every consumer. Mutation-at-startup is ugly but contained to one call site and matches the existing alpha-sort-at-module-level pattern.
## D6. No per-session toggle
Web tools have `session.web_search_enabled`. MCP tools do NOT get a session toggle in v1.14.1. If configured via env var, MCP tools are always available. Per-session MCP control is a v1.15 concern (when multiple MCP servers and the permission ruleset land together).
## D7. Graceful degradation
MCP server down at startup → log warning, expose zero MCP tools, BooCode functions normally. MCP server down mid-session (tool call fails) → the `execute` wrapper catches the error and returns `{error: true, output: "MCP server unreachable"}` — the model sees the error and can self-correct (use native tools instead).
## D8. Result content extraction
MCP `tools/call` returns `{content: ContentBlock[]}` where each block is `{type: 'text', text: string}` or `{type: 'resource', ...}`. For the PoC:
- Text blocks: join with `\n`.
- Resource blocks: serialize as JSON (the model can read structured data).
- Empty content: return `"(no output)"`.
- `isError: true` in the response: return `{error: true, output: joinedContent}`.

View File

@@ -0,0 +1,96 @@
# v1.14.1-mcp-poc — single-server MCP client proof-of-concept
Validate the MCP-client loop end-to-end against one real MCP server (Context7) before committing to the full opencode `mcp/index.ts` port at v1.15. Small, throwaway-if-needed.
## Why
BooCode's tool registry (`ALL_TOOLS` in `tools.ts`) is static — tools are hardcoded TypeScript modules. MCP is the protocol for dynamic tool discovery. Wiring one real MCP server end-to-end proves: tool-discovery → tool-list → tool-call → result-render → context-budget accounting all hold. If Context7 works, any MCP server will work via the same plumbing.
## Scope
### S1. Install `@modelcontextprotocol/sdk`
New dependency in `apps/server/package.json`. The official TypeScript MCP client SDK (MIT). Provides `Client`, `StreamableHTTPClientTransport`, tool-call/result types.
### S2. New service: `apps/server/src/services/mcp-client.ts`
Singleton MCP client that:
1. Connects to Context7 at `MCP_CONTEXT7_URL` (default `https://mcp.context7.com/mcp`) via Streamable HTTP transport.
2. Optional `MCP_CONTEXT7_API_KEY` env var passed as a header.
3. On `initialize()`: calls `tools/list`, wraps each MCP tool as a `ToolDef`, prefixes names with `context7_` to avoid collisions with BooCode's native tools.
4. **Read-only invariant guard:** rejects any tool whose `annotations?.readOnly` is explicitly `false`. Tools with `readOnly: true` or no `annotations` field are accepted (fail-open on read-only, since most MCP tools don't set annotations yet — Context7's tools don't).
5. `callTool(name, args)` → calls the MCP server's `tools/call` endpoint and returns the result content.
6. `getTools(): ToolDef[]` → returns the discovered tools wrapped as BooCode `ToolDef` objects.
7. Graceful degradation: if the MCP server is unreachable at startup, log a warning and expose zero MCP tools. BooCode functions normally with its native tools.
### S3. Config extension
`apps/server/src/config.ts` gains two optional env vars:
- `MCP_CONTEXT7_URL` (string, default `https://mcp.context7.com/mcp`)
- `MCP_CONTEXT7_API_KEY` (string, optional)
### S4. Tool registration
`apps/server/src/services/tools.ts` — after building `ALL_TOOLS` from native tools, append MCP-discovered tools from `mcpClient.getTools()`. The alpha-sort at the end of `ALL_TOOLS` construction covers both native and MCP tools. `TOOLS_BY_NAME` map includes MCP tools.
MCP tools are registered with `category: 'read_only'` (per the read-only invariant guard in S2).
### S5. Tool dispatch
`apps/server/src/services/inference/tool-phase.ts` `executeToolCall` already dispatches via `TOOLS_BY_NAME[toolName].execute(...)`. MCP tools' `execute` function calls `mcpClient.callTool(name, args)` — the dispatch is transparent to the rest of the inference loop. No changes to `executeToolCall` needed.
### S6. MCP tool result → BooCode format
MCP `tools/call` returns `{ content: [{type: 'text', text: string}, ...] }`. BooCode's `executeToolCall` expects a string or JSON-serializable output. The `execute` wrapper in the ToolDef extracts `content[0].text` (or joins multiple content blocks with `\n`). If the MCP server returns an error, the wrapper returns `{error: true, output: errorMessage}` matching BooCode's existing error-result shape.
### S7. Startup initialization
`apps/server/src/index.ts` — after `applySchema()` and before route registration, call `mcpClient.initialize()`. If `MCP_CONTEXT7_URL` is not set (or empty), skip initialization entirely (MCP is opt-in). Log the number of discovered tools on success.
Tool registration (S4) must happen AFTER MCP initialization, since `getTools()` returns the discovered tools. Current flow: `ALL_TOOLS` is a module-level constant. This needs to change to a lazy-init pattern — either a function that returns the tool list (called once at startup after MCP init), or a mutable array that MCP tools get appended to during startup.
### S8. Agent tool whitelist interaction
MCP tools are prefixed `context7_*`. Existing agents' `tools:` whitelists don't include MCP tool names — so MCP tools are only available to the default agent (no agent selected, which gets ALL_TOOLS). To make MCP tools available to specific agents, their AGENTS.md `tools:` list would need to include `context7_*` names. For the PoC, this is fine — the default agent (most common) gets MCP tools.
## Non-goals
- No stdio transport. Context7 is HTTP-only.
- No OAuth. Context7 uses an API key header.
- No multiple servers. One hardcoded server (Context7).
- No per-agent MCP server allow/deny. All agents that don't have a `tools:` whitelist get MCP tools.
- No per-session MCP toggle. If configured, MCP tools are always available.
- No UI changes. MCP tools surface in the tool list the model sees; results render as normal tool-result parts.
- No schema changes. MCP state is in-memory only.
## Hard rules
- No git commit/push. Sam commits.
- Read-only invariant: reject any MCP tool with `readOnly: false`.
- Graceful degradation: MCP server down → zero MCP tools, BooCode works normally.
- One new dep only: `@modelcontextprotocol/sdk`.
- Alpha-sort of ALL_TOOLS preserved (v1.13.3 prompt-cache invariant).
## Files expected to touch
- `apps/server/package.json` — add `@modelcontextprotocol/sdk`
- `pnpm-lock.yaml` — auto-updated
- `apps/server/src/config.ts``MCP_CONTEXT7_URL`, `MCP_CONTEXT7_API_KEY`
- `apps/server/src/services/mcp-client.ts` — NEW, ~100 lines
- `apps/server/src/services/tools.ts` — lazy-init or append MCP tools to ALL_TOOLS
- `apps/server/src/index.ts` — call `mcpClient.initialize()` at startup
- `apps/server/src/services/__tests__/mcp-client.test.ts` — NEW, unit tests for tool wrapping + read-only guard
## Estimate
~150 LoC. The MCP SDK handles the protocol; BooCode's job is wrapping discovered tools as ToolDefs and routing calls through the SDK client.
## Smoke plan
1. Set `MCP_CONTEXT7_URL=https://mcp.context7.com/mcp` in `.env` (or docker-compose env).
2. Restart boocode container.
3. Check logs: should see "mcp: initialized Context7, discovered N tools" (or similar).
4. Open a chat with no agent selected. Send "What does the `streamText` function do in the AI SDK? Use context7 to look it up."
5. Confirm: model calls `context7_resolve-library-id` then `context7_query-docs` (or whatever Context7's tool names are after prefixing).
6. Confirm: tool results render normally in the chat.
7. Without `MCP_CONTEXT7_URL` set: restart, confirm BooCode starts normally with zero MCP tools.

View File

@@ -0,0 +1,80 @@
# v1.14.1-mcp-poc tasks
## B1 — Backups
- [ ] `apps/server/src/services/tools.ts`
- [ ] `apps/server/src/config.ts`
- [ ] `apps/server/src/index.ts`
## B2 — Install `@modelcontextprotocol/sdk`
- [ ] `pnpm -C apps/server add @modelcontextprotocol/sdk`
- [ ] Verify `pnpm -C apps/server build` still works after install
- [ ] Note the installed version
## B3 — Config extension
- [ ] `apps/server/src/config.ts` — add `MCP_CONTEXT7_URL` (string, optional, default `https://mcp.context7.com/mcp`)
- [ ] `apps/server/src/config.ts` — add `MCP_CONTEXT7_API_KEY` (string, optional)
- [ ] Both via Zod `.optional()` with `.default()` for the URL
## B4 — MCP client service
- [ ] NEW `apps/server/src/services/mcp-client.ts`
- [ ] Import `Client`, `StreamableHTTPClientTransport` from `@modelcontextprotocol/sdk/client`
- [ ] `initialize(config, log)` — connect to Context7, call `tools/list`, wrap each as ToolDef, apply read-only guard
- [ ] `callTool(name, args)` — call MCP server `tools/call`, extract text content, return as string
- [ ] `getTools()` — return wrapped ToolDef[]
- [ ] `isInitialized()` — boolean
- [ ] Read-only guard: skip tools with `annotations?.readOnly === false`; accept all others
- [ ] Graceful degradation: catch connection errors, log warning, expose zero tools
- [ ] Tool name prefixing: `context7_<original_name>`
- [ ] ToolDef wrapping: map MCP inputSchema (JSONSchema) to ToolJsonSchema `function.parameters`; use `z.any()` for Zod inputSchema (MCP already validated on the server side)
- [ ] Execute wrapper: strip `context7_` prefix before calling MCP, join result content blocks with `\n`
## B5 — Tool registration (lazy-init)
- [ ] `apps/server/src/services/tools.ts` — convert `ALL_TOOLS` from a module-level constant to a lazy-initialized array
- [ ] Add `initializeTools(mcpTools: ToolDef[])` function that builds the final sorted list
- [ ] `TOOLS_BY_NAME`, `READ_ONLY_TOOL_NAMES` derived from the initialized list
- [ ] Ensure all existing callers of `ALL_TOOLS` / `TOOLS_BY_NAME` still work (they import from tools.ts — verify the export shape)
- [ ] OR simpler: keep ALL_TOOLS as-is (native tools), add `appendMcpTools(tools)` that mutates + re-sorts + rebuilds TOOLS_BY_NAME. Less clean but less invasive.
## B6 — Startup wiring
- [ ] `apps/server/src/index.ts` — after `applySchema()`, before route registration:
- If `config.MCP_CONTEXT7_URL` is set: `await mcpClient.initialize(config, app.log)`
- `appendMcpTools(mcpClient.getTools())` (or equivalent)
- Log tool count
- [ ] If URL not set: skip, log "mcp: Context7 not configured, skipping"
## B7 — Verification
- [ ] `npx tsc --noEmit -p apps/server` — 0 errors
- [ ] `pnpm -C apps/server test` — all existing tests pass (MCP client is startup-only; tests don't initialize it)
- [ ] `pnpm -C apps/web build` — green (no web changes)
## B8 — Unit tests
- [ ] NEW `apps/server/src/services/__tests__/mcp-client.test.ts`
- [ ] Test: tool wrapping produces correct ToolDef shape (name, description, jsonSchema, execute fn)
- [ ] Test: read-only guard rejects tools with `readOnly: false`
- [ ] Test: read-only guard accepts tools with `readOnly: true` or no annotations
- [ ] Test: name prefixing — `resolve-library-id``context7_resolve-library-id`
- [ ] Test: result extraction — single text content block → string; multiple → joined with `\n`
- [ ] Test: error result — MCP error → `{error: true, output: ...}` shape
## B9 — Deploy + smoke
- [ ] Add `MCP_CONTEXT7_URL=https://mcp.context7.com/mcp` to docker-compose env (or .env)
- [ ] `docker compose up --build -d`
- [ ] Check logs for MCP initialization message
- [ ] Live-smoke: send a chat asking about AI SDK docs via Context7
- [ ] Verify tool calls + results render normally
## B10 — Docs + tag
- [ ] `CHANGELOG.md` entry
- [ ] `boocode_roadmap.md` retrospective bullet
- [ ] `CLAUDE.md` — mention MCP client in the tools/services section
- [ ] Commit, tag `v1.14.1-mcp-poc`, push, rebuild

View File

@@ -0,0 +1,59 @@
# v1.15.0-mcp-multi — design decisions
## D1. Config file path
`/data/mcp.json` (alongside `AGENTS.md` at `/data/AGENTS.md`). Both are bind-mounted from the host's `data/` directory. Override via `MCP_CONFIG_PATH` env var.
File missing = no MCP (opt-in by file presence, not by env var). Simpler than the v1.14.1 approach of always-defaulting a URL.
## D2. Config schema matches opencode's `mcpServers` shape
opencode uses `~/.opencode/config.json` with a `mcpServers` key. BooCode uses `mcp.json` with the same `mcpServers` key so server entries are copy-pasteable. Property names match: `type`, `url`, `command`, `args`, `env`, `headers`. BooCode adds `enabled` (boolean toggle per server, default true) which opencode doesn't have — harmless extra key.
## D3. Transport types: streamableHttp + stdio only
- **streamableHttp**: For remote servers (Context7, future cloud MCP services). Uses `@modelcontextprotocol/sdk`'s `StreamableHTTPClientTransport`.
- **stdio**: For local subprocess servers (codecontext, future local tools). Uses `@modelcontextprotocol/sdk`'s `StdioClientTransport` (spawns child process, NDJSON framing over stdin/stdout).
- **SSE**: Skipped. Streamable HTTP supersedes SSE per the MCP spec (May 2025 protocol update). If a legacy server requires SSE, it can be added later.
## D4. Tool name prefixing: `<serverName>_<toolName>`
Generalizes v1.14.1's `context7_<name>` pattern. Server name comes from the config key (e.g. `"context7"`, `"codecontext"`). Collisions between servers with the same name are impossible (config keys are unique). Collisions between an MCP tool and a native tool are possible if someone names a server entry the same as a native tool prefix — but that's a user-configuration error, not a code bug.
## D5. Per-agent glob patterns: last-match-wins
AGENTS.md `tools:` field already supports exact-match arrays. Globs extend the same field:
```yaml
tools: [view_file, grep, context7_*]
```
Evaluation: for each tool in `ALL_TOOLS`, scan the pattern list left-to-right. A `!` prefix denies. Last matching pattern wins. This matches the roadmap's "wildcard rule matcher" language.
Examples:
- `[*]` — all tools (same as omitting `tools:` entirely)
- `[*, !web_*]` — all tools except web
- `[view_file, grep, context7_*]` — only view_file, grep, and all Context7 tools
- `[*]` on Architect + `[view_file]` on Prompt Builder — each agent gets its intended scope
Globs use a simple `minimatch`-style check: `*` matches any characters. No `?` or `**` — tool names are flat (no path separators).
## D6. No DB tables in v1.15
The roadmap listed `permissions`, `agent_permissions`, `session_permissions`, `mcp_servers` tables. All deferred to v2.0:
- **Permission tables**: Enterprise multi-user pattern. BooChat is single-user behind Authelia. The read-only invariant guard is the BooChat-era defense. Formal permission rulesets land when BooCoder adds write tools.
- **`mcp_servers` table**: In-memory registry is sufficient. No need to persist server state to DB when the config file is the source of truth and tools are re-discovered on every boot.
## D7. Stdio child lifecycle
- Spawn on `initialize()`. Persistent connection for server lifetime (not per-call).
- On child exit (unexpected): mark server unavailable, log error. Do NOT auto-restart. BooCode continues with remaining servers.
- On BooCode shutdown (`app.addHook('onClose')`): send SIGTERM to all stdio children. Wait up to 5s, then SIGKILL.
- On ENOENT (command not found): skip server with a warning. Matches the graceful-degradation pattern from v1.14.1.
## D8. v1.14.1 env vars removed
`MCP_CONTEXT7_URL` and `MCP_CONTEXT7_API_KEY` are deleted from `config.ts`. They're superseded by the JSON config file's `context7` entry. The PoC was explicitly designed as throwaway.
Migration path for anyone who had the env vars set: add a `data/mcp.json` with the Context7 entry. The CHANGELOG entry will note this.

View File

@@ -0,0 +1,130 @@
# v1.15.0-mcp-multi — multi-server MCP client + stdio transport + config file
Generalize the v1.14.1 single-server Context7 PoC into a multi-server MCP client. Add stdio transport (for local subprocess MCP servers like codecontext). JSON config file matching opencode's schema shape. Per-agent tool glob patterns in AGENTS.md frontmatter.
## Why
v1.14.1 proved the MCP loop works end-to-end but is hardcoded to one server (Context7) via env vars. Real value comes from multiple servers: Context7 for docs, codecontext re-wired as a proper MCP server (stdio), future local tools. The config shape should match opencode's so Sam can copy `mcp` blocks between the two without translation.
## Scope
### S1. JSON config file for MCP servers
New file at `/data/mcp.json` (bind-mounted like `AGENTS.md`). Env var `MCP_CONFIG_PATH` points to it (default `/data/mcp.json`).
Schema (matching opencode's shape):
```json
{
"mcpServers": {
"context7": {
"type": "streamableHttp",
"url": "https://mcp.context7.com/mcp",
"headers": { "X-API-Key": "optional-key" },
"enabled": true
},
"codecontext": {
"type": "stdio",
"command": "/usr/local/bin/codecontext",
"args": ["--mcp"],
"env": { "WORKSPACE": "/opt" },
"enabled": false
}
}
}
```
Zod-validated at startup. Unknown keys silently ignored (forward-compat). Each server entry has:
- `type`: `"streamableHttp"` | `"stdio"` (SSE deferred — Streamable HTTP supersedes it per the MCP spec)
- `url` (HTTP) or `command` + `args` + `env` (stdio)
- `headers` (HTTP, optional) — for API keys
- `enabled` (boolean, default true)
### S2. Multi-server MCP client
Refactor `mcp-client.ts` from a singleton to a registry of named MCP clients. On startup:
1. Read `/data/mcp.json` (or path from `MCP_CONFIG_PATH`)
2. For each enabled server: create a Client + transport, connect, discover tools via `tools/list`
3. Wrap tools with `<server-name>_<tool-name>` prefix (generalizes the `context7_` pattern)
4. Apply read-only invariant guard per-tool (reject `readOnlyHint: false`)
5. Append all MCP tools to `ALL_TOOLS` in a single `appendMcpTools()` call
6. Per-server graceful degradation: one server failing doesn't block others
Expose: `getMcpServers(): McpServerStatus[]` for debug/status endpoint, `callTool(prefixedName, args)` routed to the correct server by prefix.
### S3. Stdio transport
For `type: "stdio"` servers: spawn a subprocess via `child_process.spawn(command, args, {env, stdio: 'pipe'})`. Use `@modelcontextprotocol/sdk`'s `StdioClientTransport` (or implement the NDJSON framing ourselves — the SDK should have it). The subprocess runs for the lifetime of the BooCode server (persistent connection, not per-call spawn).
Child lifecycle:
- Spawn on initialize. If spawn fails, log warn, skip server (graceful degradation).
- On child exit: log error, mark server as unavailable. Do NOT restart automatically (v1.15 keeps it simple; auto-restart is a v2.0 concern).
- On BooCode shutdown (`app.addHook('onClose')`): kill child processes.
### S4. Per-agent tool glob patterns in AGENTS.md
Currently `tools:` in AGENTS.md frontmatter is an exact-match whitelist (array of tool names). Extend to support glob patterns via a lightweight matcher:
- `context7_*` — all tools from the context7 server
- `view_*` — all tools starting with `view_`
- `!web_*` — exclude web tools (deny pattern)
- Plain names (`grep`, `view_file`) work as before (exact match)
Evaluation order: for each tool in `ALL_TOOLS`, check if it matches any pattern in the agent's `tools:` list. A `!` prefix means exclude. Last-match-wins.
Parser change in `agents.ts`: when validating `tools:`, don't reject unknown names if they contain `*` (glob patterns can't be validated against the current tool list since MCP tools are discovered at runtime). Exact names are still validated.
### S5. Remove v1.14.1 env-var config
Delete `MCP_CONTEXT7_URL` and `MCP_CONTEXT7_API_KEY` from `config.ts`. They're superseded by the JSON config file. The v1.14.1 PoC is throwaway-by-design (proposal said "throwaway-if-needed").
### S6. Read-only invariant preserved
BooChat's read-only guarantee stays: every MCP tool with `readOnlyHint: false` is rejected at discovery. This applies globally, not per-server. Config has no `allowWriteTools` flag — that's a v2.0 BooCoder concern.
## Deferred to v2.0
- **Permission ruleset tables** (`permissions`, `agent_permissions`, `session_permissions`). Enterprise pattern that doesn't serve until BooCoder adds write tools. The read-only invariant guard is the BooChat-era defense-in-depth.
- **OAuth / Dynamic Client Registration.** Needs secret storage primitive first.
- **SSE transport.** Streamable HTTP supersedes it per the MCP spec. SSE is a legacy fallback.
- **Per-session MCP toggle.** No `session.mcp_enabled` column in v1.15. MCP servers are globally configured; agent tool globs are the scoping mechanism.
- **`mcp_servers` DB table.** In-memory registry is sufficient for single-user. DB tracking deferred to v2.0.
- **codecontext re-wiring to MCP.** Separate batch after v1.15 proves stdio transport works.
## Non-goals
- No frontend changes. MCP tools surface via the existing tool registry; results render as normal tool-result parts.
- No schema changes. No new DB tables or columns.
- No changes to the inference loop (v1.14.0 outer loop unchanged).
- No changes to `executeToolCall` dispatch (transparent via ToolDef.execute).
## Hard rules
- No git commit/push. Sam commits.
- Read-only invariant: reject any MCP tool with `readOnlyHint: false`.
- Graceful degradation: any server down → that server's tools unavailable, rest unaffected.
- Alpha-sort of ALL_TOOLS preserved.
- One new dep only: none (MCP SDK already installed from v1.14.1).
- 348+ existing tests still pass.
## Files expected to touch
- `apps/server/src/services/mcp-client.ts` — refactor from singleton to multi-server registry (~200→300 lines)
- `apps/server/src/services/tools.ts` — no changes expected (appendMcpTools already works for multiple tools)
- `apps/server/src/config.ts` — replace MCP env vars with `MCP_CONFIG_PATH`
- `apps/server/src/index.ts` — startup reads config file, iterates servers
- `apps/server/src/services/agents.ts` — glob pattern support in `tools:` whitelist
- `data/mcp.json` — NEW, example config with Context7 (disabled by default, enabled via edit)
- `apps/server/src/services/__tests__/mcp-client.test.ts` — update for multi-server, add stdio transport tests
- `apps/server/src/services/__tests__/agents-glob.test.ts` — NEW, glob pattern matching tests
## Estimate
~350 LoC. The MCP SDK handles both transports; BooCode's job is config parsing, multi-server lifecycle, and glob matching.
## Smoke plan
1. Create `/data/mcp.json` with Context7 enabled. Restart. Confirm tools discovered + logged.
2. Send a chat asking about library docs. Confirm `context7_*` tools called + results rendered.
3. Disable Context7 in config (`"enabled": false`). Restart. Confirm zero MCP tools.
4. Add a dummy stdio server entry pointing to `/bin/cat` (will fail). Confirm graceful degradation: Context7 works, dummy fails with a logged warning.
5. Add `tools: [context7_*]` to the Architect agent in AGENTS.md. Confirm Architect sees only Context7 tools (via AgentPicker or by chatting with Architect selected).
6. Stop boocode, confirm child processes are killed (no orphans).

View File

@@ -0,0 +1,87 @@
# v1.15.0-mcp-multi tasks
## B1 — Backups
- [ ] `mcp-client.ts`, `config.ts`, `index.ts`, `agents.ts`, `mcp-client.test.ts`
## B2 — MCP config file schema + loader
- [ ] NEW `apps/server/src/services/mcp-config.ts` (~50 lines)
- [ ] Zod schema for `mcp.json`: `McpServerConfig` with `type`, `url/command/args/env`, `headers`, `enabled`
- [ ] `loadMcpConfig(configPath: string, log): McpServerConfig[]` — reads JSON, validates, returns enabled servers
- [ ] Graceful: file missing → log info, return empty array (no MCP)
- [ ] Graceful: parse error → log warn with details, return empty array
## B3 — Config.ts: replace MCP env vars
- [ ] Remove `MCP_CONTEXT7_URL` and `MCP_CONTEXT7_API_KEY` from Zod schema
- [ ] Add `MCP_CONFIG_PATH: z.string().optional()` (no default — opt-in)
## B4 — Refactor mcp-client.ts to multi-server registry
- [ ] Replace module-level singleton with `Map<serverName, {client, transport, tools}>`
- [ ] `initialize(servers: McpServerConfig[], log)` — iterate servers, connect each, discover tools, wrap with `<serverName>_<toolName>` prefix, apply read-only guard
- [ ] Streamable HTTP transport: reuse existing pattern from v1.14.1
- [ ] Stdio transport: use `@modelcontextprotocol/sdk`'s `StdioClientTransport` (check SDK exports; fallback to `child_process.spawn` + NDJSON if SDK doesn't expose it)
- [ ] `callTool(prefixedName, args)` — extract server name from prefix, route to correct client
- [ ] `getTools()` — return all tools from all servers, flattened
- [ ] `getMcpServers()` — return status of each server (name, type, toolCount, connected)
- [ ] Per-server graceful degradation: catch per-server errors, log, skip; continue with others
- [ ] `shutdown()` — kill stdio child processes, close HTTP clients
- [ ] `app.addHook('onClose')` calls shutdown
## B5 — Startup wiring (index.ts)
- [ ] Read config: `const mcpConfigPath = config.MCP_CONFIG_PATH ?? '/data/mcp.json'`
- [ ] `const mcpServers = loadMcpConfig(mcpConfigPath, app.log)`
- [ ] `await mcpClient.initialize(mcpServers, app.log)`
- [ ] `appendMcpTools(mcpClient.getTools())`
- [ ] Log summary: "mcp: N servers connected, M tools registered"
- [ ] `app.addHook('onClose', () => mcpClient.shutdown())`
## B6 — AGENTS.md glob patterns
- [ ] `apps/server/src/services/agents.ts` — in tool whitelist validation, skip validation for entries containing `*` (can't validate against runtime-discovered tools)
- [ ] NEW helper `matchToolGlob(toolName: string, patterns: string[]): boolean` — supports `*` wildcard and `!` deny prefix, last-match-wins
- [ ] Wire into `executeStreamPhase` (stream-phase.ts) where agent tools are filtered: replace exact-match `.includes()` with `matchToolGlob()`
- [ ] Export `matchToolGlob` for test access
## B7 — Example config file
- [ ] NEW `data/mcp.json` with Context7 entry (enabled: true, with URL, no API key)
- [ ] Comment in the file noting it's bind-mounted at `/data/mcp.json` inside the container
## B8 — Tests
- [ ] Update `mcp-client.test.ts` for multi-server wrapping (tools from two servers, prefix routing)
- [ ] Test: server A fails, server B succeeds — only B's tools registered
- [ ] Test: callTool routes to correct server by prefix
- [ ] Test: shutdown kills stdio transports
- [ ] NEW `apps/server/src/services/__tests__/mcp-glob.test.ts`
- [ ] Test: exact match ("grep" matches "grep")
- [ ] Test: wildcard ("context7_*" matches "context7_query-docs")
- [ ] Test: deny ("!web_*" excludes "web_search")
- [ ] Test: last-match-wins ("*" then "!web_*" → web tools excluded)
- [ ] Test: empty pattern list → nothing matches (agent gets no tools — same as current behavior for explicit whitelists)
## B9 — Verification
- [ ] `npx tsc --noEmit -p apps/server` — 0 errors
- [ ] `pnpm -C apps/server test` — all passing
- [ ] `pnpm -C apps/web build` — green (no web changes)
## B10 — Deploy + smoke
- [ ] Create `/data/mcp.json` on the host with Context7 enabled
- [ ] Update docker-compose bind mount if needed (data/ already mounted)
- [ ] `docker compose up --build -d`
- [ ] Check logs for multi-server init
- [ ] Live-smoke: Context7 tool call from chat
- [ ] Disable Context7 in config, restart, confirm zero MCP tools
## B11 — Docs + tag
- [ ] `CHANGELOG.md` entry
- [ ] `boocode_roadmap.md` retrospective bullet on v1.15 section
- [ ] `CLAUDE.md` — update MCP references
- [ ] Commit, tag `v1.15.0-mcp-multi`, push, rebuild

View File

@@ -0,0 +1,413 @@
# v2.0 BooCoder — Implementation Plan
Ordered execution plan across all 4 sub-versions. Each phase is dispatchable as a single batch. Phases 1-4 are sequential (each builds on the prior); phases within a sub-version can sometimes be parallelized.
---
## Phase 1 — Foundation (v2.0.0-alpha)
**Goal:** Standalone BooCoder container boots, connects to DB, serves a health endpoint. No inference yet.
**Estimated:** ~200 LoC
### Steps
1. **Clone lift sources** (prep, no code)
- `cd /opt/forks && git clone agent-hub, plandex, opencode, qodo-ai/agents`
- Read agent-hub schema, plandex pending-changes, opencode permission/evaluate.ts
- Read RA.Aid README for three-stage pattern
2. **Create `apps/coder/` skeleton**
- `apps/coder/package.json` (Fastify, postgres, zod — same deps as `apps/server`)
- `apps/coder/tsconfig.json` (extends base, NodeNext)
- `apps/coder/src/index.ts` (Fastify boot, health endpoint, DB connect)
- `apps/coder/src/config.ts` (Zod config schema — DATABASE_URL, PORT, HOST, LLAMA_SWAP_URL, CONTAINER_GUIDANCE_FILE)
- `apps/coder/src/db.ts` (postgres connection, schema apply — shared with `apps/server` or fresh)
3. **Create Dockerfile**
- `apps/coder/Dockerfile` — Node 20 bookworm-slim (matches booterm for glibc compat with node-pty later)
- Mount: `/opt:/opt:rw`
- COPY built server + BOOCODER.md
4. **docker-compose.yml** — add `boocoder` service
- Port `100.114.205.53:9502:3000`
- Environment: `DATABASE_URL`, `LLAMA_SWAP_URL`, `CONTAINER_GUIDANCE_FILE=/app/BOOCODER.md`
- Network: `boocode_net`
- Depends on: `boocode_db`
5. **DB rename**`boocode_db``boochat_db`
- `ALTER DATABASE boocode RENAME TO boochat;` (one-time, run manually)
- Update `DATABASE_URL` in all docker-compose services
- Update volume name mapping
- Verify all 3 services boot against renamed DB
6. **Schema migration** — new tables in `apps/coder/src/schema.sql`
- `pending_changes` table
- `tasks` table
- `available_agents` table
- `human_inbox` view
- Applied idempotently on boot (same pattern as BooChat's `applySchema()`)
7. **BOOCODER.md** — container guidance file
- Write tools enabled (unlike BOOCHAT.md which declares read-only)
- Pending-changes queue discipline
- Path-guard rules
### Verification
- `docker compose up --build -d` — boocoder container starts
- `curl http://100.114.205.53:9502/api/health` — 200 OK
- `psql` confirms new tables exist
- BooChat + BooTerm unaffected (still boot, still serve)
---
## Phase 2 — Write Tools + Pending Changes (v2.0.0-beta)
**Goal:** BooCoder can chat with the LLM, the LLM can call write tools, changes queue in `pending_changes`, user can apply/reject.
**Estimated:** ~400 LoC
### Steps
1. **Write-path guard** (`apps/coder/src/services/write_guard.ts`)
- `resolveWritePath(projectRoot, filePath): string``resolve()` + prefix check (no realpath — file may not exist for creates)
- Deny list: inherit from BooChat's `secret_guard.ts` (`.env`, `*.pem`, `id_rsa*`, etc.)
- Fuzz tests: `../` escape, symlink outside root, null bytes, non-existent parent dirs
2. **Pending-changes service** (`apps/coder/src/services/pending_changes.ts`)
- `queueEdit(session_id, task_id, file_path, old_string, new_string): PendingChange` — computes unified diff, validates write path, INSERTs
- `queueCreate(session_id, task_id, file_path, content): PendingChange`
- `queueDelete(session_id, task_id, file_path): PendingChange`
- `applyAll(session_id): ApplyResult[]` — re-validates each path, writes to disk, marks `status='applied'`
- `applyOne(change_id): ApplyResult`
- `rejectOne(change_id): void` — marks `status='rejected'`
- `rejectAll(session_id): void`
- `rewindOne(change_id): void` — inverse-diff, writes to disk, marks `status='reverted'`
- `listPending(session_id): PendingChange[]`
3. **Write tools** (`apps/coder/src/services/tools/`)
- `edit_file.ts` — input: `{file_path, old_string, new_string}`, calls `queueEdit`
- `create_file.ts` — input: `{file_path, content}`, calls `queueCreate`
- `delete_file.ts` — input: `{file_path}`, calls `queueDelete`
- `apply_pending.ts` — calls `applyAll` for current session
- `rewind.ts` — input: `{change_id}` or `{all: true}`, calls `rewindOne`/`rewindAll`
4. **Tool registry** — register write tools alongside ALL read tools from BooChat
- Import BooChat's read tools (view_file, grep, etc.) + codecontext tools
- Add the 5 write tools
- Alpha-sort the combined list
5. **Inference loop** — port from BooChat or share via workspace package
- Copy `apps/server/src/services/inference/` into `apps/coder/src/services/inference/` (or symlink via pnpm workspace)
- The outer loop (v1.14) runs unchanged — write tools are just ToolDefs with `execute()` functions
- Compaction, doom-loop, step cap all carry forward
6. **API routes**
- `POST /api/sessions/:id/messages` — same as BooChat (creates user + assistant rows, enqueues inference)
- `GET /api/sessions/:id/pending` — returns pending changes for the session
- `POST /api/sessions/:id/pending/apply` — applies all pending
- `POST /api/pending/:id/apply` — applies one
- `POST /api/pending/:id/reject` — rejects one
- `POST /api/pending/:id/rewind` — reverts one
- WebSocket streaming (same protocol as BooChat)
### Verification
- Send a chat asking BooCoder to edit a file
- LLM calls `edit_file` → change queued in `pending_changes`
- `GET /api/sessions/:id/pending` shows the queued change with diff
- `POST /api/pending/:id/apply` writes to disk
- `POST /api/pending/:id/rewind` reverts it
- Fuzz test: attempt traversal via `edit_file("../../etc/passwd", ...)` → rejected by write_guard
---
## Phase 3 — Frontend: Diff Pane + Chat (v2.0.0)
**Goal:** Browser UI at `coder.indifferentketchup.com` with chat pane + diff pane side by side.
**Estimated:** ~200 LoC
### Steps
1. **Create `apps/coder/web/`** — React + Vite SPA (same stack as BooChat's `apps/web/`)
- Copy BooChat's Vite config, Tailwind v4 setup, font pipeline
- Shared components: `MarkdownRenderer`, `CodeBlock`, `Button`, `Input`
- New app shell: sidebar (sessions) + workspace (panes)
2. **Chat pane** — reuse BooChat's ChatPane/MessageBubble pattern
- Same WS streaming, same `useSessionStream` hook, same message rendering
- ActionRow includes tool-call rendering for write tools
3. **Diff pane** — NEW (`apps/coder/web/src/components/DiffPane.tsx`)
- Fetches `GET /api/sessions/:id/pending`
- Lists pending changes: file path + operation badge (create/edit/delete)
- Per-change: syntax-highlighted unified diff view (use Shiki or a diff-specific highlighter)
- Buttons: Approve / Reject per change, Approve All / Reject All
- Real-time updates via WS frame (`pending_change_added`, `pending_change_applied`, etc.)
4. **Workspace splitter** — chat left, diff right (or configurable)
5. **Caddy route**`coder.indifferentketchup.com` → boocoder:9502
- Authelia gating (same as BooChat)
### Verification
- Open `coder.indifferentketchup.com` in browser
- Send a message asking for a code change
- See the change appear in the diff pane in real time
- Click Approve → file written, change marked applied
- Click Reject → change discarded
---
## Phase 4 — Dispatcher + Tasks (v2.0.0 final)
**Goal:** Task queue works. User can create tasks, dispatcher picks them up and runs them through Path A.
**Estimated:** ~150 LoC
### Steps
1. **Dispatcher** (`apps/coder/src/services/dispatcher.ts`)
- In-process `setInterval(5000)` polling `tasks` WHERE `state='pending'` ORDER BY `created_at`
- For each ready task: mark `state='running'`, run inference with the task's `input` as the user message
- On completion: mark `state='completed'`
- On error: mark `state='failed'`
- On abort: mark `state='cancelled'`
- Respects `app.addHook('onClose')` — stops polling, waits for in-flight task
2. **Task API routes**
- `POST /api/tasks` — create a task `{project_id, input, agent?, model?}`
- `GET /api/tasks` — list tasks (filterable by state, project)
- `GET /api/tasks/:id` — get task details + output_summary
- `POST /api/tasks/:id/cancel` — cancel a running task
3. **Task → session linkage**
- Each task creates its own session + chat for isolation
- Task's pending_changes reference the task_id
- When task completes, its pending_changes are visible in the UI for approval
4. **Agent probing** (`apps/coder/src/services/agent-probe.ts`)
- On startup: `which opencode`, `which goose`, `which claude`, `which pi`
- Parse version from `<agent> --version`
- Check ACP support: `opencode acp --help` exits 0 → supports_acp = true
- Populate `available_agents` table
### Verification
- `POST /api/tasks {input: "add a /api/version endpoint"}` → task created
- Dispatcher picks it up → inference runs → `edit_file` queued → task completes
- `GET /api/tasks/:id` shows `state='completed'` + output_summary
- Pending changes visible in diff pane for approval
---
## Phase 5 — ACP Dispatch (v2.0.1)
**Goal:** Tasks can be dispatched to external agents via ACP. opencode and goose run as subprocesses, their events flow back into BooCode.
**Estimated:** ~350 LoC
### Steps
1. **ACP client** (`apps/coder/src/services/acp-client.ts`)
- Install: `pnpm -C apps/coder add @zed-industries/agent-client-protocol`
- `spawnAcpAgent(agent: string, task: string, worktree: string, mcpServers: McpConfig[]): AcpSession`
- Uses SDK's `StdioTransport` — spawn `opencode acp` or `goose acp` as child
- Pass `context_servers` for MCP auto-forward
- Event listener: maps ACP events to BooCode's parts taxonomy
2. **ACP event mapping**
- `file_operation` → queue into `pending_changes` (same as Path A native writes)
- `tool_call` / `tool_result` → insert as `message_parts` in the task's session
- `terminal_output` → publish as WS frame for BooTerm routing
- `permission_request` → pause (same mechanism as `ask_user_input`)
- `session_end` → task state → `completed` or `failed`
3. **Worktree management** (`apps/coder/src/services/worktrees.ts`)
- `createWorktree(projectPath, taskId): string``git worktree add /tmp/booworktrees/<taskId> -b task-<taskId> HEAD`
- `diffWorktree(worktreePath, projectPath): UnifiedDiff[]``git diff HEAD...<worktree-branch>`
- `cleanupWorktree(worktreePath): void``git worktree remove`
- On ACP session end: diff the worktree, queue diffs into `pending_changes`, cleanup
4. **PTY fallback** (`apps/coder/src/services/pty-dispatch.ts`)
- For agents without ACP (claude, pi, smallcode)
- `spawnPtyAgent(agent: string, task: string, worktree: string): PtySession`
- Uses `node-pty` — spawn `claude` or `pi` with cwd = worktree
- Capture stdout/stderr into `message_parts` (kind='text', less structured than ACP)
- On exit: diff worktree → queue pending_changes → cleanup
5. **Dispatcher update** — transport selection
- Check `available_agents[agent].supports_acp` at dispatch time
- ACP-capable → `spawnAcpAgent`
- PTY fallback → `spawnPtyAgent`
- Native (no agent specified) → Path A inference loop (Phase 4)
6. **AGENTS.md extensions**
- Add `execution_strategy: plan | act | research` field
- Add `expert_model` field for cost-routing
- Add `output_schema` field (optional JSON Schema for structured final output)
### Verification
- Create task with `agent: 'opencode'` → ACP subprocess spawns
- opencode edits files in worktree → events stream into UI
- On completion: worktree diff queued in `pending_changes`
- Approve → changes applied to main project
- Fallback: create task with `agent: 'claude'` → PTY captures output → worktree diff queued
---
## Phase 6 — MCP Server (v2.0.2)
**Goal:** BooCoder exposes its own primitives as MCP tools. External opencode sessions in Termius can drive the task queue.
**Estimated:** ~250 LoC
### Steps
1. **MCP server** (`apps/coder/src/services/mcp-server.ts`)
- Use `@modelcontextprotocol/sdk` server-side (`Server` class)
- Stdio transport (read from stdin, write to stdout)
- Entry point: `boocoder --mcp` CLI flag starts the MCP server instead of the HTTP server
2. **Tool handlers** (6 tools)
- `boocoder.create_task` → INSERT into tasks table, return task_id
- `boocoder.list_pending_changes` → SELECT from pending_changes WHERE session matches
- `boocoder.apply` → call `applyOne(change_id)`
- `boocoder.reject` → call `rejectOne(change_id)`
- `boocoder.dispatch_external_agent` → create task with agent specified, return task_id
- `boocoder.list_worktrees` → list active worktrees from tasks WHERE worktree_path IS NOT NULL AND state='running'
3. **10-question eval** (per `anthropics/skills/mcp-builder` framework)
- Write 10 independent, read-only, verifiable questions about the BooCoder state
- Run eval: `echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"boocoder.list_pending_changes","arguments":{}},"id":1}' | boocoder --mcp`
- All 10 must return correct answers
4. **opencode integration test**
- Add BooCoder as an MCP server in `~/.opencode/config.json`:
```json
{"mcpServers": {"boocoder": {"type": "stdio", "command": "boocoder", "args": ["--mcp"]}}}
```
- From opencode: call `boocoder.create_task` → verify task appears in BooCoder UI
### Verification
- `echo '...' | boocoder --mcp` returns valid MCP responses
- 10-question eval passes
- opencode can drive BooCoder's task queue via MCP
---
## Phase 7 — CLI + Polish (v2.0.3)
**Goal:** `boocode` CLI client, human inbox UI, cost tracking, observation hooks.
**Estimated:** ~400 LoC
### Steps
1. **CLI client** (`apps/coder/src/cli.ts`)
- Thin HTTP/WS client against BooCoder API
- `boocode run "task description"` → POST /api/tasks → stream output via WS
- `boocode ls` → GET /api/tasks → formatted table
- `boocode attach <id>` → WS subscribe to task's session → stream live
- `boocode send <id> "message"` → POST message to task's session chat
- Build as a standalone binary via `pkg` or `esbuild --bundle`
2. **Human inbox UI** (frontend)
- New route: `/inbox` → shows tasks WHERE `state IN ('blocked', 'failed')`
- Per-task: view output, retry (reset state to pending), cancel, reassign agent
- Badge on sidebar showing count of inbox items
3. **Cost tracking**
- `tasks.cost_tokens` populated from inference `usage` callback (same as BooChat's `tokens_used`)
- Summary API: `GET /api/stats/costs?group_by=project|agent|day` → aggregated token spend
- Simple UI: cost badge on each task, totals in settings
4. **Observation hooks** (budi taxonomy)
- Emit 5 event types on the BooCoder WS protocol for dispatched agents:
- `session_start` — agent spawned
- `user_prompt_submit` — task spec delivered
- `post_tool_use` — each tool call completed
- `subagent_start` — nested dispatch (Boomerang)
- `stop` — agent finished
- Consumed by frontend for real-time status indicators
5. **Boomerang `new_task` tool** (subagent isolation)
- When an agent's toolset includes `new_task`:
- Creates a child task (fresh session, fresh context)
- Child runs to completion
- Parent gets only `attempt_completion` summary
- Orchestrator agent profile: tools = `[new_task, list_tasks, check_task_status]` ONLY
### Verification
- `boocode run "add health endpoint"` from terminal → task runs → output streams → diff queued
- `boocode ls` shows task list with states + cost
- Inbox shows failed tasks, retry works
- Boomerang: orchestrator creates subtask → subtask runs isolated → parent gets summary only
---
## Phase 8 — Hardening + Ship (v2.0.x)
**Goal:** Security hardening, integration tests, documentation, production deploy.
**Estimated:** ~100 LoC (mostly tests + docs)
### Steps
1. **Path-guard fuzz suite** — property tests for every traversal pattern:
- `../` sequences (all depths)
- Symlink outside project root
- Null bytes in path
- Unicode normalization attacks
- Race conditions (TOCTOU between validate + write)
- MCP-served filesystem writes routed through pending_changes
2. **Integration tests**
- End-to-end: create task → inference → edit_file → apply → file written → verify content
- ACP dispatch: mock opencode → events flow → pending_changes queued
- MCP server: 10-question eval automated in CI
3. **Documentation**
- `BOOCODER.md` finalized (container guidance)
- `CLAUDE.md` updated with BooCoder architecture section
- `boocode_roadmap.md` v2.0 retrospective
- `CHANGELOG.md` entries for each sub-version
4. **Production deploy**
- Caddy config: `coder.indifferentketchup.com`
- Authelia: same SSO group as BooChat
- Smoke: full workflow (chat → edit → approve → verify)
5. **Tag** — `v2.0.0` (or `v2.0.0-rc1` if Sam wants a bake period)
---
## Execution order summary
```
Phase 1 (foundation) → v2.0.0-alpha ~200 LoC container boots
Phase 2 (write tools) → v2.0.0-beta ~400 LoC inference + pending_changes
Phase 3 (frontend) → v2.0.0 ~200 LoC chat + diff panes
Phase 4 (dispatcher) → v2.0.0-final ~150 LoC task queue + native dispatch
Phase 5 (ACP dispatch) → v2.0.1 ~350 LoC external agents + worktrees
Phase 6 (MCP server) → v2.0.2 ~250 LoC boocoder.* tools + eval
Phase 7 (CLI + polish) → v2.0.3 ~400 LoC CLI + inbox + hooks + Boomerang
Phase 8 (hardening) → v2.0.x ~100 LoC fuzz + integration tests + docs
--------
~2050 LoC total
```
Each phase is independently dispatchable. Phases 1-4 are sequential (each needs the prior). Phases 5-7 are parallelizable after Phase 4 ships (they're independent protocol surfaces). Phase 8 gates the production tag.
---
## Risk register
| Risk | Mitigation |
|---|---|
| Path-guard bypass → arbitrary writes | Pending-changes double-validates (at queue time + apply time). Fuzz suite in Phase 8. OpenHands sandbox (v2.1) as fallback. |
| ACP spec instability (remote transport WIP) | Use stdio only. No remote ACP in v2.0. |
| node-pty native compilation breaks in Docker | bookworm-slim + glibc matches booterm's working config. Pin node-pty version. |
| Worktree cleanup failure → disk bloat | 30-min idle timeout sweeper. `git worktree prune` on startup. |
| DB rename breaks existing sessions | One-time migration with explicit backup. BooChat/BooTerm URLs unchanged. |
| MCP server eval failure | Ship stdio MCP server only after 10/10 eval passes. |
| Boomerang context leak (child leaks state to parent) | Architectural enforcement: child's session_id ≠ parent's. Summary field is the ONLY bridge. |

View File

@@ -0,0 +1,346 @@
# v2.0 — BooCoder
Major version bump. New app `apps/coder/` inside the existing monorepo. Lands together with the `boocode_db``boochat_db` DB rename and the per-app subdomain split (`code.indifferentketchup.com` → BooChat, `coder.indifferentketchup.com` → BooCoder).
## What BooCoder is
A write-capable coding agent surface. Two execution paths, same UI:
- **Path A (native):** BooCode's own inference loop with write tools (`edit_file`, `create_file`, `delete_file`). Edits queue in `pending_changes` — nothing touches disk until user approves via `/apply`.
- **Path B (dispatch):** Shells out to external CLI agents (`opencode`, `goose`, `claude`, `pi`) via ACP (preferred) or raw PTY (fallback). One git worktree per dispatch. Captures events into the same parts taxonomy.
Both paths feed the same task DAG, same project registry, same pending-changes queue, same UI.
## Why now
v1.x proved the read-only loop works end-to-end: inference, tool dispatch, streaming, compaction, MCP client, outer loop, step caps, artifact rendering. The infrastructure is stable. The jump from "read-only chat" to "write-capable agent orchestrator" is the remaining gap between BooCode and having a real development environment.
## Architecture
### Three protocol roles (locked 2026-05-22)
1. **MCP client (write-capable allowed).** Inherits v1.15 client. Write-capable MCP servers (e.g. `@modelcontextprotocol/server-filesystem`) route writes through `pending_changes`. Per-task allow/deny means dispatched tasks can have a different MCP roster.
2. **MCP server (BooCoder's own primitives).** Exposes `boocoder.create_task`, `boocoder.list_pending_changes`, `boocoder.apply`, `boocoder.reject`, `boocoder.dispatch_external_agent`, `boocoder.list_worktrees` as MCP tools. Stdio transport for local consumers (Sam's `opencode` in Termius); HTTP deferred until OAuth + secret storage.
3. **ACP client (host).** Spawns `opencode acp` and `goose acp` as JSON-RPC stdio subprocesses. Maps ACP events (file operations, tool calls, terminal output) to BooCode's parts taxonomy. MCP servers configured in BooCoder are auto-forwarded to the dispatched agent (per goose docs — `context_servers` is the field).
### Container layout (post-v2.0)
| Container | Port | Mount | Purpose |
|---|---|---|---|
| `boochat` (was `boocode`) | `100.114.205.53:9500` | `/opt:/opt:ro` | Read-only chat + MCP client |
| `booterm` | `100.114.205.53:9501` | `/opt:/opt:rw` | PTY/tmux terminal |
| `boocoder` | `100.114.205.53:9502` | `/opt:/opt:rw` (policy-gated) | Write tools + ACP host + MCP client + MCP server |
| `boochat_db` (was `boocode_db`) | `127.0.0.1:5500` | `boocode_pgdata` | Shared Postgres 16 |
| `codecontext` | internal `:8080` | `/opt:/opt:ro` | Analysis sidecar (shared) |
### Caddy routing
```
code.indifferentketchup.com → boochat:9500
coder.indifferentketchup.com → boocoder:9502
term.indifferentketchup.com → booterm:9501 (or routed under code.*/term/)
```
## Schema (new tables)
```sql
-- Pending changes: queued writes before /apply
CREATE TABLE IF NOT EXISTS pending_changes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id UUID NOT NULL REFERENCES sessions(id),
task_id UUID REFERENCES tasks(id),
file_path TEXT NOT NULL,
operation TEXT NOT NULL CHECK (operation IN ('create', 'edit', 'delete')),
diff TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'applied', 'rejected', 'reverted')),
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
-- Tasks: the dispatch DAG
CREATE TABLE IF NOT EXISTS tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id),
parent_task_id UUID REFERENCES tasks(id),
state TEXT NOT NULL DEFAULT 'pending'
CHECK (state IN ('pending', 'running', 'completed', 'failed', 'blocked', 'cancelled')),
input TEXT NOT NULL,
output_summary TEXT,
agent TEXT,
model TEXT,
execution_path TEXT CHECK (execution_path IN ('native', 'acp', 'pty')),
worktree_path TEXT,
cost_tokens INTEGER,
started_at TIMESTAMPTZ,
ended_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
);
-- Available agents: probed at startup
CREATE TABLE IF NOT EXISTS available_agents (
name TEXT PRIMARY KEY,
install_path TEXT,
version TEXT,
supports_acp BOOLEAN NOT NULL DEFAULT false,
supports_mcp_client BOOLEAN NOT NULL DEFAULT false,
last_probed_at TIMESTAMPTZ
);
-- Human inbox: tasks needing attention
CREATE VIEW human_inbox AS
SELECT * FROM tasks WHERE state IN ('blocked', 'failed');
```
`task_templates` and `pipelines` deferred to v2.1 — overhead for single-user. The core is `tasks` + `pending_changes` + `available_agents`.
## Path A — Native write tools
### Tools
| Tool | Description |
|---|---|
| `edit_file` | Apply a diff to an existing file. Input: `{file_path, old_string, new_string}`. Queues in `pending_changes` with `operation='edit'`. |
| `create_file` | Create a new file. Input: `{file_path, content}`. Queues as `operation='create'`. |
| `delete_file` | Delete a file. Input: `{file_path}`. Queues as `operation='delete'`. |
| `apply_pending` | Flush all pending changes for the current session to disk. Path-guarded. |
| `rewind` | Revert a specific applied change or all changes since a checkpoint. |
### Path guard for writes
Same `pathGuard()` function from BooChat, but with a write-path variant:
- `resolveWritePath(projectRoot, requested)` — uses `resolve()` (not `realpath()`, since the file may not exist yet for creates), then verifies the result starts with `projectRoot + sep`.
- Deny list: everything in `secret_guard.ts` (`.env`, `*.pem`, etc.) — can't write to those either.
- Defense-in-depth: the `pending_changes` queue means even a path-guard bypass only queues; it doesn't hit disk until `/apply` (which re-validates).
### Diff format
Standard unified diff (what `git diff` produces). The `edit_file` tool takes `old_string` / `new_string` (same as Claude Code's edit tool — the model is trained on this shape). Server computes the unified diff for storage in `pending_changes.diff`.
### UI: per-pane diff viewer
Frontend pane type `pending_changes` in BooCoder's workspace. Shows:
- List of queued changes with file path + operation
- Per-change diff view (syntax-highlighted, side-by-side or unified)
- Approve / Reject per change, or Approve All / Reject All
## Path B — External agent dispatch
### dispatch_external_agent tool
```typescript
{
agent: 'opencode' | 'claude' | 'goose' | 'pi',
model: string, // e.g. 'claude-opus-4-7'
task: string, // natural-language task description
worktree?: string, // optional — auto-creates if not specified
}
```
### Transport selection
Dispatcher checks `available_agents.supports_acp` at runtime:
- **ACP** (preferred): `opencode acp`, `goose acp` — JSON-RPC stdio. Native session lifecycle, file-operation events, terminal events, permission prompts.
- **PTY** (fallback): `claude`, `pi`, `smallcode` — raw terminal capture via `node-pty`. Captures stdout/stderr/exit-code into PostgreSQL. Less structured than ACP.
### Worktree management
Each dispatched task gets its own git worktree:
```bash
git worktree add /tmp/booworktrees/<task-id> -b task-<task-id> HEAD
```
On completion: diff the worktree against HEAD, queue the diff into `pending_changes` for the same task, clean up the worktree. User approves/rejects the diff the same way as Path A.
### ACP event mapping
ACP events → BooCode parts taxonomy:
- `file_operation``tool_call` part (name: `acp_edit_file`) + `tool_result` part
- `tool_call``tool_call` part (preserves name)
- `terminal_output` → routes into BooTerm pane
- `permission_request` → pause inference (same mechanism as `ask_user_input`)
- `session_end` → task state → `completed` or `failed`
### MCP server auto-forward
Per goose docs, `context_servers` field in the ACP session config auto-forwards BooCoder's configured MCP servers to the dispatched agent. One MCP config drives every agent.
## Dispatcher worker
Background process (or in-process `setInterval` for v2.0 simplicity) that:
1. Queries `tasks` WHERE `state = 'pending'` ORDER BY `created_at`
2. For each ready task (no unmet dependencies):
- Mark `state = 'running'`
- Resolve execution path (Path A if no agent specified, Path B if agent specified)
- Path A: run the inference loop with write tools enabled
- Path B: spawn ACP/PTY subprocess, stream events into parts
- On completion: mark `state = 'completed'` or `'failed'`
- Queue output diff into `pending_changes`
3. On failure: mark `state = 'failed'`, surface in `human_inbox` view
## BooCoder MCP server
Exposes BooCoder's primitives as MCP tools so external agents (Sam's opencode in Termius) can drive the task queue:
| MCP Tool | Description |
|---|---|
| `boocoder.create_task` | Create a new task in the queue |
| `boocoder.list_pending_changes` | List queued changes awaiting approval |
| `boocoder.apply` | Apply a specific pending change |
| `boocoder.reject` | Reject a pending change |
| `boocoder.dispatch_external_agent` | Dispatch a task to an external agent |
| `boocoder.list_worktrees` | List active git worktrees |
Stdio transport for local consumers. HTTP transport deferred until OAuth + secret storage.
**Eval requirement:** run through `anthropics/skills mcp-builder` 10-question evaluation framework before shipping.
## Code lifts
### Primary architectural template
**`Dominic789654/agent-hub`** (Apache-2.0) — task DAG schema, dispatcher worker, project registry, human inbox. Three-process model (board server + dispatcher + assistant terminal). BooCode adapts this into a single-process Fastify app (v2.0.0) with the dispatcher as an in-process worker.
### Pending-changes UX
**`plandex-ai/plandex`** (MIT) — diff/apply/rewind vocabulary. The `pending_changes` queue concept, per-file diff view, approve/reject UI pattern. No code lifted — schema and UX design only.
### ACP client
**`agentclientprotocol.com` spec + `@zed-industries/agent-client-protocol` SDK** (Apache-2.0) — local-subprocess ACP via stdio JSON-RPC. The SDK handles framing; BooCode maps events to its parts taxonomy.
**`goose` docs** (`goose-docs.ai/docs/guides/acp-clients/`) — `context_servers` auto-forward pattern. Critical: one MCP config drives every dispatched agent.
### MCP server
**`anthropics/skills/mcp-builder`** (MIT) — 4-phase build workflow + 10-question evaluation framework for validating the MCP server before shipping.
### Dispatcher pattern
**Paseo (`getpaseo/paseo`)** — AGPL-3.0, **design only, no code lift**. Daemon+clients architecture, `--worktree` flag, CLI verb shape (`run/ls/attach/send`). BooCode reproduces the architecture using only license-clean patterns.
**Roo Code Boomerang Tasks** — orchestrator with intentional capability restriction. Down-pass/up-pass context discipline (`new_task` message, `attempt_completion` result, no implicit inheritance). Explicit precedence override clause.
### Write-tool security
**opencode `permission/evaluate.ts`** — wildcard permission ruleset (already lifted in v1.15). Extended in v2.0 to gate write tools.
**`covibes/zeroshot`** — blind-validation invariant. Verify gate runs in a separate agent context that only sees the diff and acceptance criteria, not the producing conversation. v2.0+ optional batch.
## Sub-versions
| Version | Scope |
|---|---|
| **v2.0.0** | Schema + Path A (native write tools + pending-changes queue + diff UI) + basic dispatcher |
| **v2.0.1** | Path B (ACP client for opencode/goose + PTY fallback for claude/pi + worktree management) |
| **v2.0.2** | BooCoder MCP server (stdio transport, `boocoder.*` tools, eval framework) |
| **v2.0.3** | Polish: `boocode` CLI (`run/ls/attach/send`), human_inbox UI, cost tracking |
## Dependencies
- v1.13 ✅ (parts table — the event taxonomy for everything)
- v1.14 ✅ (outer loop + step boundaries for future revert snapshots)
- v1.14.x-mcp ✅ (MCP client PoC — proves the protocol)
- v1.15 ✅ (full MCP client + tool globs — write-capable MCP servers route through pending_changes)
- v1.16 ✅ (codesight merge — codecontext now has blast-radius for impact analysis)
All dependencies shipped. v2.0 is unblocked.
## Estimate
- v2.0.0: ~800 LoC (schema + write tools + pending-changes service + diff pane + dispatcher skeleton)
- v2.0.1: ~600 LoC (ACP client + PTY dispatch + worktree management + event mapping)
- v2.0.2: ~400 LoC (MCP server + 6 tool handlers + stdio transport + eval)
- v2.0.3: ~400 LoC (CLI client + inbox UI + cost aggregation)
- **Total: ~2200 LoC** across 4 sub-versions
## Hard rules
- BooChat stays read-only. BooCoder is the only surface with write tools.
- Path-guard correctness is the #1 test target. Fuzz against every traversal pattern.
- Pending-changes queue gates ALL writes (native + MCP). Nothing touches disk without user approval (or explicit auto-apply flag per task).
- One shared database. Cross-surface joins are valuable (task → chat → terminal debugging session).
- External CLI agents on the host, not in containers. BooCoder shells out via local-exec.
- No OAuth in v2.0. MCP server is stdio-only until secret storage lands.
- DB rename `boocode_db``boochat_db` lands with v2.0.0 (one-time migration).
## AGENTS.md extensions (v2.0.0)
Port from `qodo-ai/agents` (MIT) `agent.toml` schema and `ai-christianson/RA.Aid` (Apache-2.0) three-stage pattern:
| Field | Type | Purpose | Source |
|---|---|---|---|
| `steps` | number | Per-agent step cap (already shipped v1.14.0) | opencode |
| `output_schema` | JSON Schema | Structured output constraint for the agent's final response | qodo-ai/agents |
| `exit_expression` | string | Regex/predicate — when the agent considers itself done | qodo-ai/agents |
| `execution_strategy` | `plan` \| `act` \| `research` | Which phase of the RA.Aid three-stage pattern this agent operates in | qodo-ai/agents + RA.Aid |
| `model` | string | Per-agent model override (already shipped v1.8) | — |
| `expert_model` | string | Escalation model for hard reasoning (RA.Aid "expert tool" escape hatch) | RA.Aid |
The three-stage pattern maps to BooCoder's use case:
- **Research agent** (cheap model) → understand the task, find relevant files
- **Planning agent** (standard model) → decide which files to edit, what the changes look like
- **Implementation agent** (full model) → produce the actual diffs
`expert_model` is the escape hatch: a routine model handles most subtasks, but can call the expert model (e.g. qwopus27b) when stuck. Matches Sam's existing cost-routing discipline.
## Subagent isolation (Boomerang pattern, v2.0.1)
From Roo Code Boomerang Tasks (Apache-2.0 pattern):
When an orchestrator agent calls a `new_task` tool, BooCoder:
1. Creates a fresh `tasks` row with `parent_task_id` pointing to the orchestrator's task
2. Spawns a fresh inference session (Path A) or dispatch (Path B) with ONLY the task spec as context — no inherited conversation
3. Child runs to `attempt_completion`, writes a summary to `tasks.output_summary`
4. Parent resumes reading ONLY the summary (not the child's full conversation)
**Three principles:**
- Orchestrator capability restriction: the orchestrator agent's tool list includes ONLY `new_task`, `list_tasks`, `check_task_status` — it cannot read files or call MCP tools directly
- Down-pass: parent sends task spec via `new_task(input)`, nothing else inherited
- Up-pass: child sends result via `attempt_completion(summary)`, nothing else surfaces to parent
This is the **single most important context-management primitive** — it prevents long-running orchestrators from poisoning their context with implementation detail.
## Observation hooks (v2.0.3)
From `siropkin/budi` (MIT) Claude Code 5-hook taxonomy:
Register BooCoder as a hook receiver for dispatched agents. Five events:
- `SessionStart` — agent spawned
- `UserPromptSubmit` — task spec delivered
- `PostToolUse` — each tool call completed
- `SubagentStart` — nested dispatch
- `Stop` — agent finished
These map directly to BooCode's existing WS frame protocol. The hook receiver is the BooCoder Fastify server; events flow into the `message_parts` taxonomy as `step_start`-style instrumentation parts.
## Follow-up batches (v2.0+ optional, ordered by value)
| Batch | Source | What | When |
|---|---|---|---|
| **PR-resolver tool** | `qodo-ai/qodo-skills` (MIT) | Fetch GitHub issues → batch/interactive fix → inline PR reply. BooCoder tool that replaces Sam's manual PR workflow. | v2.0.3+ |
| **HMAC audit log** | `sipyourdrink-ltd/bernstein` (verify license) | One new `audit_log` table with `prev_hmac` field. Tamper-evident history of every edit BooCoder makes. Small lift (~50 LoC). | v2.0.1+ |
| **Blind-validation gate** | `covibes/zeroshot` (MIT) | Verify gate runs in a separate agent context that sees ONLY the diff + acceptance criteria, not the producing conversation. Complements Boomerang (isolation) + bernstein (lineage). | v2.0.2+ |
| **Majority-vote ensembler** | `augmentcode/augment-swebench-agent` (MIT) | K candidate diffs from K agents → ranker model picks the best one. Optional layer above `pending_changes`. | v2.1+ |
| **Drift detection** | `memovai/memov` (MIT) | `validate_commit` concept — detects when actual changes diverge from what was requested. Shadow timeline comparison. | v2.0.3+ |
| **Anti-slop for frontend** | `Leonxlnx/taste-skill` (MIT) | 100+ specific font/color/layout ban list + 3-dial parameterization. Vendor into skills/ when BooCoder generates frontend code. | v2.0+ |
| **Verify-before-commit gate** | `DeepSourceCorp/globstar` (MIT) | Rule-based AST linter as a pre-apply quality gate. YAML checkers in `.globstar/`. | v2.1+ (parked) |
| **Docker sandbox** | `OpenHands/OpenHands` (MIT) | Per-session Docker container for write tools. Closes the `/opt:rw` mount risk if path-guard ever proves insufficient. | v2.1 (optional) |
| **Multi-provider LLM** | `earendil-works/pi` (MIT) | Provider abstraction if a need for Anthropic/OpenAI/Mistral direct surfaces beyond llama-swap. | v2.x (optional) |
## Repos to clone before starting
```bash
cd /opt/forks
git clone https://github.com/Dominic789654/agent-hub.git # Apache-2.0, task DAG + dispatcher
git clone https://github.com/plandex-ai/plandex.git # MIT, pending-changes UX
git clone https://github.com/anomalyco/opencode.git # MIT, permission evaluate.ts reference
git clone https://github.com/qodo-ai/agents.git # MIT, agent.toml schema (output_schema, exit_expression, execution_strategy)
```
Also read (no clone needed):
- `ai-christianson/RA.Aid` README — three-stage pattern + expert-tool escape hatch
- `getpaseo/paseo` README + `skills/` directory — daemon architecture + CLI verbs (AGPL, design-only)
- `agentclientprotocol.com` spec — ACP stdio protocol
- `goose-docs.ai/docs/guides/acp-clients/``context_servers` auto-forward pattern
- `siropkin/budi` README — 5-hook Claude Code taxonomy for observation
ACP SDK and MCP SDK are npm packages installed at implementation time.

View File

@@ -0,0 +1,130 @@
# v2.0 — BooCoder task breakdown
## Phase 0 — Prep (before any code)
- [ ] Clone lift sources: `agent-hub`, `plandex`, `opencode` to `/opt/forks/`
- [ ] Read agent-hub's schema + dispatcher pattern (Apache-2.0)
- [ ] Read plandex's pending-changes + diff/apply/rewind flow (MIT)
- [ ] Read opencode's `permission/evaluate.ts` for write-gate patterns (MIT)
- [ ] Install ACP SDK: `pnpm add @zed-industries/agent-client-protocol`
- [ ] Verify `opencode acp` and `goose acp` are available on the host
- [ ] Write `openspec/changes/v2.0-boocoder/design.md` with finalized decisions
## v2.0.0 — Schema + Path A (native write tools + pending-changes + diff UI)
### Infra
- [ ] Create `apps/coder/` directory skeleton (Fastify server, mirroring `apps/server/` structure)
- [ ] Create `apps/coder/Dockerfile` (Node 20 bookworm-slim, `/opt:/opt:rw` mount)
- [ ] Add `boocoder` service to `docker-compose.yml` (port 9502, boocode_net)
- [ ] Add Caddy route: `coder.indifferentketchup.com` → boocoder:9502
- [ ] DB rename: `boocode_db``boochat_db` (one-time ALTER DATABASE + docker-compose volume rename)
- [ ] Schema migration: CREATE TABLE `pending_changes`, `tasks`, `available_agents`; CREATE VIEW `human_inbox`
- [ ] Container guidance: `BOOCODER.md` (bind-mounted at `/app/BOOCODER.md`)
### Write tools
- [ ] `apps/coder/src/services/write_guard.ts``resolveWritePath(projectRoot, filePath)` (resolve + prefix-check, no realpath since file may not exist)
- [ ] `apps/coder/src/services/pending_changes.ts` — queue, apply, reject, revert operations
- [ ] Tool: `edit_file` — takes `{file_path, old_string, new_string}`, computes unified diff, queues in `pending_changes`
- [ ] Tool: `create_file` — takes `{file_path, content}`, queues as `operation='create'`
- [ ] Tool: `delete_file` — takes `{file_path}`, queues as `operation='delete'`
- [ ] Tool: `apply_pending` — flushes pending changes to disk (re-validates write_guard before each write)
- [ ] Tool: `rewind` — reverts applied changes by inverse-diff
### Inference loop
- [ ] Port the v1.14 outer loop from `apps/server/` into `apps/coder/` (or share via workspace package)
- [ ] Register write tools in the coder's tool registry (alongside all read tools from BooChat)
- [ ] Permission gate: write tools require `pending_changes` queue (can't bypass to direct disk write)
### Frontend (diff pane)
- [ ] Create `apps/coder/web/` SPA (React + Vite, same stack as BooChat's `apps/web/`)
- [ ] Diff pane component: shows pending changes with syntax-highlighted diffs
- [ ] Approve / Reject per change, Approve All / Reject All buttons
- [ ] Workspace splitter integration (chat pane + diff pane side by side)
### Verification
- [ ] `pnpm -C apps/coder build` clean
- [ ] Write path-guard fuzz tests (traversal patterns, symlinks, non-existent paths, `.env` deny)
- [ ] `docker compose up --build -d` — boocoder container starts, healthcheck passes
- [ ] Smoke: send a chat requesting a file edit → see it queued in diff pane → approve → file written
## v2.0.1 — Path B (ACP dispatch + PTY fallback + worktrees)
### ACP client
- [ ] `apps/coder/src/services/acp-client.ts` — spawn `opencode acp` / `goose acp` via `@zed-industries/agent-client-protocol` StdioTransport
- [ ] Event mapping: ACP `file_operation``tool_call` part, `terminal_output` → BooTerm route, `permission_request` → pause
- [ ] Session lifecycle: start, mid-session model switch, end
- [ ] MCP auto-forward: pass BooCoder's `context_servers` config to the ACP session
### PTY fallback
- [ ] `apps/coder/src/services/pty-dispatch.ts` — spawn `claude` / `pi` / `smallcode` via `node-pty`
- [ ] Capture stdout/stderr/exit-code into parts (less structured than ACP)
- [ ] Worktree setup: `git worktree add /tmp/booworktrees/<task-id> -b task-<task-id> HEAD`
- [ ] On completion: diff worktree vs HEAD → queue into `pending_changes`
### Dispatcher
- [ ] `apps/coder/src/services/dispatcher.ts` — polls `tasks` WHERE `state='pending'`, picks by priority + creation order
- [ ] Transport selection: check `available_agents.supports_acp` at dispatch time
- [ ] On failure: mark `state='failed'`, surface in `human_inbox`
- [ ] On completion: mark `state='completed'`, queue diff if Path B
### Agent probing
- [ ] Startup probe: `which opencode && opencode --version`, `which goose`, `which claude`, `which pi`
- [ ] Populate `available_agents` table with version + ACP support
### Verification
- [ ] Smoke: dispatch a task to `opencode` via ACP → task completes → diff queued
- [ ] Smoke: dispatch to `claude` via PTY fallback → captures output → diff from worktree
- [ ] Worktree cleanup after task completion
## v2.0.2 — BooCoder MCP server
### Implementation
- [ ] `apps/coder/src/services/mcp-server.ts` — register 6 tools as MCP tool handlers
- [ ] Stdio transport (use `@modelcontextprotocol/sdk` server-side, same SDK as client)
- [ ] Tools: `boocoder.create_task`, `boocoder.list_pending_changes`, `boocoder.apply`, `boocoder.reject`, `boocoder.dispatch_external_agent`, `boocoder.list_worktrees`
- [ ] Each tool maps to a DB operation or service call
### Eval
- [ ] Write 10-question eval per `anthropics/skills/mcp-builder` framework
- [ ] Run eval against the MCP server — all 10 must pass before shipping
- [ ] Document eval results in openspec
### Verification
- [ ] From a terminal: `echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | boocoder --mcp` → returns 6 tools
- [ ] From opencode: configure BooCoder as an MCP server in `~/.opencode/config.json`, verify tool calls work
## v2.0.3 — Polish
### CLI client
- [ ] `apps/coder/src/cli.ts` — thin WebSocket/HTTP client against BooCoder API
- [ ] Verbs: `boocode run <task>`, `boocode ls`, `boocode attach <id>`, `boocode send <id> <message>`
- [ ] Mirrors Paseo's UX, license-clean implementation
### Human inbox UI
- [ ] Frontend route showing tasks in `blocked`/`failed` state
- [ ] Per-task: view output, retry, cancel, reassign to different agent
### Cost tracking
- [ ] `tasks.cost_tokens` populated from inference usage
- [ ] Summary view: per-project, per-agent, per-day token spend
### Verification
- [ ] `boocode run "add a health endpoint"` from terminal → task appears in UI → completes → diff in pane
- [ ] `boocode ls` shows running/completed/failed tasks

45
pnpm-lock.yaml generated
View File

@@ -46,6 +46,40 @@ importers:
specifier: ^5.5.0
version: 5.9.3
apps/coder:
dependencies:
'@boocode/server':
specifier: workspace:*
version: link:../server
'@fastify/static':
specifier: ^7.0.4
version: 7.0.4
'@fastify/websocket':
specifier: ^10.0.1
version: 10.0.1
fastify:
specifier: ^4.28.1
version: 4.29.1
postgres:
specifier: ^3.4.4
version: 3.4.9
zod:
specifier: ^3.23.8
version: 3.25.76
devDependencies:
'@types/node':
specifier: ^20.14.10
version: 20.19.41
tsx:
specifier: ^4.16.2
version: 4.22.0
typescript:
specifier: ^5.5.0
version: 5.9.3
vitest:
specifier: ^3.0.0
version: 3.2.4(@types/debug@4.1.13)(@types/node@20.19.41)(lightningcss@1.32.0)(msw@2.14.6(@types/node@20.19.41)(typescript@5.9.3))
apps/server:
dependencies:
'@ai-sdk/openai-compatible':
@@ -57,6 +91,9 @@ importers:
'@fastify/websocket':
specifier: ^10.0.1
version: 10.0.1
'@modelcontextprotocol/sdk':
specifier: ^1.29.0
version: 1.29.0(zod@3.25.76)
ai:
specifier: ^6.0.190
version: 6.0.190(zod@3.25.76)
@@ -5774,7 +5811,7 @@ snapshots:
bytes: 3.1.2
content-type: 1.0.5
debug: 4.4.3
http-errors: 2.0.0
http-errors: 2.0.1
iconv-lite: 0.7.2
on-finished: 2.4.1
qs: 6.15.1
@@ -6151,7 +6188,7 @@ snapshots:
etag: 1.8.1
finalhandler: 2.1.1
fresh: 2.0.0
http-errors: 2.0.0
http-errors: 2.0.1
merge-descriptors: 2.0.0
mime-types: 3.0.2
on-finished: 2.4.1
@@ -6163,7 +6200,7 @@ snapshots:
router: 2.2.0
send: 1.2.1
serve-static: 2.2.1
statuses: 2.0.1
statuses: 2.0.2
type-is: 2.1.0
vary: 1.1.2
transitivePeerDependencies:
@@ -6262,7 +6299,7 @@ snapshots:
escape-html: 1.0.3
on-finished: 2.4.1
parseurl: 1.3.3
statuses: 2.0.1
statuses: 2.0.2
transitivePeerDependencies:
- supports-color