Files
boocode/apps/server/CLAUDE.md
indifferentketchup 381b97f78a feat(server): inference state-graph + supervisor, memory tools, MCP client, schema, routes
- Add state-graph.ts: typed state machine for inference lifecycle
- Add supervisor.ts: agent supervisor pattern for multi-agent coordination
- Add export-formatter.ts: structured export formatting
- Add manage_memory.ts: memory CRUD tool for agent persistence
- Add get_wiki_article.ts: codecontext wiki article retrieval
- Extend memory/index.ts: 3-tier memory (context/daily/core)
- Extend MCP client: mcp-config.ts env-var substitution
- Update schema.sql: agent_sessions, tasks, pending_changes extensions
- Update API types: MessageMetadata, ErrorReason, AgentSessionConfig
- Update routes: chats, messages, sessions — column renames and agent_session_id
- Update inference: error handler, payload builder, stream phase, turn orchestrator
2026-06-08 03:48:47 +00:00

56 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# apps/server — BooChat backend (deep reference) — v2.7.x (last meaningful update: 2026-06)
> Per-app engineering notes for `apps/server/src/`. Cross-cutting commands, database, environment, workflow, and cross-app contracts (WS-frame / provider-type parity, sentinels) live in the **root `CLAUDE.md`**. This file auto-loads when you read/edit files under `apps/server/`.
## These gotchas are load-bearing — do not remove or refactor without understanding why
- Do NOT remove the abort-signal pinning comment in `stream-phase.ts``fullStream` exits cleanly on abort without throwing; the post-iteration `if (signal?.aborted)` check is the only thing that distinguishes cancelled from complete.
- Do NOT remove `includeUsage: true` from `provider.ts` — the adapter defaults it false; without it, token counts are always NULL.
- Do NOT add raw `broker.publish()`/`publishUser()` calls — always use `publishFrame`/`publishUserFrame` which Zod-validate against `WsFrameSchema`.
## Stack
- **Fastify** with `@fastify/websocket` and `@fastify/static` (serves the built frontend).
- **postgres** (porsager/postgres) with tagged-template SQL — no ORM. Schema in `schema.sql`, applied on startup. LSP may false-positive on `sql<Type[]>\`...\`` generics; CLI `tsc` / `pnpm build` is authoritative.
- **Zod** for request validation and config parsing.
## Key services
- **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn/runInference/createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`, `MAX_STEPS`); `stream-phase.ts` (streamCompletion AI SDK adapter + executeStreamPhase); `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap); `tool-phase.ts` (executeToolPhase → `ToolPhaseResult`; the turn loop lives in turn.ts, not recursion); `sentinel-summaries.ts` (cap-hit/doom-loop/step-cap summaries + inserters); `error-handler.ts` (handleAbortOrError, finalizeCompletion); `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`); `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`); `budget.ts` (resolveToolBudget); `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls); `parts.ts` (`partsFromAssistantMessage`/`partsFromToolMessage`/`insertParts` — parts are the sole source of truth); `prune.ts` (two-tier compaction; `selectPruneTargets` is the pure helper); `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope, reset in `runInference` at the user-message boundary. Outer loop: `while (stepNumber < effectiveCap)`, `effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS=200)`. Per-agent `steps:` in AGENTS.md frontmatter; `steps: 0` = text-only. Step-cap hit writes a `cap_hit` sentinel (`CapHitSentinel.tsx` renders it).
- **AI SDK v6 streamCompletion adapter** (`services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/tests won't catch:
- **Abort signals are swallowed.** `streamText`'s `fullStream` exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required, else the row finalizes `complete` instead of `cancelled`. Don't refactor away the pinning comment.
- **Usage lands only at stream end** via `await result.usage` (v6 `inputTokens`/`outputTokens` → mapped to `promptTokens`/`completionTokens`). No mid-stream tok/s; ChatThroughput shows one value at stream end.
- **Tools have NO `execute` field.** BooCode dispatches tools in tool-phase.ts, not the AI SDK loop — only `description` + `inputSchema: jsonSchema(parameters)`.
- **`includeUsage: true` MUST be set on `createOpenAICompatible`** in `provider.ts`. The adapter defaults it false → no `stream_options.include_usage` → llama-swap emits no usage block → `result.usage` resolves `undefined` (NULL token counts). Don't remove during refactor.
- **Tool-call-only turns may emit a leading `\n` text-delta.** `MessageList.flatten`'s `hasText` and `MessageBubble`'s `hasContent` both `.trim()` before the length check, else whitespace-only content renders an empty bubble + ActionRow between tool calls. `buildMessagesPayload` also skips `status='failed'` and complete-but-empty assistant rows (avoids "Cannot have 2 or more assistant messages at the end of the list" upstream rejection after cap-hit + Continue).
- **`services/inference/tool-shim.ts`** — Recovers structured tool calls from plain-text model output. Some models (notably Qwen) emit `<tool_call><name>...</name><arguments>...</arguments></tool_call>` inline text instead of structured JSON. `extractToolCalls(text)` parses both XML and JSON inline formats. `hasToolCallMarkup(text)` is a fast pre-check. Used as a fallback in the stream phase when structured `tool_calls` parse fails. Does NOT require `FAST_MODEL` — operates on the existing turn's output text.
- **`services/inference/loop-detectors.ts`** — Six detectors that catch repetitive model behavior: `detectContentRepeat` (same content N times), `detectToolLoop` (same tool called consecutively). `detectDoomLoop` combines both. These are additive to the existing `sentinels.ts` doom-loop detection.
- **AI SDK ModelMessage conversion** (`toModelMessages` in stream-phase.ts). Tool messages need a `toolName` for `ToolResultPart`; BooCode's OpenAI-shape history lacks it, so a forward-scan builds a `tool_call_id → toolName` map from prior assistant `tool_calls`. Tool outputs wrapped as `{ type: 'json' | 'text', value }` (v6 `ToolResultOutput`). Reasoning emits a `ReasoningPart` first in the content array.
- **`experimental_repairToolCall`** wired into `streamText` to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through: logs the bad call, returns it unmodified; `executeToolPhase`'s zod-reject path routes it back to the model next turn.
- **`chat_status` frame** (via `broker.publishUser`) — `status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'`. Frontend `useChatStatus` derives `idle_warm` (<30s since idle) vs `idle_cold`. `ChatThroughput` renders beside `StatusDot` only when streaming/tool_running, fed by 500ms-throttled `'usage'` frames (`completion_tokens` + `ctx_used` + `ctx_max`). `POST /api/chats/:id/discard_stale` marks a stuck-streaming row `failed` when the frontend's 60s no-token timer gives up.
- **Stale-streaming sweeps** (`apps/server/src/index.ts`): a boot-time pass after `applySchema()` and a periodic 60s `setInterval` both flip `messages.status='streaming'` older than 5 min to `failed` (publishing `chat_status='idle'`); the interval also runs `cleanupTruncations` (TTL + orphan reap of tmpfs truncation files). `onClose` hook clears the timer. Recovers from a container restart mid-stream.
- **`services/broker.ts`** — In-memory pub/sub, two channel types: per-session (message streaming) and per-user (sidebar). No persistence; clients reconnect on restart. Every WS publish goes through `broker.publishFrame(sessionId, frame)` / `publishUserFrame(user, frame)` — both Zod-validate against `WsFrameSchema` (`types/ws-frames.ts`) and fail-closed (log + drop). Schema single-sourced in `@boocode/contracts` (`packages/contracts/src/ws-frames.ts`); the package's `ws-frames.test.ts` enforces schema correctness. Don't add raw `broker.publish()`/`publishUser()` calls.
- **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) pass three guards: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). Web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (falls back to `project.default_web_search_enabled`) and filtered out of the LLM tool schema when false. Truncation: when a tool slice cuts content, `services/truncate.ts` stashes the full text on tmpfs (`BOOCODE_TRUNCATION_DIR`, default `/tmp/boocode-truncations`, 0o700) keyed by `tr_<12 base32>`; `view_truncated_output(id)` retrieves it. 5MB cap, 7-day TTL, reaped by the sweeper. Container restart loses retrieval — acceptable.
- **`services/compaction.ts`** + **`services/model-context.ts`** — Anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself each compaction). Triggered when `chats.needs_compaction` is set after a turn exceeds `usable(ctx_max) = floor(0.85 × ctx_max)`. **`ctx_max` comes from `model-context.getModelContext()` fetching `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx`. First inferences after boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model; negative cache TTL 60s, recovers next turn. `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on assistant `content` (OpenAI wire shape has no structured reasoning field); standalone tag when content is empty. `buildHeadPayload` + `OpenAiMessage` exported for tests — keep them exported.
- **`services/system-prompt.ts`** — `buildSystemPrompt` is the string shim; `buildSystemPromptWithFingerprint` is the canonical impl returning `{prompt, fingerprint, drift}`. SHA-256 of the assembled prefix is logged per `buildMessagesPayload` (`prefix-fingerprint`, info); a `Map<sessionId, lastHash>` fires `prefix-drift` (warn) on change with a `changed_inputs` diff. The prefix is byte-stable in steady-state, so prefix caching is left to the input-layer mtime caches (BOOCHAT.md + AGENTS.md global/per-project in `agents.ts:safeStat`).
- **`services/inference/budget.ts`** — tool-call budgets: `BUDGET_READ_ONLY = 30`, `BUDGET_NON_READ_ONLY = 10` (forward-looking; no write tools yet), `BUDGET_NO_AGENT = 30` (every `ALL_TOOLS` tool is read-only today, so no-agent shares the read-only cap). Per-agent `max_tool_calls` from AGENTS.md overrides.
- **`messages_with_parts` view** (`schema.sql`). Read sites needing `tool_calls` / `tool_results` / `reasoning_parts` SELECT from this view, NOT `messages` — the legacy `messages.tool_calls`/`tool_results` JSON columns were dropped; the view reads parts-only subselects. Writes target `message_parts` via `insertParts` (or `partsFromAssistantMessage`/`partsFromToolMessage`). The `Message` wire type still carries `tool_calls?`/`tool_results?` because the view synthesizes them. Shapes: `tool_calls jsonb[]`, `tool_results jsonb` (single object), `reasoning_parts jsonb[]` of `{text}`. To UPDATE a message and return its full shape, do a two-step UPDATE returning `id` then SELECT from the view — RETURNING off bare `messages` no longer carries the tool fields. **`messages.model`** (attribution chip) stamps the model per assistant turn — at `finalizeCompletion` (BooChat + native coder) + the dispatcher's assistant-row INSERT (external coder); read via the view + the `message_complete` frame, rendered by `shortenModelName`.
- **`services/file_ops.ts`** — Shared file operation implementations used by both inference tools and HTTP routes.
- **`services/auto_name.ts`** — Non-streaming LLM call to generate 4-word session titles after the first assistant reply.
- **Provider picker dispatch**: when `provider !== 'boocode'`, the message route creates a `tasks` row (with `session_id` set) instead of calling `inference.enqueue`. The dispatcher (in `apps/coder`) picks it up and dispatches via ACP or PTY using the agent's `install_path`.
Route registration: all routes registered in `index.ts` via `register*Routes(app, sql, ...)`. Routes live in `routes/*.ts`.
## Server conventions
- **New tools** live in their own `services/<name>.ts` (see `web_search.ts`, `web_fetch.ts`) — a pure `executeFoo(input, ...deps)` for direct test access plus a `ToolDef` wrapper that `loadConfig()`s its real deps. Register the ToolDef in `tools.ts` `ALL_TOOLS` (and `READ_ONLY_TOOL_NAMES` if applicable). Inject `fetcher: typeof fetch = fetch` rather than `vi.spyOn(globalThis, 'fetch')`.
- **DB/session-aware tools** take an optional 4th `ToolExecCtx { sql, sessionId }` arg on `ToolDef.execute`, plumbed `executeToolPhase``executeToolCall``execute`. Optional so filesystem tools and the `apps/coder` `ALL_TOOLS` consumer stay compatible; filesystem tools ignore it. `read_tab_by_number` is the reference.
- **ReadableStream test stubs** use `pull()` (not `start()`) so chunks are produced lazily — `start()` enqueues everything and closes before the consumer reads, so a later `reader.cancel()` finds the stream closed and the `cancel()` callback never fires. Provide MORE chunks than the test consumes so the source stays 'readable' when cancel runs.
- Tool-name whitelists must derive from `ALL_TOOLS` in `services/tools.ts`, never hardcoded (this drift class hit `services/agents.ts` `ALL_TOOL_NAMES` before).
- Agent registry lives at `data/AGENTS.md` (global, bind-mounted at `/data/AGENTS.md`). No per-project `AGENTS.md` in this repo (removed to eliminate two-files-must-stay-in-sync drift); the `getAgentsForProject` per-project override mechanism remains for *other* projects.
- `data/AGENTS.md` is PARSED (`agents.ts` `splitSections`/`parseAgentSection`): each `## <Name>` is one agent and must be followed by a `---` frontmatter fence or the block throws; content before the first `## ` is discarded. Do NOT add free-form `## ` rule sections — they break the registry. Cross-cutting agent rules go in CLAUDE.md or a parser-ignored preamble.
- MCP stdio transport uses newline-delimited JSON (NDJSON), NOT LSP-style `Content-Length` headers. `codecontext/shim.go` is the reference (per the MCP spec, modelcontextprotocol.io/specification/server/transports).
- **`payload.ts:loadContext` SELECT** must include every `Session` field downstream code reads. The tool phase reads `session.allowed_read_paths`; if the SELECT omits it, cross-repo read grants silently fail. `sql<Session[]>` doesn't enforce column coverage, so the type doesn't catch it.
- **Sidecar routing** (`services/inference/provider.ts`): `upstreamModel(config, modelId, agent)` routes to `LLAMA_SIDECAR_URL` when the agent has `llama_extra_args`, else `LLAMA_SWAP_URL`. `resolveRoute(agent)` returns `{route, flags}`. Sidecar provider created fresh per call (not cached) because `X-Agent-Flags` varies per agent. Boot-time guard in `index.ts` refuses to start if any agent has `llama_extra_args` but `LLAMA_SIDECAR_URL` is unset.
- **Secret guard safe patterns** (`services/secret_guard.ts`): `.env.example`, `.env.sample`, `.env.template`, `.env.defaults` are allowlisted via `SAFE_PATTERNS`. Do NOT add `.env.production`/`.env.development`/`.env.test` — those can hold real secrets.
- **llama-sidecar** (`/opt/forks/llama-sidecar/`): Go daemon for a per-agent llama-server process pool (routed to via "Sidecar routing" above). Cross-compile: `GOOS=windows GOARCH=amd64 /snap/go/current/bin/go build -o bin/llama-sidecar.exe ./cmd/llama-sidecar`. Gitea: `indifferentketchup/llama-sidecar`. Windows child-process gotchas: `context.Background()` for child lifetime (not request ctx), `os.Open(os.DevNull)` for stdin, `os.Pipe()` for stdout with a drain goroutine, `DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP` flags. SSH to sam-desktop: `ssh samki@100.101.41.16`; use `schtasks` for persistent spawning (SSH `start /B` doesn't survive session close).