Files

indifferentketchup 381b97f78a feat(server): inference state-graph + supervisor, memory tools, MCP client, schema, routes

- Add state-graph.ts: typed state machine for inference lifecycle
- Add supervisor.ts: agent supervisor pattern for multi-agent coordination
- Add export-formatter.ts: structured export formatting
- Add manage_memory.ts: memory CRUD tool for agent persistence
- Add get_wiki_article.ts: codecontext wiki article retrieval
- Extend memory/index.ts: 3-tier memory (context/daily/core)
- Extend MCP client: mcp-config.ts env-var substitution
- Update schema.sql: agent_sessions, tasks, pending_changes extensions
- Update API types: MessageMetadata, ErrorReason, AgentSessionConfig
- Update routes: chats, messages, sessions — column renames and agent_session_id
- Update inference: error handler, payload builder, stream phase, turn orchestrator

2026-06-08 03:48:47 +00:00

15 KiB

Raw Blame History

apps/server — BooChat backend (deep reference) — v2.7.x (last meaningful update: 2026-06)

Per-app engineering notes for apps/server/src/. Cross-cutting commands, database, environment, workflow, and cross-app contracts (WS-frame / provider-type parity, sentinels) live in the root CLAUDE.md. This file auto-loads when you read/edit files under apps/server/.

These gotchas are load-bearing — do not remove or refactor without understanding why

Do NOT remove the abort-signal pinning comment in stream-phase.ts — fullStream exits cleanly on abort without throwing; the post-iteration if (signal?.aborted) check is the only thing that distinguishes cancelled from complete.
Do NOT remove includeUsage: true from provider.ts — the adapter defaults it false; without it, token counts are always NULL.
Do NOT add raw broker.publish()/publishUser() calls — always use publishFrame/publishUserFrame which Zod-validate against WsFrameSchema.

Stack

Fastify with @fastify/websocket and @fastify/static (serves the built frontend).
postgres (porsager/postgres) with tagged-template SQL — no ORM. Schema in schema.sql, applied on startup. LSP may false-positive on sql<Type[]>\...`generics; CLItsc/pnpm build` is authoritative.
Zod for request validation and config parsing.

Key services

services/inference/ — Public surface re-exported via inference/index.ts; callers import from ./services/inference/index.js explicitly (NodeNext doesn't honor directory-index resolution). Layout: turn.ts (runAssistantTurn/runInference/createInferenceRunner; exports InferenceFrame, InferenceContext, TurnArgs, StreamResult, MAX_STEPS); stream-phase.ts (streamCompletion AI SDK adapter + executeStreamPhase); provider.ts (upstreamModel(baseURL, modelId) wrapping createOpenAICompatible against llama-swap); tool-phase.ts (executeToolPhase → ToolPhaseResult; the turn loop lives in turn.ts, not recursion); sentinel-summaries.ts (cap-hit/doom-loop/step-cap summaries + inserters); error-handler.ts (handleAbortOrError, finalizeCompletion); payload.ts (buildMessagesPayload, loadContext, maybeFlagForCompaction, OpenAiMessage); sentinels.ts (detectDoomLoop, DOOM_LOOP_THRESHOLD); budget.ts (resolveToolBudget); xml-parser.ts (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls); parts.ts (partsFromAssistantMessage/partsFromToolMessage/insertParts — parts are the sole source of truth); prune.ts (two-tier compaction; selectPruneTargets is the pure helper); types.ts (StreamPhaseState, DB_FLUSH_INTERVAL_MS). TurnArgs is the per-turn state envelope, reset in runInference at the user-message boundary. Outer loop: while (stepNumber < effectiveCap), effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS=200). Per-agent steps: in AGENTS.md frontmatter; steps: 0 = text-only. Step-cap hit writes a cap_hit sentinel (CapHitSentinel.tsx renders it).
AI SDK v6 streamCompletion adapter (services/inference/stream-phase.ts). streamText is the underlying call; the BooCode layer (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/tests won't catch:
- Abort signals are swallowed. streamText's fullStream exits cleanly when abortSignal fires — no throw. Post-iteration if (signal?.aborted) throw <AbortError> is required, else the row finalizes complete instead of cancelled. Don't refactor away the pinning comment.
- Usage lands only at stream end via await result.usage (v6 inputTokens/outputTokens → mapped to promptTokens/completionTokens). No mid-stream tok/s; ChatThroughput shows one value at stream end.
- Tools have NO execute field. BooCode dispatches tools in tool-phase.ts, not the AI SDK loop — only description + inputSchema: jsonSchema(parameters).
- includeUsage: true MUST be set on createOpenAICompatible in provider.ts. The adapter defaults it false → no stream_options.include_usage → llama-swap emits no usage block → result.usage resolves undefined (NULL token counts). Don't remove during refactor.
- Tool-call-only turns may emit a leading \n text-delta. MessageList.flatten's hasText and MessageBubble's hasContent both .trim() before the length check, else whitespace-only content renders an empty bubble + ActionRow between tool calls. buildMessagesPayload also skips status='failed' and complete-but-empty assistant rows (avoids "Cannot have 2 or more assistant messages at the end of the list" upstream rejection after cap-hit + Continue).
services/inference/tool-shim.ts — Recovers structured tool calls from plain-text model output. Some models (notably Qwen) emit <tool_call><name>...</name><arguments>...</arguments></tool_call> inline text instead of structured JSON. extractToolCalls(text) parses both XML and JSON inline formats. hasToolCallMarkup(text) is a fast pre-check. Used as a fallback in the stream phase when structured tool_calls parse fails. Does NOT require FAST_MODEL — operates on the existing turn's output text.
services/inference/loop-detectors.ts — Six detectors that catch repetitive model behavior: detectContentRepeat (same content N times), detectToolLoop (same tool called consecutively). detectDoomLoop combines both. These are additive to the existing sentinels.ts doom-loop detection.
AI SDK ModelMessage conversion (toModelMessages in stream-phase.ts). Tool messages need a toolName for ToolResultPart; BooCode's OpenAI-shape history lacks it, so a forward-scan builds a tool_call_id → toolName map from prior assistant tool_calls. Tool outputs wrapped as { type: 'json' | 'text', value } (v6 ToolResultOutput). Reasoning emits a ReasoningPart first in the content array.
experimental_repairToolCall wired into streamText to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through: logs the bad call, returns it unmodified; executeToolPhase's zod-reject path routes it back to the model next turn.
chat_status frame (via broker.publishUser) — status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'. Frontend useChatStatus derives idle_warm (<30s since idle) vs idle_cold. ChatThroughput renders beside StatusDot only when streaming/tool_running, fed by 500ms-throttled 'usage' frames (completion_tokens + ctx_used + ctx_max). POST /api/chats/:id/discard_stale marks a stuck-streaming row failed when the frontend's 60s no-token timer gives up.
Stale-streaming sweeps (apps/server/src/index.ts): a boot-time pass after applySchema() and a periodic 60s setInterval both flip messages.status='streaming' older than 5 min to failed (publishing chat_status='idle'); the interval also runs cleanupTruncations (TTL + orphan reap of tmpfs truncation files). onClose hook clears the timer. Recovers from a container restart mid-stream.
services/broker.ts — In-memory pub/sub, two channel types: per-session (message streaming) and per-user (sidebar). No persistence; clients reconnect on restart. Every WS publish goes through broker.publishFrame(sessionId, frame) / publishUserFrame(user, frame) — both Zod-validate against WsFrameSchema (types/ws-frames.ts) and fail-closed (log + drop). Schema single-sourced in @boocode/contracts (packages/contracts/src/ws-frames.ts); the package's ws-frames.test.ts enforces schema correctness. Don't add raw broker.publish()/publishUser() calls.
services/tools.ts — Tool registry (ALL_TOOLS, READ_ONLY_TOOL_NAMES, TOOLS_BY_NAME). Filesystem tools (view_file/list_dir/grep/find_files) pass three guards: path_guard.ts (workspace scope), secret_guard.ts (filename deny list), url_guard.ts (SSRF/private-IP block for web_fetch). Web tools (web_search, web_fetch) are opt-in per chat via session.web_search_enabled (falls back to project.default_web_search_enabled) and filtered out of the LLM tool schema when false. Truncation: when a tool slice cuts content, services/truncate.ts stashes the full text on tmpfs (BOOCODE_TRUNCATION_DIR, default /tmp/boocode-truncations, 0o700) keyed by tr_<12 base32>; view_truncated_output(id) retrieves it. 5MB cap, 7-day TTL, reaped by the sweeper. Container restart loses retrieval — acceptable.
services/compaction.ts + services/model-context.ts — Anchored rolling summary (single summary=true assistant row per chat, supersedes itself each compaction). Triggered when chats.needs_compaction is set after a turn exceeds usable(ctx_max) = floor(0.85 × ctx_max). ctx_max comes from model-context.getModelContext() fetching ${LLAMA_SWAP_URL}/upstream/<model>/props — NOT from parsed.timings.n_ctx. First inferences after boot may have ctx_max=NULL if llama-swap hasn't loaded the model; negative cache TTL 60s, recovers next turn. buildHeadPayload embeds reasoning_parts as a <reasoning>...</reasoning> prose prefix on assistant content (OpenAI wire shape has no structured reasoning field); standalone tag when content is empty. buildHeadPayload + OpenAiMessage exported for tests — keep them exported.
services/system-prompt.ts — buildSystemPrompt is the string shim; buildSystemPromptWithFingerprint is the canonical impl returning {prompt, fingerprint, drift}. SHA-256 of the assembled prefix is logged per buildMessagesPayload (prefix-fingerprint, info); a Map<sessionId, lastHash> fires prefix-drift (warn) on change with a changed_inputs diff. The prefix is byte-stable in steady-state, so prefix caching is left to the input-layer mtime caches (BOOCHAT.md + AGENTS.md global/per-project in agents.ts:safeStat).
services/inference/budget.ts — tool-call budgets: BUDGET_READ_ONLY = 30, BUDGET_NON_READ_ONLY = 10 (forward-looking; no write tools yet), BUDGET_NO_AGENT = 30 (every ALL_TOOLS tool is read-only today, so no-agent shares the read-only cap). Per-agent max_tool_calls from AGENTS.md overrides.
messages_with_parts view (schema.sql). Read sites needing tool_calls / tool_results / reasoning_parts SELECT from this view, NOT messages — the legacy messages.tool_calls/tool_results JSON columns were dropped; the view reads parts-only subselects. Writes target message_parts via insertParts (or partsFromAssistantMessage/partsFromToolMessage). The Message wire type still carries tool_calls?/tool_results? because the view synthesizes them. Shapes: tool_calls jsonb[], tool_results jsonb (single object), reasoning_parts jsonb[] of {text}. To UPDATE a message and return its full shape, do a two-step UPDATE returning id then SELECT from the view — RETURNING off bare messages no longer carries the tool fields. messages.model (attribution chip) stamps the model per assistant turn — at finalizeCompletion (BooChat + native coder) + the dispatcher's assistant-row INSERT (external coder); read via the view + the message_complete frame, rendered by shortenModelName.
services/file_ops.ts — Shared file operation implementations used by both inference tools and HTTP routes.
services/auto_name.ts — Non-streaming LLM call to generate 4-word session titles after the first assistant reply.
Provider picker dispatch: when provider !== 'boocode', the message route creates a tasks row (with session_id set) instead of calling inference.enqueue. The dispatcher (in apps/coder) picks it up and dispatches via ACP or PTY using the agent's install_path.

Route registration: all routes registered in index.ts via register*Routes(app, sql, ...). Routes live in routes/*.ts.

Server conventions

New tools live in their own services/<name>.ts (see web_search.ts, web_fetch.ts) — a pure executeFoo(input, ...deps) for direct test access plus a ToolDef wrapper that loadConfig()s its real deps. Register the ToolDef in tools.ts ALL_TOOLS (and READ_ONLY_TOOL_NAMES if applicable). Inject fetcher: typeof fetch = fetch rather than vi.spyOn(globalThis, 'fetch').
DB/session-aware tools take an optional 4th ToolExecCtx { sql, sessionId } arg on ToolDef.execute, plumbed executeToolPhase→executeToolCall→execute. Optional so filesystem tools and the apps/coder ALL_TOOLS consumer stay compatible; filesystem tools ignore it. read_tab_by_number is the reference.
ReadableStream test stubs use pull() (not start()) so chunks are produced lazily — start() enqueues everything and closes before the consumer reads, so a later reader.cancel() finds the stream closed and the cancel() callback never fires. Provide MORE chunks than the test consumes so the source stays 'readable' when cancel runs.
Tool-name whitelists must derive from ALL_TOOLS in services/tools.ts, never hardcoded (this drift class hit services/agents.ts ALL_TOOL_NAMES before).
Agent registry lives at data/AGENTS.md (global, bind-mounted at /data/AGENTS.md). No per-project AGENTS.md in this repo (removed to eliminate two-files-must-stay-in-sync drift); the getAgentsForProject per-project override mechanism remains for other projects.
data/AGENTS.md is PARSED (agents.ts splitSections/parseAgentSection): each ## <Name> is one agent and must be followed by a --- frontmatter fence or the block throws; content before the first ## is discarded. Do NOT add free-form ## rule sections — they break the registry. Cross-cutting agent rules go in CLAUDE.md or a parser-ignored preamble.
MCP stdio transport uses newline-delimited JSON (NDJSON), NOT LSP-style Content-Length headers. codecontext/shim.go is the reference (per the MCP spec, modelcontextprotocol.io/specification/server/transports).
payload.ts:loadContext SELECT must include every Session field downstream code reads. The tool phase reads session.allowed_read_paths; if the SELECT omits it, cross-repo read grants silently fail. sql<Session[]> doesn't enforce column coverage, so the type doesn't catch it.
Sidecar routing (services/inference/provider.ts): upstreamModel(config, modelId, agent) routes to LLAMA_SIDECAR_URL when the agent has llama_extra_args, else LLAMA_SWAP_URL. resolveRoute(agent) returns {route, flags}. Sidecar provider created fresh per call (not cached) because X-Agent-Flags varies per agent. Boot-time guard in index.ts refuses to start if any agent has llama_extra_args but LLAMA_SIDECAR_URL is unset.
Secret guard safe patterns (services/secret_guard.ts): .env.example, .env.sample, .env.template, .env.defaults are allowlisted via SAFE_PATTERNS. Do NOT add .env.production/.env.development/.env.test — those can hold real secrets.
llama-sidecar (/opt/forks/llama-sidecar/): Go daemon for a per-agent llama-server process pool (routed to via "Sidecar routing" above). Cross-compile: GOOS=windows GOARCH=amd64 /snap/go/current/bin/go build -o bin/llama-sidecar.exe ./cmd/llama-sidecar. Gitea: indifferentketchup/llama-sidecar. Windows child-process gotchas: context.Background() for child lifetime (not request ctx), os.Open(os.DevNull) for stdin, os.Pipe() for stdout with a drain goroutine, DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP flags. SSH to sam-desktop: ssh samki@100.101.41.16; use schtasks for persistent spawning (SSH start /B doesn't survive session close).

15 KiB Raw Blame History Unescape Escape

apps/server — BooChat backend (deep reference) — v2.7.x (last meaningful update: 2026-06)

These gotchas are load-bearing — do not remove or refactor without understanding why

Stack

Key services

Server conventions

15 KiB

Raw Blame History