- Add state-graph.ts: typed state machine for inference lifecycle - Add supervisor.ts: agent supervisor pattern for multi-agent coordination - Add export-formatter.ts: structured export formatting - Add manage_memory.ts: memory CRUD tool for agent persistence - Add get_wiki_article.ts: codecontext wiki article retrieval - Extend memory/index.ts: 3-tier memory (context/daily/core) - Extend MCP client: mcp-config.ts env-var substitution - Update schema.sql: agent_sessions, tasks, pending_changes extensions - Update API types: MessageMetadata, ErrorReason, AgentSessionConfig - Update routes: chats, messages, sessions — column renames and agent_session_id - Update inference: error handler, payload builder, stream phase, turn orchestrator
15 KiB
15 KiB
apps/server — BooChat backend (deep reference) — v2.7.x (last meaningful update: 2026-06)
Per-app engineering notes for
apps/server/src/. Cross-cutting commands, database, environment, workflow, and cross-app contracts (WS-frame / provider-type parity, sentinels) live in the rootCLAUDE.md. This file auto-loads when you read/edit files underapps/server/.
These gotchas are load-bearing — do not remove or refactor without understanding why
- Do NOT remove the abort-signal pinning comment in
stream-phase.ts—fullStreamexits cleanly on abort without throwing; the post-iterationif (signal?.aborted)check is the only thing that distinguishes cancelled from complete. - Do NOT remove
includeUsage: truefromprovider.ts— the adapter defaults it false; without it, token counts are always NULL. - Do NOT add raw
broker.publish()/publishUser()calls — always usepublishFrame/publishUserFramewhich Zod-validate againstWsFrameSchema.
Stack
- Fastify with
@fastify/websocketand@fastify/static(serves the built frontend). - postgres (porsager/postgres) with tagged-template SQL — no ORM. Schema in
schema.sql, applied on startup. LSP may false-positive onsql<Type[]>\...`generics; CLItsc/pnpm build` is authoritative. - Zod for request validation and config parsing.
Key services
services/inference/— Public surface re-exported viainference/index.ts; callers import from./services/inference/index.jsexplicitly (NodeNext doesn't honor directory-index resolution). Layout:turn.ts(runAssistantTurn/runInference/createInferenceRunner; exportsInferenceFrame,InferenceContext,TurnArgs,StreamResult,MAX_STEPS);stream-phase.ts(streamCompletion AI SDK adapter + executeStreamPhase);provider.ts(upstreamModel(baseURL, modelId)wrappingcreateOpenAICompatibleagainst llama-swap);tool-phase.ts(executeToolPhase →ToolPhaseResult; the turn loop lives in turn.ts, not recursion);sentinel-summaries.ts(cap-hit/doom-loop/step-cap summaries + inserters);error-handler.ts(handleAbortOrError, finalizeCompletion);payload.ts(buildMessagesPayload, loadContext, maybeFlagForCompaction,OpenAiMessage);sentinels.ts(detectDoomLoop,DOOM_LOOP_THRESHOLD);budget.ts(resolveToolBudget);xml-parser.ts(qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls);parts.ts(partsFromAssistantMessage/partsFromToolMessage/insertParts— parts are the sole source of truth);prune.ts(two-tier compaction;selectPruneTargetsis the pure helper);types.ts(StreamPhaseState,DB_FLUSH_INTERVAL_MS).TurnArgsis the per-turn state envelope, reset inrunInferenceat the user-message boundary. Outer loop:while (stepNumber < effectiveCap),effectiveCap = Math.min(agent.steps ?? Infinity, MAX_STEPS=200). Per-agentsteps:in AGENTS.md frontmatter;steps: 0= text-only. Step-cap hit writes acap_hitsentinel (CapHitSentinel.tsxrenders it).- AI SDK v6 streamCompletion adapter (
services/inference/stream-phase.ts).streamTextis the underlying call; the BooCode layer (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/tests won't catch:- Abort signals are swallowed.
streamText'sfullStreamexits cleanly whenabortSignalfires — no throw. Post-iterationif (signal?.aborted) throw <AbortError>is required, else the row finalizescompleteinstead ofcancelled. Don't refactor away the pinning comment. - Usage lands only at stream end via
await result.usage(v6inputTokens/outputTokens→ mapped topromptTokens/completionTokens). No mid-stream tok/s; ChatThroughput shows one value at stream end. - Tools have NO
executefield. BooCode dispatches tools in tool-phase.ts, not the AI SDK loop — onlydescription+inputSchema: jsonSchema(parameters). includeUsage: trueMUST be set oncreateOpenAICompatibleinprovider.ts. The adapter defaults it false → nostream_options.include_usage→ llama-swap emits no usage block →result.usageresolvesundefined(NULL token counts). Don't remove during refactor.- Tool-call-only turns may emit a leading
\ntext-delta.MessageList.flatten'shasTextandMessageBubble'shasContentboth.trim()before the length check, else whitespace-only content renders an empty bubble + ActionRow between tool calls.buildMessagesPayloadalso skipsstatus='failed'and complete-but-empty assistant rows (avoids "Cannot have 2 or more assistant messages at the end of the list" upstream rejection after cap-hit + Continue).
- Abort signals are swallowed.
services/inference/tool-shim.ts— Recovers structured tool calls from plain-text model output. Some models (notably Qwen) emit<tool_call><name>...</name><arguments>...</arguments></tool_call>inline text instead of structured JSON.extractToolCalls(text)parses both XML and JSON inline formats.hasToolCallMarkup(text)is a fast pre-check. Used as a fallback in the stream phase when structuredtool_callsparse fails. Does NOT requireFAST_MODEL— operates on the existing turn's output text.services/inference/loop-detectors.ts— Six detectors that catch repetitive model behavior:detectContentRepeat(same content N times),detectToolLoop(same tool called consecutively).detectDoomLoopcombines both. These are additive to the existingsentinels.tsdoom-loop detection.- AI SDK ModelMessage conversion (
toModelMessagesin stream-phase.ts). Tool messages need atoolNameforToolResultPart; BooCode's OpenAI-shape history lacks it, so a forward-scan builds atool_call_id → toolNamemap from prior assistanttool_calls. Tool outputs wrapped as{ type: 'json' | 'text', value }(v6ToolResultOutput). Reasoning emits aReasoningPartfirst in the content array. experimental_repairToolCallwired intostreamTextto keep the stream alive when qwen3.6 emits malformed tool args. Pass-through: logs the bad call, returns it unmodified;executeToolPhase's zod-reject path routes it back to the model next turn.chat_statusframe (viabroker.publishUser) —status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'. FrontenduseChatStatusderivesidle_warm(<30s since idle) vsidle_cold.ChatThroughputrenders besideStatusDotonly when streaming/tool_running, fed by 500ms-throttled'usage'frames (completion_tokens+ctx_used+ctx_max).POST /api/chats/:id/discard_stalemarks a stuck-streaming rowfailedwhen the frontend's 60s no-token timer gives up.- Stale-streaming sweeps (
apps/server/src/index.ts): a boot-time pass afterapplySchema()and a periodic 60ssetIntervalboth flipmessages.status='streaming'older than 5 min tofailed(publishingchat_status='idle'); the interval also runscleanupTruncations(TTL + orphan reap of tmpfs truncation files).onClosehook clears the timer. Recovers from a container restart mid-stream. services/broker.ts— In-memory pub/sub, two channel types: per-session (message streaming) and per-user (sidebar). No persistence; clients reconnect on restart. Every WS publish goes throughbroker.publishFrame(sessionId, frame)/publishUserFrame(user, frame)— both Zod-validate againstWsFrameSchema(types/ws-frames.ts) and fail-closed (log + drop). Schema single-sourced in@boocode/contracts(packages/contracts/src/ws-frames.ts); the package'sws-frames.test.tsenforces schema correctness. Don't add rawbroker.publish()/publishUser()calls.services/tools.ts— Tool registry (ALL_TOOLS,READ_ONLY_TOOL_NAMES,TOOLS_BY_NAME). Filesystem tools (view_file/list_dir/grep/find_files) pass three guards:path_guard.ts(workspace scope),secret_guard.ts(filename deny list),url_guard.ts(SSRF/private-IP block for web_fetch). Web tools (web_search,web_fetch) are opt-in per chat viasession.web_search_enabled(falls back toproject.default_web_search_enabled) and filtered out of the LLM tool schema when false. Truncation: when a tool slice cuts content,services/truncate.tsstashes the full text on tmpfs (BOOCODE_TRUNCATION_DIR, default/tmp/boocode-truncations, 0o700) keyed bytr_<12 base32>;view_truncated_output(id)retrieves it. 5MB cap, 7-day TTL, reaped by the sweeper. Container restart loses retrieval — acceptable.services/compaction.ts+services/model-context.ts— Anchored rolling summary (singlesummary=trueassistant row per chat, supersedes itself each compaction). Triggered whenchats.needs_compactionis set after a turn exceedsusable(ctx_max) = floor(0.85 × ctx_max).ctx_maxcomes frommodel-context.getModelContext()fetching${LLAMA_SWAP_URL}/upstream/<model>/props— NOT fromparsed.timings.n_ctx. First inferences after boot may havectx_max=NULLif llama-swap hasn't loaded the model; negative cache TTL 60s, recovers next turn.buildHeadPayloadembedsreasoning_partsas a<reasoning>...</reasoning>prose prefix on assistantcontent(OpenAI wire shape has no structured reasoning field); standalone tag when content is empty.buildHeadPayload+OpenAiMessageexported for tests — keep them exported.services/system-prompt.ts—buildSystemPromptis the string shim;buildSystemPromptWithFingerprintis the canonical impl returning{prompt, fingerprint, drift}. SHA-256 of the assembled prefix is logged perbuildMessagesPayload(prefix-fingerprint, info); aMap<sessionId, lastHash>firesprefix-drift(warn) on change with achanged_inputsdiff. The prefix is byte-stable in steady-state, so prefix caching is left to the input-layer mtime caches (BOOCHAT.md + AGENTS.md global/per-project inagents.ts:safeStat).services/inference/budget.ts— tool-call budgets:BUDGET_READ_ONLY = 30,BUDGET_NON_READ_ONLY = 10(forward-looking; no write tools yet),BUDGET_NO_AGENT = 30(everyALL_TOOLStool is read-only today, so no-agent shares the read-only cap). Per-agentmax_tool_callsfrom AGENTS.md overrides.messages_with_partsview (schema.sql). Read sites needingtool_calls/tool_results/reasoning_partsSELECT from this view, NOTmessages— the legacymessages.tool_calls/tool_resultsJSON columns were dropped; the view reads parts-only subselects. Writes targetmessage_partsviainsertParts(orpartsFromAssistantMessage/partsFromToolMessage). TheMessagewire type still carriestool_calls?/tool_results?because the view synthesizes them. Shapes:tool_calls jsonb[],tool_results jsonb(single object),reasoning_parts jsonb[]of{text}. To UPDATE a message and return its full shape, do a two-step UPDATE returningidthen SELECT from the view — RETURNING off baremessagesno longer carries the tool fields.messages.model(attribution chip) stamps the model per assistant turn — atfinalizeCompletion(BooChat + native coder) + the dispatcher's assistant-row INSERT (external coder); read via the view + themessage_completeframe, rendered byshortenModelName.services/file_ops.ts— Shared file operation implementations used by both inference tools and HTTP routes.services/auto_name.ts— Non-streaming LLM call to generate 4-word session titles after the first assistant reply.- Provider picker dispatch: when
provider !== 'boocode', the message route creates atasksrow (withsession_idset) instead of callinginference.enqueue. The dispatcher (inapps/coder) picks it up and dispatches via ACP or PTY using the agent'sinstall_path.
Route registration: all routes registered in index.ts via register*Routes(app, sql, ...). Routes live in routes/*.ts.
Server conventions
- New tools live in their own
services/<name>.ts(seeweb_search.ts,web_fetch.ts) — a pureexecuteFoo(input, ...deps)for direct test access plus aToolDefwrapper thatloadConfig()s its real deps. Register the ToolDef intools.tsALL_TOOLS(andREAD_ONLY_TOOL_NAMESif applicable). Injectfetcher: typeof fetch = fetchrather thanvi.spyOn(globalThis, 'fetch'). - DB/session-aware tools take an optional 4th
ToolExecCtx { sql, sessionId }arg onToolDef.execute, plumbedexecuteToolPhase→executeToolCall→execute. Optional so filesystem tools and theapps/coderALL_TOOLSconsumer stay compatible; filesystem tools ignore it.read_tab_by_numberis the reference. - ReadableStream test stubs use
pull()(notstart()) so chunks are produced lazily —start()enqueues everything and closes before the consumer reads, so a laterreader.cancel()finds the stream closed and thecancel()callback never fires. Provide MORE chunks than the test consumes so the source stays 'readable' when cancel runs. - Tool-name whitelists must derive from
ALL_TOOLSinservices/tools.ts, never hardcoded (this drift class hitservices/agents.tsALL_TOOL_NAMESbefore). - Agent registry lives at
data/AGENTS.md(global, bind-mounted at/data/AGENTS.md). No per-projectAGENTS.mdin this repo (removed to eliminate two-files-must-stay-in-sync drift); thegetAgentsForProjectper-project override mechanism remains for other projects. data/AGENTS.mdis PARSED (agents.tssplitSections/parseAgentSection): each## <Name>is one agent and must be followed by a---frontmatter fence or the block throws; content before the first##is discarded. Do NOT add free-form##rule sections — they break the registry. Cross-cutting agent rules go in CLAUDE.md or a parser-ignored preamble.- MCP stdio transport uses newline-delimited JSON (NDJSON), NOT LSP-style
Content-Lengthheaders.codecontext/shim.gois the reference (per the MCP spec, modelcontextprotocol.io/specification/server/transports). payload.ts:loadContextSELECT must include everySessionfield downstream code reads. The tool phase readssession.allowed_read_paths; if the SELECT omits it, cross-repo read grants silently fail.sql<Session[]>doesn't enforce column coverage, so the type doesn't catch it.- Sidecar routing (
services/inference/provider.ts):upstreamModel(config, modelId, agent)routes toLLAMA_SIDECAR_URLwhen the agent hasllama_extra_args, elseLLAMA_SWAP_URL.resolveRoute(agent)returns{route, flags}. Sidecar provider created fresh per call (not cached) becauseX-Agent-Flagsvaries per agent. Boot-time guard inindex.tsrefuses to start if any agent hasllama_extra_argsbutLLAMA_SIDECAR_URLis unset. - Secret guard safe patterns (
services/secret_guard.ts):.env.example,.env.sample,.env.template,.env.defaultsare allowlisted viaSAFE_PATTERNS. Do NOT add.env.production/.env.development/.env.test— those can hold real secrets. - llama-sidecar (
/opt/forks/llama-sidecar/): Go daemon for a per-agent llama-server process pool (routed to via "Sidecar routing" above). Cross-compile:GOOS=windows GOARCH=amd64 /snap/go/current/bin/go build -o bin/llama-sidecar.exe ./cmd/llama-sidecar. Gitea:indifferentketchup/llama-sidecar. Windows child-process gotchas:context.Background()for child lifetime (not request ctx),os.Open(os.DevNull)for stdin,os.Pipe()for stdout with a drain goroutine,DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUPflags. SSH to sam-desktop:ssh samki@100.101.41.16; useschtasksfor persistent spawning (SSHstart /Bdoesn't survive session close).