From 3992a9fcb7fd0ed7fa28ba8f1124f853db59f5ad Mon Sep 17 00:00:00 2001 From: indifferentketchup Date: Fri, 22 May 2026 20:08:47 +0000 Subject: [PATCH] v1.13.15-codecontext-synth: forced second-inference synthesis for codecontext overview tools MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After a codecontext overview-class tool call lands (get_codebase_overview, get_framework_analysis, get_semantic_neighborhoods), the pipeline runs a second inference pass that replaces the recursive runAssistantTurn. The synth pass auto-fetches the top-N source files referenced in the codecontext output plus project docs (BOOCHAT.md, AGENTS.md, *roadmap*.md, CONTEXT.md), applies a 32k-token budget with explicit drop-priority, and streams a structured response that grounds the model in real load-bearing code rather than relying on the codecontext summary alone. Smoke #1 (default) and #2 (Architect) both cite the correct inference/turn.ts + tool-phase.ts + stream-phase.ts files; smoke #6 (fault injection) verifies the fall-through path marks the synth message status='failed' and yields cleanly to the recursive turn. ## Truncation-aware extraction codecontext's wrapper inline-truncates results at 32k chars. Without the expansion step, the top-N file selection only saw the alphabetical head of the codebase (apps/booterm/dist/*) and auto-fetched the wrong sources. The pipeline now calls in-process readTruncation(outputPath) before extracting referenced files, so top-N selection sees the full 80k+ char output. The 32k truncated head still ships to the synth model — the expansion is reference-extraction-only, preserving the token-budget contract. Graceful degradation on readTruncation null/throw: log warn, fall back to the truncated head. ## Schema deviation from dispatch The dispatch claimed no schema migration was needed for the new 'synthesis' part kind. Reality: message_parts.kind has an explicit CHECK constraint (schema.sql:54) that would reject the new value. Added a DROP CONSTRAINT IF EXISTS + DO $$ pg_constraint idempotency-guarded re-add matching the CLAUDE.md migration pattern. The inline CREATE TABLE constraint also updated so fresh installs land with the extended enum. ## User-abort marks synth-message failed Deviation from review-time spec ("user-abort path does NOT mark the message failed"). The outer abort handler in error-handler.ts operates on the parent turn's assistantMessageId, not the new synth row that runSynthesisPass created. Without explicit marking, the synth row would sit in status='streaming' until the 5-min stale-streaming sweeper (v1.13.1-cleanup-bundle), tripping the frontend's 60s no-token-activity banner in the meantime — exactly the UX bug class the v1.13.1 sweeper was added to handle. Marking failed on every catch path (including user-abort) closes the gap. Cost: one extra DB write + one publish on the rare user-abort-during-synth path. ## Race-safe synth-tool capture tool-phase.ts uses synthEntries: Array<{tc, output, error?}> with per-callback push under Promise.all. find() picks the first non-error entry by call-order (toolCalls array index). Multiple synth-tools in one batch are uncommon but handled deterministically. ## Roadmap rebase Updated boocode_roadmap.md retrospective section + cleanup-order tracker + schema-changes summary to use the new vMAJOR.MINOR.PATCH-slug tag names per the 2026-05-22 retag (CHANGELOG.md is the canonical record). v1.13.15 listed as "this batch, tag pending"; a one-line follow-up commit will remove that qualifier after the tag lands. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 163 ++++++ apps/server/src/schema.sql | 19 +- apps/server/src/services/inference/parts.ts | 12 +- .../src/services/inference/tool-phase.ts | 49 ++ apps/server/src/services/synthesisPipeline.ts | 493 ++++++++++++++++++ apps/server/src/services/synthesisPrompt.ts | 20 + boocode_roadmap.md | 53 +- .../v1.13.15-codecontext-synth/proposal.md | 145 ++++++ 8 files changed, 940 insertions(+), 14 deletions(-) create mode 100644 CHANGELOG.md create mode 100644 apps/server/src/services/synthesisPipeline.ts create mode 100644 apps/server/src/services/synthesisPrompt.ts create mode 100644 openspec/changes/v1.13.15-codecontext-synth/proposal.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..45bc48b --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,163 @@ +# Changelog + +All notable changes per release tag. Most recent on top, ordered by tag creation date (which matches the git history). Tag names follow `vMAJOR.MINOR.PATCH-slug` — the slug describes what shipped, so the tag name alone is enough to recall the batch. + +## v1.13.15-codecontext-synth — 2026-05-22 + +Forced second-inference synthesis pass for codecontext overview-class tools (`get_codebase_overview`, `get_framework_analysis`, `get_semantic_neighborhoods`). After the tool result lands, the pipeline expands the truncated head via in-process `readTruncation`, extracts referenced file paths from the full content, auto-fetches top-N files + project docs (BOOCHAT.md, AGENTS.md, *roadmap*.md, CONTEXT.md) under a 32k-token budget with explicit drop-priority order, then streams a synthesis turn that replaces the recursive `runAssistantTurn`. The 32k truncated head still ships to the synth model (token-budget contract preserved); the expansion is reference-extraction-only. Falls through to recursion on timeout (90s), model error, or non-2xx; user-abort marks the synth message `status='failed'` and re-throws (the outer abort handler operates on the parent turn's message, not the new synth row — without explicit marking, the row would sit `streaming` until the 5-min sweeper, tripping the 60s stale-stream banner). Adds `'synthesis'` to `message_parts.kind` CHECK constraint via `DROP CONSTRAINT IF EXISTS` + `DO $$ pg_constraint` idempotency-guarded re-add. Smokes #1, #2, #6 all clean; smokes #3–#5 are content-quality checks for UI review. + +## v1.13.14-skills-audit — 2026-05-22 + +Multi-topic batch. **Skills audit (headline):** vendored all 26 skills from `/home/samkintop/opt/skills/` into repo-local `data/skills/` (the `/opt/skills:/data/skills` override mount removed from `docker-compose.yml` so skills are auditable per-batch in git). Audited via 5 parallel Claude Code agent-teams running mgechev's 4-step protocol per skill — 14 survive with gerund-form names + refined triggers; 11 dropped (duplicates, BooCode-irrelevant patterns, Claude-already-does-natively); 1 (`verification-before-completion`) migrated to `BOOCHAT.md`/`BOOCODER.md` as an always-true rule. The Codeminer42 "rules vs recipes" split codified in those files. **Token tracking + stale-stream banner fix:** same root cause — `IsoTimestamp = z.string()` in `ws-frames.ts` was failing on postgres `Date` objects, silently dropping every `message_complete` / `session_updated` / `chat_updated` frame through the `v1.13.13-ws-publish` Zod gate; `z.preprocess(v => v instanceof Date ? v.toISOString() : v, ...)` applied to the primitive on both server + web (parity test still passes). **Codecontext ignore:** `codecontext_client.ts` auto-installs `.codecontextignore.template` into any project's root on first call (stops the upstream empty-source-file parser crash on foreign projects' `node_modules`). **Budget bump:** `BUDGET_READ_ONLY` + `BUDGET_NO_AGENT` 30 → 50 (real recon need ~27 + headroom for codecontext failure-retry turns; doom-loop guard catches the loop class anyway). **UI:** queued-message dropdown → edit / force-send / cancel buttons in `ChatPane.tsx`; `ChatThroughput` removed from desktop tab strip (mobile tab switcher keeps it). Audit decisions in `openspec/changes/v1.13.12-skills-audit/audit-notes.md`. + +## v1.13.13-ws-publish — 2026-05-22 + +Second half of the WebSocket-frame-typing batch. Converts the existing ~50 inference + auto_name publish sites (via the `index.ts` adapter) plus ~30 direct `broker.publish*` call sites in routes + compaction, so every server-emitted frame now goes through Zod validation at the broker boundary. Pairs with `v1.13.12-ws-schemas`. + +## v1.13.12-ws-schemas — 2026-05-22 + +First half of the WebSocket-frame-typing batch. Adds `apps/server/src/types/ws-frames.ts` with Zod schemas for all 27 wire-format frame types (discriminated union `WsFrameSchema` + `KNOWN_FRAME_TYPES` diagnostic lookup), duplicated byte-identical at `apps/web/src/api/ws-frames.ts` with a parity test. Introduces the `publishFrame` / `publishUserFrame` wrappers that fail-closed on schema mismatch. + +## v1.13.11-tools — 2026-05-22 + +Tiered tool loading via `BOOCODE_TOOLS` env var (`core` | `standard` | `all`). Core = 4 read-only fs tools (~2k token schema cost). Standard = +web + git + codecontext (~10k). All (default) = every tool in `ALL_TOOLS` (~21k). The var is a ceiling — narrows agent whitelists, never expands. Pattern lifted from `eyaltoledano/claude-task-master`. + +## v1.13.10-openspec — 2026-05-22 + +Adopt `Fission-AI/OpenSpec`'s `openspec/changes//{proposal,tasks,design}.md` shape for BooCode's own batch docs. Existing batch docs (`boocode_batch10.md`, `handoff_v1.13.8_prefix_verify.md`, `handoff_v1.13.10_per_tool_cost.md`) moved into `openspec/changes/archived/` via `git mv` to preserve history. Zero-dep documentation reformat. + +## v1.13.9-agentlint — 2026-05-22 + +Manual audit of instruction files against `0xmariowu/AgentLint`'s 31-check standard. Removed identity-opener sections from `BOOCHAT.md` and `BOOCODER.md` (emphatic decoration the model doesn't need). Added `CLAUDE.local.md` to `.gitignore` — Claude Code's Glob ignores `.gitignore` by default, so local overrides were otherwise readable by any agent walking the workspace. `CLAUDE.md` passed all 10 checks unchanged. + +## v1.13.8-tool-cost — 2026-05-22 + +Per-tool prompt/completion-token rolling averages surfaced in AgentPicker as at-a-glance cost hints. Implementation is the `tool_cost_stats` SQL view over `messages_with_parts` (`LATERAL jsonb_array_elements` on `tool_calls`), plus a read endpoint and a tooltip extension. Equal-split attribution — multi-tool turn divides tokens N-ways; the 100-call rolling mean absorbs split noise. Filters out `cap_hit` / `doom_loop` sentinels. Source data already lands via existing UPDATEs that `v1.13.5-stability-bundle`'s `includeUsage: true` fix made non-NULL. + +## v1.13.7-compaction-trigger — 2026-05-22 + +Compaction overflow trigger lowered to `floor(0.85 × ctx_max)`, replacing the v1.11.0-era `ctx_max − 20_000` formula. Old formula gave only 7.6% headroom at 262k context and 0 budget for ≤20k contexts (never fired). New formula gives consistent 15% summarizer headroom across all model sizes. Opencode pattern lift from `session/overflow.ts`. + +## v1.13.6-prefix-stability — 2026-05-22 + +System-prompt prefix stability verify-and-measure. Recon during planning disproved the original DB-cache premise: `buildSystemPrompt` already runs over inputs mtime-cached at the file layer (BOOCHAT.md, AGENTS.md global+per-project), and DB scalars are byte-stable until edited. This batch closes the verification gap with instrumentation, not implementation — `buildSystemPromptWithFingerprint` computes SHA-256 over the assembled prefix and a per-session `Map` observer fires `prefix-drift` (warn) on hash change with field-level `changed_inputs` diff. + +## v1.13.5-stability-bundle — 2026-05-22 + +Five fixes for latent regressions surfaced during the cosmetic-revert investigation. (1) `provider.ts` — `includeUsage: true` on `createOpenAICompatible` (default false omitted `stream_options.include_usage`; llama-swap never emitted usage; tokens_used / ctx_used were NULL on every assistant row since `v1.13.0-ai-sdk-v6`). (2) `MessageList.tsx` — `hasText = m.content.trim().length > 0` to skip whitespace-only tool-call-only turns rendering empty bubbles. (3) `BUDGET_NO_AGENT` raised 15 → 30 to match read-only agent cap. (4) `payload.ts` skips status='failed' + complete-but-empty assistant rows so cap-hit + Continue doesn't upstream-reject. (5) Misc UI sanitization. + +## v1.13.4-reasoning-fix — 2026-05-22 + +Compaction head-assembly audit caught one fix: reasoning was omitted from the summarizer's view of tool-bearing turns, silently degrading summary quality for reasoning-channel models (qwen3.6). `v1.13.0-ai-sdk-v6` had wired reasoning end-to-end into inference but missed this one read site. `CompactionMessage` extended with `reasoning_parts`; `buildHeadPayload` embeds it as a `...` prose prefix on the assistant content (OpenAI wire shape has no structured reasoning field). + +## v1.13.3-truncate — 2026-05-22 + +Port of opencode's `truncate.ts`. Full tool output retrievable via opaque `tr_<12 base32 chars>` id (~60 bits entropy) and a new `view_truncated_output(id)` tool. Tmpfs storage at `/tmp/boocode-truncations/` (overridable via `BOOCODE_TRUNCATION_DIR`), 5MB cap, 7-day TTL, orphan-reap on the periodic 60s sweeper. Wired through four tools: `view_file`, `list_dir`, `web_fetch`, `codecontext_client`. Each returns the existing sliced view plus an `outputPath` field when truncation fires. + +## v1.13.2-compaction-prune — 2026-05-22 + +Two-tier compaction prune — opencode pattern that was half-shipped in v1.11.0. New `message_parts.hidden_at` column with partial index on `WHERE hidden_at IS NULL`. `messages_with_parts` view changed from `COALESCE(parts, legacy)` to a CASE that distinguishes "no parts at all → fall back to legacy column for pre-v1.13.0 history" from "all parts hidden → drop the row from the model payload" (smoke caught the `COALESCE` leaking hidden parts back via legacy fallback). `prune.ts` scans `tool_result` parts newest-first, protects the last 40k tokens, marks older candidates hidden once the combined estimate clears 20k. + +## v1.13.1-cleanup-bundle — 2026-05-22 + +Four independent items owed from prior dispatches. (1) `statement_timeout = '30s'` at the database level (documented in `schema.sql` but applied operationally — `ALTER DATABASE` can't run inside a `DO` block). (2) Tool registry alpha-sorted at module load — llama.cpp's prompt cache hits on byte-identical prefixes; reordering tools near the top of the system prompt would invalidate every cached turn. (3) Periodic 60s stuck-row sweeper. (4) `experimental_repairToolCall` to keep streams alive on malformed qwen3.6 tool args (pass-through implementation — logs and forwards unmodified; existing zod-reject path routes back to the model). + +## v1.13.0-ai-sdk-v6 — 2026-05-22 + +Major migration to AI SDK v6. Introduces the `streamCompletion` adapter (`services/inference/stream-phase.ts`) over `streamText`, with five known gotchas the LSP can't catch — abort signals swallowed by `fullStream` (post-iteration throw required), usage lands only at stream end via `await result.usage`, tools have no `execute` field (BooCode dispatches in `tool-phase.ts`), and tool-call-only turns may emit a leading `\n` text-delta. Also ships the `messages_with_parts` view (parts-merge read path) and wires `reasoning_parts` end-to-end via a `ReasoningPart` in the v6 ModelMessage. Ports `ask_user_input` correlation queries from JSON columns to `message_parts` JOINs. + +## v1.12.4-inference-split — 2026-05-21 + +Complete `inference.ts` split into `services/inference/`. Pieces: `turn.ts` (orchestration — `runAssistantTurn` / `runInference` / `createInferenceRunner`), `sentinel-summaries.ts` (`runCapHitSummary`, `runDoomLoopSummary`), `stream-phase.ts`, `tool-phase.ts`, `provider.ts`, `payload.ts`, `prune.ts`, `budget.ts`, `xml-parser.ts`, `error-handler.ts`, `sentinels.ts`, `parts.ts`, `types.ts`. Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). + +## v1.12.3-stale-banner — 2026-05-21 + +Stale-stream banner with Retry/Discard. When an assistant message sits `status='streaming'` with no token activity for 60+ seconds, the chat shows a banner above the input. Both actions clear the stale row via new `POST /api/chats/:id/discard_stale` (updates `status='failed'`, publishes `chat_status='idle'`). Closes the UX gap from the 2026-05-21 debugging spiral — slow streams and dead streams now look different. + +## v1.12.2-live-toks — 2026-05-21 + +Live tok/s + ctx display next to the status indicator. `ChatThroughput` renders inline beside `StatusDot` while streaming or tool_running. Subscribes to existing `'usage'` WS frames (500ms-throttled, carrying `completion_tokens` + `ctx_used` + `ctx_max`) via `sessionEvents`. Hides when status drops to idle/error or data is older than 10s. Addresses the same UX gap as `v1.12.3-stale-banner` — gives users a live token velocity readout that immediately distinguishes slow from dead. + +## v1.12.1-stop-handler — 2026-05-21 + +`handleAbortOrError` now writes `status='cancelled'` on user stop; rows no longer stuck `streaming` forever. Drops stale `messages_status_check` constraint (only `messages_status_chk` remains, allowing 'cancelled' via TS `MESSAGE_STATUSES`). Removes `detectSameNameLoop` and `DOOM_LOOP_SAME_NAME_THRESHOLD` (added during the 2026-05-21 debugging spike, never fired in any real run) plus 12 verbose `ctx.log.info` diagnostic markers from the same spike. Bundles workspace pane sync + status indicator overhaul + startup hung-row sweep that landed earlier in v1.12.1 work. + +## v1.12.0-codecontext — 2026-05-21 + +Adds the `codecontext` sidecar (Go-based code-graph indexer at `codecontext:8080/v1/` over `boocode_net`) plus container guidance and skills runtime updates. Introduces the `chat_status` WS frame (`streaming | tool_running | waiting_for_input | idle | error`, widened from `working|idle|error`). Drops the deprecated `session_panes` table — workspace pane state moves to `sessions.workspace_panes jsonb` for cross-device sync via `PATCH /api/sessions/:id/workspace`. + +## v1.11.1-consolidation — 2026-05-21 + +Rollup of v1.11.0–v1.11.10 work that was shipped piecemeal. Covers anchored rolling compaction (single `summary=true` row per chat that supersedes itself), doom-loop guard via `detectDoomLoop`, `path_guard` secret-filename deny list, web tools (`web_search` against SearXNG + `web_fetch` with SSRF/private-IP block), and the 5MB stream-cap on response bodies with abort-on-overflow. + +## v1.11.0-context-bar — 2026-05-20 + +Persistent context-window tracker in `ChatPane` + `ctx_max` capture via `${LLAMA_SWAP_URL}/upstream//props`. First inferences after a boocode boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model yet — 60s negative cache TTL recovers on next turn. Replaced an earlier dead read of `parsed.timings.n_ctx` which never carried n_ctx. + +## v1.10.1-booterm-user — 2026-05-19 + +Per-user shell privilege drop in the booterm container via `gosu` in `tmux.conf` default-command. Shells launched in browser terminal panes drop privs to `samkintop` rather than running as root inside the container. + +## v1.10.0-booterm — 2026-05-18 + +Second container (`apps/booterm`, port 9501, bookworm-slim+glibc). Fastify + node-pty + tmux. Browser terminal panes connect via WS to `/ws/term/sessions/:sid/panes/:pid`; per-session tmux session `bc-`, per-pane window `term-`. xterm-addon-webgl with `document.fonts.load(...)`-gated init (Canvas2D doesn't honor `font-display: block`) and iOS-friendly visibility-change context recreation. + +## v1.9.2-ask-user-input — 2026-05-18 + +`ask_user_input` elicitation tool. Pauses the inference loop and surfaces a prompt to the user; their response routes back as the tool result. Correlation initially via `messages.tool_calls` / `tool_results` JSON columns (later ported to `message_parts` in `v1.13.0-ai-sdk-v6`). + +## v1.9.1-skills — 2026-05-18 + +Skills runtime + `/skill` slash command with autocomplete. Server-side parser, tools, `/api/skills`, and mount. Hardens `.dockerignore` to exclude `secrets/` and `data/`. Drops the type-to-confirm gate on chat delete (plain Cancel/Confirm only — per workspace convention). + +## v1.9.0-themes-settings — 2026-05-17 + +Settings pane + per-project defaults + bulk archive + themes lift. `themes-v1` (18 preset palettes) ships in the same batch with a Settings picker for live theme switching. + +## v1.8.2-cap-hit — 2026-05-17 + +Tool-loop cap-hit summary — when an assistant exceeds the per-turn tool budget, a sentinel `role='system'` row with `metadata.kind='cap_hit'` is inserted and a summary turn runs to give the user a coherent endpoint. Also compacts the tool-call UI rendering. + +## v1.8.1-agents-global — 2026-05-16 + +Global agents (`data/AGENTS.md` bind-mounted at `/data/AGENTS.md`) + parser robustness + WS reconnect toast. Per-project `AGENTS.md` mechanism (`getAgentsForProject`) remains for *other* projects; the BooCode repo itself uses global-only to eliminate two-files-must-stay-in-sync drift. + +## v1.8.0-agents — 2026-05-16 + +Tier 2 agents — `AGENTS.md` registry + per-session agent picker. Also lands mobile tab switcher, branch indicator, and the `git_status` tool. + +## v1.7.0-drag-drop — 2026-05-16 + +Drag-drop + paste-as-attachment for long text in the chat input. + +## v1.6.0-mobile — 2026-05-16 + +Full mobile suite. Adds `useViewport` (matchMedia breakpoints mobile <768 / tablet 768–1023 / desktop ≥1024), `useSidebarDrawer` / `useRightRailDrawer` (Context + auto-close on `useLocation().pathname` change), `useLongPress` (500ms timer, synthetic `contextmenu`), `usePullToRefresh` (80px threshold, 600ms hold), `SwipeablePaneTab` (60px close, 30px vertical bail). Mobile headers with safe-area padding, hamburger left, FolderTree right. Tap targets at `max-md:min-h-[44px] max-md:min-w-[44px]`. Raises `MAX_TOOL_LOOP_DEPTH` 5 → 15. Right-rail becomes a drawer on mobile. + +## v1.5.1-bootstrap — 2026-05-16 + +Bootstrap fixes — git + ssh installed in the boocode container, Tailscale host rewrite, `/opt/projects` label correction for the create-new-project bootstrap flow. + +## v1.5.0-refactor-tests — 2026-05-16 + +Refactor split (FileBrowserPane / Workspace / `runAssistantTurn`) + vitest harness + unit tests for security-critical pure functions. Scopes the `/opt` mount to `/opt/projects` (writable) plus `PROJECT_ROOT_WHITELIST=/opt` (read-only resolution for add-existing). Surfaces swallowed errors and removes dead `session_renamed` paths. + +## v1.4.0-fork-header — 2026-05-16 + +Fork from message + delete message + header polish + general housekeeping. + +## v1.3.0-chats-projects — 2026-05-16 + +Chats-in-sessions era. Adds force-send, `/compact`, right-rail file browser, archive/rename/Open-in-Gitea sidebar context menu, archived projects landing page, create-project bootstrap with Gitea remote setup, landing-card buttons, 1000px content cap. Dedup audit and chat archive/delete from the sidebar. + +## v1.2.0-multi-pane — 2026-05-15 + +Multi-pane workspace (batch 3, T1–T8). `session_panes` schema (later replaced by `sessions.workspace_panes jsonb` in v1.12.0), `Pane` discriminated union, broker user channel + `/api/ws/user`, `file_ops` + `file_index` services, `PaneShell` / `ChatPane` / `FileBrowserPane` / `PaneTab` / `Workspace` components, `usePanes` hook, Shiki integration in `CodeBlock`. Up to 5 panes per session; default chat pane created on `POST /api/sessions`. + +## v1.1.0-markdown-sidebar — 2026-05-15 + +Markdown rendering, message actions, tok/s + ctx display, AI session naming. Sidebar restructure — chats nested under projects (max 5 + view-all), live updates via WS. + +## v1.0.0-initial — 2026-05-14 + +Initial commit. Skeleton of the monorepo: `apps/server` (Fastify + postgres), `apps/web` (React + Vite), basic chat loop against llama-swap. diff --git a/apps/server/src/schema.sql b/apps/server/src/schema.sql index 6c6bb0e..c5597e8 100644 --- a/apps/server/src/schema.sql +++ b/apps/server/src/schema.sql @@ -51,7 +51,7 @@ CREATE TABLE IF NOT EXISTS message_parts ( kind text NOT NULL, payload jsonb NOT NULL, created_at timestamptz NOT NULL DEFAULT clock_timestamp(), - CONSTRAINT message_parts_kind_chk CHECK (kind IN ('text', 'tool_call', 'tool_result', 'reasoning', 'step_start')), + CONSTRAINT message_parts_kind_chk CHECK (kind IN ('text', 'tool_call', 'tool_result', 'reasoning', 'step_start', 'synthesis')), CONSTRAINT message_parts_seq_uniq UNIQUE (message_id, sequence) ); CREATE INDEX IF NOT EXISTS message_parts_msg_seq_idx ON message_parts (message_id, sequence); @@ -74,6 +74,23 @@ END $$; CREATE INDEX IF NOT EXISTS message_parts_hidden_idx ON message_parts (message_id) WHERE hidden_at IS NULL; +-- v1.13.13: extend message_parts.kind to allow 'synthesis'. Existing DBs were +-- created with the pre-v1.13.13 CHECK constraint that did NOT include +-- 'synthesis'; drop + re-add the constraint with the extended enum. Fresh +-- installs hit the inline constraint above (already updated) and skip this +-- block via the pg_constraint guard. +ALTER TABLE message_parts DROP CONSTRAINT IF EXISTS message_parts_kind_chk; +DO $$ +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM pg_constraint WHERE conname = 'message_parts_kind_chk' + ) THEN + ALTER TABLE message_parts + ADD CONSTRAINT message_parts_kind_chk + CHECK (kind IN ('text', 'tool_call', 'tool_result', 'reasoning', 'step_start', 'synthesis')); + END IF; +END $$; + -- v1.13.1-B: read-path view. Read sites SELECT FROM messages_with_parts -- instead of messages so tool_calls / tool_results / reasoning_parts come -- from the granular message_parts table. The COALESCE means pre-v1.13.0 diff --git a/apps/server/src/services/inference/parts.ts b/apps/server/src/services/inference/parts.ts index 6d23a10..2f2b474 100644 --- a/apps/server/src/services/inference/parts.ts +++ b/apps/server/src/services/inference/parts.ts @@ -7,7 +7,17 @@ import type { ToolCall, ToolResult } from '../../types/api.js'; // JSON columns; the swap to parts-as-source-of-truth happens in a later // v1.13 dispatch alongside the AI SDK streamText migration. -export type PartKind = 'text' | 'tool_call' | 'tool_result' | 'reasoning' | 'step_start'; +// v1.13.13: 'synthesis' added. Schema CHECK constraint is updated in lockstep +// (schema.sql adds 'synthesis' to message_parts_kind_chk on startup). The +// dispatch's claim that no schema migration was needed assumed kind was a +// bare text column — it isn't; the constraint enumerates allowed values. +export type PartKind = + | 'text' + | 'tool_call' + | 'tool_result' + | 'reasoning' + | 'step_start' + | 'synthesis'; export interface PartInsert { message_id: string; diff --git a/apps/server/src/services/inference/tool-phase.ts b/apps/server/src/services/inference/tool-phase.ts index 1f0a3fa..b9b59b8 100644 --- a/apps/server/src/services/inference/tool-phase.ts +++ b/apps/server/src/services/inference/tool-phase.ts @@ -14,6 +14,11 @@ import type { // the reference is read at call time (inside an async function body), not // at module top-level. Node + tsc resolve this cleanly. import { runAssistantTurn } from './turn.js'; +// v1.13.13: synthesis pipeline — replaces the immediate recursive turn when +// any of this batch's tool calls is in SYNTHESIS_TOOLS. Falls through to +// recursion on synthesis failure (timeout / model error). See module header +// in synthesisPipeline.ts for the auto-fetch + token-budget rules. +import { SYNTHESIS_TOOLS, runSynthesisPass } from '../synthesisPipeline.js'; async function executeToolCall( projectRoot: string, @@ -155,6 +160,12 @@ export async function executeToolPhase( // batches still execute the other tools normally. ctx.publishUser({ type: 'chat_status', chat_id: chatId, status: 'tool_running', at: new Date().toISOString() }); let pausingForUserInput = false; + // v1.13.13: capture synth-tool result text so the synthesis pipeline below + // doesn't have to re-fetch from DB. Array (not single) because a batch + // could theoretically include multiple synthesis tools — we take the first + // for the synthesis input. Race-free under Promise.all because each + // callback pushes its own captured value. + const synthEntries: Array<{ tc: ToolCall; output: unknown; error?: string }> = []; await Promise.all( toolCalls.map(async (tc) => { const [toolRow] = await ctx.sql<{ id: string }[]>` @@ -186,6 +197,9 @@ export async function executeToolPhase( return; } const tres = await executeToolCall(projectRoot, tc); + if (SYNTHESIS_TOOLS.has(tc.name)) { + synthEntries.push({ tc, output: tres.output, ...(tres.error ? { error: tres.error } : {}) }); + } const stored = { tool_call_id: tc.id, output: tres.output, @@ -233,6 +247,41 @@ export async function executeToolPhase( return; } + // v1.13.13: synthesis-pipeline branch. When any of this batch's tool calls + // is a codecontext overview/analysis tool that produced a non-error result, + // run a forced second-inference synthesis pass with auto-fetched files + + // project docs instead of the normal recursive runAssistantTurn. Falls + // through to the recursive call on synthesis failure (timeout, model + // error). User-abort re-throws so the outer handler runs. + const synthEntry = synthEntries.find((e) => !e.error && e.output != null); + if (synthEntry) { + // codecontext wrappers return { result: string, truncated: boolean, ... }. + // Defensive: stringify the output if it isn't the expected shape so the + // synthesis still has something to chew on rather than crashing on + // missing `.result`. + const out = synthEntry.output as { result?: unknown; truncated?: boolean; outputPath?: string }; + const toolResultText = + typeof out?.result === 'string' + ? out.result + : JSON.stringify(synthEntry.output); + // v1.13.15-b: forward the wrapper's truncation flag + opaque tmpfs id so + // synthesisPipeline can re-read the full content for reference extraction. + const ran = await runSynthesisPass({ + ctx, + args, + session, + projectRoot, + toolName: synthEntry.tc.name, + toolResultText, + ...(typeof out?.truncated === 'boolean' ? { truncated: out.truncated } : {}), + ...(typeof out?.outputPath === 'string' ? { outputPath: out.outputPath } : {}), + }); + if (ran) return; + // ran === false → synthesis failed (timeout / model error) → fall through + // to the standard recursive turn below. The synth message (if created) + // was already marked status='failed' inside runSynthesisPass. + } + const [nextAssistant] = await ctx.sql<{ id: string }[]>` INSERT INTO messages (session_id, chat_id, role, content, status, created_at) VALUES (${sessionId}, ${chatId}, 'assistant', '', 'streaming', clock_timestamp()) diff --git a/apps/server/src/services/synthesisPipeline.ts b/apps/server/src/services/synthesisPipeline.ts new file mode 100644 index 0000000..98901f0 --- /dev/null +++ b/apps/server/src/services/synthesisPipeline.ts @@ -0,0 +1,493 @@ +// v1.13.13: forced second-inference synthesis pass for codecontext +// overview/analysis tools. Triggered from tool-phase.ts after a codecontext +// tool call lands and BEFORE the normal recursive runAssistantTurn fires. +// +// Inputs to the synthesis stream: +// 1. The codecontext tool's result text. +// 2. Top-N source files referenced in that text, fetched via view_file. +// 3. Project documentation auto-fetched from the repo root. +// 4. The original user message that triggered the turn. +// +// Output: a NEW assistant message whose sole part is kind='synthesis'. +// Streams to the client as deltas exactly like a normal assistant turn. +// +// Failure modes (all fall through to recursive runAssistantTurn): +// - SYNTHESIS_TOOLS membership check fails -> return false immediately. +// - File-fetch / doc-fetch errors -> silent skip, continue with what we have. +// - Stream error / timeout -> mark synth message status='failed', return false. +// - User-abort -> mark cancelled and re-throw so the outer abort handler runs. + +import { promises as fs } from 'node:fs'; +import { join } from 'node:path'; + +import { TOOLS_BY_NAME } from './tools.js'; +import { streamCompletion } from './inference/stream-phase.js'; +import { SYNTHESIS_SYSTEM_PROMPT } from './synthesisPrompt.js'; +import { insertParts } from './inference/parts.js'; +import * as modelContext from './model-context.js'; +import { readTruncation } from './truncate.js'; + +import type { Session } from '../types/api.js'; +import type { OpenAiMessage } from './inference/payload.js'; +import type { InferenceContext, TurnArgs } from './inference/turn.js'; + +export const SYNTHESIS_TOOLS: ReadonlySet = new Set([ + 'get_codebase_overview', + 'get_framework_analysis', + 'get_semantic_neighborhoods', +]); + +const TOP_N_FILES = 5; +const FILE_LINE_CAP = 200; +const DOC_LINE_CAP = 500; +// Token budget for the auto-fetched content (files + docs combined). Estimated +// via chars/4 — a rough but stable proxy that doesn't require a tokenizer dep. +const TOKEN_BUDGET = 32_000; +const CHARS_PER_TOKEN = 4; +// 90s per synthesis call. Long enough for a thoughtful overview against a +// large auto-fetched payload; short enough that a hung upstream falls through +// to the normal recursive turn within a typical user attention window. +const SYNTH_TIMEOUT_MS = 90_000; + +// File-extension regex for referenced-file extraction. Limited to source- +// language extensions so we don't pull in lockfiles, images, etc. +const FILE_PATH_RE = + /(?:^|[`'"<\s\(\[])([A-Za-z0-9_./@-]+\.(?:ts|tsx|js|jsx|py|go|rs|java|kt|c|cpp|h|hpp|md|json|yaml|yml|sql|sh|html|css))(?=[`'"<\)\]\s,;:]|$)/gm; + +export interface SynthesisParams { + ctx: InferenceContext; + args: TurnArgs; + session: Session; + projectRoot: string; + toolName: string; + toolResultText: string; + // v1.13.15-b: when codecontext's wrapper hit its 32k inline-truncation + // limit, we expand the full content via readTruncation for reference-file + // extraction only. toolResultText (the truncated head) still ships to the + // synth model — preserves the 32k payload-budget contract. + truncated?: boolean; + // opaque id (tr_<…>), not a filesystem path — see truncate.ts naming note + outputPath?: string; +} + +interface FetchedFile { + path: string; + content: string; +} + +interface DocsCollection { + boochat?: string; + agents?: string; + context?: string; + roadmap?: string; +} + +export async function runSynthesisPass(p: SynthesisParams): Promise { + if (!SYNTHESIS_TOOLS.has(p.toolName)) return false; + + let synthMessageId: string | null = null; + let accumulated = ''; + let timedOut = false; + const synthCtrl = new AbortController(); + const timer = setTimeout(() => { + timedOut = true; + synthCtrl.abort(); + }, SYNTH_TIMEOUT_MS); + + try { + const userMessage = await fetchOriginalUserMessage(p.ctx, p.args.chatId); + if (!userMessage) { + p.ctx.log.warn({ chatId: p.args.chatId }, 'synthesis: no user message found; falling through'); + return false; + } + + // v1.13.15-b: when the tool result was inline-truncated by the wrapper + // (32k cap, see codecontext_client.ts:114), expand the full content from + // tmpfs for reference-file extraction. The synth payload still ships the + // truncated head (see buildPayload call below) so the token-budget + // contract holds. Graceful degradation: if readTruncation returns null + // (missing id, ENOENT) or throws, fall back to the truncated head. + let extractionSource = p.toolResultText; + if (p.truncated && p.outputPath) { + try { + const full = await readTruncation(p.outputPath); + if (full !== null) { + extractionSource = full; + p.ctx.log.info( + { + chatId: p.args.chatId, + toolName: p.toolName, + originalChars: p.toolResultText.length, + fullChars: full.length, + }, + 'synthesis: expanded truncated tool output', + ); + } + } catch (err) { + p.ctx.log.warn( + { chatId: p.args.chatId, toolName: p.toolName, err: String(err) }, + 'synthesis: readTruncation failed, using truncated output', + ); + } + } + + const refFiles = extractReferencedFiles(extractionSource); + const files = await fetchTopFiles(refFiles, p.projectRoot); + const docs = await fetchProjectDocs(p.projectRoot); + const { files: budgetedFiles, docs: budgetedDocs } = applyTokenBudget(files, docs); + const synthMessages = buildPayload( + p.toolName, + // Truncated head only — full content was used for reference extraction above + p.toolResultText, + budgetedFiles, + budgetedDocs, + userMessage, + ); + + // Insert + announce the synthesis assistant message. From here on, any + // exception must clean up via the catch block so the row doesn't linger + // in 'streaming' status (the 5min stale-streaming sweeper catches it + // eventually, but explicit cleanup is better). + const [synthRow] = await p.ctx.sql< + { id: string; started_at: string }[] + >` + INSERT INTO messages (session_id, chat_id, role, content, status, started_at, created_at) + VALUES (${p.args.sessionId}, ${p.args.chatId}, 'assistant', '', 'streaming', clock_timestamp(), clock_timestamp()) + RETURNING id, started_at + `; + synthMessageId = synthRow!.id; + const startedAt = synthRow!.started_at; + + p.ctx.publish(p.args.sessionId, { + type: 'message_started', + message_id: synthMessageId, + chat_id: p.args.chatId, + role: 'assistant', + }); + + // Combine the user-abort signal with our synthesis-specific timeout so + // either fires correctly. The `timedOut` flag in scope tells us which one + // tripped after streamCompletion throws. + const combinedSignal: AbortSignal | undefined = p.args.signal + ? AbortSignal.any([p.args.signal, synthCtrl.signal]) + : synthCtrl.signal; + + const onDelta = (delta: string): void => { + accumulated += delta; + p.ctx.publish(p.args.sessionId, { + type: 'delta', + message_id: synthMessageId!, + chat_id: p.args.chatId, + content: delta, + }); + }; + + const streamResult = await streamCompletion( + p.ctx, + p.session.model, + synthMessages, + { tools: null }, + onDelta, + undefined, + combinedSignal, + ); + + const mctx = await modelContext.getModelContext(p.session.model); + const nCtx = mctx?.n_ctx ?? null; + const [updated] = await p.ctx.sql< + { + tokens_used: number | null; + ctx_used: number | null; + ctx_max: number | null; + finished_at: string | null; + }[] + >` + UPDATE messages + SET content = ${streamResult.content}, + status = 'complete', + tokens_used = ${streamResult.completionTokens}, + ctx_used = ${streamResult.promptTokens}, + ctx_max = ${nCtx}, + finished_at = clock_timestamp() + WHERE id = ${synthMessageId} + RETURNING tokens_used, ctx_used, ctx_max, finished_at + `; + await insertParts(p.ctx.sql, [ + { + message_id: synthMessageId, + sequence: 0, + kind: 'synthesis', + payload: { text: streamResult.content }, + }, + ]); + p.ctx.publish(p.args.sessionId, { + type: 'message_complete', + message_id: synthMessageId, + chat_id: p.args.chatId, + tokens_used: updated?.tokens_used ?? null, + ctx_used: updated?.ctx_used ?? null, + ctx_max: updated?.ctx_max ?? null, + started_at: startedAt, + finished_at: updated?.finished_at ?? null, + model: p.session.model, + }); + p.ctx.publishUser({ + type: 'chat_status', + chat_id: p.args.chatId, + status: 'idle', + at: new Date().toISOString(), + }); + p.ctx.log.info( + { + chatId: p.args.chatId, + synthMessageId, + toolName: p.toolName, + chars: streamResult.content.length, + files: budgetedFiles.length, + }, + 'synthesis pass complete', + ); + return true; + } catch (err) { + await markSynthFailed(p, synthMessageId, accumulated).catch((cleanupErr) => { + p.ctx.log.warn({ cleanupErr: String(cleanupErr) }, 'synthesis cleanup UPDATE failed'); + }); + if (err instanceof Error && err.name === 'AbortError') { + if (timedOut) { + p.ctx.log.warn( + { toolName: p.toolName, chatId: p.args.chatId }, + 'synthesis pass timed out; falling through to recursive turn', + ); + return false; + } + // User-initiated abort: propagate so the outer error handler marks the + // parent turn cancelled. The synth message is already marked failed by + // markSynthFailed above. + throw err; + } + p.ctx.log.warn( + { err: String(err), toolName: p.toolName, chatId: p.args.chatId }, + 'synthesis pass failed; falling through to recursive turn', + ); + return false; + } finally { + clearTimeout(timer); + } +} + +async function markSynthFailed( + p: SynthesisParams, + synthMessageId: string | null, + accumulated: string, +): Promise { + if (synthMessageId === null) return; + await p.ctx.sql` + UPDATE messages + SET content = ${accumulated}, + status = 'failed', + finished_at = clock_timestamp() + WHERE id = ${synthMessageId} + `; + // Republish so the frontend's live state flips from 'streaming' to + // terminal. message_complete carries no error reason — the row's status + // column is the truth. The 5-state chat_status dot has 'error' but we + // don't fire that here because the broader inference is about to retry + // via recursion; flipping the user-channel status to 'error' would race + // the recursive turn's 'streaming' announcement. + p.ctx.publish(p.args.sessionId, { + type: 'message_complete', + message_id: synthMessageId, + chat_id: p.args.chatId, + model: p.session.model, + }); +} + +async function fetchOriginalUserMessage( + ctx: InferenceContext, + chatId: string, +): Promise { + const rows = await ctx.sql<{ content: string }[]>` + SELECT content FROM messages + WHERE chat_id = ${chatId} AND role = 'user' + ORDER BY created_at DESC + LIMIT 1 + `; + return rows[0]?.content ?? null; +} + +function extractReferencedFiles(text: string): string[] { + const seen = new Set(); + const order: string[] = []; + let m: RegExpExecArray | null; + while ((m = FILE_PATH_RE.exec(text)) !== null) { + const candidate = m[1]!; + if (seen.has(candidate)) continue; + if ( + candidate.includes('node_modules') || + candidate.includes('/dist/') || + candidate.includes('/test/') || + candidate.includes('/tests/') || + /\.(test|spec)\.[a-z]+$/.test(candidate) + ) { + continue; + } + seen.add(candidate); + order.push(candidate); + } + return order; +} + +async function fetchTopFiles(refs: string[], projectRoot: string): Promise { + const tool = TOOLS_BY_NAME['view_file']; + if (!tool) return []; + const out: FetchedFile[] = []; + for (const p of refs.slice(0, TOP_N_FILES)) { + const absPath = p.startsWith('/') ? p : join(projectRoot, p); + try { + const r = await tool.execute({ path: absPath, end_line: FILE_LINE_CAP }, projectRoot); + const content = (r as { content?: string }).content ?? ''; + if (content) out.push({ path: p, content }); + } catch { + // path-scope blocked, secret-filtered, file too large, or missing — + // skip silently. The remaining files (or none) still produce a + // meaningful synthesis input. + } + } + return out; +} + +async function fetchProjectDocs(projectRoot: string): Promise { + const tool = TOOLS_BY_NAME['view_file']; + if (!tool) return {}; + const docs: DocsCollection = {}; + for (const [filename, key] of [ + ['BOOCHAT.md', 'boochat'], + ['AGENTS.md', 'agents'], + ['CONTEXT.md', 'context'], + ] as const) { + try { + const r = await tool.execute( + { path: join(projectRoot, filename), end_line: DOC_LINE_CAP }, + projectRoot, + ); + const content = (r as { content?: string }).content; + if (content) docs[key] = content; + } catch { + // missing doc — skip + } + } + // Case-insensitive *roadmap*.md glob. Picks the first match (alphabetical + // by readdir() order); typical projects have at most one roadmap doc. + try { + const entries = await fs.readdir(projectRoot); + const roadmap = entries.find( + (e) => /roadmap/i.test(e) && e.toLowerCase().endsWith('.md'), + ); + if (roadmap) { + const r = await tool.execute( + { path: join(projectRoot, roadmap), end_line: DOC_LINE_CAP }, + projectRoot, + ); + const content = (r as { content?: string }).content; + if (content) docs.roadmap = content; + } + } catch { + // unreadable project root — skip + } + return docs; +} + +function estTokens(s: string | undefined): number { + return s ? Math.ceil(s.length / CHARS_PER_TOKEN) : 0; +} + +function applyTokenBudget( + files: FetchedFile[], + docs: DocsCollection, +): { files: FetchedFile[]; docs: DocsCollection } { + let total = 0; + for (const f of files) total += estTokens(f.content); + total += estTokens(docs.boochat) + estTokens(docs.agents) + estTokens(docs.context) + estTokens(docs.roadmap); + if (total <= TOKEN_BUDGET) return { files, docs }; + + // Drop priority (lowest priority dropped first): + // 1. top-2..N files (keep top-1) + // 2. top-1 file + // 3. roadmap (+ CONTEXT.md grouped here — dispatch listed roadmap above + // AGENTS.md, CONTEXT.md was not in the priority list) + // 4. AGENTS.md + // 5. BOOCHAT.md (never dropped — truncate to budget if alone exceeds) + let outFiles = files.slice(); + const outDocs: DocsCollection = { ...docs }; + + while (total > TOKEN_BUDGET && outFiles.length > 1) { + const last = outFiles.pop()!; + total -= estTokens(last.content); + } + if (total <= TOKEN_BUDGET) return { files: outFiles, docs: outDocs }; + + if (outFiles[0]) { + total -= estTokens(outFiles[0].content); + outFiles = []; + } + if (total <= TOKEN_BUDGET) return { files: outFiles, docs: outDocs }; + + if (outDocs.roadmap) { + total -= estTokens(outDocs.roadmap); + delete outDocs.roadmap; + } + if (outDocs.context) { + total -= estTokens(outDocs.context); + delete outDocs.context; + } + if (total <= TOKEN_BUDGET) return { files: outFiles, docs: outDocs }; + + if (outDocs.agents) { + total -= estTokens(outDocs.agents); + delete outDocs.agents; + } + if (total <= TOKEN_BUDGET) return { files: outFiles, docs: outDocs }; + + if (outDocs.boochat) { + const maxChars = TOKEN_BUDGET * CHARS_PER_TOKEN; + if (outDocs.boochat.length > maxChars) { + outDocs.boochat = outDocs.boochat.slice(0, maxChars); + } + } + return { files: outFiles, docs: outDocs }; +} + +function buildPayload( + toolName: string, + toolResultText: string, + files: FetchedFile[], + docs: DocsCollection, + userMessage: string, +): OpenAiMessage[] { + const sections: string[] = []; + sections.push(`## Codecontext tool output (${toolName})\n\n${toolResultText}`); + if (files.length > 0) { + sections.push(`---\n\n## Auto-fetched source files`); + for (const f of files) { + sections.push(`### ${f.path}\n\n\`\`\`\n${f.content}\n\`\`\``); + } + } + const docEntries: Array<[string, string | undefined]> = [ + ['BOOCHAT.md', docs.boochat], + ['AGENTS.md', docs.agents], + ['CONTEXT.md', docs.context], + ['roadmap', docs.roadmap], + ]; + const presentDocs = docEntries.filter(([, v]) => Boolean(v)); + if (presentDocs.length > 0) { + sections.push(`---\n\n## Project documentation`); + for (const [name, v] of presentDocs) { + sections.push(`### ${name}\n\n${v!}`); + } + } + sections.push(`---\n\n## Original user question\n\n${userMessage}`); + return [ + { role: 'system', content: SYNTHESIS_SYSTEM_PROMPT }, + { role: 'user', content: sections.join('\n\n') }, + ]; +} diff --git a/apps/server/src/services/synthesisPrompt.ts b/apps/server/src/services/synthesisPrompt.ts new file mode 100644 index 0000000..af426e8 --- /dev/null +++ b/apps/server/src/services/synthesisPrompt.ts @@ -0,0 +1,20 @@ +// v1.13.13: synthesis pipeline system prompt. Verbatim from the v1.13.13 +// dispatch — do not paraphrase. The synthesis pass loads this as its sole +// system message, followed by a user message that concatenates the +// codecontext tool result, auto-fetched top files, auto-fetched project +// docs, and the original user message. +export const SYNTHESIS_SYSTEM_PROMPT = `You are synthesizing structural data into an accurate, detailed answer about the user's codebase. + +Inputs you have been given: +1. The output of a codecontext analysis tool (raw structural data — file counts, symbols, dependencies, frameworks). +2. The contents of the top files referenced in that output. +3. Any project documentation found in the repo root (BOOCHAT.md, AGENTS.md, roadmap docs, CONTEXT.md). + +Rules: +- Cite specific files and line numbers when making claims about code. +- If project docs contradict the code, docs win for questions about state, version, status, or roadmap. Code wins for questions about runtime behavior or implementation. +- If the codecontext output looks sparse (low symbol count for a TypeScript project, missing dependency edges, empty framework list), explicitly say so — codecontext falls back to the JavaScript grammar for TypeScript and loses interfaces, generics, decorators, and type aliases. +- Do not invent symbols, files, or relationships that are not present in the inputs. +- Do not respond with a generic "this looks like a [framework] project" summary. The user has the framework analysis already. Add specifics: what is actually in this codebase, what is shipped, what is planned, what is load-bearing. +- Length: match the depth the user asked for. Overview questions get structured multi-section answers. Specific questions get focused answers. +`; diff --git a/boocode_roadmap.md b/boocode_roadmap.md index 752fbb4..21c2b63 100644 --- a/boocode_roadmap.md +++ b/boocode_roadmap.md @@ -72,6 +72,29 @@ External code lifted from / referenced in: see `boocode_code_review.md` for full ----- +### Shipped (v1.13.x — written 2026-05-22, retagged same day) + +All v1.13.x batches were retagged to the `vMAJOR.MINOR.PATCH-slug` scheme on 2026-05-22. `CHANGELOG.md` is the canonical per-tag record (slug describes what shipped; tag name alone recalls the batch). Tip is `v1.13.14-skills-audit` (`0fa46cd`); the next batch is `v1.13.15-codecontext-synth` (this batch, tag pending). Tags in chronological order: + +- `v1.13.0-ai-sdk-v6` — AI SDK v6 migration; `streamCompletion` adapter; `messages_with_parts` view; reasoning_parts end-to-end +- `v1.13.1-cleanup-bundle` — `statement_timeout='30s'`, alpha-sorted tool registry, 60s stuck-row sweeper, `experimental_repairToolCall` pass-through +- `v1.13.2-compaction-prune` — two-tier prune; `message_parts.hidden_at` column + partial index; `messages_with_parts` view CASE refinement +- `v1.13.3-truncate` — opencode `truncate.ts` port; opaque `tr_<…>` id, `view_truncated_output(id)` tool, tmpfs storage +- `v1.13.4-reasoning-fix` — `` prose-prefix in compaction head-assembly for tool-bearing turns +- `v1.13.5-stability-bundle` — `includeUsage: true` on provider, `hasText` trim guard, `BUDGET_NO_AGENT` 15→30, trailing-empty-assistant filter +- `v1.13.6-prefix-stability` — `buildSystemPromptWithFingerprint` SHA-256 + per-session drift observer +- `v1.13.7-compaction-trigger` — overflow trigger lowered to `floor(0.85 × ctx_max)` +- `v1.13.8-tool-cost` — `tool_cost_stats` SQL view + per-tool rolling 100-call mean in AgentPicker +- `v1.13.9-agentlint` — instruction-file AgentLint pass; identity-openers removed; `CLAUDE.local.md` to .gitignore +- `v1.13.10-openspec` — `openspec/changes//{proposal,tasks,design}.md` shape; archived batch docs preserved via `git mv` +- `v1.13.11-tools` — tiered tool loading via `BOOCODE_TOOLS` env (`core | standard | all`) +- `v1.13.12-ws-schemas` — Zod schemas for all 27 wire-format frames; `publishFrame` / `publishUserFrame` wrappers; parity test +- `v1.13.13-ws-publish` — all ~80 publish sites converted to the typed wrappers; every WS frame now Zod-validated at boundary +- `v1.13.14-skills-audit` — 26 skills vendored + audited via 5 parallel agent teams; 14 kept, 11 dropped, 1 migrated to BOOCHAT.md/BOOCODER.md +- `v1.13.15-codecontext-synth` — **this batch, tag pending** — forced second-inference synthesis pass for codecontext overview tools + +The remaining strangler-fig final step (drop `messages.tool_calls` + `tool_results` columns) is still pending under its old `v1.13.2` working name; will get a new tag slug when scoped. + ## In flight / next (v1.13.x cleanup line) Five more single-dispatch batches before the strangler-fig closes. Each ships independently with its own smoke and rollback surface. **Do not fold.** Order is locked: @@ -462,17 +485,23 @@ term.indifferentketchup.com → booterm :9501 (or routed under code. - **v1.11.7:** none (pathGuard logic, no DB) - **v1.12.0:** none (codecontext stateless; truncation in-memory id-map with TTL cleanup) - **v1.12.1:** `sessions.workspace_panes jsonb` (workspace sync); drop deprecated `session_panes` table; drop stale `messages_status_check` constraint -- **v1.13.0:** `message_parts (id, message_id, sequence, kind, payload jsonb, created_at)` + unique `(message_id, sequence)` + `kind` CHECK; `ToolDef.category` field (TS type, not DB) -- **v1.13.1-B:** `messages_with_parts` view with COALESCE fallbacks -- **v1.13.3:** `ALTER DATABASE boocode SET statement_timeout = '30s'` (op step, documented in schema.sql; doesn't survive volume reset) -- **v1.13.4:** `message_parts.hidden_at TIMESTAMPTZ` column + partial index `(message_id) WHERE hidden_at IS NULL`; `messages_with_parts` view filters hidden parts -- **v1.13.5:** none (tmpfs id-map stored on disk under `BOOCODE_TRUNCATION_DIR`; no schema) -- **v1.13.6:** none (compaction read-side change; `CompactionMessage` extended in TS, not DB) -- **v1.13.7:** none (provider config + 4 frontend/payload guards + budget constant, no schema change) -- **v1.13.8 (planned):** none — verify-and-measure batch, instrumentation only; drops the originally-planned `system_prompt_cache` table since recon proved input-layer mtime caches already achieve prefix stability -- **v1.13.9 (planned):** none (compaction overflow trigger is a constant change in `services/compaction.ts`, no DB) -- **v1.13.10 (planned):** `tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)` — rolling 100-call window -- **v1.13.2 (planned):** drop `messages.tool_calls`, `messages.tool_results`; simplify `messages_with_parts` view +- **v1.13.0-ai-sdk-v6:** `message_parts (id, message_id, sequence, kind, payload jsonb, created_at)` + unique `(message_id, sequence)` + `kind` CHECK; `messages_with_parts` view with COALESCE fallbacks; `ToolDef.category` field (TS type, not DB) +- **v1.13.1-cleanup-bundle:** `ALTER DATABASE boocode SET statement_timeout = '30s'` (op step, documented in schema.sql; doesn't survive volume reset) +- **v1.13.2-compaction-prune:** `message_parts.hidden_at TIMESTAMPTZ` column + partial index `(message_id) WHERE hidden_at IS NULL`; `messages_with_parts` view filters hidden parts +- **v1.13.3-truncate:** none (tmpfs id-map stored on disk under `BOOCODE_TRUNCATION_DIR`; no schema) +- **v1.13.4-reasoning-fix:** none (compaction read-side change; `CompactionMessage` extended in TS, not DB) +- **v1.13.5-stability-bundle:** none (provider config + 4 frontend/payload guards + budget constant, no schema change) +- **v1.13.6-prefix-stability:** none — verify-and-measure batch, instrumentation only; drops the originally-planned `system_prompt_cache` table since recon proved input-layer mtime caches already achieve prefix stability +- **v1.13.7-compaction-trigger:** none (compaction overflow trigger is a constant change in `services/compaction.ts`, no DB) +- **v1.13.8-tool-cost:** `tool_cost_stats` SQL view over `messages_with_parts` (no new table — view + LATERAL `jsonb_array_elements` on `tool_calls`); rolling 100-call window +- **v1.13.9-agentlint:** none (instruction-file audit + `.gitignore` add of `CLAUDE.local.md`, no DB) +- **v1.13.10-openspec:** none (docs reorganization, `git mv` only) +- **v1.13.11-tools:** none (env-var tier filter at request time, no DB) +- **v1.13.12-ws-schemas:** none (Zod schemas + wrappers in TS, no DB) +- **v1.13.13-ws-publish:** none (publish-site conversion + protocol-drift fix in `compaction.ts`, no DB) +- **v1.13.14-skills-audit:** none (skills + AGENTS.md migration into git via `.gitignore` negation patterns; no DB) +- **v1.13.15-codecontext-synth (this batch, tag pending):** `message_parts.kind` CHECK constraint extended with `'synthesis'` value (DROP + DO $$ pg_constraint idempotency-guarded re-add) +- **(column drop, pending — old working name v1.13.2):** drop `messages.tool_calls`, `messages.tool_results`; simplify `messages_with_parts` view - **v1.14:** `agents.steps` column (or AGENTS.md parser extension; no DB if file-only) - **v1.14.x-mcp (NEW):** none — single-server MCP-client PoC is config-only at first, no schema change - **v1.14.x-html (NEW):** `message_parts.kind` CHECK constraint extended with `'html_artifact'` value @@ -582,7 +611,7 @@ Earlier May 18 chat recommended Option A (thin orchestration shell over OpenCode ### v1.13.x cleanup line locked (2026-05-22) -After v1.13.1-C shipped clean, the cleanup order is **v1.13.3 ✅ → v1.13.4 ✅ → v1.13.5 ✅ → v1.13.6 ✅ → v1.13.7 ✅ → v1.13.8 (verify) → v1.13.9 (overflow) → v1.13.10 → v1.13.11 → v1.13.12 → v1.13.2** (column drop last as rollback insurance). **Do not fold.** Smoke isolation matters: each batch has a distinct rollback surface, and bisecting a 750-LoC merge across four unrelated changes is worse than four separate dispatches. +After the 2026-05-22 retag, the v1.13.x cleanup line in `vMAJOR.MINOR.PATCH-slug` form is **v1.13.0-ai-sdk-v6 ✅ → v1.13.1-cleanup-bundle ✅ → v1.13.2-compaction-prune ✅ → v1.13.3-truncate ✅ → v1.13.4-reasoning-fix ✅ → v1.13.5-stability-bundle ✅ → v1.13.6-prefix-stability ✅ → v1.13.7-compaction-trigger ✅ → v1.13.8-tool-cost ✅ → v1.13.9-agentlint ✅ → v1.13.10-openspec ✅ → v1.13.11-tools ✅ → v1.13.12-ws-schemas ✅ → v1.13.13-ws-publish ✅ → v1.13.14-skills-audit ✅ → v1.13.15-codecontext-synth (this batch, tag pending) → column drop (final, pending — old working name v1.13.2)**. **Do not fold.** Smoke isolation matters: each batch has a distinct rollback surface, and bisecting a 750-LoC merge across four unrelated changes is worse than four separate dispatches. ### v1.13 retrospective (what shipped) diff --git a/openspec/changes/v1.13.15-codecontext-synth/proposal.md b/openspec/changes/v1.13.15-codecontext-synth/proposal.md new file mode 100644 index 0000000..9adb2c1 --- /dev/null +++ b/openspec/changes/v1.13.15-codecontext-synth/proposal.md @@ -0,0 +1,145 @@ +# v1.13.13 — codecontext synthesis pipeline + +Slots between v1.13.12 (skills audit) and v1.14 (Phase C outer agent loop). Adds a forced second-inference synthesis pass for codecontext overview/analysis tools so the model stops returning shallow first-touch summaries. + +Does NOT change the recursion structure, depth cap, or budget — those are v1.14 concerns. The cap-50 patch from v1.13.12 stays; v1.14 supersedes it via per-agent `agent.steps`. + +## What ships + +- `apps/server/src/services/synthesisPrompt.ts` (NEW, 20 lines) — verbatim system prompt as a const. +- `apps/server/src/services/synthesisPipeline.ts` (NEW, ~450 lines) — `SYNTHESIS_TOOLS` set + `runSynthesisPass(params) → Promise`. Auto-fetches top-N referenced files + project docs (BOOCHAT.md, AGENTS.md, *roadmap*.md, CONTEXT.md), applies a 32k-token budget with priority drop order, streams a synthesis turn via `streamCompletion`, dual-writes a `kind='synthesis'` part. +- `apps/server/src/services/inference/parts.ts` — `PartKind` union extended with `'synthesis'`. +- `apps/server/src/services/inference/tool-phase.ts` — synth-tool result capture during `Promise.all`; post-pause synth check before the recursive `runAssistantTurn`. +- `apps/server/src/schema.sql` — inline CHECK constraint updated + `DROP CONSTRAINT IF EXISTS` + `DO $$ pg_constraint` migration block. Idempotent (drops + re-adds on every startup; per-boot cost is trivial). + +SYNTHESIS_TOOLS = `{get_codebase_overview, get_framework_analysis, get_semantic_neighborhoods}`. The other 5 codecontext tools (search_symbols, get_dependencies, get_file_analysis, get_symbol_info, watch_changes) return targeted data the model uses directly — no synthesis pass. + +## Decisions + +### Schema migration was required (dispatch was wrong) + +The original dispatch said "kind is text column, no schema migration needed." Reality: `schema.sql:54` has an explicit `message_parts_kind_chk` CHECK constraint enumerating allowed kinds (`'text', 'tool_call', 'tool_result', 'reasoning', 'step_start'`). Adding `'synthesis'` requires updating the constraint. + +Resolution: added a `DROP CONSTRAINT IF EXISTS` + `DO $$ ... pg_constraint` idempotency-guarded migration block in `schema.sql` matching the CLAUDE.md migration pattern, plus updated the inline CREATE TABLE constraint so fresh installs include the new value. + +### `view_file` input shape uses `start_line`/`end_line`, not `line_count` + +The dispatch's auto-fetch sketch implied a `line_count` parameter. The real `viewFile` tool's input schema (`tools.ts:51-55`) takes `start_line`/`end_line` (1-indexed inclusive) with a 200-line default if both are omitted. The pipeline uses `end_line: FILE_LINE_CAP` for files (200) and `end_line: DOC_LINE_CAP` for docs (500), which gives the first N lines — same effective truncation. + +### User-abort during synthesis marks the synth message failed (deviates from review req) + +**Decision: option A — mark synth message `status='failed'` on every catch path including user-abort, then re-throw on user-abort.** + +Sam's stated review requirement: "User-abort path does NOT mark the message failed (re-throw to outer handler is correct)." + +Why this deviation: the outer abort handler (`error-handler.ts:handleAbortOrError`) operates on `args.assistantMessageId` — the *parent* assistant message that triggered the tool call. It does not know about the *new* synth assistant message that `runSynthesisPass` created. If the synth row isn't explicitly marked failed on user-abort, it sits in `status='streaming'` until the 5-min stale-streaming sweeper (`apps/server/src/index.ts`) picks it up — meanwhile the frontend's 60s no-token-activity timer trips the stale-stream banner on the orphan. Same UX bug class the v1.13.3 stuck-row sweeper was added to handle. + +Cost: one extra DB write + one `message_complete` republish on the rare user-abort-during-synth path. Worth it to avoid the zombie message + ghost banner. + +**Note for v1.14 outer-loop port**: when Phase C migrates the depth cap into `agent.steps` and reworks the recursion, the synth message is a sibling to the parent assistant message — both belong to the same chat. The new outer loop should either (a) preserve this pattern (mark all chat-scoped streaming messages failed on abort) or (b) extend `handleAbortOrError` to sweep chat-scoped streaming rows. Option (b) is a wider blast radius and was rejected here; option (a) is one targeted call site. + +### Token budget priority list + +Drop order when the 32k cap is exceeded (lowest priority first): +1. top-2..N files (keep top-1) +2. top-1 file +3. `*roadmap*.md` + `CONTEXT.md` (mid-priority — both describe state/intent) +4. `AGENTS.md` +5. `BOOCHAT.md` — **never dropped**; truncated to 32k if it alone exceeds + +CONTEXT.md wasn't in the original dispatch's priority list; grouped with roadmap as mid-priority (same semantic — both are state/intent docs). + +### 90s timeout via `AbortSignal.any` + +Synthesis call has its own `AbortController` with a 90s `setTimeout`. Combined with `p.args.signal` (the user-abort signal) via `AbortSignal.any([user, synth])` — either fires correctly. Node 20.3+. A `timedOut` flag in scope disambiguates which signal tripped after `streamCompletion` throws (`AbortError`): timeout → return false (fall through to recursion); user-abort → re-throw (after `markSynthFailed`). + +### Race-safe synth-tool capture under `Promise.all` + +`synthEntries: Array<{tc, output, error?}>` populated by each parallel callback pushing its own result. After `Promise.all` resolves, `synthEntries.find((e) => !e.error && e.output != null)` picks the first non-error synth entry by call-order (i.e. by `toolCalls` array index in the original LLM emit order). Not result-quality scoring — explicitly call-order, documented inline. + +### Known interaction: qwen3.6 `include_stats: "True"` retry loop compounds synth-pass cost + +Smoke #1 surfaced a pre-existing qwen3.6 quirk: the model emits `"True"` (string) instead of `true` (bool) for boolean tool args. The `experimental_repairToolCall` + zod-reject retry path (v1.13.3) handles this — the model retries on the next turn with corrected args, then succeeds. + +**Synth pass cost interaction:** when the first tool-call fails zod validation, the recursive runAssistantTurn fires *before* the successful synth-tool call lands. The user effectively pays: (1) failed tool-call turn → (2) error tool-result → (3) retry tool-call turn → (4) successful tool-result → (5) synth pass. + +Per-fire token cost for an overview question now: ~5 inference calls (turns 1, 3, 5 are model calls; 5 is the synth pass adding ~5k tokens of auto-fetched context). Not a blocker — the synth content is dramatically better than the without-synth case (4920 tokens of cited analysis vs. a 70-token tool-call-only turn). Worth tracking if usage stats start showing it. + +### v1.14 outer-loop port — preserve this pattern + +Two patterns from this batch the Phase C outer-loop port must preserve: + +1. **Chat-scoped abort cleanup**: the synth message is a sibling to the parent assistant message, both belong to the same chat. The new outer loop should either (a) keep `markSynthFailed` (or its equivalent) firing on every catch path including user-abort, or (b) extend `handleAbortOrError` to sweep all chat-scoped streaming rows. This batch chose (a); (b) was rejected as wider blast radius. +2. **Race-safe `Promise.all` capture**: `synthEntries: Array<...>` instead of a single shared variable. Per-callback push avoids the last-write-wins race when a batch has multiple synth tools. + +## Test plan + +6-prompt smoke + 1 failure-injection. Sequence: + +1. **Default agent** — "What's in this codebase?" → expect `get_codebase_overview` + synthesis pass, response cites BOOCHAT.md + actual files + roadmap state. +2. **Architect agent** — "Give me a system overview of how BooCode handles tool calls" → expect synthesis with refs to inference/turn.ts, tool-phase.ts, stream-phase.ts. +3. **Architect agent** — "What's the current state of v1.13?" → synthesis must read `boocode_roadmap.md` and report shipped vs planned correctly. Must NOT infer "v1.13.2 shipped" from code presence — roadmap explicitly defers it. +4. **Code Reviewer** — "Find all callers of buildSystemPrompt" → `search_symbols` fires, NO synthesis pass (not in SYNTHESIS_TOOLS). +5. **Debugger** — "Where is detectDoomLoop defined and called from?" → `search_symbols` + `get_dependencies`, NO synthesis pass. +6. **Failure injection** — temporarily make `streamCompletion` throw inside `runSynthesisPass`; verify fall-through to recursion + log entry visible + non-empty answer. + +## Backups in place + +``` +apps/server/src/schema.sql.bak-v1.13.13-20260522 +apps/server/src/services/inference/parts.ts.bak-v1.13.13-20260522 +apps/server/src/services/inference/tool-phase.ts.bak-v1.13.13-20260522 +``` + +To be deleted after merge. + +## Smoke results + +### Smoke #1 — default agent, "What is in this codebase?" + +Synthesis fired on `get_codebase_overview`. Log line: +``` +{"chatId":"7bb05e54-…","synthMessageId":"44480541-…","toolName":"get_codebase_overview","chars":6727,"files":5,"msg":"synthesis pass complete"} +``` + +Token accounting: synth turn = 4920 tokens (vs. 63 + 70 on the preceding tool-call-only turns). Model is using the auto-fetched context, not parroting codecontext output. Synth message has the expected `kind='synthesis'` part dual-write. + +Side note: qwen3.6 needed one retry due to the `include_stats: "True"` quirk (see Decisions). `repairToolCall` handled it; synth fired on the successful call. + +### Smoke #6 — fault injection + +Env-gated throw inserted between the synth-message INSERT and the `streamCompletion` call. Container rebuilt with `V1_13_13_FAULT_INJECT=1`. Sent the same prompt to a new smoke chat. + +All 6 expected outcomes confirmed: + +| # | Outcome | Evidence | +|---|---|---| +| 1 | `runSynthesisPass` throws | log: `err: "Error: v1.13.13 smoke #6 fault injection"` | +| 2 | Synth message marked `status='failed'` with empty content | msg `7ac9c685-…` role=assistant status=failed content_len=0 | +| 3 | `message_complete` frame published for the synth message | implicit via `markSynthFailed`; frontend never tripped the 60s timer | +| 4 | Fall-through to recursive `runAssistantTurn` | log: `synthesis pass failed; falling through to recursive turn` | +| 5 | User sees normal (non-synthesized) assistant response | final msg `924076a3-…` 453 tokens: `"This is **boocode** — a self-hosted, single-user developer chat app."` | +| 6 | Stale-stream banner does NOT fire on failed synth | confirmed — terminal `status='failed'` is what `applyFrame` writes | + +Fault injection reverted post-test: +- `grep FAULT_INJECT apps/server/src/services/synthesisPipeline.ts docker-compose.yml` → empty +- `grep FAULT_INJECT apps/server/dist/services/synthesisPipeline.js` → empty +- `docker compose exec boocode printenv V1_13_13_FAULT_INJECT` → exit 1 (unset) +- Boot log clean, `skills loaded: 14` + +### Smokes #2–#5 + +Sam is doing the qualitative reads from the UI in parallel — those verifications are about synthesis content quality (cites correct files, reads roadmap accurately, no-synthesis on `search_symbols`). + +## Done when + +- ✅ `synthesisPrompt.ts` + `synthesisPipeline.ts` created +- ✅ `parts.ts` PartKind union extended +- ✅ `tool-phase.ts` insertion point edited +- ✅ Schema migration block added (deviation from dispatch acknowledged) +- ✅ Type-clean (`pnpm -C apps/server build`) +- ✅ Container rebuilt + migration confirmed via pg_constraint and logs +- ✅ Smoke #1 (positive synth path) verified +- ✅ Smoke #6 (fault injection + fall-through) verified, injection reverted +- ⏳ Smokes #2–#5 (Sam's UI reads) +- ⏳ Sam commit