v1.13.15-codecontext-synth: forced second-inference synthesis for codecontext overview tools

After a codecontext overview-class tool call lands (get_codebase_overview, get_framework_analysis, get_semantic_neighborhoods), the pipeline runs a second inference pass that replaces the recursive runAssistantTurn. The synth pass auto-fetches the top-N source files referenced in the codecontext output plus project docs (BOOCHAT.md, AGENTS.md, *roadmap*.md, CONTEXT.md), applies a 32k-token budget with explicit drop-priority, and streams a structured response that grounds the model in real load-bearing code rather than relying on the codecontext summary alone. Smoke #1 (default) and #2 (Architect) both cite the correct inference/turn.ts + tool-phase.ts + stream-phase.ts files; smoke #6 (fault injection) verifies the fall-through path marks the synth message status='failed' and yields cleanly to the recursive turn. ## Truncation-aware extraction codecontext's wrapper inline-truncates results at 32k chars. Without the expansion step, the top-N file selection only saw the alphabetical head of the codebase (apps/booterm/dist/*) and auto-fetched the wrong sources. The pipeline now calls in-process readTruncation(outputPath) before extracting referenced files, so top-N selection sees the full 80k+ char output. The 32k truncated head still ships to the synth model — the expansion is reference-extraction-only, preserving the token-budget contract. Graceful degradation on readTruncation null/throw: log warn, fall back to the truncated head. ## Schema deviation from dispatch The dispatch claimed no schema migration was needed for the new 'synthesis' part kind. Reality: message_parts.kind has an explicit CHECK constraint (schema.sql:54) that would reject the new value. Added a DROP CONSTRAINT IF EXISTS + DO $$ pg_constraint idempotency-guarded re-add matching the CLAUDE.md migration pattern. The inline CREATE TABLE constraint also updated so fresh installs land with the extended enum. ## User-abort marks synth-message failed Deviation from review-time spec ("user-abort path does NOT mark the message failed"). The outer abort handler in error-handler.ts operates on the parent turn's assistantMessageId, not the new synth row that runSynthesisPass created. Without explicit marking, the synth row would sit in status='streaming' until the 5-min stale-streaming sweeper (v1.13.1-cleanup-bundle), tripping the frontend's 60s no-token-activity banner in the meantime — exactly the UX bug class the v1.13.1 sweeper was added to handle. Marking failed on every catch path (including user-abort) closes the gap. Cost: one extra DB write + one publish on the rare user-abort-during-synth path. ## Race-safe synth-tool capture tool-phase.ts uses synthEntries: Array<{tc, output, error?}> with per-callback push under Promise.all. find() picks the first non-error entry by call-order (toolCalls array index). Multiple synth-tools in one batch are uncommon but handled deterministically. ## Roadmap rebase Updated boocode_roadmap.md retrospective section + cleanup-order tracker + schema-changes summary to use the new vMAJOR.MINOR.PATCH-slug tag names per the 2026-05-22 retag (CHANGELOG.md is the canonical record). v1.13.15 listed as "this batch, tag pending"; a one-line follow-up commit will remove that qualifier after the tag lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:08:47 +00:00
parent 0fa46cd06c
commit 3992a9fcb7
8 changed files with 940 additions and 14 deletions
--- a/openspec/changes/v1.13.15-codecontext-synth/proposal.md
+++ b/openspec/changes/v1.13.15-codecontext-synth/proposal.md
@@ -0,0 +1,145 @@
+# v1.13.13 — codecontext synthesis pipeline
+
+Slots between v1.13.12 (skills audit) and v1.14 (Phase C outer agent loop). Adds a forced second-inference synthesis pass for codecontext overview/analysis tools so the model stops returning shallow first-touch summaries.
+
+Does NOT change the recursion structure, depth cap, or budget — those are v1.14 concerns. The cap-50 patch from v1.13.12 stays; v1.14 supersedes it via per-agent `agent.steps`.
+
+## What ships
+
+- `apps/server/src/services/synthesisPrompt.ts` (NEW, 20 lines) — verbatim system prompt as a const.
+- `apps/server/src/services/synthesisPipeline.ts` (NEW, ~450 lines) — `SYNTHESIS_TOOLS` set + `runSynthesisPass(params) → Promise<boolean>`. Auto-fetches top-N referenced files + project docs (BOOCHAT.md, AGENTS.md, *roadmap*.md, CONTEXT.md), applies a 32k-token budget with priority drop order, streams a synthesis turn via `streamCompletion`, dual-writes a `kind='synthesis'` part.
+- `apps/server/src/services/inference/parts.ts` — `PartKind` union extended with `'synthesis'`.
+- `apps/server/src/services/inference/tool-phase.ts` — synth-tool result capture during `Promise.all`; post-pause synth check before the recursive `runAssistantTurn`.
+- `apps/server/src/schema.sql` — inline CHECK constraint updated + `DROP CONSTRAINT IF EXISTS` + `DO $$ pg_constraint` migration block. Idempotent (drops + re-adds on every startup; per-boot cost is trivial).
+
+SYNTHESIS_TOOLS = `{get_codebase_overview, get_framework_analysis, get_semantic_neighborhoods}`. The other 5 codecontext tools (search_symbols, get_dependencies, get_file_analysis, get_symbol_info, watch_changes) return targeted data the model uses directly — no synthesis pass.
+
+## Decisions
+
+### Schema migration was required (dispatch was wrong)
+
+The original dispatch said "kind is text column, no schema migration needed." Reality: `schema.sql:54` has an explicit `message_parts_kind_chk` CHECK constraint enumerating allowed kinds (`'text', 'tool_call', 'tool_result', 'reasoning', 'step_start'`). Adding `'synthesis'` requires updating the constraint.
+
+Resolution: added a `DROP CONSTRAINT IF EXISTS` + `DO $$ ... pg_constraint` idempotency-guarded migration block in `schema.sql` matching the CLAUDE.md migration pattern, plus updated the inline CREATE TABLE constraint so fresh installs include the new value.
+
+### `view_file` input shape uses `start_line`/`end_line`, not `line_count`
+
+The dispatch's auto-fetch sketch implied a `line_count` parameter. The real `viewFile` tool's input schema (`tools.ts:51-55`) takes `start_line`/`end_line` (1-indexed inclusive) with a 200-line default if both are omitted. The pipeline uses `end_line: FILE_LINE_CAP` for files (200) and `end_line: DOC_LINE_CAP` for docs (500), which gives the first N lines — same effective truncation.
+
+### User-abort during synthesis marks the synth message failed (deviates from review req)
+
+**Decision: option A — mark synth message `status='failed'` on every catch path including user-abort, then re-throw on user-abort.**
+
+Sam's stated review requirement: "User-abort path does NOT mark the message failed (re-throw to outer handler is correct)."
+
+Why this deviation: the outer abort handler (`error-handler.ts:handleAbortOrError`) operates on `args.assistantMessageId` — the *parent* assistant message that triggered the tool call. It does not know about the *new* synth assistant message that `runSynthesisPass` created. If the synth row isn't explicitly marked failed on user-abort, it sits in `status='streaming'` until the 5-min stale-streaming sweeper (`apps/server/src/index.ts`) picks it up — meanwhile the frontend's 60s no-token-activity timer trips the stale-stream banner on the orphan. Same UX bug class the v1.13.3 stuck-row sweeper was added to handle.
+
+Cost: one extra DB write + one `message_complete` republish on the rare user-abort-during-synth path. Worth it to avoid the zombie message + ghost banner.
+
+**Note for v1.14 outer-loop port**: when Phase C migrates the depth cap into `agent.steps` and reworks the recursion, the synth message is a sibling to the parent assistant message — both belong to the same chat. The new outer loop should either (a) preserve this pattern (mark all chat-scoped streaming messages failed on abort) or (b) extend `handleAbortOrError` to sweep chat-scoped streaming rows. Option (b) is a wider blast radius and was rejected here; option (a) is one targeted call site.
+
+### Token budget priority list
+
+Drop order when the 32k cap is exceeded (lowest priority first):
+1. top-2..N files (keep top-1)
+2. top-1 file
+3. `*roadmap*.md` + `CONTEXT.md` (mid-priority — both describe state/intent)
+4. `AGENTS.md`
+5. `BOOCHAT.md` — **never dropped**; truncated to 32k if it alone exceeds
+
+CONTEXT.md wasn't in the original dispatch's priority list; grouped with roadmap as mid-priority (same semantic — both are state/intent docs).
+
+### 90s timeout via `AbortSignal.any`
+
+Synthesis call has its own `AbortController` with a 90s `setTimeout`. Combined with `p.args.signal` (the user-abort signal) via `AbortSignal.any([user, synth])` — either fires correctly. Node 20.3+. A `timedOut` flag in scope disambiguates which signal tripped after `streamCompletion` throws (`AbortError`): timeout → return false (fall through to recursion); user-abort → re-throw (after `markSynthFailed`).
+
+### Race-safe synth-tool capture under `Promise.all`
+
+`synthEntries: Array<{tc, output, error?}>` populated by each parallel callback pushing its own result. After `Promise.all` resolves, `synthEntries.find((e) => !e.error && e.output != null)` picks the first non-error synth entry by call-order (i.e. by `toolCalls` array index in the original LLM emit order). Not result-quality scoring — explicitly call-order, documented inline.
+
+### Known interaction: qwen3.6 `include_stats: "True"` retry loop compounds synth-pass cost
+
+Smoke #1 surfaced a pre-existing qwen3.6 quirk: the model emits `"True"` (string) instead of `true` (bool) for boolean tool args. The `experimental_repairToolCall` + zod-reject retry path (v1.13.3) handles this — the model retries on the next turn with corrected args, then succeeds.
+
+**Synth pass cost interaction:** when the first tool-call fails zod validation, the recursive runAssistantTurn fires *before* the successful synth-tool call lands. The user effectively pays: (1) failed tool-call turn → (2) error tool-result → (3) retry tool-call turn → (4) successful tool-result → (5) synth pass.
+
+Per-fire token cost for an overview question now: ~5 inference calls (turns 1, 3, 5 are model calls; 5 is the synth pass adding ~5k tokens of auto-fetched context). Not a blocker — the synth content is dramatically better than the without-synth case (4920 tokens of cited analysis vs. a 70-token tool-call-only turn). Worth tracking if usage stats start showing it.
+
+### v1.14 outer-loop port — preserve this pattern
+
+Two patterns from this batch the Phase C outer-loop port must preserve:
+
+1. **Chat-scoped abort cleanup**: the synth message is a sibling to the parent assistant message, both belong to the same chat. The new outer loop should either (a) keep `markSynthFailed` (or its equivalent) firing on every catch path including user-abort, or (b) extend `handleAbortOrError` to sweep all chat-scoped streaming rows. This batch chose (a); (b) was rejected as wider blast radius.
+2. **Race-safe `Promise.all` capture**: `synthEntries: Array<...>` instead of a single shared variable. Per-callback push avoids the last-write-wins race when a batch has multiple synth tools.
+
+## Test plan
+
+6-prompt smoke + 1 failure-injection. Sequence:
+
+1. **Default agent** — "What's in this codebase?" → expect `get_codebase_overview` + synthesis pass, response cites BOOCHAT.md + actual files + roadmap state.
+2. **Architect agent** — "Give me a system overview of how BooCode handles tool calls" → expect synthesis with refs to inference/turn.ts, tool-phase.ts, stream-phase.ts.
+3. **Architect agent** — "What's the current state of v1.13?" → synthesis must read `boocode_roadmap.md` and report shipped vs planned correctly. Must NOT infer "v1.13.2 shipped" from code presence — roadmap explicitly defers it.
+4. **Code Reviewer** — "Find all callers of buildSystemPrompt" → `search_symbols` fires, NO synthesis pass (not in SYNTHESIS_TOOLS).
+5. **Debugger** — "Where is detectDoomLoop defined and called from?" → `search_symbols` + `get_dependencies`, NO synthesis pass.
+6. **Failure injection** — temporarily make `streamCompletion` throw inside `runSynthesisPass`; verify fall-through to recursion + log entry visible + non-empty answer.
+
+## Backups in place
+
+```
+apps/server/src/schema.sql.bak-v1.13.13-20260522
+apps/server/src/services/inference/parts.ts.bak-v1.13.13-20260522
+apps/server/src/services/inference/tool-phase.ts.bak-v1.13.13-20260522
+```
+
+To be deleted after merge.
+
+## Smoke results
+
+### Smoke #1 — default agent, "What is in this codebase?"
+
+Synthesis fired on `get_codebase_overview`. Log line:
+```
+{"chatId":"7bb05e54-…","synthMessageId":"44480541-…","toolName":"get_codebase_overview","chars":6727,"files":5,"msg":"synthesis pass complete"}
+```
+
+Token accounting: synth turn = 4920 tokens (vs. 63 + 70 on the preceding tool-call-only turns). Model is using the auto-fetched context, not parroting codecontext output. Synth message has the expected `kind='synthesis'` part dual-write.
+
+Side note: qwen3.6 needed one retry due to the `include_stats: "True"` quirk (see Decisions). `repairToolCall` handled it; synth fired on the successful call.
+
+### Smoke #6 — fault injection
+
+Env-gated throw inserted between the synth-message INSERT and the `streamCompletion` call. Container rebuilt with `V1_13_13_FAULT_INJECT=1`. Sent the same prompt to a new smoke chat.
+
+All 6 expected outcomes confirmed:
+
+| # | Outcome | Evidence |
+|---|---|---|
+| 1 | `runSynthesisPass` throws | log: `err: "Error: v1.13.13 smoke #6 fault injection"` |
+| 2 | Synth message marked `status='failed'` with empty content | msg `7ac9c685-…` role=assistant status=failed content_len=0 |
+| 3 | `message_complete` frame published for the synth message | implicit via `markSynthFailed`; frontend never tripped the 60s timer |
+| 4 | Fall-through to recursive `runAssistantTurn` | log: `synthesis pass failed; falling through to recursive turn` |
+| 5 | User sees normal (non-synthesized) assistant response | final msg `924076a3-…` 453 tokens: `"This is **boocode** — a self-hosted, single-user developer chat app."` |
+| 6 | Stale-stream banner does NOT fire on failed synth | confirmed — terminal `status='failed'` is what `applyFrame` writes |
+
+Fault injection reverted post-test:
+- `grep FAULT_INJECT apps/server/src/services/synthesisPipeline.ts docker-compose.yml` → empty
+- `grep FAULT_INJECT apps/server/dist/services/synthesisPipeline.js` → empty
+- `docker compose exec boocode printenv V1_13_13_FAULT_INJECT` → exit 1 (unset)
+- Boot log clean, `skills loaded: 14`
+
+### Smokes #2–#5
+
+Sam is doing the qualitative reads from the UI in parallel — those verifications are about synthesis content quality (cites correct files, reads roadmap accurately, no-synthesis on `search_symbols`).
+
+## Done when
+
+- ✅ `synthesisPrompt.ts` + `synthesisPipeline.ts` created
+- ✅ `parts.ts` PartKind union extended
+- ✅ `tool-phase.ts` insertion point edited
+- ✅ Schema migration block added (deviation from dispatch acknowledged)
+- ✅ Type-clean (`pnpm -C apps/server build`)
+- ✅ Container rebuilt + migration confirmed via pg_constraint and logs
+- ✅ Smoke #1 (positive synth path) verified
+- ✅ Smoke #6 (fault injection + fall-through) verified, injection reverted
+- ⏳ Smokes #2–#5 (Sam's UI reads)
+- ⏳ Sam commit