Files
boocode/openspec/changes/v1.13.15-codecontext-synth/proposal.md
indifferentketchup 3992a9fcb7 v1.13.15-codecontext-synth: forced second-inference synthesis for codecontext overview tools
After a codecontext overview-class tool call lands (get_codebase_overview,
get_framework_analysis, get_semantic_neighborhoods), the pipeline runs a
second inference pass that replaces the recursive runAssistantTurn. The
synth pass auto-fetches the top-N source files referenced in the
codecontext output plus project docs (BOOCHAT.md, AGENTS.md,
*roadmap*.md, CONTEXT.md), applies a 32k-token budget with explicit
drop-priority, and streams a structured response that grounds the model
in real load-bearing code rather than relying on the codecontext summary
alone. Smoke #1 (default) and #2 (Architect) both cite the correct
inference/turn.ts + tool-phase.ts + stream-phase.ts files; smoke #6
(fault injection) verifies the fall-through path marks the synth message
status='failed' and yields cleanly to the recursive turn.

## Truncation-aware extraction

codecontext's wrapper inline-truncates results at 32k chars. Without the
expansion step, the top-N file selection only saw the alphabetical head
of the codebase (apps/booterm/dist/*) and auto-fetched the wrong sources.
The pipeline now calls in-process readTruncation(outputPath) before
extracting referenced files, so top-N selection sees the full 80k+ char
output. The 32k truncated head still ships to the synth model — the
expansion is reference-extraction-only, preserving the token-budget
contract. Graceful degradation on readTruncation null/throw: log warn,
fall back to the truncated head.

## Schema deviation from dispatch

The dispatch claimed no schema migration was needed for the new
'synthesis' part kind. Reality: message_parts.kind has an explicit
CHECK constraint (schema.sql:54) that would reject the new value. Added
a DROP CONSTRAINT IF EXISTS + DO $$ pg_constraint idempotency-guarded
re-add matching the CLAUDE.md migration pattern. The inline CREATE TABLE
constraint also updated so fresh installs land with the extended enum.

## User-abort marks synth-message failed

Deviation from review-time spec ("user-abort path does NOT mark the
message failed"). The outer abort handler in error-handler.ts operates
on the parent turn's assistantMessageId, not the new synth row that
runSynthesisPass created. Without explicit marking, the synth row would
sit in status='streaming' until the 5-min stale-streaming sweeper
(v1.13.1-cleanup-bundle), tripping the frontend's 60s no-token-activity
banner in the meantime — exactly the UX bug class the v1.13.1 sweeper
was added to handle. Marking failed on every catch path (including
user-abort) closes the gap. Cost: one extra DB write + one publish on
the rare user-abort-during-synth path.

## Race-safe synth-tool capture

tool-phase.ts uses synthEntries: Array<{tc, output, error?}> with
per-callback push under Promise.all. find() picks the first non-error
entry by call-order (toolCalls array index). Multiple synth-tools in
one batch are uncommon but handled deterministically.

## Roadmap rebase

Updated boocode_roadmap.md retrospective section + cleanup-order tracker
+ schema-changes summary to use the new vMAJOR.MINOR.PATCH-slug tag
names per the 2026-05-22 retag (CHANGELOG.md is the canonical record).
v1.13.15 listed as "this batch, tag pending"; a one-line follow-up
commit will remove that qualifier after the tag lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:08:47 +00:00

146 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# v1.13.13 — codecontext synthesis pipeline
Slots between v1.13.12 (skills audit) and v1.14 (Phase C outer agent loop). Adds a forced second-inference synthesis pass for codecontext overview/analysis tools so the model stops returning shallow first-touch summaries.
Does NOT change the recursion structure, depth cap, or budget — those are v1.14 concerns. The cap-50 patch from v1.13.12 stays; v1.14 supersedes it via per-agent `agent.steps`.
## What ships
- `apps/server/src/services/synthesisPrompt.ts` (NEW, 20 lines) — verbatim system prompt as a const.
- `apps/server/src/services/synthesisPipeline.ts` (NEW, ~450 lines) — `SYNTHESIS_TOOLS` set + `runSynthesisPass(params) → Promise<boolean>`. Auto-fetches top-N referenced files + project docs (BOOCHAT.md, AGENTS.md, *roadmap*.md, CONTEXT.md), applies a 32k-token budget with priority drop order, streams a synthesis turn via `streamCompletion`, dual-writes a `kind='synthesis'` part.
- `apps/server/src/services/inference/parts.ts``PartKind` union extended with `'synthesis'`.
- `apps/server/src/services/inference/tool-phase.ts` — synth-tool result capture during `Promise.all`; post-pause synth check before the recursive `runAssistantTurn`.
- `apps/server/src/schema.sql` — inline CHECK constraint updated + `DROP CONSTRAINT IF EXISTS` + `DO $$ pg_constraint` migration block. Idempotent (drops + re-adds on every startup; per-boot cost is trivial).
SYNTHESIS_TOOLS = `{get_codebase_overview, get_framework_analysis, get_semantic_neighborhoods}`. The other 5 codecontext tools (search_symbols, get_dependencies, get_file_analysis, get_symbol_info, watch_changes) return targeted data the model uses directly — no synthesis pass.
## Decisions
### Schema migration was required (dispatch was wrong)
The original dispatch said "kind is text column, no schema migration needed." Reality: `schema.sql:54` has an explicit `message_parts_kind_chk` CHECK constraint enumerating allowed kinds (`'text', 'tool_call', 'tool_result', 'reasoning', 'step_start'`). Adding `'synthesis'` requires updating the constraint.
Resolution: added a `DROP CONSTRAINT IF EXISTS` + `DO $$ ... pg_constraint` idempotency-guarded migration block in `schema.sql` matching the CLAUDE.md migration pattern, plus updated the inline CREATE TABLE constraint so fresh installs include the new value.
### `view_file` input shape uses `start_line`/`end_line`, not `line_count`
The dispatch's auto-fetch sketch implied a `line_count` parameter. The real `viewFile` tool's input schema (`tools.ts:51-55`) takes `start_line`/`end_line` (1-indexed inclusive) with a 200-line default if both are omitted. The pipeline uses `end_line: FILE_LINE_CAP` for files (200) and `end_line: DOC_LINE_CAP` for docs (500), which gives the first N lines — same effective truncation.
### User-abort during synthesis marks the synth message failed (deviates from review req)
**Decision: option A — mark synth message `status='failed'` on every catch path including user-abort, then re-throw on user-abort.**
Sam's stated review requirement: "User-abort path does NOT mark the message failed (re-throw to outer handler is correct)."
Why this deviation: the outer abort handler (`error-handler.ts:handleAbortOrError`) operates on `args.assistantMessageId` — the *parent* assistant message that triggered the tool call. It does not know about the *new* synth assistant message that `runSynthesisPass` created. If the synth row isn't explicitly marked failed on user-abort, it sits in `status='streaming'` until the 5-min stale-streaming sweeper (`apps/server/src/index.ts`) picks it up — meanwhile the frontend's 60s no-token-activity timer trips the stale-stream banner on the orphan. Same UX bug class the v1.13.3 stuck-row sweeper was added to handle.
Cost: one extra DB write + one `message_complete` republish on the rare user-abort-during-synth path. Worth it to avoid the zombie message + ghost banner.
**Note for v1.14 outer-loop port**: when Phase C migrates the depth cap into `agent.steps` and reworks the recursion, the synth message is a sibling to the parent assistant message — both belong to the same chat. The new outer loop should either (a) preserve this pattern (mark all chat-scoped streaming messages failed on abort) or (b) extend `handleAbortOrError` to sweep chat-scoped streaming rows. Option (b) is a wider blast radius and was rejected here; option (a) is one targeted call site.
### Token budget priority list
Drop order when the 32k cap is exceeded (lowest priority first):
1. top-2..N files (keep top-1)
2. top-1 file
3. `*roadmap*.md` + `CONTEXT.md` (mid-priority — both describe state/intent)
4. `AGENTS.md`
5. `BOOCHAT.md`**never dropped**; truncated to 32k if it alone exceeds
CONTEXT.md wasn't in the original dispatch's priority list; grouped with roadmap as mid-priority (same semantic — both are state/intent docs).
### 90s timeout via `AbortSignal.any`
Synthesis call has its own `AbortController` with a 90s `setTimeout`. Combined with `p.args.signal` (the user-abort signal) via `AbortSignal.any([user, synth])` — either fires correctly. Node 20.3+. A `timedOut` flag in scope disambiguates which signal tripped after `streamCompletion` throws (`AbortError`): timeout → return false (fall through to recursion); user-abort → re-throw (after `markSynthFailed`).
### Race-safe synth-tool capture under `Promise.all`
`synthEntries: Array<{tc, output, error?}>` populated by each parallel callback pushing its own result. After `Promise.all` resolves, `synthEntries.find((e) => !e.error && e.output != null)` picks the first non-error synth entry by call-order (i.e. by `toolCalls` array index in the original LLM emit order). Not result-quality scoring — explicitly call-order, documented inline.
### Known interaction: qwen3.6 `include_stats: "True"` retry loop compounds synth-pass cost
Smoke #1 surfaced a pre-existing qwen3.6 quirk: the model emits `"True"` (string) instead of `true` (bool) for boolean tool args. The `experimental_repairToolCall` + zod-reject retry path (v1.13.3) handles this — the model retries on the next turn with corrected args, then succeeds.
**Synth pass cost interaction:** when the first tool-call fails zod validation, the recursive runAssistantTurn fires *before* the successful synth-tool call lands. The user effectively pays: (1) failed tool-call turn → (2) error tool-result → (3) retry tool-call turn → (4) successful tool-result → (5) synth pass.
Per-fire token cost for an overview question now: ~5 inference calls (turns 1, 3, 5 are model calls; 5 is the synth pass adding ~5k tokens of auto-fetched context). Not a blocker — the synth content is dramatically better than the without-synth case (4920 tokens of cited analysis vs. a 70-token tool-call-only turn). Worth tracking if usage stats start showing it.
### v1.14 outer-loop port — preserve this pattern
Two patterns from this batch the Phase C outer-loop port must preserve:
1. **Chat-scoped abort cleanup**: the synth message is a sibling to the parent assistant message, both belong to the same chat. The new outer loop should either (a) keep `markSynthFailed` (or its equivalent) firing on every catch path including user-abort, or (b) extend `handleAbortOrError` to sweep all chat-scoped streaming rows. This batch chose (a); (b) was rejected as wider blast radius.
2. **Race-safe `Promise.all` capture**: `synthEntries: Array<...>` instead of a single shared variable. Per-callback push avoids the last-write-wins race when a batch has multiple synth tools.
## Test plan
6-prompt smoke + 1 failure-injection. Sequence:
1. **Default agent** — "What's in this codebase?" → expect `get_codebase_overview` + synthesis pass, response cites BOOCHAT.md + actual files + roadmap state.
2. **Architect agent** — "Give me a system overview of how BooCode handles tool calls" → expect synthesis with refs to inference/turn.ts, tool-phase.ts, stream-phase.ts.
3. **Architect agent** — "What's the current state of v1.13?" → synthesis must read `boocode_roadmap.md` and report shipped vs planned correctly. Must NOT infer "v1.13.2 shipped" from code presence — roadmap explicitly defers it.
4. **Code Reviewer** — "Find all callers of buildSystemPrompt" → `search_symbols` fires, NO synthesis pass (not in SYNTHESIS_TOOLS).
5. **Debugger** — "Where is detectDoomLoop defined and called from?" → `search_symbols` + `get_dependencies`, NO synthesis pass.
6. **Failure injection** — temporarily make `streamCompletion` throw inside `runSynthesisPass`; verify fall-through to recursion + log entry visible + non-empty answer.
## Backups in place
```
apps/server/src/schema.sql.bak-v1.13.13-20260522
apps/server/src/services/inference/parts.ts.bak-v1.13.13-20260522
apps/server/src/services/inference/tool-phase.ts.bak-v1.13.13-20260522
```
To be deleted after merge.
## Smoke results
### Smoke #1 — default agent, "What is in this codebase?"
Synthesis fired on `get_codebase_overview`. Log line:
```
{"chatId":"7bb05e54-…","synthMessageId":"44480541-…","toolName":"get_codebase_overview","chars":6727,"files":5,"msg":"synthesis pass complete"}
```
Token accounting: synth turn = 4920 tokens (vs. 63 + 70 on the preceding tool-call-only turns). Model is using the auto-fetched context, not parroting codecontext output. Synth message has the expected `kind='synthesis'` part dual-write.
Side note: qwen3.6 needed one retry due to the `include_stats: "True"` quirk (see Decisions). `repairToolCall` handled it; synth fired on the successful call.
### Smoke #6 — fault injection
Env-gated throw inserted between the synth-message INSERT and the `streamCompletion` call. Container rebuilt with `V1_13_13_FAULT_INJECT=1`. Sent the same prompt to a new smoke chat.
All 6 expected outcomes confirmed:
| # | Outcome | Evidence |
|---|---|---|
| 1 | `runSynthesisPass` throws | log: `err: "Error: v1.13.13 smoke #6 fault injection"` |
| 2 | Synth message marked `status='failed'` with empty content | msg `7ac9c685-…` role=assistant status=failed content_len=0 |
| 3 | `message_complete` frame published for the synth message | implicit via `markSynthFailed`; frontend never tripped the 60s timer |
| 4 | Fall-through to recursive `runAssistantTurn` | log: `synthesis pass failed; falling through to recursive turn` |
| 5 | User sees normal (non-synthesized) assistant response | final msg `924076a3-…` 453 tokens: `"This is **boocode** — a self-hosted, single-user developer chat app."` |
| 6 | Stale-stream banner does NOT fire on failed synth | confirmed — terminal `status='failed'` is what `applyFrame` writes |
Fault injection reverted post-test:
- `grep FAULT_INJECT apps/server/src/services/synthesisPipeline.ts docker-compose.yml` → empty
- `grep FAULT_INJECT apps/server/dist/services/synthesisPipeline.js` → empty
- `docker compose exec boocode printenv V1_13_13_FAULT_INJECT` → exit 1 (unset)
- Boot log clean, `skills loaded: 14`
### Smokes #2#5
Sam is doing the qualitative reads from the UI in parallel — those verifications are about synthesis content quality (cites correct files, reads roadmap accurately, no-synthesis on `search_symbols`).
## Done when
-`synthesisPrompt.ts` + `synthesisPipeline.ts` created
-`parts.ts` PartKind union extended
-`tool-phase.ts` insertion point edited
- ✅ Schema migration block added (deviation from dispatch acknowledged)
- ✅ Type-clean (`pnpm -C apps/server build`)
- ✅ Container rebuilt + migration confirmed via pg_constraint and logs
- ✅ Smoke #1 (positive synth path) verified
- ✅ Smoke #6 (fault injection + fall-through) verified, injection reverted
- ⏳ Smokes #2#5 (Sam's UI reads)
- ⏳ Sam commit