- message_parts.hidden_at timestamptz column (NULL by default) with a
partial index on (message_id) WHERE hidden_at IS NULL for the common
visible-parts filter.
- messages_with_parts view changed from COALESCE(parts, legacy) to
CASE WHEN EXISTS(any parts of kind) THEN visible-parts ELSE legacy.
COALESCE would have leaked hidden parts back via the legacy fallback
when every part was pruned (smoke caught it pre-commit). The CASE
distinguishes "no parts at all → fall back to legacy column for
pre-v1.13.0 history" from "all parts hidden → return null/empty so
the row drops out of the model payload" exactly.
- prune.ts: scans tool_result parts newest-first, protects the last 40k
tokens (PROTECTED_TOKENS), marks older candidates hidden when their
combined estimate clears 20k (PRUNE_TRIGGER_TOKENS — equal to
COMPACTION_BUFFER from v1.11.0, so a successful prune is exactly the
budget the summary path would have freed). Stops at chats.tail_start_id
so it doesn't double-erase across the last summary boundary. Pure
decision helper selectPruneTargets exported separately for unit tests.
- Wired into maybeFlagForCompaction: prune runs synchronously when
overflow is detected; if it freed >= PRUNE_TRIGGER_TOKENS, the
needs_compaction flag is NOT set and the (expensive) summary inference
call is skipped this turn. The next turn's overflow check re-evaluates
from scratch.
- 6 new unit tests in prune.test.ts cover: empty input, protection-only
(no candidates), candidates below trigger, candidates above trigger,
candidates straddling a summary boundary, exactly-protection-tokens.
179 tests total (was 173).
Smoke verified post-rebuild:
- \\d message_parts shows hidden_at + partial index.
- View definition shows AND p.hidden_at IS NULL filters on all three
subselects.
- Synthetic hide-then-restore confirmed the view drops the tool_result
jsonb to null when its only part is hidden, and restores when un-hidden.
- EXPLAIN ANALYZE on the 42-message stress chat: 0.325ms (faster than
v1.13.1-B's 1.018ms — EXISTS short-circuits cleanly for the common
no-parts case).
- Normal turn (plain text prompt) completes unaffected.
Closes a v1.11.0 design item that was scoped but never implemented. With
v1.13's parts table the prune is dramatically cheaper to write — pre-parts
it would have meant editing JSON blobs in-place; now it's a hidden_at
flag and a view subselect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass 1 — ask_user_input correlation port (messages.ts:478, :549):
- The two correlation queries that backed the elicitation flow used to scan
messages.tool_calls and messages.tool_results JSON columns directly. They
now JOIN message_parts on payload->>'id' (for the caller assistant) and
payload->>'tool_call_id' (for the pending tool row). Semantics preserved:
ORDER BY m.created_at DESC LIMIT 1 still picks the latest issuance, the
already-answered 409 guard now reads payload.output, and the UPDATE +
parts replace inside sql.begin is unchanged from v1.13.0.
- Pre-v1.13.0 history has no parts rows and is unreachable to this lookup
path (404). Acceptable per dispatch decision — no pending elicitation
from before v1.13.0 will still be open. JSON-column fallback can land as
a hotfix if it ever surfaces.
Pass 2 — reasoning_parts wired end-to-end:
- types.ts/StreamResult gains `reasoning: string`. stream-phase.ts accumulates
reasoning-delta text per stream (replacing the v1.13.1-A counter-only
diagnostic) and returns it on the result.
- parts.ts/partsFromAssistantMessage gains an optional `reasoning` param.
When present it emits a kind='reasoning' part at sequence 0, ahead of
the text and tool_call parts.
- error-handler.ts/finalizeCompletion and tool-phase.ts/executeToolPhase
both thread result.reasoning into the dual-write call so reasoning-channel
models (qwen3.6) get persistent reasoning rows.
- payload.ts: loadContext SELECT pulls reasoning_parts from the v1.13.1-B
view; OpenAiMessage gains an optional `reasoning` field; buildMessagesPayload
collapses reasoning_parts into a single string per assistant message.
- stream-phase.ts/toModelMessages converts assistant messages with reasoning
into an AI SDK ModelMessage content array starting with a ReasoningPart,
matching the @ai-sdk/provider-utils AssistantContent union. Reasoning
models can now replay prior reasoning context across tool-call boundaries.
- types/api.ts and apps/web/src/api/types.ts Message interface gain
reasoning_parts (optional, nullable). Frontend doesn't render this yet —
field reserved for a v1.14 UI surface.
Tests: 2 new in parts.test.ts cover reasoning-at-sequence-0 with and
without text content. 172 tests pass (170 prior + 2 new).
Smoke verified against the live container:
- A reasoning-prompt ("walk through 17 × 23 step by step") produced one
message with kind='reasoning' (361 chars) at sequence 0 and kind='text'
(429 chars) at sequence 1. Adapter log confirmed reasoning capture.
- The new correlation SQL was validated against existing tool_call /
tool_result parts: returns the expected message_id + payload shape with
pending state correctly identified via payload.output IS NULL.
- ask_user_input end-to-end through the UI is Sam's smoke — the Prompt
Builder agent does not always trigger ask_user_input for these prompts,
so synthetic verification via SQL substituted for traffic-driven cover.
Annotation: the v1.13.1-A abort-throw site in stream-phase.ts got a
one-liner comment ("AI SDK v6 fullStream returns normally on abort; check
signal explicitly.") to prevent a future refactor removing it.
v1.13.2 drops the dual-write + the JSON columns + collapses the view.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- schema.sql: new messages_with_parts view. tool_calls aggregates parts
with kind='tool_call' as a jsonb array of {id, name, args}; tool_results
picks the single sequence=0 part with kind='tool_result' as a jsonb
{tool_call_id, output, truncated, error?}. COALESCE against the legacy
jsonb columns means pre-v1.13.0 history (no parts rows) still reads
correctly via the fallback, and fresh inserts (where parts dual-write
follows the row INSERT) hit the legacy columns until the parts land.
- reasoning_parts column added to the view but not selected by any caller
yet — v1.13.1-C extends the Message type and pulls it into the model
payload alongside the type extension.
- Read sites switched to FROM messages_with_parts:
- routes/chats.ts:427 (chat history GET)
- routes/messages.ts:95 (session history GET)
- routes/ws.ts:27 (WS snapshot on session connect, resume path)
- services/inference/payload.ts (loadContext for model assembly)
- services/compaction.ts (compaction's payload assembly)
- chats.ts:394 (discard_stale UPDATE RETURNING) unchanged — UPDATEs target
messages directly and the returned shape is for a freshly-modified row
where the legacy column is dual-written and correct.
- messages.ts:478/549 (ask_user_input correlation) intentionally not
migrated — those query a different shape, ported in v1.13.1-C.
- Writes still target `messages` directly; the view is read-only.
Smoke verified against the live container:
- Equivalence: 5/5 messages with both legacy column and parts row return
identical tool_calls jsonb between FROM messages and FROM messages_with_parts.
- Perf: EXPLAIN ANALYZE on the 42-message stress chat returns in ~1ms
(50ms threshold). Bitmap Index Scan on message_parts_msg_seq_idx
carries the parts lookups.
- API contract: GET /api/chats/:id/messages returns identical
{id, name, args} tool_calls and {tool_call_id, output, truncated, error}
tool_results shapes to frontend consumers — no UI changes needed.
- Inference path: sent a view_file prompt; assistant turn 1 emitted the
tool_call, tool message captured the result, follow-up assistant turn
read the result back via loadContext (now view-backed) and answered
correctly. End-to-end loop intact.
v1.13.2 drops the dual-write + the JSON columns + simplifies the view
to just SELECT FROM message_parts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sentinel-summaries.ts: runCapHitSummary, insertCapHitSentinel,
runDoomLoopSummary, insertDoomLoopSentinel
- inference.ts → inference/turn.ts: residue is runAssistantTurn,
runInference, createInferenceRunner orchestration only
- inference/index.ts: re-export shim preserves the public surface
(createInferenceRunner, runInference, runAssistantTurn,
detectDoomLoop, DOOM_LOOP_THRESHOLD, buildMessagesPayload, plus
type-side InferenceContext/InferenceFrame/StreamResult/TurnArgs/
FramePublisher)
- src/index.ts + auto_name.ts + the two vitest test files updated to
import from ./services/inference/index.js explicitly (NodeNext ESM
doesn't honor directory-index resolution)
Final tally: 11 files under services/inference/, the largest being
sentinel-summaries.ts at 523 LoC (two near-clone summary paths kept
side-by-side until a third sentinel justifies factoring out a shared
runWrapUpSummary). turn.ts is now 326 LoC, the next-largest is
stream-phase.ts at 380. Public import surface unchanged.
tool-phase.ts → turn.ts back-edge for runAssistantTurn remains
(cycle is safe; resolved at call time).
Prepares the file structure for v1.13 AI SDK migration — streamText
swap targets stream-phase.ts only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- payload.ts: buildMessagesPayload (re-exported), loadContext,
maybeFlagForCompaction
- error-handler.ts: handleAbortOrError, finalizeCompletion
Both new files type-import InferenceContext/StreamResult/TurnArgs from
inference.ts; ESM elides type imports so there's no runtime cycle.
handleAbortOrError turned out not to call the summary functions, so
no back-edge needed.
inference.ts shrinks from ~1676 to ~1401 LoC.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>