v1.13.4: two-tier compaction prune — opencode pattern half-shipped in v1.11.0

- message_parts.hidden_at timestamptz column (NULL by default) with a
  partial index on (message_id) WHERE hidden_at IS NULL for the common
  visible-parts filter.
- messages_with_parts view changed from COALESCE(parts, legacy) to
  CASE WHEN EXISTS(any parts of kind) THEN visible-parts ELSE legacy.
  COALESCE would have leaked hidden parts back via the legacy fallback
  when every part was pruned (smoke caught it pre-commit). The CASE
  distinguishes "no parts at all → fall back to legacy column for
  pre-v1.13.0 history" from "all parts hidden → return null/empty so
  the row drops out of the model payload" exactly.
- prune.ts: scans tool_result parts newest-first, protects the last 40k
  tokens (PROTECTED_TOKENS), marks older candidates hidden when their
  combined estimate clears 20k (PRUNE_TRIGGER_TOKENS — equal to
  COMPACTION_BUFFER from v1.11.0, so a successful prune is exactly the
  budget the summary path would have freed). Stops at chats.tail_start_id
  so it doesn't double-erase across the last summary boundary. Pure
  decision helper selectPruneTargets exported separately for unit tests.
- Wired into maybeFlagForCompaction: prune runs synchronously when
  overflow is detected; if it freed >= PRUNE_TRIGGER_TOKENS, the
  needs_compaction flag is NOT set and the (expensive) summary inference
  call is skipped this turn. The next turn's overflow check re-evaluates
  from scratch.
- 6 new unit tests in prune.test.ts cover: empty input, protection-only
  (no candidates), candidates below trigger, candidates above trigger,
  candidates straddling a summary boundary, exactly-protection-tokens.
  179 tests total (was 173).

Smoke verified post-rebuild:
- \\d message_parts shows hidden_at + partial index.
- View definition shows AND p.hidden_at IS NULL filters on all three
  subselects.
- Synthetic hide-then-restore confirmed the view drops the tool_result
  jsonb to null when its only part is hidden, and restores when un-hidden.
- EXPLAIN ANALYZE on the 42-message stress chat: 0.325ms (faster than
  v1.13.1-B's 1.018ms — EXISTS short-circuits cleanly for the common
  no-parts case).
- Normal turn (plain text prompt) completes unaffected.

Closes a v1.11.0 design item that was scoped but never implemented. With
v1.13's parts table the prune is dramatically cheaper to write — pre-parts
it would have meant editing JSON blobs in-place; now it's a hidden_at
flag and a view subselect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 07:02:17 +00:00
parent a08d809b73
commit ec8593cf77
4 changed files with 286 additions and 15 deletions

View File

@@ -8,6 +8,7 @@ import type {
import * as compaction from '../compaction.js';
import { buildSystemPrompt } from '../system-prompt.js';
import { isAnySentinel } from './sentinels.js';
import { PRUNE_TRIGGER_TOKENS, prune } from './prune.js';
import type { InferenceContext } from './turn.js';
export interface OpenAiMessage {
@@ -166,6 +167,26 @@ export async function maybeFlagForCompaction(
contextLimit,
);
if (!overflow) return;
// v1.13.4: try the cheap prune first. If it freed at least the buffer
// worth of tokens (PRUNE_TRIGGER_TOKENS, identical to COMPACTION_BUFFER),
// we're below the threshold again — skip flagging summarize for the next
// turn. The next turn's overflow check will re-evaluate from scratch.
// Prune failures (DB errors etc.) propagate so the surrounding inference
// path sees them; the catch in finalizeCompletion / executeToolPhase
// doesn't shield this — by design, we want to know if prune is broken.
const pruned = await prune({ sql: ctx.sql, chatId });
if (pruned.hidden > 0) {
ctx.log.info(
{ chatId, hidden: pruned.hidden, freedTokens: pruned.freedTokens },
'inference: prune freed context budget',
);
}
if (pruned.freedTokens >= PRUNE_TRIGGER_TOKENS) {
// Prune handled it; skip the (expensive) summarize path.
return;
}
await ctx.sql`UPDATE chats SET needs_compaction = true WHERE id = ${chatId}`;
ctx.log.info({ chatId, promptTokens, completionTokens, contextLimit }, 'inference: flagged for compaction');
}