v1.13.4: two-tier compaction prune — opencode pattern half-shipped in v1.11.0
- message_parts.hidden_at timestamptz column (NULL by default) with a partial index on (message_id) WHERE hidden_at IS NULL for the common visible-parts filter. - messages_with_parts view changed from COALESCE(parts, legacy) to CASE WHEN EXISTS(any parts of kind) THEN visible-parts ELSE legacy. COALESCE would have leaked hidden parts back via the legacy fallback when every part was pruned (smoke caught it pre-commit). The CASE distinguishes "no parts at all → fall back to legacy column for pre-v1.13.0 history" from "all parts hidden → return null/empty so the row drops out of the model payload" exactly. - prune.ts: scans tool_result parts newest-first, protects the last 40k tokens (PROTECTED_TOKENS), marks older candidates hidden when their combined estimate clears 20k (PRUNE_TRIGGER_TOKENS — equal to COMPACTION_BUFFER from v1.11.0, so a successful prune is exactly the budget the summary path would have freed). Stops at chats.tail_start_id so it doesn't double-erase across the last summary boundary. Pure decision helper selectPruneTargets exported separately for unit tests. - Wired into maybeFlagForCompaction: prune runs synchronously when overflow is detected; if it freed >= PRUNE_TRIGGER_TOKENS, the needs_compaction flag is NOT set and the (expensive) summary inference call is skipped this turn. The next turn's overflow check re-evaluates from scratch. - 6 new unit tests in prune.test.ts cover: empty input, protection-only (no candidates), candidates below trigger, candidates above trigger, candidates straddling a summary boundary, exactly-protection-tokens. 179 tests total (was 173). Smoke verified post-rebuild: - \\d message_parts shows hidden_at + partial index. - View definition shows AND p.hidden_at IS NULL filters on all three subselects. - Synthetic hide-then-restore confirmed the view drops the tool_result jsonb to null when its only part is hidden, and restores when un-hidden. - EXPLAIN ANALYZE on the 42-message stress chat: 0.325ms (faster than v1.13.1-B's 1.018ms — EXISTS short-circuits cleanly for the common no-parts case). - Normal turn (plain text prompt) completes unaffected. Closes a v1.11.0 design item that was scoped but never implemented. With v1.13's parts table the prune is dramatically cheaper to write — pre-parts it would have meant editing JSON blobs in-place; now it's a hidden_at flag and a view subselect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -8,6 +8,7 @@ import type {
|
||||
import * as compaction from '../compaction.js';
|
||||
import { buildSystemPrompt } from '../system-prompt.js';
|
||||
import { isAnySentinel } from './sentinels.js';
|
||||
import { PRUNE_TRIGGER_TOKENS, prune } from './prune.js';
|
||||
import type { InferenceContext } from './turn.js';
|
||||
|
||||
export interface OpenAiMessage {
|
||||
@@ -166,6 +167,26 @@ export async function maybeFlagForCompaction(
|
||||
contextLimit,
|
||||
);
|
||||
if (!overflow) return;
|
||||
|
||||
// v1.13.4: try the cheap prune first. If it freed at least the buffer
|
||||
// worth of tokens (PRUNE_TRIGGER_TOKENS, identical to COMPACTION_BUFFER),
|
||||
// we're below the threshold again — skip flagging summarize for the next
|
||||
// turn. The next turn's overflow check will re-evaluate from scratch.
|
||||
// Prune failures (DB errors etc.) propagate so the surrounding inference
|
||||
// path sees them; the catch in finalizeCompletion / executeToolPhase
|
||||
// doesn't shield this — by design, we want to know if prune is broken.
|
||||
const pruned = await prune({ sql: ctx.sql, chatId });
|
||||
if (pruned.hidden > 0) {
|
||||
ctx.log.info(
|
||||
{ chatId, hidden: pruned.hidden, freedTokens: pruned.freedTokens },
|
||||
'inference: prune freed context budget',
|
||||
);
|
||||
}
|
||||
if (pruned.freedTokens >= PRUNE_TRIGGER_TOKENS) {
|
||||
// Prune handled it; skip the (expensive) summarize path.
|
||||
return;
|
||||
}
|
||||
|
||||
await ctx.sql`UPDATE chats SET needs_compaction = true WHERE id = ${chatId}`;
|
||||
ctx.log.info({ chatId, promptTokens, completionTokens, contextLimit }, 'inference: flagged for compaction');
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user