v1.13.9: compaction overflow trigger — 0.85 × ctx_max early trigger

Opencode pattern (session/overflow.ts): fire compaction at 85% of ctx_max, replacing the v1.11.0-era `ctx_max - 20_000` formula. Old formula: usable = ctx_max - 20_000 - ctx=262144 → trigger at 242144 (92.4%) — only 7.6% headroom - ctx=100000 → trigger at 80000 (80.0%) - ctx= 32000 → trigger at 12000 (37.5%) — over-eager - ctx<=20000 → trigger at 0 — never fires New formula: usable = floor(0.85 * ctx_max) - ctx=262144 → trigger at 222822 (85.0%) — 15% headroom for summarizer - ctx=100000 → trigger at 85000 (85.0%) - ctx= 32000 → trigger at 27200 (85.0%) - ctx= 8192 → trigger at 6963 (85.0%) Ratio gives consistent headroom at any context scale. The qwen3.6 daily driver gets ~19k tokens more breathing room before overflow; small-ctx models no longer degenerate to never-triggering. usable() is the only consumer of COMPACTION_BUFFER → constant deleted. New EARLY_TRIGGER_RATIO constant takes its place. isOverflow() and the maybeFlagForCompaction() call site at payload.ts:184 are unchanged — formula swap is internal to compaction.ts. payload.ts comment touched only to drop the stale COMPACTION_BUFFER reference (PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed threshold; independent of the overflow formula). Tests: 4 new usable() corner cases (262k/100k/8k/zero+negative), plus 5 isOverflow() numbers shifted to match the 85k budget at ctx=100k. 195/195 server tests pass (was 194). Smoke: ratio math verified by unit tests at all four corners. Live cap-hit verification deferred — requires accumulating >222k tokens in a session under qwen3.6-35b-a3b-mxfp4 (was >242k pre-fix); will surface organically in extended use.
2026-05-22 13:59:14 +00:00
parent a0c8d212cb
commit b06a4a8e55
3 changed files with 53 additions and 34 deletions
--- a/apps/server/src/services/compaction.ts
+++ b/apps/server/src/services/compaction.ts
@@ -23,7 +23,13 @@ import type { Broker } from './broker.js';
 import { SUMMARY_TEMPLATE } from './compaction-prompt.js';
 import * as modelContextLookup from './model-context.js';

-const COMPACTION_BUFFER = 20_000;
+// v1.13.9: ratio-only overflow trigger. Fires compaction at 85% of ctx_max
+// (opencode session/overflow.ts pattern). Replaces the v1.11.0-era
+// `ctx_max - 20_000` formula which degenerated to 0 for contexts ≤20k and
+// gave only 7-8% headroom to the summarizer at 262k. Ratio gives consistent
+// 15% headroom at any scale, and small-ctx models no longer get an
+// effectively-disabled trigger.
+const EARLY_TRIGGER_RATIO = 0.85;
 const MIN_PRESERVE_RECENT_TOKENS = 2_000;
 const MAX_PRESERVE_RECENT_TOKENS = 8_000;
 const DEFAULT_TAIL_TURNS = 2;
@@ -50,13 +56,13 @@ export interface CompactionMessage {

 // === overflow ===

-// Tokens we hold in reserve for the model's response so a near-full context
-// can still produce a useful turn. Mirrors opencode's COMPACTION_BUFFER.
-// Returns 0 when the context limit is unknown (caller treats 0 as "do not
-// trigger overflow"); avoids dividing-by-zero downstream.
+// Returns the token budget at which overflow fires. Triggers compaction at
+// 85% of contextLimit (opencode session/overflow.ts pattern). Returns 0 when
+// the context limit is unknown — caller treats 0 as "do not trigger overflow",
+// keeping inference flowing rather than compacting a turn we can't size.
 export function usable(contextLimit: number): number {
  if (!contextLimit || contextLimit <= 0) return 0;
-  return Math.max(0, contextLimit - COMPACTION_BUFFER);
+  return Math.floor(EARLY_TRIGGER_RATIO * contextLimit);
 }

 export interface Usage {