v1.13.9: compaction overflow trigger — 0.85 × ctx_max early trigger

Opencode pattern (session/overflow.ts): fire compaction at 85% of
ctx_max, replacing the v1.11.0-era `ctx_max - 20_000` formula.

Old formula: usable = ctx_max - 20_000
  - ctx=262144 → trigger at 242144 (92.4%) — only 7.6% headroom
  - ctx=100000 → trigger at  80000 (80.0%)
  - ctx= 32000 → trigger at  12000 (37.5%) — over-eager
  - ctx<=20000 → trigger at      0 — never fires

New formula: usable = floor(0.85 * ctx_max)
  - ctx=262144 → trigger at 222822 (85.0%) — 15% headroom for summarizer
  - ctx=100000 → trigger at  85000 (85.0%)
  - ctx= 32000 → trigger at  27200 (85.0%)
  - ctx=  8192 → trigger at   6963 (85.0%)

Ratio gives consistent headroom at any context scale. The qwen3.6
daily driver gets ~19k tokens more breathing room before overflow;
small-ctx models no longer degenerate to never-triggering.

usable() is the only consumer of COMPACTION_BUFFER → constant deleted.
New EARLY_TRIGGER_RATIO constant takes its place.

isOverflow() and the maybeFlagForCompaction() call site at
payload.ts:184 are unchanged — formula swap is internal to compaction.ts.
payload.ts comment touched only to drop the stale COMPACTION_BUFFER
reference (PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed
threshold; independent of the overflow formula).

Tests: 4 new usable() corner cases (262k/100k/8k/zero+negative), plus
5 isOverflow() numbers shifted to match the 85k budget at ctx=100k.
195/195 server tests pass (was 194).

Smoke: ratio math verified by unit tests at all four corners. Live
cap-hit verification deferred — requires accumulating >222k tokens
in a session under qwen3.6-35b-a3b-mxfp4 (was >242k pre-fix); will
surface organically in extended use.
This commit is contained in:
2026-05-22 13:59:14 +00:00
parent a0c8d212cb
commit b06a4a8e55
3 changed files with 53 additions and 34 deletions

View File

@@ -23,7 +23,13 @@ import type { Broker } from './broker.js';
import { SUMMARY_TEMPLATE } from './compaction-prompt.js';
import * as modelContextLookup from './model-context.js';
const COMPACTION_BUFFER = 20_000;
// v1.13.9: ratio-only overflow trigger. Fires compaction at 85% of ctx_max
// (opencode session/overflow.ts pattern). Replaces the v1.11.0-era
// `ctx_max - 20_000` formula which degenerated to 0 for contexts ≤20k and
// gave only 7-8% headroom to the summarizer at 262k. Ratio gives consistent
// 15% headroom at any scale, and small-ctx models no longer get an
// effectively-disabled trigger.
const EARLY_TRIGGER_RATIO = 0.85;
const MIN_PRESERVE_RECENT_TOKENS = 2_000;
const MAX_PRESERVE_RECENT_TOKENS = 8_000;
const DEFAULT_TAIL_TURNS = 2;
@@ -50,13 +56,13 @@ export interface CompactionMessage {
// === overflow ===
// Tokens we hold in reserve for the model's response so a near-full context
// can still produce a useful turn. Mirrors opencode's COMPACTION_BUFFER.
// Returns 0 when the context limit is unknown (caller treats 0 as "do not
// trigger overflow"); avoids dividing-by-zero downstream.
// Returns the token budget at which overflow fires. Triggers compaction at
// 85% of contextLimit (opencode session/overflow.ts pattern). Returns 0 when
// the context limit is unknown caller treats 0 as "do not trigger overflow",
// keeping inference flowing rather than compacting a turn we can't size.
export function usable(contextLimit: number): number {
if (!contextLimit || contextLimit <= 0) return 0;
return Math.max(0, contextLimit - COMPACTION_BUFFER);
return Math.floor(EARLY_TRIGGER_RATIO * contextLimit);
}
export interface Usage {