Second half of the WebSocket-frame-typing batch. Phase A (8b568b3)
landed the schemas + frontend receive validation + publishFrame /
publishUserFrame wrappers. This commit converts the existing publish
call sites so every server-emitted WS frame now goes through Zod
validation at the broker boundary.
Conversion strategy: change once in the inference / skills adapters in
index.ts (so ctx.publish / ctx.publishUser propagate to publishFrame /
publishUserFrame for ALL ~50 inference + auto_name call sites in one
move), then bulk-replace the ~30 direct broker.publish* call sites in
the routes + compaction.
Files touched:
- index.ts: inference + skills route adapters now call publishFrame /
publishUserFrame internally; raw broker.publishUser('default', ...)
call in the stale-row sweeper also converted.
- routes/projects.ts (7 sites), routes/chats.ts (9 sites),
routes/sessions.ts (8 sites): all broker.publishUser(...) → broker.
publishUserFrame(...).
- services/compaction.ts (3 sites): 2 publishUser, 1 publish.
Real protocol drift surfaced by Zod, fixed in the same commit:
services/compaction.ts:442 was publishing chat_status with status:
'working' — the v1.12.1 chat_status widening (CLAUDE.md:55) dropped
this enum value in favor of streaming|tool_running|waiting_for_input|
idle|error. The compaction.ts site was missed during v1.12.1; the
frame had been published with an unknown enum value ever since (the
frontend useChatStatus quietly ignored it). Corrected to 'streaming'
— compaction's LLM call has the same dot-state semantic as an
inference turn. This is exactly the class of bug v1.13.11 exists to
catch.
Schema relaxation: OpaqueObject (the bag type for nested entities like
Project / Chat / Session / WorkspacePane embedded in WS frames) was
z.object({}).passthrough(), which Zod outputs as {} & {[k:string]:
unknown}. The strict-typed entities don't have index signatures so
TypeScript rejected them at publishFrame call sites. Relaxed to
z.unknown() — runtime validation still accepts the value, dev-time
narrowing happens via the existing hand-maintained types. Trade-off:
frame-level drift detection stays sharp; nested-payload validation
goes to follow-up work as the brief intended.
Schema audit:
grep -rn "broker\.publish(\|broker\.publishUser(" apps/server/src \
--include="*.ts" | grep -v "broker.ts\|__tests__\|.bak"
→ 0 results. Every server publish goes through publishFrame /
publishUserFrame. The remaining ctx.publish / ctx.publishUser sites
in services/inference/* + services/auto_name.ts route through the
index.ts adapter, which calls publishFrame internally.
Tests: 219/219 pass (unchanged from v1.13.11-a; the Phase B conversion
is mechanical and doesn't add test cases).
Smoke: clean container boot, no ws-frame-validation-failed entries
under normal traffic. Sidebar list refresh + agent picker open both
pass through useUserEvents without drops.
~70 LoC across 7 files. v1.13.11 closed.
Opencode pattern (session/overflow.ts): fire compaction at 85% of
ctx_max, replacing the v1.11.0-era `ctx_max - 20_000` formula.
Old formula: usable = ctx_max - 20_000
- ctx=262144 → trigger at 242144 (92.4%) — only 7.6% headroom
- ctx=100000 → trigger at 80000 (80.0%)
- ctx= 32000 → trigger at 12000 (37.5%) — over-eager
- ctx<=20000 → trigger at 0 — never fires
New formula: usable = floor(0.85 * ctx_max)
- ctx=262144 → trigger at 222822 (85.0%) — 15% headroom for summarizer
- ctx=100000 → trigger at 85000 (85.0%)
- ctx= 32000 → trigger at 27200 (85.0%)
- ctx= 8192 → trigger at 6963 (85.0%)
Ratio gives consistent headroom at any context scale. The qwen3.6
daily driver gets ~19k tokens more breathing room before overflow;
small-ctx models no longer degenerate to never-triggering.
usable() is the only consumer of COMPACTION_BUFFER → constant deleted.
New EARLY_TRIGGER_RATIO constant takes its place.
isOverflow() and the maybeFlagForCompaction() call site at
payload.ts:184 are unchanged — formula swap is internal to compaction.ts.
payload.ts comment touched only to drop the stale COMPACTION_BUFFER
reference (PRUNE_TRIGGER_TOKENS stays at 20k as the prune-freed
threshold; independent of the overflow formula).
Tests: 4 new usable() corner cases (262k/100k/8k/zero+negative), plus
5 isOverflow() numbers shifted to match the 85k budget at ctx=100k.
195/195 server tests pass (was 194).
Smoke: ratio math verified by unit tests at all four corners. Live
cap-hit verification deferred — requires accumulating >222k tokens
in a session under qwen3.6-35b-a3b-mxfp4 (was >242k pre-fix); will
surface organically in extended use.
Audit traced compaction's summary path post-v1.13.1-B read flip:
- Q1: reads from messages_with_parts (view) — clean
- Q2: parts shape correctly threaded through buildHeadPayload — clean
- Q3: reasoning omitted from summary input — FIX NEEDED
v1.13.1-C wired reasoning end-to-end into inference/payload.ts but
missed this read site. Summarizer model couldn't see the reasoning
trail for tool-bearing turns, quietly degrading summary quality for
reasoning-channel models (qwen3.6).
Fix:
- CompactionMessage extended with reasoning_parts field
- SELECT pulls reasoning_parts from messages_with_parts
- buildHeadPayload (now exported for tests) prefixes assistant content
with <reasoning>...</reasoning>\n\n<content>... when reasoning is
present; standalone <reasoning>...</reasoning> for tool-call-only
turns; omits the tag when reasoning is null or empty
4 new render branch tests (190 total).
Smoke deferred: forcing real compaction requires either threshold
pollution or building up a >40k-token chat with reasoning_parts.
Render branches are unit-covered; integration would only re-prove
structural correctness.
- schema.sql: new messages_with_parts view. tool_calls aggregates parts
with kind='tool_call' as a jsonb array of {id, name, args}; tool_results
picks the single sequence=0 part with kind='tool_result' as a jsonb
{tool_call_id, output, truncated, error?}. COALESCE against the legacy
jsonb columns means pre-v1.13.0 history (no parts rows) still reads
correctly via the fallback, and fresh inserts (where parts dual-write
follows the row INSERT) hit the legacy columns until the parts land.
- reasoning_parts column added to the view but not selected by any caller
yet — v1.13.1-C extends the Message type and pulls it into the model
payload alongside the type extension.
- Read sites switched to FROM messages_with_parts:
- routes/chats.ts:427 (chat history GET)
- routes/messages.ts:95 (session history GET)
- routes/ws.ts:27 (WS snapshot on session connect, resume path)
- services/inference/payload.ts (loadContext for model assembly)
- services/compaction.ts (compaction's payload assembly)
- chats.ts:394 (discard_stale UPDATE RETURNING) unchanged — UPDATEs target
messages directly and the returned shape is for a freshly-modified row
where the legacy column is dual-written and correct.
- messages.ts:478/549 (ask_user_input correlation) intentionally not
migrated — those query a different shape, ported in v1.13.1-C.
- Writes still target `messages` directly; the view is read-only.
Smoke verified against the live container:
- Equivalence: 5/5 messages with both legacy column and parts row return
identical tool_calls jsonb between FROM messages and FROM messages_with_parts.
- Perf: EXPLAIN ANALYZE on the 42-message stress chat returns in ~1ms
(50ms threshold). Bitmap Index Scan on message_parts_msg_seq_idx
carries the parts lookups.
- API contract: GET /api/chats/:id/messages returns identical
{id, name, args} tool_calls and {tool_call_id, output, truncated, error}
tool_results shapes to frontend consumers — no UI changes needed.
- Inference path: sent a view_file prompt; assistant turn 1 emitted the
tool_call, tool message captured the result, follow-up assistant turn
read the result back via loadContext (now view-backed) and answered
correctly. End-to-end loop intact.
v1.13.2 drops the dual-write + the JSON columns + simplifies the view
to just SELECT FROM message_parts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- llama-server does not emit n_ctx in timings (confirmed empirically);
dead code at inference.ts:479 and compaction.ts:300 never fired
- New model-context.ts: cached fetch of /upstream/<model>/props
with positive-cache (no TTL) and 60s negative-cache
- Wired into all 4 ctx_max write sites: 3 in inference.ts
(executeToolPhase, finalizeCompletion, runCapHitSummary) and
1 in compaction.ts (summary row INSERT)
- AbortController 3s timeout, lenient parsing with sensible defaults
- 12 new vitest cases for the cache module (59 total)
- 7 historical assistant rows backfilled manually (see notes)