v1.13.3: cleanup bundle — statement timeout + alpha ordering + stuck-row sweeper + repairToolCall

Four independent items, all owed from prior dispatches.

- statement_timeout at the database level via:
    ALTER DATABASE boocode SET statement_timeout = '30s';
  Applied operationally; documented as a comment at the top of schema.sql
  (ALTER DATABASE can't run inside a DO block, so it's not idempotent
  inside applySchema). Re-apply after a volume reset.

- Tool registry alpha-sorted at module load. llama.cpp's prompt cache
  hits on byte-identical prefixes; any reordering of the tool list near
  the top of the system prompt would invalidate every cached turn.
  Single-source sort at the ALL_TOOLS export so toolJsonSchemas() and
  TOOLS_BY_NAME inherit the order automatically. New tools.test.ts
  asserts the invariant; total tests 173 (was 172).

- Periodic in-process stuck-row sweeper. Runs every 60s, marks
  'streaming' rows older than 5 minutes as 'failed', and publishes
  chat_status='idle' on the user channel so the UI dot drops without a
  refresh. Closes the mid-session crash UX gap; the v1.12.1 boot sweep
  only fires once at startup, so sessions used to stay stuck until next
  reboot. setInterval cleaned up via app.addHook('onClose'). Mirrors
  handleAbortOrError's publish pattern.

- experimental_repairToolCall wired through AI SDK v6 streamText. Pass-
  through implementation: log + return the original toolCall so the
  stream keeps going. executeToolPhase's existing error paths (unknown
  tool name → 'unknown tool: X' result; zod-reject → 'tool X rejected
  — field: required') already surface bad calls to the model; the value
  here is preventing the AI SDK from THROWING on parse errors and
  killing the whole stream. Owed since v1.13.1-A.

Smoke verified:
- statement_timeout = '30s' confirmed via SHOW.
- Tool path normal flow intact (list_dir prompt → tool_call → result
  → final assistant). No malformed tool calls in the test run; repair
  log will surface them when qwen3.6 actually emits one.
- Alpha order verified at runtime via the dist bundle: match: true.
- Sweeper logic not traffic-tested (no stuck rows to find), but the
  SQL UPDATE + broker.publishUser pattern is identical to handleAbort
  and the boot sweep — synthesis-only verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 06:46:03 +00:00
parent ac1a71f583
commit a08d809b73
5 changed files with 102 additions and 3 deletions

View File

@@ -201,6 +201,46 @@ async function main() {
app.log.info(`serving static frontend from ${webDist}`);
}
// v1.13.3: periodic in-process sweeper for streaming rows orphaned by a
// mid-session crash. The boot sweep (above) only fires once at startup;
// this loop catches the in-flight case. 60s cadence + 5-min threshold
// matches the boot sweep so behavior is consistent. Publishes
// chat_status='idle' on the user channel so the UI dot drops without a
// refresh — same pattern as handleAbortOrError.
const SWEEP_INTERVAL_MS = 60_000;
const sweepStaleStreaming = async (): Promise<void> => {
try {
const rows = await sql<{ id: string; chat_id: string }[]>`
UPDATE messages
SET status = 'failed', finished_at = clock_timestamp()
WHERE status = 'streaming'
AND created_at < NOW() - INTERVAL '5 minutes'
RETURNING id, chat_id
`;
if (rows.length === 0) return;
app.log.warn(
{ swept: rows.length, ids: rows.map((r) => r.id) },
'swept stale streaming rows',
);
const seenChats = new Set<string>();
const now = new Date().toISOString();
for (const row of rows) {
if (seenChats.has(row.chat_id)) continue;
seenChats.add(row.chat_id);
broker.publishUser('default', {
type: 'chat_status',
chat_id: row.chat_id,
status: 'idle',
at: now,
});
}
} catch (err) {
app.log.error({ err }, 'stuck-row sweeper failed');
}
};
const sweepTimer = setInterval(() => { void sweepStaleStreaming(); }, SWEEP_INTERVAL_MS);
app.addHook('onClose', async () => { clearInterval(sweepTimer); });
const shutdown = async (signal: string) => {
app.log.info(`received ${signal}, shutting down`);
try {