Files
boocode/apps/server/src/services/inference/provider.ts
indifferentketchup ff29b48e3a v1.13.7: stability bundle — usage capture + payload/UI sanitization
Five fixes for latent regressions surfaced during the v1.13.x.cosmetic
revert investigation. None alter schema or compaction; all cleanup
against the v1.13.1-A AI SDK migration's hidden surface.

(1) provider.ts — includeUsage: true on createOpenAICompatible.
@ai-sdk/openai-compatible defaults this false, omitting
stream_options.include_usage from the request body; llama-swap never
emitted the usage block, so result.usage.inputTokens/outputTokens
resolved undefined and tokens_used / ctx_used landed NULL in every
assistant row since v1.13.1-A. No historical backfill.

(2) MessageList.tsx — hasText = m.content.trim().length > 0.
AI SDK v6 streaming occasionally emits a leading "\n" text-delta on
tool-call-only turns; the literal newline passed length > 0 and
rendered an empty bubble + ActionRow between every tool call. Trim
catches it without changing semantics for genuine content.

(3) MessageBubble.tsx — same trim on hasContent for the no-tool-calls
path. Defensive symmetry with MessageList.flatten.

(4) payload.ts — buildMessagesPayload skips assistant rows with
status='failed' AND assistant rows with status='complete' + empty
content + no tool_calls. Without this, a trailing empty/failed
assistant + the next attempt's placeholder produced "Cannot have 2
or more assistant messages at the end of the list" rejections from
the OpenAI-compatible upstream after cap-hit + Continue.

(5) budget.ts — BUDGET_NO_AGENT 15 → 30. Every tool in ALL_TOOLS is
read-only today; the 15-cap was forward-looking for write tools that
haven't landed. No-agent mode now matches BUDGET_READ_ONLY.

47 LoC across 5 files. 190/190 server tests pass.

Verified live: new assistant turns populate StatsLine token data;
single-tool-call turns no longer render the stray empty-bubble +
ActionRow between tool calls; Continue after cap-hit no longer hits
the trailing-assistant API rejection.
2026-05-22 13:24:19 +00:00

35 lines
1.5 KiB
TypeScript

import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
import type { LanguageModel } from 'ai';
// v1.13.1-A: AI SDK provider against llama-swap. baseURL is threaded from
// config.LLAMA_SWAP_URL at call time (not module-load) so tests can stub the
// upstream without touching env vars. No apiKey — llama-swap is unauth in our
// Tailscale topology and exposing it over the public internet is gated by
// Authelia at the Caddy layer, not by API keys.
const cache = new Map<string, ReturnType<typeof createOpenAICompatible>>();
function getProvider(baseURL: string): ReturnType<typeof createOpenAICompatible> {
let provider = cache.get(baseURL);
if (!provider) {
provider = createOpenAICompatible({
name: 'llama-swap',
baseURL: baseURL.endsWith('/v1') ? baseURL : `${baseURL}/v1`,
// v1.13.7: @ai-sdk/openai-compatible defaults includeUsage=false, which
// omits `stream_options.include_usage` from the request body. Without
// it, llama.cpp / llama-swap never emits the trailing usage block, so
// `result.usage` resolves with inputTokens=outputTokens=undefined and
// tokens_used / ctx_used land as NULL in every messages row. Setting
// true here re-enables the per-stream usage payload across all models
// served via the llama-swap provider.
includeUsage: true,
});
cache.set(baseURL, provider);
}
return provider;
}
export function upstreamModel(baseURL: string, modelId: string): LanguageModel {
return getProvider(baseURL).chatModel(modelId);
}