Five fixes for latent regressions surfaced during the v1.13.x.cosmetic revert investigation. None alter schema or compaction; all cleanup against the v1.13.1-A AI SDK migration's hidden surface. (1) provider.ts — includeUsage: true on createOpenAICompatible. @ai-sdk/openai-compatible defaults this false, omitting stream_options.include_usage from the request body; llama-swap never emitted the usage block, so result.usage.inputTokens/outputTokens resolved undefined and tokens_used / ctx_used landed NULL in every assistant row since v1.13.1-A. No historical backfill. (2) MessageList.tsx — hasText = m.content.trim().length > 0. AI SDK v6 streaming occasionally emits a leading "\n" text-delta on tool-call-only turns; the literal newline passed length > 0 and rendered an empty bubble + ActionRow between every tool call. Trim catches it without changing semantics for genuine content. (3) MessageBubble.tsx — same trim on hasContent for the no-tool-calls path. Defensive symmetry with MessageList.flatten. (4) payload.ts — buildMessagesPayload skips assistant rows with status='failed' AND assistant rows with status='complete' + empty content + no tool_calls. Without this, a trailing empty/failed assistant + the next attempt's placeholder produced "Cannot have 2 or more assistant messages at the end of the list" rejections from the OpenAI-compatible upstream after cap-hit + Continue. (5) budget.ts — BUDGET_NO_AGENT 15 → 30. Every tool in ALL_TOOLS is read-only today; the 15-cap was forward-looking for write tools that haven't landed. No-agent mode now matches BUDGET_READ_ONLY. 47 LoC across 5 files. 190/190 server tests pass. Verified live: new assistant turns populate StatsLine token data; single-tool-call turns no longer render the stray empty-bubble + ActionRow between tool calls; Continue after cap-hit no longer hits the trailing-assistant API rejection.
35 lines
1.5 KiB
TypeScript
35 lines
1.5 KiB
TypeScript
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
|
|
import type { LanguageModel } from 'ai';
|
|
|
|
// v1.13.1-A: AI SDK provider against llama-swap. baseURL is threaded from
|
|
// config.LLAMA_SWAP_URL at call time (not module-load) so tests can stub the
|
|
// upstream without touching env vars. No apiKey — llama-swap is unauth in our
|
|
// Tailscale topology and exposing it over the public internet is gated by
|
|
// Authelia at the Caddy layer, not by API keys.
|
|
|
|
const cache = new Map<string, ReturnType<typeof createOpenAICompatible>>();
|
|
|
|
function getProvider(baseURL: string): ReturnType<typeof createOpenAICompatible> {
|
|
let provider = cache.get(baseURL);
|
|
if (!provider) {
|
|
provider = createOpenAICompatible({
|
|
name: 'llama-swap',
|
|
baseURL: baseURL.endsWith('/v1') ? baseURL : `${baseURL}/v1`,
|
|
// v1.13.7: @ai-sdk/openai-compatible defaults includeUsage=false, which
|
|
// omits `stream_options.include_usage` from the request body. Without
|
|
// it, llama.cpp / llama-swap never emits the trailing usage block, so
|
|
// `result.usage` resolves with inputTokens=outputTokens=undefined and
|
|
// tokens_used / ctx_used land as NULL in every messages row. Setting
|
|
// true here re-enables the per-stream usage payload across all models
|
|
// served via the llama-swap provider.
|
|
includeUsage: true,
|
|
});
|
|
cache.set(baseURL, provider);
|
|
}
|
|
return provider;
|
|
}
|
|
|
|
export function upstreamModel(baseURL: string, modelId: string): LanguageModel {
|
|
return getProvider(baseURL).chatModel(modelId);
|
|
}
|