- Add ai@^6 and @ai-sdk/openai-compatible@^2 to apps/server.
- New services/inference/provider.ts: createOpenAICompatible against
llama-swap (baseURL threaded from config.LLAMA_SWAP_URL, cached per
baseURL). No apiKey — Authelia + Tailscale gate llama-swap, not keys.
- streamCompletion rewritten as an adapter over streamText. AI SDK
fullStream parts (text-delta, tool-call, finish, error) map back to
the legacy {content?, tool_calls?, finishReason} StreamResult shape
that executeStreamPhase already consumes. No layer above
streamCompletion changes.
- toModelMessages converts BooCode's OpenAI-shaped history to AI SDK
ModelMessage[]; tool messages need toolName which we look up by
scanning earlier assistant tool_calls for the matching id.
- buildAiTools wraps BooCode's JSON-schema tool defs via
tool({ inputSchema: jsonSchema(parameters) }) with NO execute —
BooCode dispatches tools in tool-phase.ts, not the AI SDK loop.
- XML fallback parser preserved as-is — qwen3.6 still emits XML tool
calls in text content that the structured tool-call layer misses.
- reasoning-delta parts dropped with a debug-level counter — captured
properly in v1.13.1-C.
- Abort path: streamText({ abortSignal }) wires ctx.signal through, but
AI SDK v6 swallows the abort (fullStream iterator exits cleanly
rather than throwing). Post-iteration `if (signal?.aborted) throw` so
handleAbortOrError owns the row and writes status='cancelled'. Caught
by smoke D; would have shipped as status='complete' on stop otherwise.
- Usage frame reads result.usage (inputTokens / outputTokens v6 names)
AFTER stream drain. Single trailing publish through the existing 500ms
throttle. Known regression: ChatThroughput's live mid-stream tick
(v1.12.2) is gone — it now shows a single value at stream end.
TODO(v1.13.1-followup): interpolate outputTokens during streaming
via a delta-cadence counter (e.g. part.text.length/4 token proxy)
and publish every 500ms; reconcile against result.usage at finish.
- Write-path dual-write from v1.13.0 unaffected.
Read path stays on JSON columns. v1.13.1-B flips reads to message_parts.
Smoke verified end-to-end against running container:
- A. Plain text: status='complete', 1 text part.
- B. Single tool prompt → multi-tool chain (4 calls): every assistant
with tool_calls has 2 parts (text+tool_call), every tool row has
1 part (tool_result).
- C. Multi-step covered by B's chain.
- D. Stop mid-stream: status='cancelled' written via handleAbortOrError
after the post-iteration abort throw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
27 lines
1020 B
TypeScript
27 lines
1020 B
TypeScript
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
|
|
import type { LanguageModel } from 'ai';
|
|
|
|
// v1.13.1-A: AI SDK provider against llama-swap. baseURL is threaded from
|
|
// config.LLAMA_SWAP_URL at call time (not module-load) so tests can stub the
|
|
// upstream without touching env vars. No apiKey — llama-swap is unauth in our
|
|
// Tailscale topology and exposing it over the public internet is gated by
|
|
// Authelia at the Caddy layer, not by API keys.
|
|
|
|
const cache = new Map<string, ReturnType<typeof createOpenAICompatible>>();
|
|
|
|
function getProvider(baseURL: string): ReturnType<typeof createOpenAICompatible> {
|
|
let provider = cache.get(baseURL);
|
|
if (!provider) {
|
|
provider = createOpenAICompatible({
|
|
name: 'llama-swap',
|
|
baseURL: baseURL.endsWith('/v1') ? baseURL : `${baseURL}/v1`,
|
|
});
|
|
cache.set(baseURL, provider);
|
|
}
|
|
return provider;
|
|
}
|
|
|
|
export function upstreamModel(baseURL: string, modelId: string): LanguageModel {
|
|
return getProvider(baseURL).chatModel(modelId);
|
|
}
|