- Add ai@^6 and @ai-sdk/openai-compatible@^2 to apps/server.
- New services/inference/provider.ts: createOpenAICompatible against
llama-swap (baseURL threaded from config.LLAMA_SWAP_URL, cached per
baseURL). No apiKey — Authelia + Tailscale gate llama-swap, not keys.
- streamCompletion rewritten as an adapter over streamText. AI SDK
fullStream parts (text-delta, tool-call, finish, error) map back to
the legacy {content?, tool_calls?, finishReason} StreamResult shape
that executeStreamPhase already consumes. No layer above
streamCompletion changes.
- toModelMessages converts BooCode's OpenAI-shaped history to AI SDK
ModelMessage[]; tool messages need toolName which we look up by
scanning earlier assistant tool_calls for the matching id.
- buildAiTools wraps BooCode's JSON-schema tool defs via
tool({ inputSchema: jsonSchema(parameters) }) with NO execute —
BooCode dispatches tools in tool-phase.ts, not the AI SDK loop.
- XML fallback parser preserved as-is — qwen3.6 still emits XML tool
calls in text content that the structured tool-call layer misses.
- reasoning-delta parts dropped with a debug-level counter — captured
properly in v1.13.1-C.
- Abort path: streamText({ abortSignal }) wires ctx.signal through, but
AI SDK v6 swallows the abort (fullStream iterator exits cleanly
rather than throwing). Post-iteration `if (signal?.aborted) throw` so
handleAbortOrError owns the row and writes status='cancelled'. Caught
by smoke D; would have shipped as status='complete' on stop otherwise.
- Usage frame reads result.usage (inputTokens / outputTokens v6 names)
AFTER stream drain. Single trailing publish through the existing 500ms
throttle. Known regression: ChatThroughput's live mid-stream tick
(v1.12.2) is gone — it now shows a single value at stream end.
TODO(v1.13.1-followup): interpolate outputTokens during streaming
via a delta-cadence counter (e.g. part.text.length/4 token proxy)
and publish every 500ms; reconcile against result.usage at finish.
- Write-path dual-write from v1.13.0 unaffected.
Read path stays on JSON columns. v1.13.1-B flips reads to message_parts.
Smoke verified end-to-end against running container:
- A. Plain text: status='complete', 1 text part.
- B. Single tool prompt → multi-tool chain (4 calls): every assistant
with tool_calls has 2 parts (text+tool_call), every tool row has
1 part (tool_result).
- C. Multi-step covered by B's chain.
- D. Stop mid-stream: status='cancelled' written via handleAbortOrError
after the post-iteration abort throw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
31 lines
766 B
JSON
31 lines
766 B
JSON
{
|
|
"name": "@boocode/server",
|
|
"version": "0.0.0",
|
|
"private": true,
|
|
"type": "module",
|
|
"main": "dist/index.js",
|
|
"scripts": {
|
|
"dev": "tsx watch src/index.ts",
|
|
"build": "tsc && node -e \"import('node:fs').then(fs=>fs.copyFileSync('src/schema.sql','dist/schema.sql'))\"",
|
|
"typecheck": "tsc --noEmit",
|
|
"test": "vitest run"
|
|
},
|
|
"dependencies": {
|
|
"@ai-sdk/openai-compatible": "^2.0.47",
|
|
"@fastify/static": "^7.0.4",
|
|
"@fastify/websocket": "^10.0.1",
|
|
"ai": "^6.0.190",
|
|
"fastify": "^4.28.1",
|
|
"postgres": "^3.4.4",
|
|
"ws": "^8.18.0",
|
|
"zod": "^3.23.8"
|
|
},
|
|
"devDependencies": {
|
|
"@types/node": "^20.14.10",
|
|
"@types/ws": "^8.5.10",
|
|
"tsx": "^4.16.2",
|
|
"typescript": "^5.5.0",
|
|
"vitest": "^3.2.4"
|
|
}
|
|
}
|