- Add ai@^6 and @ai-sdk/openai-compatible@^2 to apps/server.
- New services/inference/provider.ts: createOpenAICompatible against
llama-swap (baseURL threaded from config.LLAMA_SWAP_URL, cached per
baseURL). No apiKey — Authelia + Tailscale gate llama-swap, not keys.
- streamCompletion rewritten as an adapter over streamText. AI SDK
fullStream parts (text-delta, tool-call, finish, error) map back to
the legacy {content?, tool_calls?, finishReason} StreamResult shape
that executeStreamPhase already consumes. No layer above
streamCompletion changes.
- toModelMessages converts BooCode's OpenAI-shaped history to AI SDK
ModelMessage[]; tool messages need toolName which we look up by
scanning earlier assistant tool_calls for the matching id.
- buildAiTools wraps BooCode's JSON-schema tool defs via
tool({ inputSchema: jsonSchema(parameters) }) with NO execute —
BooCode dispatches tools in tool-phase.ts, not the AI SDK loop.
- XML fallback parser preserved as-is — qwen3.6 still emits XML tool
calls in text content that the structured tool-call layer misses.
- reasoning-delta parts dropped with a debug-level counter — captured
properly in v1.13.1-C.
- Abort path: streamText({ abortSignal }) wires ctx.signal through, but
AI SDK v6 swallows the abort (fullStream iterator exits cleanly
rather than throwing). Post-iteration `if (signal?.aborted) throw` so
handleAbortOrError owns the row and writes status='cancelled'. Caught
by smoke D; would have shipped as status='complete' on stop otherwise.
- Usage frame reads result.usage (inputTokens / outputTokens v6 names)
AFTER stream drain. Single trailing publish through the existing 500ms
throttle. Known regression: ChatThroughput's live mid-stream tick
(v1.12.2) is gone — it now shows a single value at stream end.
TODO(v1.13.1-followup): interpolate outputTokens during streaming
via a delta-cadence counter (e.g. part.text.length/4 token proxy)
and publish every 500ms; reconcile against result.usage at finish.
- Write-path dual-write from v1.13.0 unaffected.
Read path stays on JSON columns. v1.13.1-B flips reads to message_parts.
Smoke verified end-to-end against running container:
- A. Plain text: status='complete', 1 text part.
- B. Single tool prompt → multi-tool chain (4 calls): every assistant
with tool_calls has 2 parts (text+tool_call), every tool row has
1 part (tool_result).
- C. Multi-step covered by B's chain.
- D. Stop mid-stream: status='cancelled' written via handleAbortOrError
after the post-iteration abort throw.
- sentinel-summaries.ts: runCapHitSummary, insertCapHitSentinel,
runDoomLoopSummary, insertDoomLoopSentinel
- inference.ts → inference/turn.ts: residue is runAssistantTurn,
runInference, createInferenceRunner orchestration only
- inference/index.ts: re-export shim preserves the public surface
(createInferenceRunner, runInference, runAssistantTurn,
detectDoomLoop, DOOM_LOOP_THRESHOLD, buildMessagesPayload, plus
type-side InferenceContext/InferenceFrame/StreamResult/TurnArgs/
FramePublisher)
- src/index.ts + auto_name.ts + the two vitest test files updated to
import from ./services/inference/index.js explicitly (NodeNext ESM
doesn't honor directory-index resolution)
Final tally: 11 files under services/inference/, the largest being
sentinel-summaries.ts at 523 LoC (two near-clone summary paths kept
side-by-side until a third sentinel justifies factoring out a shared
runWrapUpSummary). turn.ts is now 326 LoC, the next-largest is
stream-phase.ts at 380. Public import surface unchanged.
tool-phase.ts → turn.ts back-edge for runAssistantTurn remains
(cycle is safe; resolved at call time).
Prepares the file structure for v1.13 AI SDK migration — streamText
swap targets stream-phase.ts only.
- stream-phase.ts: streamCompletion, executeStreamPhase (plus sseLines,
StreamOptions, ChatCompletionDelta/Chunk as private helpers)
- tool-phase.ts: executeToolPhase + private executeToolCall
- types.ts: shared StreamPhaseState + DB_FLUSH_INTERVAL_MS so the
summary functions still in inference.ts can reference them without
pulling from a phase file
Cycle: executeToolPhase recurses into runAssistantTurn, which stays in
inference.ts. Resolved by direct value back-edge — tool-phase.ts does
`import { runAssistantTurn } from '../inference.js'` and runAssistantTurn
is now exported. Safe because the dereference happens inside an async
function body, after both modules have fully evaluated. No
callback-through-args fallback needed.
inference.ts shrinks from ~1401 to ~828 LoC. Final Dispatch D moves the
sentinel summaries out and renames the residue to inference/turn.ts.