Files
boocode/docs/multi-provider-local-models.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

5.6 KiB

Multi-Provider Local Models — Operator Guide

How BooCode routes local inference across multiple llama-swap machines, how to add another machine, and the smoke matrix to run after any provider change. Implementation plan: plans/multi-provider-local-models/feature-implementation-plan.md.

Runtime contract

  • Config authority: /data/llama-providers.json (bind-mounted; gitignored), read by both apps/server and apps/coder via LLAMA_PROVIDERS_PATH. Tracked template: data/llama-providers.example.json.
  • Legacy fallback: when the file is absent, both apps synthesize a single provider from LLAMA_SWAP_URL. Startup never breaks on a missing file.
  • Model identity: persisted and cached ids are composite provider/model (e.g. sam-desktop/qwen3.6-35b-a3b). Wire calls to upstreams always send the bare model id. Legacy bare ids resolve to defaultProvider indefinitely.
  • Resolver: resolveModelProvider() in apps/server/src/services/inference/provider.ts is the single routing authority for streaming, non-streaming, context lookup, compaction, and task-model fallback. The coder mirrors this via its registry loader (apps/coder/src/services/llama-providers.ts) for arena and the local gateway.
  • opencode bridge: the BooCoder-hosted OpenAI-compatible gateway (apps/coder/src/services/local-gateway.ts) exposes all local providers to opencode under the single namespace boocode-local; the inner modelID is the composite id (boocode-local/sam-desktop/qwen3.6-35b). No path rewrites a composite id down to llama-swap/<model>.

Add a machine

  1. Start llama-swap on the new machine, reachable over Tailscale (e.g. http://100.x.y.z:84NN).
  2. Edit /data/llama-providers.json: append a provider entry { "id": "<machine-slug>", "label": "<Display>", "baseUrl": "http://100.x.y.z:84NN", "kind": "llama-swap" }.
  3. Restart consumers: docker compose restart boocode (server reads the file at startup) and sudo systemctl restart boocoder.
  4. Verify: GET /api/models shows a new provider group; the new machine's models appear as <machine-slug>/<model> in the BooChat picker and the native BooCoder composer.
  5. Run the smoke matrix below.

That is the whole flow — no code changes, no rebuild (config lives in the bind-mounted data/).

Smoke matrix

Run after adding/removing a provider or changing provider config:

Case Steps Expect
Legacy fallback Remove/rename llama-providers.json, restart server Boot OK; single provider synthesized from LLAMA_SWAP_URL; bare-id sessions still stream
Two local providers File with sam-desktop + embedding; chat once on a model from each Both stream; GET /api/models shows both groups with composite ids
Duplicate model names Same wire model name on two providers; chat on each composite id Each request hits its own machine (check llama-swap logs); context limits are not cross-shared
DeepSeek enabled Set DEEPSEEK_API_KEY; pick deepseek/<model>; also pick embedding/deepseek-r1-qwen3-8b First routes to DeepSeek cloud; second routes to local embedding (collision case)
Favorites Star models from two providers, refresh, unplug one provider, refresh Favorites persist; offline provider's favorites hidden, not deleted from settings
opencode parity Dispatch an opencode task on boocode-local/<provider>/<model> for two providers sharing a wire name Each lands on the correct machine; no llama-swap/ collapse in opencode config or logs
Arena Battle with contestants from two local providers Local lane stays serial (ADR-0001); each contestant calls its own provider

Interface for BooControl (follow-on)

BooControl must consume, not reinvent:

  • the provider registry file /data/llama-providers.json (schema: @boocode/contracts/llama-providers, LlamaProvidersFileSchema) as the single source of provider identity;
  • composite provider/model ids everywhere it stores or displays model identity (parseModelRef/formatModelRef from the same contracts subpath);
  • GET /api/models for live inventory and favorite_models in GET/PATCH /api/settings for user preference — never raw host env vars.

Adding fleet UI = writing this file + restarting consumers; nothing else owns provider identity.

External agents

Both of Sam's coder agents get the local fleet through the gateway at coder startup, under the single provider namespace boocode-local:

  • opencodeopencode-config-sync.ts writes the provider (with @ai-sdk/openai-compatible + gateway baseURL + model map) into ~/.config/opencode/opencode.json.
  • Pipi-config-sync.ts writes the provider into ~/.pi/agent/models.json (other providers untouched; hand-tuned per-model contextWindow/maxTokens overrides on boocode-local entries survive re-sync).

After adding a machine, sudo systemctl restart boocoder re-syncs both.

Resilience notes

  • Arena's local-model set self-refreshes every 5 min (arena-local-models.ts): a provider that was down at coder startup is reclassified as local once it recovers; an unreachable provider keeps its last-known models (stale-but-local beats a wrong cloud-lane dispatch). Bare ids are contributed only by the default provider.
  • The gateway forwards the client's Authorization header to upstreams when present; its /v1/* routes remain unauthenticated on :9502 (repo convention: the reverse proxy owns auth).
  • Gateway GET /v1/models serves the live composite model list fetched from every registry provider.