# Multi-Provider Local Models — Operator Guide How BooCode routes local inference across multiple llama-swap machines, how to add another machine, and the smoke matrix to run after any provider change. Implementation plan: [plans/multi-provider-local-models/feature-implementation-plan.md](plans/multi-provider-local-models/feature-implementation-plan.md). ## Runtime contract - **Config authority:** `/data/llama-providers.json` (bind-mounted; gitignored), read by both `apps/server` and `apps/coder` via `LLAMA_PROVIDERS_PATH`. Tracked template: `data/llama-providers.example.json`. - **Legacy fallback:** when the file is absent, both apps synthesize a single provider from `LLAMA_SWAP_URL`. Startup never breaks on a missing file. - **Model identity:** persisted and cached ids are composite `provider/model` (e.g. `sam-desktop/qwen3.6-35b-a3b`). Wire calls to upstreams always send the bare model id. Legacy bare ids resolve to `defaultProvider` indefinitely. - **Resolver:** `resolveModelProvider()` in `apps/server/src/services/inference/provider.ts` is the single routing authority for streaming, non-streaming, context lookup, compaction, and task-model fallback. The coder mirrors this via its registry loader (`apps/coder/src/services/llama-providers.ts`) for arena and the local gateway. - **opencode bridge:** the BooCoder-hosted OpenAI-compatible gateway (`apps/coder/src/services/local-gateway.ts`) exposes all local providers to opencode under the single namespace `boocode-local`; the inner modelID is the composite id (`boocode-local/sam-desktop/qwen3.6-35b`). No path rewrites a composite id down to `llama-swap/`. ## Add a machine 1. Start llama-swap on the new machine, reachable over Tailscale (e.g. `http://100.x.y.z:84NN`). 2. Edit `/data/llama-providers.json`: append a provider entry `{ "id": "", "label": "", "baseUrl": "http://100.x.y.z:84NN", "kind": "llama-swap" }`. 3. Restart consumers: `docker compose restart boocode` (server reads the file at startup) and `sudo systemctl restart boocoder`. 4. Verify: `GET /api/models` shows a new provider group; the new machine's models appear as `/` in the BooChat picker and the native BooCoder composer. 5. Run the smoke matrix below. That is the whole flow — no code changes, no rebuild (config lives in the bind-mounted `data/`). ## Smoke matrix Run after adding/removing a provider or changing provider config: | Case | Steps | Expect | |---|---|---| | Legacy fallback | Remove/rename `llama-providers.json`, restart server | Boot OK; single provider synthesized from `LLAMA_SWAP_URL`; bare-id sessions still stream | | Two local providers | File with `sam-desktop` + `embedding`; chat once on a model from each | Both stream; `GET /api/models` shows both groups with composite ids | | Duplicate model names | Same wire model name on two providers; chat on each composite id | Each request hits its own machine (check llama-swap logs); context limits are not cross-shared | | DeepSeek enabled | Set `DEEPSEEK_API_KEY`; pick `deepseek/`; also pick `embedding/deepseek-r1-qwen3-8b` | First routes to DeepSeek cloud; second routes to local `embedding` (collision case) | | Favorites | Star models from two providers, refresh, unplug one provider, refresh | Favorites persist; offline provider's favorites hidden, not deleted from settings | | opencode parity | Dispatch an opencode task on `boocode-local//` for two providers sharing a wire name | Each lands on the correct machine; no `llama-swap/` collapse in opencode config or logs | | Arena | Battle with contestants from two local providers | Local lane stays serial (ADR-0001); each contestant calls its own provider | ## Interface for BooControl (follow-on) BooControl must consume, not reinvent: - the provider registry file `/data/llama-providers.json` (schema: `@boocode/contracts/llama-providers`, `LlamaProvidersFileSchema`) as the single source of provider identity; - composite `provider/model` ids everywhere it stores or displays model identity (`parseModelRef`/`formatModelRef` from the same contracts subpath); - `GET /api/models` for live inventory and `favorite_models` in `GET/PATCH /api/settings` for user preference — never raw host env vars. Adding fleet UI = writing this file + restarting consumers; nothing else owns provider identity. ## External agents Both of Sam's coder agents get the local fleet through the gateway at coder startup, under the single provider namespace `boocode-local`: - **opencode** — `opencode-config-sync.ts` writes the provider (with `@ai-sdk/openai-compatible` + gateway `baseURL` + model map) into `~/.config/opencode/opencode.json`. - **Pi** — `pi-config-sync.ts` writes the provider into `~/.pi/agent/models.json` (other providers untouched; hand-tuned per-model `contextWindow`/`maxTokens` overrides on boocode-local entries survive re-sync). After adding a machine, `sudo systemctl restart boocoder` re-syncs both. ## Resilience notes - **Arena's local-model set self-refreshes every 5 min** (`arena-local-models.ts`): a provider that was down at coder startup is reclassified as local once it recovers; an unreachable provider keeps its last-known models (stale-but-local beats a wrong cloud-lane dispatch). Bare ids are contributed only by the default provider. - The gateway forwards the client's `Authorization` header to upstreams when present; its `/v1/*` routes remain unauthenticated on :9502 (repo convention: the reverse proxy owns auth). - Gateway `GET /v1/models` serves the live composite model list fetched from every registry provider.