boocode/docs/multi-provider-local-models.md

# Multi-Provider Local Models — Operator Guide

How BooCode routes local inference across multiple llama-swap machines, how to
add another machine, and the smoke matrix to run after any provider change.
Implementation plan: [plans/multi-provider-local-models/feature-implementation-plan.md](plans/multi-provider-local-models/feature-implementation-plan.md).

## Runtime contract

- **Config authority:** `/data/llama-providers.json` (bind-mounted; gitignored),
  read by both `apps/server` and `apps/coder` via `LLAMA_PROVIDERS_PATH`.
  Tracked template: `data/llama-providers.example.json`.
- **Legacy fallback:** when the file is absent, both apps synthesize a single
  provider from `LLAMA_SWAP_URL`. Startup never breaks on a missing file.
- **Model identity:** persisted and cached ids are composite `provider/model`
  (e.g. `sam-desktop/qwen3.6-35b-a3b`). Wire calls to upstreams always send the
  bare model id. Legacy bare ids resolve to `defaultProvider` indefinitely.
- **Resolver:** `resolveModelProvider()` in
  `apps/server/src/services/inference/provider.ts` is the single routing
  authority for streaming, non-streaming, context lookup, compaction, and
  task-model fallback. The coder mirrors this via its registry loader
  (`apps/coder/src/services/llama-providers.ts`) for arena and the local gateway.
- **opencode bridge:** the BooCoder-hosted OpenAI-compatible gateway
  (`apps/coder/src/services/local-gateway.ts`) exposes all local providers to
  opencode under the single namespace `boocode-local`; the inner modelID is the
  composite id (`boocode-local/sam-desktop/qwen3.6-35b`). No path rewrites a
  composite id down to `llama-swap/<model>`.

## Add a machine

1. Start llama-swap on the new machine, reachable over Tailscale
   (e.g. `http://100.x.y.z:84NN`).
2. Edit `/data/llama-providers.json`: append a provider entry
   `{ "id": "<machine-slug>", "label": "<Display>", "baseUrl": "http://100.x.y.z:84NN", "kind": "llama-swap" }`.
3. Restart consumers: `docker compose restart boocode` (server reads the file at
   startup) and `sudo systemctl restart boocoder`.
4. Verify: `GET /api/models` shows a new provider group; the new machine's
   models appear as `<machine-slug>/<model>` in the BooChat picker and the
   native BooCoder composer.
5. Run the smoke matrix below.

That is the whole flow — no code changes, no rebuild (config lives in the
bind-mounted `data/`).

## Smoke matrix

Run after adding/removing a provider or changing provider config:

| Case | Steps | Expect |
|---|---|---|
| Legacy fallback | Remove/rename `llama-providers.json`, restart server | Boot OK; single provider synthesized from `LLAMA_SWAP_URL`; bare-id sessions still stream |
| Two local providers | File with `sam-desktop` + `embedding`; chat once on a model from each | Both stream; `GET /api/models` shows both groups with composite ids |
| Duplicate model names | Same wire model name on two providers; chat on each composite id | Each request hits its own machine (check llama-swap logs); context limits are not cross-shared |
| DeepSeek enabled | Set `DEEPSEEK_API_KEY`; pick `deepseek/<model>`; also pick `embedding/deepseek-r1-qwen3-8b` | First routes to DeepSeek cloud; second routes to local `embedding` (collision case) |
| Favorites | Star models from two providers, refresh, unplug one provider, refresh | Favorites persist; offline provider's favorites hidden, not deleted from settings |
| opencode parity | Dispatch an opencode task on `boocode-local/<provider>/<model>` for two providers sharing a wire name | Each lands on the correct machine; no `llama-swap/` collapse in opencode config or logs |
| Arena | Battle with contestants from two local providers | Local lane stays serial (ADR-0001); each contestant calls its own provider |

## Interface for BooControl (follow-on)

BooControl must consume, not reinvent:

- the provider registry file `/data/llama-providers.json` (schema:
  `@boocode/contracts/llama-providers`, `LlamaProvidersFileSchema`) as the
  single source of provider identity;
- composite `provider/model` ids everywhere it stores or displays model
  identity (`parseModelRef`/`formatModelRef` from the same contracts subpath);
- `GET /api/models` for live inventory and `favorite_models` in
  `GET/PATCH /api/settings` for user preference — never raw host env vars.

Adding fleet UI = writing this file + restarting consumers; nothing else owns
provider identity.

## External agents

Both of Sam's coder agents get the local fleet through the gateway at coder
startup, under the single provider namespace `boocode-local`:

- **opencode** — `opencode-config-sync.ts` writes the provider (with
  `@ai-sdk/openai-compatible` + gateway `baseURL` + model map) into
  `~/.config/opencode/opencode.json`.
- **Pi** — `pi-config-sync.ts` writes the provider into
  `~/.pi/agent/models.json` (other providers untouched; hand-tuned per-model
  `contextWindow`/`maxTokens` overrides on boocode-local entries survive
  re-sync).

After adding a machine, `sudo systemctl restart boocoder` re-syncs both.

## Resilience notes

- **Arena's local-model set self-refreshes every 5 min**
  (`arena-local-models.ts`): a provider that was down at coder startup is
  reclassified as local once it recovers; an unreachable provider keeps its
  last-known models (stale-but-local beats a wrong cloud-lane dispatch). Bare
  ids are contributed only by the default provider.
- The gateway forwards the client's `Authorization` header to upstreams when
  present; its `/v1/*` routes remain unauthenticated on :9502 (repo
  convention: the reverse proxy owns auth).
- Gateway `GET /v1/models` serves the live composite model list fetched from
  every registry provider.