Files
boocode/docs/multi-provider-local-models.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

100 lines
5.6 KiB
Markdown

# Multi-Provider Local Models — Operator Guide
How BooCode routes local inference across multiple llama-swap machines, how to
add another machine, and the smoke matrix to run after any provider change.
Implementation plan: [plans/multi-provider-local-models/feature-implementation-plan.md](plans/multi-provider-local-models/feature-implementation-plan.md).
## Runtime contract
- **Config authority:** `/data/llama-providers.json` (bind-mounted; gitignored),
read by both `apps/server` and `apps/coder` via `LLAMA_PROVIDERS_PATH`.
Tracked template: `data/llama-providers.example.json`.
- **Legacy fallback:** when the file is absent, both apps synthesize a single
provider from `LLAMA_SWAP_URL`. Startup never breaks on a missing file.
- **Model identity:** persisted and cached ids are composite `provider/model`
(e.g. `sam-desktop/qwen3.6-35b-a3b`). Wire calls to upstreams always send the
bare model id. Legacy bare ids resolve to `defaultProvider` indefinitely.
- **Resolver:** `resolveModelProvider()` in
`apps/server/src/services/inference/provider.ts` is the single routing
authority for streaming, non-streaming, context lookup, compaction, and
task-model fallback. The coder mirrors this via its registry loader
(`apps/coder/src/services/llama-providers.ts`) for arena and the local gateway.
- **opencode bridge:** the BooCoder-hosted OpenAI-compatible gateway
(`apps/coder/src/services/local-gateway.ts`) exposes all local providers to
opencode under the single namespace `boocode-local`; the inner modelID is the
composite id (`boocode-local/sam-desktop/qwen3.6-35b`). No path rewrites a
composite id down to `llama-swap/<model>`.
## Add a machine
1. Start llama-swap on the new machine, reachable over Tailscale
(e.g. `http://100.x.y.z:84NN`).
2. Edit `/data/llama-providers.json`: append a provider entry
`{ "id": "<machine-slug>", "label": "<Display>", "baseUrl": "http://100.x.y.z:84NN", "kind": "llama-swap" }`.
3. Restart consumers: `docker compose restart boocode` (server reads the file at
startup) and `sudo systemctl restart boocoder`.
4. Verify: `GET /api/models` shows a new provider group; the new machine's
models appear as `<machine-slug>/<model>` in the BooChat picker and the
native BooCoder composer.
5. Run the smoke matrix below.
That is the whole flow — no code changes, no rebuild (config lives in the
bind-mounted `data/`).
## Smoke matrix
Run after adding/removing a provider or changing provider config:
| Case | Steps | Expect |
|---|---|---|
| Legacy fallback | Remove/rename `llama-providers.json`, restart server | Boot OK; single provider synthesized from `LLAMA_SWAP_URL`; bare-id sessions still stream |
| Two local providers | File with `sam-desktop` + `embedding`; chat once on a model from each | Both stream; `GET /api/models` shows both groups with composite ids |
| Duplicate model names | Same wire model name on two providers; chat on each composite id | Each request hits its own machine (check llama-swap logs); context limits are not cross-shared |
| DeepSeek enabled | Set `DEEPSEEK_API_KEY`; pick `deepseek/<model>`; also pick `embedding/deepseek-r1-qwen3-8b` | First routes to DeepSeek cloud; second routes to local `embedding` (collision case) |
| Favorites | Star models from two providers, refresh, unplug one provider, refresh | Favorites persist; offline provider's favorites hidden, not deleted from settings |
| opencode parity | Dispatch an opencode task on `boocode-local/<provider>/<model>` for two providers sharing a wire name | Each lands on the correct machine; no `llama-swap/` collapse in opencode config or logs |
| Arena | Battle with contestants from two local providers | Local lane stays serial (ADR-0001); each contestant calls its own provider |
## Interface for BooControl (follow-on)
BooControl must consume, not reinvent:
- the provider registry file `/data/llama-providers.json` (schema:
`@boocode/contracts/llama-providers`, `LlamaProvidersFileSchema`) as the
single source of provider identity;
- composite `provider/model` ids everywhere it stores or displays model
identity (`parseModelRef`/`formatModelRef` from the same contracts subpath);
- `GET /api/models` for live inventory and `favorite_models` in
`GET/PATCH /api/settings` for user preference — never raw host env vars.
Adding fleet UI = writing this file + restarting consumers; nothing else owns
provider identity.
## External agents
Both of Sam's coder agents get the local fleet through the gateway at coder
startup, under the single provider namespace `boocode-local`:
- **opencode** — `opencode-config-sync.ts` writes the provider (with
`@ai-sdk/openai-compatible` + gateway `baseURL` + model map) into
`~/.config/opencode/opencode.json`.
- **Pi** — `pi-config-sync.ts` writes the provider into
`~/.pi/agent/models.json` (other providers untouched; hand-tuned per-model
`contextWindow`/`maxTokens` overrides on boocode-local entries survive
re-sync).
After adding a machine, `sudo systemctl restart boocoder` re-syncs both.
## Resilience notes
- **Arena's local-model set self-refreshes every 5 min**
(`arena-local-models.ts`): a provider that was down at coder startup is
reclassified as local once it recovers; an unreachable provider keeps its
last-known models (stale-but-local beats a wrong cloud-lane dispatch). Bare
ids are contributed only by the default provider.
- The gateway forwards the client's `Authorization` header to upstreams when
present; its `/v1/*` routes remain unauthenticated on :9502 (repo
convention: the reverse proxy owns auth).
- Gateway `GET /v1/models` serves the live composite model list fetched from
every registry provider.