feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
5.6 KiB
5.6 KiB
Multi-Provider Local Models — Operator Guide
How BooCode routes local inference across multiple llama-swap machines, how to add another machine, and the smoke matrix to run after any provider change. Implementation plan: plans/multi-provider-local-models/feature-implementation-plan.md.
Runtime contract
- Config authority:
/data/llama-providers.json(bind-mounted; gitignored), read by bothapps/serverandapps/coderviaLLAMA_PROVIDERS_PATH. Tracked template:data/llama-providers.example.json. - Legacy fallback: when the file is absent, both apps synthesize a single
provider from
LLAMA_SWAP_URL. Startup never breaks on a missing file. - Model identity: persisted and cached ids are composite
provider/model(e.g.sam-desktop/qwen3.6-35b-a3b). Wire calls to upstreams always send the bare model id. Legacy bare ids resolve todefaultProviderindefinitely. - Resolver:
resolveModelProvider()inapps/server/src/services/inference/provider.tsis the single routing authority for streaming, non-streaming, context lookup, compaction, and task-model fallback. The coder mirrors this via its registry loader (apps/coder/src/services/llama-providers.ts) for arena and the local gateway. - opencode bridge: the BooCoder-hosted OpenAI-compatible gateway
(
apps/coder/src/services/local-gateway.ts) exposes all local providers to opencode under the single namespaceboocode-local; the inner modelID is the composite id (boocode-local/sam-desktop/qwen3.6-35b). No path rewrites a composite id down tollama-swap/<model>.
Add a machine
- Start llama-swap on the new machine, reachable over Tailscale
(e.g.
http://100.x.y.z:84NN). - Edit
/data/llama-providers.json: append a provider entry{ "id": "<machine-slug>", "label": "<Display>", "baseUrl": "http://100.x.y.z:84NN", "kind": "llama-swap" }. - Restart consumers:
docker compose restart boocode(server reads the file at startup) andsudo systemctl restart boocoder. - Verify:
GET /api/modelsshows a new provider group; the new machine's models appear as<machine-slug>/<model>in the BooChat picker and the native BooCoder composer. - Run the smoke matrix below.
That is the whole flow — no code changes, no rebuild (config lives in the
bind-mounted data/).
Smoke matrix
Run after adding/removing a provider or changing provider config:
| Case | Steps | Expect |
|---|---|---|
| Legacy fallback | Remove/rename llama-providers.json, restart server |
Boot OK; single provider synthesized from LLAMA_SWAP_URL; bare-id sessions still stream |
| Two local providers | File with sam-desktop + embedding; chat once on a model from each |
Both stream; GET /api/models shows both groups with composite ids |
| Duplicate model names | Same wire model name on two providers; chat on each composite id | Each request hits its own machine (check llama-swap logs); context limits are not cross-shared |
| DeepSeek enabled | Set DEEPSEEK_API_KEY; pick deepseek/<model>; also pick embedding/deepseek-r1-qwen3-8b |
First routes to DeepSeek cloud; second routes to local embedding (collision case) |
| Favorites | Star models from two providers, refresh, unplug one provider, refresh | Favorites persist; offline provider's favorites hidden, not deleted from settings |
| opencode parity | Dispatch an opencode task on boocode-local/<provider>/<model> for two providers sharing a wire name |
Each lands on the correct machine; no llama-swap/ collapse in opencode config or logs |
| Arena | Battle with contestants from two local providers | Local lane stays serial (ADR-0001); each contestant calls its own provider |
Interface for BooControl (follow-on)
BooControl must consume, not reinvent:
- the provider registry file
/data/llama-providers.json(schema:@boocode/contracts/llama-providers,LlamaProvidersFileSchema) as the single source of provider identity; - composite
provider/modelids everywhere it stores or displays model identity (parseModelRef/formatModelReffrom the same contracts subpath); GET /api/modelsfor live inventory andfavorite_modelsinGET/PATCH /api/settingsfor user preference — never raw host env vars.
Adding fleet UI = writing this file + restarting consumers; nothing else owns provider identity.
External agents
Both of Sam's coder agents get the local fleet through the gateway at coder
startup, under the single provider namespace boocode-local:
- opencode —
opencode-config-sync.tswrites the provider (with@ai-sdk/openai-compatible+ gatewaybaseURL+ model map) into~/.config/opencode/opencode.json. - Pi —
pi-config-sync.tswrites the provider into~/.pi/agent/models.json(other providers untouched; hand-tuned per-modelcontextWindow/maxTokensoverrides on boocode-local entries survive re-sync).
After adding a machine, sudo systemctl restart boocoder re-syncs both.
Resilience notes
- Arena's local-model set self-refreshes every 5 min
(
arena-local-models.ts): a provider that was down at coder startup is reclassified as local once it recovers; an unreachable provider keeps its last-known models (stale-but-local beats a wrong cloud-lane dispatch). Bare ids are contributed only by the default provider. - The gateway forwards the client's
Authorizationheader to upstreams when present; its/v1/*routes remain unauthenticated on :9502 (repo convention: the reverse proxy owns auth). - Gateway
GET /v1/modelsserves the live composite model list fetched from every registry provider.