feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
100 lines
5.6 KiB
Markdown
100 lines
5.6 KiB
Markdown
# Multi-Provider Local Models — Operator Guide
|
|
|
|
How BooCode routes local inference across multiple llama-swap machines, how to
|
|
add another machine, and the smoke matrix to run after any provider change.
|
|
Implementation plan: [plans/multi-provider-local-models/feature-implementation-plan.md](plans/multi-provider-local-models/feature-implementation-plan.md).
|
|
|
|
## Runtime contract
|
|
|
|
- **Config authority:** `/data/llama-providers.json` (bind-mounted; gitignored),
|
|
read by both `apps/server` and `apps/coder` via `LLAMA_PROVIDERS_PATH`.
|
|
Tracked template: `data/llama-providers.example.json`.
|
|
- **Legacy fallback:** when the file is absent, both apps synthesize a single
|
|
provider from `LLAMA_SWAP_URL`. Startup never breaks on a missing file.
|
|
- **Model identity:** persisted and cached ids are composite `provider/model`
|
|
(e.g. `sam-desktop/qwen3.6-35b-a3b`). Wire calls to upstreams always send the
|
|
bare model id. Legacy bare ids resolve to `defaultProvider` indefinitely.
|
|
- **Resolver:** `resolveModelProvider()` in
|
|
`apps/server/src/services/inference/provider.ts` is the single routing
|
|
authority for streaming, non-streaming, context lookup, compaction, and
|
|
task-model fallback. The coder mirrors this via its registry loader
|
|
(`apps/coder/src/services/llama-providers.ts`) for arena and the local gateway.
|
|
- **opencode bridge:** the BooCoder-hosted OpenAI-compatible gateway
|
|
(`apps/coder/src/services/local-gateway.ts`) exposes all local providers to
|
|
opencode under the single namespace `boocode-local`; the inner modelID is the
|
|
composite id (`boocode-local/sam-desktop/qwen3.6-35b`). No path rewrites a
|
|
composite id down to `llama-swap/<model>`.
|
|
|
|
## Add a machine
|
|
|
|
1. Start llama-swap on the new machine, reachable over Tailscale
|
|
(e.g. `http://100.x.y.z:84NN`).
|
|
2. Edit `/data/llama-providers.json`: append a provider entry
|
|
`{ "id": "<machine-slug>", "label": "<Display>", "baseUrl": "http://100.x.y.z:84NN", "kind": "llama-swap" }`.
|
|
3. Restart consumers: `docker compose restart boocode` (server reads the file at
|
|
startup) and `sudo systemctl restart boocoder`.
|
|
4. Verify: `GET /api/models` shows a new provider group; the new machine's
|
|
models appear as `<machine-slug>/<model>` in the BooChat picker and the
|
|
native BooCoder composer.
|
|
5. Run the smoke matrix below.
|
|
|
|
That is the whole flow — no code changes, no rebuild (config lives in the
|
|
bind-mounted `data/`).
|
|
|
|
## Smoke matrix
|
|
|
|
Run after adding/removing a provider or changing provider config:
|
|
|
|
| Case | Steps | Expect |
|
|
|---|---|---|
|
|
| Legacy fallback | Remove/rename `llama-providers.json`, restart server | Boot OK; single provider synthesized from `LLAMA_SWAP_URL`; bare-id sessions still stream |
|
|
| Two local providers | File with `sam-desktop` + `embedding`; chat once on a model from each | Both stream; `GET /api/models` shows both groups with composite ids |
|
|
| Duplicate model names | Same wire model name on two providers; chat on each composite id | Each request hits its own machine (check llama-swap logs); context limits are not cross-shared |
|
|
| DeepSeek enabled | Set `DEEPSEEK_API_KEY`; pick `deepseek/<model>`; also pick `embedding/deepseek-r1-qwen3-8b` | First routes to DeepSeek cloud; second routes to local `embedding` (collision case) |
|
|
| Favorites | Star models from two providers, refresh, unplug one provider, refresh | Favorites persist; offline provider's favorites hidden, not deleted from settings |
|
|
| opencode parity | Dispatch an opencode task on `boocode-local/<provider>/<model>` for two providers sharing a wire name | Each lands on the correct machine; no `llama-swap/` collapse in opencode config or logs |
|
|
| Arena | Battle with contestants from two local providers | Local lane stays serial (ADR-0001); each contestant calls its own provider |
|
|
|
|
## Interface for BooControl (follow-on)
|
|
|
|
BooControl must consume, not reinvent:
|
|
|
|
- the provider registry file `/data/llama-providers.json` (schema:
|
|
`@boocode/contracts/llama-providers`, `LlamaProvidersFileSchema`) as the
|
|
single source of provider identity;
|
|
- composite `provider/model` ids everywhere it stores or displays model
|
|
identity (`parseModelRef`/`formatModelRef` from the same contracts subpath);
|
|
- `GET /api/models` for live inventory and `favorite_models` in
|
|
`GET/PATCH /api/settings` for user preference — never raw host env vars.
|
|
|
|
Adding fleet UI = writing this file + restarting consumers; nothing else owns
|
|
provider identity.
|
|
|
|
## External agents
|
|
|
|
Both of Sam's coder agents get the local fleet through the gateway at coder
|
|
startup, under the single provider namespace `boocode-local`:
|
|
|
|
- **opencode** — `opencode-config-sync.ts` writes the provider (with
|
|
`@ai-sdk/openai-compatible` + gateway `baseURL` + model map) into
|
|
`~/.config/opencode/opencode.json`.
|
|
- **Pi** — `pi-config-sync.ts` writes the provider into
|
|
`~/.pi/agent/models.json` (other providers untouched; hand-tuned per-model
|
|
`contextWindow`/`maxTokens` overrides on boocode-local entries survive
|
|
re-sync).
|
|
|
|
After adding a machine, `sudo systemctl restart boocoder` re-syncs both.
|
|
|
|
## Resilience notes
|
|
|
|
- **Arena's local-model set self-refreshes every 5 min**
|
|
(`arena-local-models.ts`): a provider that was down at coder startup is
|
|
reclassified as local once it recovers; an unreachable provider keeps its
|
|
last-known models (stale-but-local beats a wrong cloud-lane dispatch). Bare
|
|
ids are contributed only by the default provider.
|
|
- The gateway forwards the client's `Authorization` header to upstreams when
|
|
present; its `/v1/*` routes remain unauthenticated on :9502 (repo
|
|
convention: the reverse proxy owns auth).
|
|
- Gateway `GET /v1/models` serves the live composite model list fetched from
|
|
every registry provider.
|