Files

indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).

2026-06-14 12:48:47 +00:00

5.6 KiB

Raw Blame History

Multi-Provider Local Models — Operator Guide

How BooCode routes local inference across multiple llama-swap machines, how to add another machine, and the smoke matrix to run after any provider change. Implementation plan: plans/multi-provider-local-models/feature-implementation-plan.md.

Runtime contract

Config authority: /data/llama-providers.json (bind-mounted; gitignored), read by both apps/server and apps/coder via LLAMA_PROVIDERS_PATH. Tracked template: data/llama-providers.example.json.
Legacy fallback: when the file is absent, both apps synthesize a single provider from LLAMA_SWAP_URL. Startup never breaks on a missing file.
Model identity: persisted and cached ids are composite provider/model (e.g. sam-desktop/qwen3.6-35b-a3b). Wire calls to upstreams always send the bare model id. Legacy bare ids resolve to defaultProvider indefinitely.
Resolver: resolveModelProvider() in apps/server/src/services/inference/provider.ts is the single routing authority for streaming, non-streaming, context lookup, compaction, and task-model fallback. The coder mirrors this via its registry loader (apps/coder/src/services/llama-providers.ts) for arena and the local gateway.
opencode bridge: the BooCoder-hosted OpenAI-compatible gateway (apps/coder/src/services/local-gateway.ts) exposes all local providers to opencode under the single namespace boocode-local; the inner modelID is the composite id (boocode-local/sam-desktop/qwen3.6-35b). No path rewrites a composite id down to llama-swap/<model>.

Add a machine

Start llama-swap on the new machine, reachable over Tailscale (e.g. http://100.x.y.z:84NN).
Edit /data/llama-providers.json: append a provider entry { "id": "<machine-slug>", "label": "<Display>", "baseUrl": "http://100.x.y.z:84NN", "kind": "llama-swap" }.
Restart consumers: docker compose restart boocode (server reads the file at startup) and sudo systemctl restart boocoder.
Verify: GET /api/models shows a new provider group; the new machine's models appear as <machine-slug>/<model> in the BooChat picker and the native BooCoder composer.
Run the smoke matrix below.

That is the whole flow — no code changes, no rebuild (config lives in the bind-mounted data/).

Smoke matrix

Run after adding/removing a provider or changing provider config:

Case	Steps	Expect
Legacy fallback	Remove/rename `llama-providers.json`, restart server	Boot OK; single provider synthesized from `LLAMA_SWAP_URL`; bare-id sessions still stream
Two local providers	File with `sam-desktop` + `embedding`; chat once on a model from each	Both stream; `GET /api/models` shows both groups with composite ids
Duplicate model names	Same wire model name on two providers; chat on each composite id	Each request hits its own machine (check llama-swap logs); context limits are not cross-shared
DeepSeek enabled	Set `DEEPSEEK_API_KEY`; pick `deepseek/<model>`; also pick `embedding/deepseek-r1-qwen3-8b`	First routes to DeepSeek cloud; second routes to local `embedding` (collision case)
Favorites	Star models from two providers, refresh, unplug one provider, refresh	Favorites persist; offline provider's favorites hidden, not deleted from settings
opencode parity	Dispatch an opencode task on `boocode-local/<provider>/<model>` for two providers sharing a wire name	Each lands on the correct machine; no `llama-swap/` collapse in opencode config or logs
Arena	Battle with contestants from two local providers	Local lane stays serial (ADR-0001); each contestant calls its own provider

Interface for BooControl (follow-on)

BooControl must consume, not reinvent:

the provider registry file /data/llama-providers.json (schema: @boocode/contracts/llama-providers, LlamaProvidersFileSchema) as the single source of provider identity;
composite provider/model ids everywhere it stores or displays model identity (parseModelRef/formatModelRef from the same contracts subpath);
GET /api/models for live inventory and favorite_models in GET/PATCH /api/settings for user preference — never raw host env vars.

Adding fleet UI = writing this file + restarting consumers; nothing else owns provider identity.

External agents

Both of Sam's coder agents get the local fleet through the gateway at coder startup, under the single provider namespace boocode-local:

opencode — opencode-config-sync.ts writes the provider (with @ai-sdk/openai-compatible + gateway baseURL + model map) into ~/.config/opencode/opencode.json.
Pi — pi-config-sync.ts writes the provider into ~/.pi/agent/models.json (other providers untouched; hand-tuned per-model contextWindow/maxTokens overrides on boocode-local entries survive re-sync).

After adding a machine, sudo systemctl restart boocoder re-syncs both.

Resilience notes

Arena's local-model set self-refreshes every 5 min (arena-local-models.ts): a provider that was down at coder startup is reclassified as local once it recovers; an unreachable provider keeps its last-known models (stale-but-local beats a wrong cloud-lane dispatch). Bare ids are contributed only by the default provider.
The gateway forwards the client's Authorization header to upstreams when present; its /v1/* routes remain unauthenticated on :9502 (repo convention: the reverse proxy owns auth).
Gateway GET /v1/models serves the live composite model list fetched from every registry provider.

5.6 KiB Raw Blame History