feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
17 KiB
Feature Implementation Plan: Multi-Provider Local Models
This plan turns the multi-provider local-model design into a strict implementation sequence that can be executed with Orchestration. It assumes the target is not just “fix the picker,” but to make local inference work as a small fleet with stable provider identity, shared favorites, correct routing, and an honest parity story for BooCoder.
Source Specification
- Primary rollout outline: build-phase-outline.md
- Behavioral design: ../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md
- Task inventory: ../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md
- Architecture analysis: ../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md
- Research note: ../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md
- Discovery notes: artifacts/.discovery-notes.md
Outcome
When this plan is complete:
- BooChat can route local models by named provider, not by one global
LLAMA_SWAP_URL. - Favorites are shared across BooChat and native BooCoder, derived from settings instead of being baked into the server catalog (D-4).
- Duplicate model names on different local machines are safe because persisted and cached identity is
provider/model(D-2). - Native BooCoder and Arena use the same provider-aware resolver as BooChat (D-3).
- External-agent parity is real rather than implied:
opencodeonly gets multi-provider local models after a provider-preserving bridge exists (D-5, D-6). - Adding another local machine is a config change plus a smoke pass, not another architecture pass (D-7).
Working Assumptions
- The shared local-provider source of truth is
/data/llama-providers.json, exposed to both apps throughLLAMA_PROVIDERS_PATH, with legacy env fallback while the file is absent (D-1). packages/contractsowns schemas and pure helpers; app-local loader modules own file I/O and env fallback, following the existingprovider-config/provider-config-registrysplit in BooCoder (D-1).- The work ends at a completed multi-provider substrate. BooControl is a follow-on consumer, not part of this implementation batch.
Orchestration Rules
- Treat each work unit below as one mergeable branch. Do not overlap branches that touch the same shared contract files.
- Never run more than one agent at a time on
packages/contracts/src/*,apps/server/src/services/inference/provider.ts,apps/web/src/api/types.ts, orapps/coder/src/services/provider-snapshot.ts. - Inside a work unit, parallelize only disjoint file groups. Contract changes first, consumers second, tests last.
- Close each work unit with its own verification before starting the next one. Do not stack W1-W4 and debug later.
Work Unit Index
| # | Work Unit | Surface | Delivers | Depends On | Verification |
|---|---|---|---|---|---|
| 1 | Provider Registry Foundation | contracts + server + coder | Shared config schema, model-ref helpers, app-local registry loaders | — | Contracts build, server build, coder build |
| 2 | Server Catalog and Routing | server | Provider-aware /api/models and unified resolver |
W1 | server tests for routing + collision cases |
| 3 | Server Downstream Consumers | server | Context, compaction, and task-model stop assuming one endpoint | W2 | server tests for cache isolation + bare-id fallback |
| 4 | BooChat Favorites and Grouped Picker | server + web | Shared favorites and provider-grouped chat model selection | W2 | server tests + web smoke |
| 5 | Native BooCoder Parity | coder + web | Native boocode local models use composite IDs and grouped selection |
W1, W4 | coder tests + BooCoder smoke |
| 6 | Arena Parity | coder | Arena local calls and local-model classification become provider-aware | W5 | coder tests + arena smoke |
| 7 | External-Agent Parity | coder | opencode gets multi-provider local models through a real bridge |
W5 | coder tests + opencode smoke |
| 8 | Operations and Final Verification | docs + configs + smoke | Add-a-machine runbook, final matrix, ready handoff to BooControl | W7 | end-to-end smoke matrix |
Work Units
W1. Provider Registry Foundation
Goal. Make provider identity real before any routing or UI changes.
Files and seams.
packages/contracts/src/for the new local-provider schema and pure model-ref helperspackages/contracts/package.jsonexportsapps/server/src/config.tsapps/coder/src/config.ts- new app-local registry loaders under
apps/server/src/services/andapps/coder/src/services/ data/llama-providers.example.json
Implement.
- Add a new contracts subpath for local provider config, separate from the existing coder ACP provider config.
- Define the shared file shape:
defaultProviderplusproviders[]withid,label,baseUrl, optionalsidecarUrl, andkind. - Add pure helpers for
parseModelRef,formatModelRef, and legacy bare-id resolution. - Add
LLAMA_PROVIDERS_PATHto both server and coder config. - Implement server and coder registry loaders that read the shared file and synthesize one legacy provider from
LLAMA_SWAP_URLand optionalLLAMA_SIDECAR_URLwhen the file is absent. - Add a checked example config with
sam-desktopandembedding.
Parallel-safe split.
- Agent A: contracts schema + helpers + exports
- Agent B: server config + server loader after A merges
- Agent C: coder config + coder loader after A merges
Exit criteria.
- Both apps can start with only legacy env vars.
- Both apps can also start with a real
llama-providers.json. - Pure helper tests cover
provider/modeland bare fallback.
W2. Server Catalog and Routing
Goal. Replace server-side routing heuristics with one provider-aware resolver.
Files and seams.
apps/server/src/routes/models.tsapps/server/src/services/inference/provider.tsapps/server/src/types/api.tsapps/web/src/api/types.tsapps/web/src/api/client.ts- relevant provider tests
Implement.
- Refactor
/api/modelsto return provider-grouped inventory only, with everyModelInfo.idalready composite (D-4). - Build one server resolver that answers:
- provider identity
- upstream base URL
- sidecar eligibility
- final wire model id
- DeepSeek special handling
- Make both
upstreamModel()andresolveModelEndpoint()call that same resolver. - Remove the current “prefix means provider” logic as the authority; keep compatibility only at the bare-id fallback layer.
Parallel-safe split.
- First branch: resolver and tests
- Second branch:
/api/modelscontract change plus client type updates
Exit criteria.
embedding/deepseek-r1-qwen3-8broutes as localembedding, not as DeepSeek cloud.embedding/*never uses a sidecar.- Legacy bare models still resolve through the configured default provider.
W3. Server Downstream Consumers
Goal. Remove the remaining single-endpoint assumptions in server call sites.
Files and seams.
apps/server/src/services/model-context.tsapps/server/src/index.tsapps/server/src/services/compaction.tsapps/server/src/services/task-model.tsapps/server/src/services/inference/error-handler.tsapps/server/src/services/__tests__/model-context.test.ts
Implement.
- Change
model-contextto key caches by composite model id, not bare wire id. - Move context lookup from one process-wide
LLAMA_SWAP_URLassumption to the provider-aware resolver. - Update compaction to resolve the right upstream before summary calls.
- Update task-model fallback resolution to use the same parsed model ref path as inference.
- Audit remaining server
LLAMA_SWAP_URLcall sites and either migrate them or explicitly mark them legacy-only.
Parallel-safe split.
- Agent A:
model-context.ts+ tests - Agent B:
compaction.tsandtask-model.tsafter A lands, because both depend on the new resolver contract
Exit criteria.
- Two providers serving the same wire model name do not share context cache entries.
- Existing sessions with bare models still load context and complete turns.
- No server path doing local inference bypasses the shared resolver.
W4. BooChat Favorites and Grouped Picker
Goal. Stabilize the end-user selection model on BooChat before deeper coding surfaces adopt it.
Files and seams.
apps/server/src/routes/settings.tsapps/server/src/services/settings.tsor equivalent settings helper pathapps/web/src/components/ModelPicker.tsxapps/web/src/lib/model-label.tsapps/web/src/api/client.tsapps/web/src/api/types.tsapps/web/src/pages/Session.tsx
Implement.
- Add
favorite_models: string[]handling in settings. - Normalize malformed and duplicate entries on write.
- In the client, derive:
- Favorites section first
- then one section per provider
- hide unavailable favorites without deleting them
- Keep a favorited model visible in both Favorites and its provider section.
- Make new model selections write composite ids.
Parallel-safe split.
- Server settings branch first
- Web picker branch second against the new contract
Exit criteria.
- Favorites persist across refresh.
- Removing a provider from live inventory hides its favorites without deleting the stored ids.
- A new chat selection stores
provider/model.
W5. Native BooCoder Parity
Goal. Move native boocode local model usage onto the shared provider model before touching opencode.
Files and seams.
apps/coder/src/services/provider-snapshot.tsapps/coder/src/services/dispatcher.tsapps/web/src/components/AgentComposerBar.tsxapps/web/src/lib/model-label.tspackages/contracts/src/provider-snapshot.tsonly if the snapshot contract truly needs new metadata
Implement.
- Make the native
boocodeprovider expose composite local model ids from the shared registry. - Update native dispatch to resolve composite local ids through the shared registry.
- Render grouped local models for the native
boocodepath inAgentComposerBar. - If the current
opencodesnapshot path would falsely advertise multi-provider local models before W7, hide that advertising now rather than leave the UI misleading (D-5).
Parallel-safe split.
- Coder backend first
- AgentComposerBar UI second
Exit criteria.
- Native BooCoder tasks can run against at least two distinct local providers.
- The native picker behavior matches BooChat’s grouped/favorites mental model closely enough that a user is not learning a second local-model identity system.
opencodeis not yet claiming parity it does not have.
W6. Arena Parity
Goal. Make Arena consume the same local-provider substrate instead of one live llama-swap list.
Files and seams.
apps/coder/src/services/arena-model-call.tsapps/coder/src/services/arena-analyzer.tsapps/coder/src/services/arena-runner.tsapps/coder/src/index.ts- arena tests
Implement.
- Replace direct
LLAMA_SWAP_URLlocal calls with the provider-aware resolver. - Build Arena’s local-model set from the shared provider registry, not one fetched list.
- Preserve ADR-0001’s two-lane scheduling rule; provider awareness changes local identity, not lane semantics.
- Keep bare-id compatibility only where old data needs it.
Parallel-safe split.
- Agent A:
arena-model-call.ts+ analyzer updates - Agent B: local-model set construction in
index.ts+ runner adjustments after A settles the model identity contract
Exit criteria.
- Arena can run local contestants from more than one machine.
- Local-vs-cloud classification still works.
- ADR-0001 behavior remains intact.
W7. External-Agent Parity
Goal. Give opencode a real multi-provider local-model story instead of collapsing everything back to llama-swap/<model>.
Files and seams.
apps/coder/src/services/backends/opencode-server.tsapps/coder/src/services/provider-snapshot.tsapps/coder/src/services/agent-probe.ts- new BooCoder-hosted gateway route or service module under
apps/coder/src/services/ - host config generation or sync for opencode local models
Implement.
- Add a BooCoder-hosted OpenAI-compatible local gateway that accepts provider-preserving model ids and routes them to the correct local provider (D-6).
- Use one opencode-facing provider namespace such as
boocode-local, where the opencodeproviderIDis stable and themodelIDis the inner composite id likesam-desktop/qwen3.6-35b. - Update provider snapshot merging so
opencodeadvertisesboocode-local/<provider/model>rather thanllama-swap/<model>. - Update the opencode bridge parser and config sync so duplicate model names remain distinguishable end to end.
- Add smoke coverage for two providers serving the same wire model name.
Parallel-safe split.
- Gateway branch first
- Snapshot/config-sync branch second
- Final opencode backend/parser adjustments last
Exit criteria.
opencodecan target two local providers with overlapping wire model names and hit the correct machine both times.- No path rewrites
provider/modeldown to plainllama-swap/model.
W8. Operations and Final Verification
Goal. End with a repeatable operator workflow, not just a working dev branch.
Files and seams.
data/llama-providers.example.json- operator docs under
docs/ - OpenSpec tasks/status notes as needed
Implement.
- Document the add-a-machine flow for config-managed local providers.
- Document the smoke matrix for:
- single legacy provider fallback
- two local providers
- duplicate model names across two providers
- DeepSeek enabled
opencodelocal parity
- Record the final interface BooControl should consume: provider registry plus composite ids, not raw host env vars.
Exit criteria.
- A third machine can be added by editing config and running the smoke matrix.
- The implementation docs name the exact runtime contract BooControl should build on.
Verification Plan
pnpm -C packages/contracts buildpnpm -C apps/server testpnpm -C apps/server buildpnpm -C apps/coder testpnpm -C apps/coder buildnpx tsc -p apps/web/tsconfig.app.json --noEmit
Add targeted tests as the work lands:
- model-ref parse/format and bare-id fallback
- provider-aware routing and DeepSeek collision cases
- context-cache isolation for duplicate model names
- favorites hide-not-delete behavior
- provider snapshot and opencode bridge behavior
- arena local-model classification across multiple providers
Main Risks
- The W2 contract change to
/api/modelsand W5 snapshot changes can drift across apps if contract parity is edited piecemeal. Follow the cross-app contract standard in artifacts/.discovery-notes.md and land contract-first branches. - W7 is the hardest seam. If the gateway is skipped and the old string rewrite is kept, the feature will look complete in UI while still routing the wrong machine.
model-context.tsis a hidden correctness seam. If cache keys stay bare, duplicate model names will mis-share context limits and compaction behavior even after routing is fixed.
Deferred
- BooControl itself
- picker search and richer filtering
- manual favorite reordering
- host health badges in pickers
Definition of Done
- BooChat, native BooCoder, Arena, and
opencodeall support provider-aware local models end to end. - Legacy bare ids remain readable.
- Two providers can expose the same wire model name without ambiguity.
- Adding another local machine is documented and smoke-tested.
- BooControl can start later without inventing a second provider registry.