chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions

View File

@@ -0,0 +1,126 @@
# Discovery Notes: Multi-Provider Local Models
Single source of truth for implementation context. Read this first before touching the plan or code.
## Tech stack
- Monorepo with pnpm workspaces.
- `apps/server`: Fastify + Postgres, native inference, local-model routing, BooChat APIs.
- `apps/web`: React + Vite SPA, shared chat and coder UI.
- `apps/coder`: host-side BooCoder service, provider probing, native and external-agent dispatch, Arena, MCP.
- `packages/contracts`: shared cross-app schemas and types, built before consumers.
- TypeScript strict mode. Server and coder use NodeNext and `.js` import suffixes.
- Tests: `pnpm -C apps/server test`, `pnpm -C apps/coder test`. No dedicated web test harness.
## ADRs found
- `docs/adr/0001-arena-two-lane-scheduling.md`
Summary: local llama-backed contestants run serially in one lane, cloud contestants run in parallel in another lane; multi-provider work must preserve this lane model.
- `docs/adr/0002-arena-dedicated-tables-not-flow-runner.md`
Summary: Arena owns its own storage and runtime shape; reuse dispatcher machinery but do not fold Arena back into flow-runner abstractions.
## Coding standards found
- `docs/coding-standards/cross-app-contract-parity.md`
Summary: when a cross-app contract changes, update the canonical package source plus app-side secondary representations in the same batch; missing one side silently drops behavior at runtime.
- `CLAUDE.md`
Summary: `packages/contracts` is the single source for provider-snapshot and message-metadata contracts, deploy-by-surface rules matter, and contract changes must respect app-local secondary unions and renderers where they still exist.
## Relevant architecture notes
- `apps/server/CLAUDE.md`
Summary: `services/inference/provider.ts` is the current llama-swap provider seam; `model-context.ts` and `compaction.ts` currently assume one upstream.
- `apps/coder/CLAUDE.md`
Summary: provider snapshot and `opencode` integration are the main local-model seams; `llama-swap/*` is currently the local namespace assumption.
- `apps/web/CLAUDE.md`
Summary: `ModelPicker` and `AgentComposerBar` are separate UI surfaces with different constraints; any provider snapshot loading-state change can make providers disappear from the coder UI.
## Code touch points
### Shared contracts and config patterns
- `packages/contracts/src/provider-config.ts`
Existing coder ACP provider config schema; useful precedent, but not the right place to overload with local host inventory semantics.
- `apps/coder/src/services/provider-config-registry.ts`
Existing pattern for schema-in-package plus app-local load/build cache.
- `packages/contracts/src/provider-snapshot.ts`
Shared snapshot contract used by coder and web.
### Server: catalog, routing, and downstream local-model consumers
- `apps/server/src/config.ts`
Current env config includes `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and `DEFAULT_MODEL`; multi-provider config must enter here.
- `apps/server/src/routes/models.ts`
Current `/api/models` route fetches one llama-swap and optionally DeepSeek.
- `apps/server/src/services/inference/provider.ts`
Current route selection and AI SDK provider seam; central place to remove heuristic provider detection.
- `apps/server/src/services/model-context.ts`
Current context cache keys by bare model string and assumes one `LLAMA_SWAP_URL`.
- `apps/server/src/services/compaction.ts`
Uses `resolveModelEndpoint()` today, but still contains one-provider assumptions and a DeepSeek prefix special case.
- `apps/server/src/services/task-model.ts`
Returns one resolved `{url, model}` pair today.
- `apps/server/src/index.ts`
Calls `configureModelContext({ llamaSwapUrl })`; this wiring must change when context lookup becomes provider-aware.
- `apps/server/src/routes/settings.ts`
Existing shared settings persistence surface; right place for `favorite_models`.
### Web: BooChat and coder selection UI
- `apps/web/src/components/ModelPicker.tsx`
Shared BooChat model picker component; currently assumes a flat `/api/models` list.
- `apps/web/src/components/AgentComposerBar.tsx`
Native BooCoder provider/mode/model picker surface.
- `apps/web/src/lib/model-label.ts`
Display-only model prettifier used by both pickers.
- `apps/web/src/api/client.ts`
`models()` currently expects `ModelInfo[]`.
- `apps/web/src/api/types.ts`
Holds the web-side API contract for `/api/models` and other cross-app payloads.
### Coder: native, snapshot, arena, and external-agent bridge
- `apps/coder/src/config.ts`
Current coder config still exposes `LLAMA_SWAP_URL`; multi-provider config must enter here too.
- `apps/coder/src/services/provider-snapshot.ts`
Current snapshot fetches one `LLAMA_SWAP_URL`, prefixes local models as `llama-swap/*`, and merges them into `opencode`.
- `apps/coder/src/services/dispatcher.ts`
Current native and external-agent dispatch logic still assumes local bare ids or `llama-swap/*` for local routing.
- `apps/coder/src/services/backends/opencode-server.ts`
`parseModel()` splits only once at `/`; this is good news because a stable outer provider namespace can carry an inner composite model id.
- `apps/coder/src/services/arena-model-call.ts`
Direct one-shot local model call against `LLAMA_SWAP_URL`.
- `apps/coder/src/services/arena-analyzer.ts`
Local-vs-cloud checks rely on one local model set and one upstream.
- `apps/coder/src/index.ts`
Builds the local-model set for Arena from one fetched llama-swap list.
## Recent activity and churn
High-churn files in the last 90 days:
- `apps/web/src/api/types.ts`
- `apps/web/src/api/client.ts`
- `apps/server/src/index.ts`
- `apps/server/src/types/api.ts`
- `apps/coder/src/services/dispatcher.ts`
- `apps/coder/src/index.ts`
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/web/src/components/AgentComposerBar.tsx`
- `apps/server/src/services/compaction.ts`
Implication: keep work units narrow and avoid combining unrelated refactors in these files.
## Constraints and load-bearing facts
- `packages/contracts` already owns provider-snapshot types; if the snapshot contract changes, rebuild the package before touching consumers.
- `apps/web` has no dedicated test harness, so web verification will rely on typecheck plus smoke testing.
- Arenas local lane semantics are intentional; multi-provider support must not collapse local models into parallel execution.
- `opencode` local parity is not a small rename. The current host config and snapshot behavior collapse identity to one `llama-swap` namespace.
## Gaps and unknowns
- No existing shared local-provider config file or schema exists in-repo yet.
- `/api/models` shape change is not yet specified in app-local types; W2 must settle the contract before W4 starts.
- The final `opencode` gateway path is not implemented anywhere yet; W7 is net-new code, not just adaptation.
- No dedicated docs for “add a machine” exist yet; W8 must create them.

View File

@@ -0,0 +1,109 @@
# Implementation Decision Log: Multi-Provider Local Models
This file records the implementation decisions committed while planning the multi-provider local-model rollout.
Behavioral intent lives in [../feature-implementation-plan.md](../feature-implementation-plan.md) and the source
artifacts it cites. Round history lives in [implementation-iteration-history.md](implementation-iteration-history.md).
Source artifacts:
- [../build-phase-outline.md](../build-phase-outline.md)
- [../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md)
- [../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md)
- [../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md)
- [./.discovery-notes.md](./.discovery-notes.md)
### D-1: Shared local-provider config authority
- **Question:** Where does the source of truth for named local providers live, and what belongs in the shared package versus app-local loaders?
- **Decision:** Use `/data/llama-providers.json`, wired through `LLAMA_PROVIDERS_PATH`, as the shared authority for local providers. Put the schema and pure model-ref helpers in `packages/contracts`; keep file I/O and legacy env fallback in app-local registry loaders for server and coder.
- **Rationale:** This matches the existing BooCoder pattern of package-owned schemas plus app-local load/build caches, avoids duplicating config semantics, and avoids forcing Node-specific loader code into every consumer of the contracts package.
- **Evidence:** `packages/contracts/src/provider-config.ts` and `apps/coder/src/services/provider-config-registry.ts` already follow this split; the current local-provider gap is that server and coder do not share any equivalent registry.
- **Rejected alternatives:**
- Keep local providers env-only forever. Rejected because server and coder already drift and more machines would multiply the drift.
- Put file reading only in one app and make the other app consume it indirectly. Rejected because both server and coder need startup-time local-provider awareness.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, Working Assumptions, W1.
### D-2: Persist and cache composite `provider/model` ids; keep wire ids bare
- **Question:** What is the canonical identity format for local model selections and caches?
- **Decision:** Persist and cache `provider/model`. Strip the provider prefix only at the final upstream call boundary. Keep indefinite support for legacy bare ids by resolving them to `defaultProvider`.
- **Rationale:** Duplicate wire model names across machines are otherwise impossible to represent safely. This also keeps DB migrations small because the existing columns are already free-form text.
- **Evidence:** `sessions.model` and `chats.model` are stringly typed; `apps/server/src/services/model-context.ts` currently keys by bare model and would otherwise cross-poison duplicate names.
- **Rejected alternatives:**
- Keep persisted ids bare and use side metadata for provider. Rejected because many call sites already pass the model string around alone.
- Prefix wire calls too. Rejected because upstream llama-swap and DeepSeek calls want the actual provider-native model id.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W1, W2, W3.
### D-3: One provider-aware resolver shared across streaming, non-streaming, context, and Arena
- **Question:** Should each consumer keep its own endpoint logic once multiple local providers exist?
- **Decision:** No. Build one provider-aware resolver contract and make streaming inference, non-streaming calls, context lookup, compaction, task-model resolution, and Arena all go through it.
- **Rationale:** The current failure mode is duplicated routing logic with slightly different heuristics. Fixing only one path would leave subtle misroutes in the others.
- **Evidence:** `apps/server/src/services/inference/provider.ts`, `apps/server/src/services/model-context.ts`, `apps/server/src/services/compaction.ts`, `apps/server/src/services/task-model.ts`, and `apps/coder/src/services/arena-model-call.ts` all handle local-model identity separately today.
- **Rejected alternatives:**
- Only unify server inference and leave context/arena separate. Rejected because that would preserve hidden correctness bugs in context limits and Arena calls.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W2, W3, W6.
### D-4: Favorites are a settings-backed user view, not a server catalog section
- **Question:** Where should the Favorites concept live?
- **Decision:** Store `favorite_models: string[]` in settings and derive the Favorites section client-side from settings plus provider inventory. The server catalog returns providers and models only.
- **Rationale:** Inventory answers “what exists now.” Favorites answer “what this user prefers.” Keeping them separate avoids overloading the server catalog with user-specific UI state.
- **Evidence:** `settings` already exists server-side; the OpenSpec analysis already identified favorites as a user-level concern rather than an inventory concern.
- **Rejected alternatives:**
- Return a synthetic Favorites section from `/api/models`. Rejected because it entangles inventory with user preference and complicates offline/unavailable favorite behavior.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W2, W4.
### D-5: Native `boocode` parity ships before `opencode` parity
- **Question:** Should native and external-agent BooCoder paths move together?
- **Decision:** No. Native `boocode` parity is W5. `opencode` parity is W7 and does not begin until the native path is correct and the UI stops falsely advertising multi-provider local models under the old bridge.
- **Rationale:** Native `boocode` can use the shared resolver directly. `opencode` still assumes one local-provider namespace and is the riskier seam.
- **Evidence:** `apps/coder/src/services/provider-snapshot.ts` prefixes local models as `llama-swap/*`; `apps/coder/src/services/backends/opencode-server.ts` still assumes the outer provider namespace identifies the target upstream.
- **Rejected alternatives:**
- Rename everything to `provider/model` in one pass. Rejected because the external-agent bridge would still collapse identity at the last moment.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W5, W7.
### D-6: `opencode` parity uses a `boocode-local` gateway, not a string rewrite
- **Question:** What is the safe path to external-agent parity?
- **Decision:** Add a BooCoder-hosted OpenAI-compatible local gateway and present it to `opencode` as one stable provider namespace such as `boocode-local`. The inner `modelID` carries the composite local identity like `sam-desktop/qwen3.6-35b`.
- **Rationale:** `parseModel()` in the opencode backend already splits only once at `/`, which means a stable outer provider id can safely carry the inner composite local id. That preserves provider identity without teaching opencode about every machine directly.
- **Evidence:** `apps/coder/src/services/backends/opencode-server.ts` `parseModel()` returns `{ providerID, modelID }` where `modelID` may contain additional slashes; current `llama-swap/<model>` mapping is the ambiguity seam.
- **Rejected alternatives:**
- Keep rewriting `provider/model` back to `llama-swap/model`. Rejected because duplicate local model names would still route incorrectly.
- Add one direct opencode provider per local machine. Rejected because it duplicates the registry and leaks fleet structure into opencode config.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W7.
### D-7: Add-a-machine stays config-driven in this initiative
- **Question:** Does this rollout include a control-plane UI for adding local machines?
- **Decision:** No. Adding a machine stays a config-driven operation in this initiative, documented in W8. BooControl is the later UI/control-plane consumer.
- **Rationale:** The user goal is multi-provider support now, not a new admin product before the substrate exists.
- **Evidence:** BooControls own tasks call this registry work a prerequisite; current repo state has no stable local-provider substrate yet.
- **Rejected alternatives:**
- Build BooControl first. Rejected because it would either duplicate registry logic or bind to todays broken single-provider assumptions.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W8, Deferred.
### D-8: Work unit sequencing is contract-first, consumer-second, verification-third
- **Question:** How should this be broken down for Orchestration so branches do not constantly collide?
- **Decision:** Sequence every work unit as:
1. contracts and config
2. primary backend seam
3. downstream consumers
4. tests and smoke
and forbid parallel editing of the shared contract and resolver files.
- **Rationale:** The churniest files in this repo are exactly the shared contract and coordinator files. Letting multiple branches edit them in parallel is the fastest path to merge thrash and subtle drift.
- **Evidence:** Recent churn is highest in `apps/web/src/api/types.ts`, `apps/web/src/api/client.ts`, `apps/server/src/index.ts`, `apps/coder/src/services/dispatcher.ts`, and `apps/coder/src/services/provider-snapshot.ts`.
- **Rejected alternatives:**
- Split by app only. Rejected because this feature crosses contracts, server, web, and coder in nearly every phase.
- **Driven by rounds:** R1.
- **Referenced in plan:** Orchestration Rules, Work Unit Index, all work units.

View File

@@ -0,0 +1,38 @@
# Implementation Iteration History: Multi-Provider Local Models
This file records how the implementation plan was assembled from the existing research, OpenSpec docs, and codebase review.
Committed decisions live in [implementation-decision-log.md](implementation-decision-log.md). The primary plan lives in
[../feature-implementation-plan.md](../feature-implementation-plan.md).
## R1: Coordinator pass grounded in source docs and local code review
- **Specialists engaged:** coordinator-only pass using the existing research note, OpenSpec design/tasks, implementation analysis, root and app `CLAUDE.md` files, ADRs, coding standard, and targeted code search. No separate specialist tool round was run in this repo pass.
- **New input provided:** [../build-phase-outline.md](../build-phase-outline.md), [./.discovery-notes.md](./.discovery-notes.md), the OpenSpec batch, and the current code seams in server, web, and coder.
- **Claim ledger:**
| # | Claim | State | Spec-maturity |
|---|---|---|---|
| C1 | There is no single source of truth for local providers shared by server and coder | Evidenced | plan-level |
| C2 | Composite `provider/model` ids are required for duplicate model names across hosts | Evidenced | plan-level |
| C3 | Routing logic is duplicated across streaming, non-streaming, context, compaction, task-model, and Arena | Evidenced | plan-level |
| C4 | Favorites belong in settings plus client derivation, not in the server catalog | Evidenced | plan-level |
| C5 | Native BooCoder can adopt the shared resolver before `opencode` can | Evidenced | plan-level |
| C6 | The current `opencode` bridge collapses local identity and needs a provider-preserving gateway | Evidenced | plan-level |
| C7 | Arena is a separate local-model consumer and must be planned explicitly | Evidenced | plan-level |
| C8 | BooControl depends on this substrate and should not be built first | Evidenced | plan-level |
- **Open Questions raised:**
- OQ-1: shared local-provider authority format and location
Resolution: D-1, `/data/llama-providers.json` plus `LLAMA_PROVIDERS_PATH`
- OQ-2: canonical local model identity format
Resolution: D-2, composite `provider/model`
- OQ-3: how to achieve external-agent parity honestly
Resolution: D-6, `boocode-local` gateway
- OQ-4: whether add-a-machine is UI-driven in this batch
Resolution: D-7, no, keep config-driven
- **Spec-maturity tags:** all findings were plan-level. No spec-stage reopening was required because the earlier research and OpenSpec docs already settled the behavior.
- **Resolution source:** evidence from source docs plus current code inspection.
- **Decisions produced:** D-1, D-2, D-3, D-4, D-5, D-6, D-7, D-8.
- **Changed in plan:** initial authoring of `feature-implementation-plan.md` and its three supporting artifacts.
- **Next-step recommendation:** go to synthesis. The work is ready to execute as W1 through W8 in order, with W7 as the main hard seam and W8 as the operational closeout.

View File

@@ -0,0 +1,390 @@
---
title: "Multi-Provider Local Models — Build Phase Outline"
source_artifact: "Multiple sources: docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md; openspec/changes/multi-llama-swap-providers-model-favorites/design.md; openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md"
audience: "mixed"
generated: "2026-06-10"
generated_by: "han.core:plan-a-phased-build"
---
# Multi-Provider Local Models — Build Phase Outline
This document describes the order in which multi-provider local model support will be built. The work is broken into a sequence of phases, where each phase is a thin end-to-end deliverable that can be demonstrated to a real person, and each phase builds on the one before it. The goal is to let BooCode work cleanly with more than one local model machine today and make it straightforward to add more local machines later.
This outline is built from three sources taken together: the research note that identified the routing and identity problems, the OpenSpec batch that defines the intended behavior, and the implementation analysis that tightened the architecture around the harder integration seams. The source material describes what exists today, what the target behavior is, and where the hidden risks are. This document describes the order in which the work should be built so the system reaches that target in a controlled way.
## Table of Contents
- [Executive Summary](#executive-summary)
- [Build Phase Index](#build-phase-index)
- [How This Rollout Differs from the First Draft](#departures)
- [Phase Kinds](#phase-kinds)
- [Build Phases](#build-phases)
- [Phase 1: Named Provider Inventory](#phase-1)
- [Phase 2: Multi-Provider BooChat](#phase-2)
- [Phase 3: Shared Favorites and Grouped Selection](#phase-3)
- [Phase 4: Native BooCoder Parity](#phase-4)
- [Phase 5: Multi-Provider Arena](#phase-5)
- [Phase 6: External-Agent Parity](#phase-6)
- [Phase 7: Add-a-Machine Operations](#phase-7)
- [Phase 8 (Deferred): BooControl Fleet Layer](#phase-8)
- [Open Questions](#open-questions)
---
## Executive Summary {#executive-summary}
**The goal:** BooCode should treat local inference as a small fleet instead of a single machine. A user should be able to choose models from multiple local providers, keep favorites across BooChat and BooCoder, run coding and arena workflows against the intended provider, and add another local machine later without reopening the core design.
**The shape of the build:**
- The rollout starts by making provider identity real and visible before any routing changes are hidden behind it.
- BooChat gets multi-provider conversations before the broader coding surfaces, so the first live slice proves the model identity and routing rules end to end.
- Shared favorites and grouped pickers land before the coding parity work so the selection experience stabilizes once and is then reused.
- Native BooCoder and Arena adopt the same provider rules before the harder external-agent bridge is attempted.
- The final live phase turns “two machines supported” into “more machines are routine,” so the work ends in an operationally repeatable state instead of a one-off fix.
**Sequencing rationale, in plain language:**
The order starts with the smallest user-visible slice that proves the new mental model: named providers and distinct model identities. Once that exists, BooChat can safely route real conversations across providers and expose any mistakes early. Only after model identity, routing, and favorites are stable does it make sense to move deeper coding surfaces over, because those surfaces are less forgiving and have more hidden assumptions. The external-agent bridge comes late because it is the one place where a simple rename would look correct but still route the wrong machine.
**Departures from the source artifact:**
- Favorites are treated as a user-level view derived from shared settings, not as a built-in section of the servers model inventory.
- Native BooCoder parity comes before external-agent parity, because the external-agent path needs its own provider-preserving bridge.
**Phases deliberately deferred:**
BooControl is listed as a deferred final phase because it depends on this registry and identity work but does not need to exist for the multi-provider rollout itself to be complete. Search, richer filtering, and other picker refinements are also intentionally left out of the live phase sequence unless real usage proves they are needed.
**Where to look next:** The [Build Phase Index](#build-phase-index) lists every phase in order. The [departures section](#departures) names the two decisions that shape the rest of the plan. Detailed write-ups follow under [Build Phases](#build-phases). Decisions the team must resolve before phase 1 can start are at [Open Questions](#open-questions).
---
## Build Phase Index {#build-phase-index}
| # | Phase | Kind | Outcome (one sentence) |
|---|---|---|---|
| 1 | [Named Provider Inventory](#phase-1) | Foundation | BooCode can see distinct local providers and distinct model identities. |
| 2 | [Multi-Provider BooChat](#phase-2) | Feature slice | A chat can run on the intended local provider without misrouting. |
| 3 | [Shared Favorites and Grouped Selection](#phase-3) | Feature slice | Favorites persist once and appear consistently across both chat surfaces. |
| 4 | [Native BooCoder Parity](#phase-4) | Feature slice | Native coding tasks can use the same multi-provider local model pool. |
| 5 | [Multi-Provider Arena](#phase-5) | Feature slice | Arena can compare local models from more than one machine correctly. |
| 6 | [External-Agent Parity](#phase-6) | Feature slice | External coding providers can target local machines without losing provider identity. |
| 7 | [Add-a-Machine Operations](#phase-7) | Polish | Adding another local machine becomes a routine configuration change. |
| 8 | [BooControl Fleet Layer (deferred)](#phase-8) | Deferred | A fleet cockpit can build on the finished provider registry later. |
> Numbers are assigned in build order and are stable for the life of this outline. Cite them as `Phase N` in tickets, comments, and follow-up reports.
---
## How This Rollout Differs from the First Draft {#departures}
The rollout deliberately departs from the first pass of the design in the ways named below. Each departure is summarized once here so the phase write-ups can refer to it by name.
### 1. Favorites are a shared user preference, not part of the provider inventory
The first draft treated favorites as if they belonged inside the model catalog itself. The rollout instead treats them as a shared user preference layered on top of provider inventory. This matters because provider inventory answers “what exists right now,” while favorites answer “what this user prefers across devices and surfaces.”
### 2. External-agent support is a late seam, not part of the first local-model cut
The first draft grouped native and external-agent parity together too early. The rollout separates them because native surfaces can use the new provider resolver directly, while the external-agent path still assumes one local provider behind the scenes. That path needs a real bridge, not a string rewrite.
---
## Phase Kinds {#phase-kinds}
- **Foundation** — A capability that does not yet deliver the full user outcome, but is required for later phases. It must still be demonstrable on its own.
- **Feature slice** — A thin end-to-end strip of new behavior that a real user can experience.
- **Polish** — Refinement, resilience, or operational quality-of-life work that enriches a working core.
- **Deferred** — Listed for traceability; not built in the current plan.
---
## Build Phases {#build-phases}
### Phase 1: Named Provider Inventory {#phase-1}
**Kind.** Foundation.
**Builds on.** Nothing — this is the starting phase.
**What we build.** BooCode learns that “local models” are not one undifferentiated pool. The system gains a shared named-provider list, a stable way to name a selected model as “provider plus model,” a default-provider fallback for old data, and a provider-aware inventory view that can show which models belong to which machine.
**Why this is Phase 1.** No later phase is safe until provider identity exists as a first-class concept. This phase is still demonstrable on its own because a person can see two named local providers with their own model groups and confirm that existing sessions still resolve instead of breaking.
**Outcome to demonstrate.**
1. Start BooCode with two named local providers configured.
2. Open the model selection view and see separate groups for each provider.
3. Open an older session that still stores a legacy bare model value.
4. Confirm the older session still resolves to a usable default instead of failing.
**Source citations.**
- [Research — Recommendation](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Research — What exists today](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#what-exists-today-codebase--current-state-anchor)
- [Implementation analysis — Shared local-provider registry](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#1-shared-local-provider-registry)
**Connects to.**
- Creates the identity rules used by [Phase 2](#phase-2), [Phase 4](#phase-4), and [Phase 5](#phase-5).
- Establishes the provider list that [Phase 7](#phase-7) will operationalize for future machines.
**Preconditions to verify before starting.**
- Confirm the shared provider list lives in one new shared location rather than being split between separate app-specific settings.
- Confirm which provider is the long-term default when legacy bare model values are encountered.
---
### Phase 2: Multi-Provider BooChat {#phase-2}
**Kind.** Feature slice.
**Builds on.** Phase 1, where provider identity and fallback rules are established.
**What we build.** BooChat becomes the first live end-to-end consumer of multiple local providers. A person can choose a model from any configured provider, send a message, and trust that the response came from the intended machine. The same phase also fixes the two current routing hazards: models that happen to share a cloud-provider prefix in their name, and models that should never be sent through the sidecar path.
**Why this is Phase 2.** BooChat is the fastest way to prove the provider resolver against real behavior. It surfaces routing mistakes immediately, but it is still simpler and easier to inspect than the coding surfaces that layer more state and backend behavior on top.
**Outcome to demonstrate.**
1. Open a chat and choose a model from the first local provider.
2. Send a prompt and get a response.
3. Switch to a model from the second local provider and send the same prompt.
4. Confirm both responses arrive successfully and the second provider does not get routed through the wrong path.
5. Run a model whose name resembles a cloud model name and confirm it still uses the intended local provider.
**Source citations.**
- [Research — Recommendation constraints](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Research — Does embedding need a llama-sidecar? No.](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#does-embedding-need-a-llama-sidecar-no)
- [OpenSpec design — Server changes](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md#5-server-changes)
**Connects to.**
- Supplies the stable routing behavior reused in [Phase 3](#phase-3), [Phase 4](#phase-4), and [Phase 5](#phase-5).
- Proves the provider resolver before the coding flows depend on it.
**Preconditions to verify before starting.**
- Confirm the desired provider order for the user-facing list.
- Confirm the cloud-backed model group stays visibly separate from local machine groups.
---
### Phase 3: Shared Favorites and Grouped Selection {#phase-3}
**Kind.** Feature slice.
**Builds on.** Phase 1 for provider identity and Phase 2 for live multi-provider chat behavior.
**What we build.** Model selection becomes a stable, shared experience instead of a one-off list. A person can favorite models, see favorites first, still browse by provider below, and have the same favorite set follow them across chat surfaces. If a provider is temporarily unavailable, its favorites disappear from the visible list without being lost.
**Why this is Phase 3.** Once the routing rules are real, the next highest-value step is to make selection usable. Doing this before the deeper coding surfaces avoids building two different model-selection experiences and then reconciling them later.
**Outcome to demonstrate.**
1. Favorite one model from each local provider.
2. Refresh and confirm both favorites appear at the top while still remaining in their provider groups.
3. Open the other chat surface and confirm the same favorites appear there too.
4. Temporarily remove one provider from the live inventory.
5. Confirm its favorite disappears from view without being deleted, then returns when the provider comes back.
**Source citations.**
- [Research — Dropdown + favorites prior art](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#dropdown--favorites-prior-art-web)
- [Research — Favorites persistence](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#sub-decision--favorites-persistence)
- [Implementation analysis — Provider-aware catalog, client-derived favorites](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#3-provider-aware-catalog-client-derived-favorites)
**Connects to.**
- Provides the selection behavior reused by [Phase 4](#phase-4).
- Stabilizes the shared user preference model before the broader fleet tooling in [Phase 7](#phase-7).
**Preconditions to verify before starting.**
- Confirm favorites are shared for the single user across devices rather than stored per browser.
- Confirm insertion order is enough for the first favorite list and manual reordering can wait.
---
### Phase 4: Native BooCoder Parity {#phase-4}
**Kind.** Feature slice.
**Builds on.** Phase 1 for provider identity, Phase 2 for routing behavior, and Phase 3 for the grouped selection experience.
**What we build.** The native coding path in BooCoder gains the same local model pool as BooChat. A person can choose a local model from any configured provider for native coding work and trust that the coding session is using the selected provider instead of collapsing everything back to one machine.
**Why this is Phase 4.** The native coding path can use the shared provider resolver directly, so it is the safest BooCoder slice to move next. Shipping it before the external-agent bridge delivers real user value while avoiding the hardest integration seam for one more phase.
**Outcome to demonstrate.**
1. Open the native coding experience.
2. Choose a local model from the first provider and run a coding task.
3. Start a second coding task using a model from the second provider.
4. Confirm both tasks run successfully using the intended provider-specific model choice.
**Source citations.**
- [Research — Recommendation constraints](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Implementation analysis — Treat native and external-agent paths differently](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#4-treat-boocoder-native-and-boocoder-external-agent-paths-differently)
- [OpenSpec design — BooCoder integration](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md#7-boocoder-integration)
**Connects to.**
- Establishes the stable native coding baseline before [Phase 6](#phase-6) tackles external-agent parity.
- Shares its provider list and identity rules with [Phase 5](#phase-5).
**Preconditions to verify before starting.**
- Confirm the native coding path is the required BooCoder target for the first live parity slice.
- Confirm the same grouped-selection experience should be preserved in the coding surface without new selection concepts.
---
### Phase 5: Multi-Provider Arena {#phase-5}
**Kind.** Feature slice.
**Builds on.** Phase 1 for provider identity and Phase 2 for provider-aware local routing.
**What we build.** Arena stops treating “local” as one machine and instead treats it as a set of named providers. A person can run local comparisons across models from different machines and get correct routing and fair local classification instead of silent misclassification.
**Why this is Phase 5.** Arena benefits from the same resolver as chat and coding, but it is a separate consumer with its own local-versus-cloud logic. It belongs after the shared routing behavior is proven, but before the harder external-agent bridge so the local evaluation surface is complete early.
**Outcome to demonstrate.**
1. Start an arena comparison using one local model from the first machine and one from the second.
2. Run the comparison to completion.
3. Confirm both contenders are treated as local candidates rather than being collapsed into one generic local lane.
4. Confirm the results still make sense when one contender uses a provider-specific route such as the sidecar-backed machine.
**Source citations.**
- [Research — Recommendation constraints](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Implementation analysis — Arena is a separate local-model consumer](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#f-006--arena-is-a-separate-local-model-consumer-not-just-another-caller)
**Connects to.**
- Reuses the same provider resolver established earlier.
- Supplies the local evaluation surface that [Phase 7](#phase-7) will harden for future machines.
**Preconditions to verify before starting.**
- Confirm that the intended outcome is correct provider-aware behavior, not yet a richer benchmarking or reporting layer.
- Confirm that local fairness rules should still treat all named local providers as part of the local class rather than introducing provider-specific scheduling policy in this phase.
---
### Phase 6: External-Agent Parity {#phase-6}
**Kind.** Feature slice.
**Builds on.** Phases 1 through 5, because this phase depends on the final provider model being stable before it is bridged outward.
**What we build.** External coding providers gain access to the same multi-provider local fleet without losing provider identity. The user-visible outcome is simple: a local model chosen for an external coding workflow still hits the intended machine even when another machine serves a model with the same name.
**Why this is Phase 6.** This is the most failure-prone seam in the entire rollout. Shipping it earlier would make the system look complete while still hiding ambiguous routing behind the scenes. By the time this phase starts, the provider model, picker behavior, and native local routing rules are already stable.
**Outcome to demonstrate.**
1. Open an external coding workflow that can use a local model.
2. Choose a model name that also exists on another local machine.
3. Run the task and confirm the request still reaches the intended provider instead of whichever machine happens to share the name.
4. Repeat with a different local provider and confirm the same behavior.
**Source citations.**
- [Research — Validation V1 and V9](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#validation)
- [Implementation analysis — No safe path for opencode local-model parity](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#g-005--no-safe-path-for-opencode-local-model-parity)
- [Implementation analysis — Preferred parity path for opencode](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#5-preferred-parity-path-for-opencode-a-boocoder-hosted-local-model-gateway)
**Connects to.**
- Completes the coding-side multi-provider story started in [Phase 4](#phase-4).
- Creates the provider bridge that keeps future machines safe in [Phase 7](#phase-7).
**Preconditions to verify before starting.**
- Confirm whether this phase will include a provider-preserving gateway or be split into a follow-up initiative.
- Confirm external-agent parity is required for the same milestone as native parity rather than being a later enhancement.
---
### Phase 7: Add-a-Machine Operations {#phase-7}
**Kind.** Polish.
**Builds on.** Phases 1 through 6, where the provider model and all major consumers are already in place.
**What we build.** The rollout stops being “support two machines” and becomes “support a growing local fleet.” A person can add another local machine by following a repeatable operational path, see it appear in inventory, and trust that chat, coding, and arena all treat it as just another named provider instead of a custom exception.
**Why this is Phase 7.** The architecture can claim success only when adding another machine is routine rather than bespoke. This phase comes late because it is about making the completed system repeatable and low-friction, not about proving the original two-machine behavior.
**Outcome to demonstrate.**
1. Add a third local provider using the documented provider path.
2. Restart or refresh the system.
3. See the new machine appear in the provider inventory with its own model group.
4. Use one model from the new machine in chat, one in coding, and one in arena.
5. Confirm all three surfaces recognize the new machine without custom code changes.
**Source citations.**
- [Research — Recommendation](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Implementation analysis — Recommended sequence](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#recommended-sequence)
- [Implementation analysis — Shared local-provider registry](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#1-shared-local-provider-registry)
**Connects to.**
- Turns the whole earlier rollout into an operationally repeatable capability.
- Provides the stable registry that the deferred fleet layer in [Phase 8](#phase-8) can consume later.
**Preconditions to verify before starting.**
- Confirm configuration-based provider management is acceptable for the first operational pass and a full management interface is not required yet.
- Confirm the success bar is “no code changes required to add the machine,” not “all provider administration happens inside the product.”
---
### Phase 8 (Deferred): BooControl Fleet Layer {#phase-8}
**Kind.** Deferred.
**Builds on.** Phases 1 through 7, because it consumes the finished provider registry and the settled provider names.
**What we build.** A dedicated fleet-control and observability layer that can show the state of multiple local model providers, collect live information across them, and eventually make routing and benchmarking easier to understand.
**Why this is deferred.** BooControl depends on the provider registry, but the registry does not depend on BooControl. Building the control layer earlier would either duplicate the provider model or force BooControl to sit on top of assumptions that this rollout is specifically trying to remove.
**Reopen when.** Reopen this phase once multi-provider chat, coding, arena, and add-a-machine operations are already stable and there is enough day-to-day fleet activity to justify a dedicated control surface.
**Outcome to demonstrate (when or if built).**
1. Open the fleet view.
2. See every named local provider in one place.
3. Inspect live state or history without having to visit each machine separately.
**Source citations.**
- [BooControl tasks — prerequisite note](../../../openspec/changes/boocontrol/tasks.md#p0--prerequisite-separate-batch-multi-llama-swap-provider-registry)
- [BooControl proposal — prerequisite note](../../../openspec/changes/boocontrol/proposal.md#why)
---
## Open Questions {#open-questions}
### OQ-1. Where should the shared provider list live, and who owns it? {#oq-1}
**Blocks phase(s).** Phase 1.
The first phase cannot start until there is one agreed source of truth for named local providers. If that decision stays split, every later phase inherits the split.
- **Option A — a new shared provider list used by both apps.** One place defines provider names, addresses, and any provider-specific routing attributes. This keeps the local fleet model unified.
- **Option B — keep the existing separate settings and derive one view from the other.** This lowers the immediate change but keeps the long-term drift risk alive.
- **Recommendation: Option A.** The whole point of the rollout is to make provider identity shared and durable. Keeping two authorities would repeat the same problem in a new shape.
### OQ-2. Does this initiative include external-agent parity, or does it stop after native parity? {#oq-2}
**Blocks phase(s).** Phase 6.
The rollout can reach a useful and honest midpoint after native parity, but it cannot claim full multi-provider coding parity until the external-agent path is solved too.
- **Option A — include external-agent parity in this initiative.** This produces a complete end state, but it requires a dedicated provider-preserving bridge.
- **Option B — stop after native parity and split the external-agent work into a follow-up.** This shortens the first initiative, but the end state remains intentionally incomplete.
- **Recommendation: Option A if the bridge is accepted; otherwise Option B.** If the team is willing to build the bridge properly, finishing the job now avoids a misleading halfway state. If not, native parity should ship honestly as a bounded milestone and the rest should be split explicitly.
### OQ-3. Is a product-based provider management screen required now, or is configuration-based rollout enough? {#oq-3}
**Blocks phase(s).** Phase 7.
The final live phase is about making more machines routine to add. The open question is whether “routine” means “edit the provider list and restart” or whether it already means “manage providers inside the product.”
- **Option A — configuration-based rollout first.** A trusted operator adds machines through the shared provider list and validates them using the product.
- **Option B — product-based management in the same initiative.** Provider administration becomes part of the product immediately.
- **Recommendation: Option A.** The current initiative is about correct provider identity and repeatable multi-provider behavior. A full management screen adds another feature layer before the provider model has had time to prove itself.
### Carry-over notes
- Search, tag filtering, and richer picker controls are intentionally not blockers for the main rollout.
- Full fleet control, reporting, and advanced routing policy stay deferred until the provider model is already stable in daily use.

View File

@@ -0,0 +1,345 @@
# Feature Implementation Plan: Multi-Provider Local Models
This plan turns the multi-provider local-model design into a strict implementation sequence that can be executed with Orchestration. It assumes the target is not just “fix the picker,” but to make local inference work as a small fleet with stable provider identity, shared favorites, correct routing, and an honest parity story for BooCoder.
## Source Specification
- Primary rollout outline: [build-phase-outline.md](build-phase-outline.md)
- Behavioral design: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md)
- Task inventory: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md)
- Architecture analysis: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md)
- Research note: [../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md)
- Discovery notes: [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md)
## Outcome
When this plan is complete:
- BooChat can route local models by named provider, not by one global `LLAMA_SWAP_URL`.
- Favorites are shared across BooChat and native BooCoder, derived from settings instead of being baked into the server catalog ([D-4](artifacts/implementation-decision-log.md#d-4-favorites-are-a-settings-backed-user-view-not-a-server-catalog-section)).
- Duplicate model names on different local machines are safe because persisted and cached identity is `provider/model` ([D-2](artifacts/implementation-decision-log.md#d-2-persist-and-cache-composite-providermodel-ids-keep-wire-ids-bare)).
- Native BooCoder and Arena use the same provider-aware resolver as BooChat ([D-3](artifacts/implementation-decision-log.md#d-3-one-provider-aware-resolver-shared-across-streaming-non-streaming-context-and-arena)).
- External-agent parity is real rather than implied: `opencode` only gets multi-provider local models after a provider-preserving bridge exists ([D-5](artifacts/implementation-decision-log.md#d-5-native-boocode-parity-ships-before-opencode-parity), [D-6](artifacts/implementation-decision-log.md#d-6-opencode-parity-uses-a-boocode-local-gateway-not-a-string-rewrite)).
- Adding another local machine is a config change plus a smoke pass, not another architecture pass ([D-7](artifacts/implementation-decision-log.md#d-7-add-a-machine-stays-config-driven-in-this-initiative)).
## Working Assumptions
- The shared local-provider source of truth is `/data/llama-providers.json`, exposed to both apps through `LLAMA_PROVIDERS_PATH`, with legacy env fallback while the file is absent ([D-1](artifacts/implementation-decision-log.md#d-1-shared-local-provider-config-authority)).
- `packages/contracts` owns schemas and pure helpers; app-local loader modules own file I/O and env fallback, following the existing `provider-config` / `provider-config-registry` split in BooCoder ([D-1](artifacts/implementation-decision-log.md#d-1-shared-local-provider-config-authority)).
- The work ends at a completed multi-provider substrate. BooControl is a follow-on consumer, not part of this implementation batch.
## Orchestration Rules
- Treat each work unit below as one mergeable branch. Do not overlap branches that touch the same shared contract files.
- Never run more than one agent at a time on `packages/contracts/src/*`, `apps/server/src/services/inference/provider.ts`, `apps/web/src/api/types.ts`, or `apps/coder/src/services/provider-snapshot.ts`.
- Inside a work unit, parallelize only disjoint file groups. Contract changes first, consumers second, tests last.
- Close each work unit with its own verification before starting the next one. Do not stack W1-W4 and debug later.
## Work Unit Index
| # | Work Unit | Surface | Delivers | Depends On | Verification |
|---|---|---|---|---|---|
| 1 | Provider Registry Foundation | contracts + server + coder | Shared config schema, model-ref helpers, app-local registry loaders | — | Contracts build, server build, coder build |
| 2 | Server Catalog and Routing | server | Provider-aware `/api/models` and unified resolver | W1 | server tests for routing + collision cases |
| 3 | Server Downstream Consumers | server | Context, compaction, and task-model stop assuming one endpoint | W2 | server tests for cache isolation + bare-id fallback |
| 4 | BooChat Favorites and Grouped Picker | server + web | Shared favorites and provider-grouped chat model selection | W2 | server tests + web smoke |
| 5 | Native BooCoder Parity | coder + web | Native `boocode` local models use composite IDs and grouped selection | W1, W4 | coder tests + BooCoder smoke |
| 6 | Arena Parity | coder | Arena local calls and local-model classification become provider-aware | W5 | coder tests + arena smoke |
| 7 | External-Agent Parity | coder | `opencode` gets multi-provider local models through a real bridge | W5 | coder tests + opencode smoke |
| 8 | Operations and Final Verification | docs + configs + smoke | Add-a-machine runbook, final matrix, ready handoff to BooControl | W7 | end-to-end smoke matrix |
## Work Units
### W1. Provider Registry Foundation
**Goal.** Make provider identity real before any routing or UI changes.
**Files and seams.**
- `packages/contracts/src/` for the new local-provider schema and pure model-ref helpers
- `packages/contracts/package.json` exports
- `apps/server/src/config.ts`
- `apps/coder/src/config.ts`
- new app-local registry loaders under `apps/server/src/services/` and `apps/coder/src/services/`
- `data/llama-providers.example.json`
**Implement.**
1. Add a new contracts subpath for local provider config, separate from the existing coder ACP provider config.
2. Define the shared file shape: `defaultProvider` plus `providers[]` with `id`, `label`, `baseUrl`, optional `sidecarUrl`, and `kind`.
3. Add pure helpers for `parseModelRef`, `formatModelRef`, and legacy bare-id resolution.
4. Add `LLAMA_PROVIDERS_PATH` to both server and coder config.
5. Implement server and coder registry loaders that read the shared file and synthesize one legacy provider from `LLAMA_SWAP_URL` and optional `LLAMA_SIDECAR_URL` when the file is absent.
6. Add a checked example config with `sam-desktop` and `embedding`.
**Parallel-safe split.**
- Agent A: contracts schema + helpers + exports
- Agent B: server config + server loader after A merges
- Agent C: coder config + coder loader after A merges
**Exit criteria.**
- Both apps can start with only legacy env vars.
- Both apps can also start with a real `llama-providers.json`.
- Pure helper tests cover `provider/model` and bare fallback.
### W2. Server Catalog and Routing
**Goal.** Replace server-side routing heuristics with one provider-aware resolver.
**Files and seams.**
- `apps/server/src/routes/models.ts`
- `apps/server/src/services/inference/provider.ts`
- `apps/server/src/types/api.ts`
- `apps/web/src/api/types.ts`
- `apps/web/src/api/client.ts`
- relevant provider tests
**Implement.**
1. Refactor `/api/models` to return provider-grouped inventory only, with every `ModelInfo.id` already composite ([D-4](artifacts/implementation-decision-log.md#d-4-favorites-are-a-settings-backed-user-view-not-a-server-catalog-section)).
2. Build one server resolver that answers:
- provider identity
- upstream base URL
- sidecar eligibility
- final wire model id
- DeepSeek special handling
3. Make both `upstreamModel()` and `resolveModelEndpoint()` call that same resolver.
4. Remove the current “prefix means provider” logic as the authority; keep compatibility only at the bare-id fallback layer.
**Parallel-safe split.**
- First branch: resolver and tests
- Second branch: `/api/models` contract change plus client type updates
**Exit criteria.**
- `embedding/deepseek-r1-qwen3-8b` routes as local `embedding`, not as DeepSeek cloud.
- `embedding/*` never uses a sidecar.
- Legacy bare models still resolve through the configured default provider.
### W3. Server Downstream Consumers
**Goal.** Remove the remaining single-endpoint assumptions in server call sites.
**Files and seams.**
- `apps/server/src/services/model-context.ts`
- `apps/server/src/index.ts`
- `apps/server/src/services/compaction.ts`
- `apps/server/src/services/task-model.ts`
- `apps/server/src/services/inference/error-handler.ts`
- `apps/server/src/services/__tests__/model-context.test.ts`
**Implement.**
1. Change `model-context` to key caches by composite model id, not bare wire id.
2. Move context lookup from one process-wide `LLAMA_SWAP_URL` assumption to the provider-aware resolver.
3. Update compaction to resolve the right upstream before summary calls.
4. Update task-model fallback resolution to use the same parsed model ref path as inference.
5. Audit remaining server `LLAMA_SWAP_URL` call sites and either migrate them or explicitly mark them legacy-only.
**Parallel-safe split.**
- Agent A: `model-context.ts` + tests
- Agent B: `compaction.ts` and `task-model.ts` after A lands, because both depend on the new resolver contract
**Exit criteria.**
- Two providers serving the same wire model name do not share context cache entries.
- Existing sessions with bare models still load context and complete turns.
- No server path doing local inference bypasses the shared resolver.
### W4. BooChat Favorites and Grouped Picker
**Goal.** Stabilize the end-user selection model on BooChat before deeper coding surfaces adopt it.
**Files and seams.**
- `apps/server/src/routes/settings.ts`
- `apps/server/src/services/settings.ts` or equivalent settings helper path
- `apps/web/src/components/ModelPicker.tsx`
- `apps/web/src/lib/model-label.ts`
- `apps/web/src/api/client.ts`
- `apps/web/src/api/types.ts`
- `apps/web/src/pages/Session.tsx`
**Implement.**
1. Add `favorite_models: string[]` handling in settings.
2. Normalize malformed and duplicate entries on write.
3. In the client, derive:
- Favorites section first
- then one section per provider
- hide unavailable favorites without deleting them
4. Keep a favorited model visible in both Favorites and its provider section.
5. Make new model selections write composite ids.
**Parallel-safe split.**
- Server settings branch first
- Web picker branch second against the new contract
**Exit criteria.**
- Favorites persist across refresh.
- Removing a provider from live inventory hides its favorites without deleting the stored ids.
- A new chat selection stores `provider/model`.
### W5. Native BooCoder Parity
**Goal.** Move native `boocode` local model usage onto the shared provider model before touching `opencode`.
**Files and seams.**
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/coder/src/services/dispatcher.ts`
- `apps/web/src/components/AgentComposerBar.tsx`
- `apps/web/src/lib/model-label.ts`
- `packages/contracts/src/provider-snapshot.ts` only if the snapshot contract truly needs new metadata
**Implement.**
1. Make the native `boocode` provider expose composite local model ids from the shared registry.
2. Update native dispatch to resolve composite local ids through the shared registry.
3. Render grouped local models for the native `boocode` path in `AgentComposerBar`.
4. If the current `opencode` snapshot path would falsely advertise multi-provider local models before W7, hide that advertising now rather than leave the UI misleading ([D-5](artifacts/implementation-decision-log.md#d-5-native-boocode-parity-ships-before-opencode-parity)).
**Parallel-safe split.**
- Coder backend first
- AgentComposerBar UI second
**Exit criteria.**
- Native BooCoder tasks can run against at least two distinct local providers.
- The native picker behavior matches BooChats grouped/favorites mental model closely enough that a user is not learning a second local-model identity system.
- `opencode` is not yet claiming parity it does not have.
### W6. Arena Parity
**Goal.** Make Arena consume the same local-provider substrate instead of one live llama-swap list.
**Files and seams.**
- `apps/coder/src/services/arena-model-call.ts`
- `apps/coder/src/services/arena-analyzer.ts`
- `apps/coder/src/services/arena-runner.ts`
- `apps/coder/src/index.ts`
- arena tests
**Implement.**
1. Replace direct `LLAMA_SWAP_URL` local calls with the provider-aware resolver.
2. Build Arenas local-model set from the shared provider registry, not one fetched list.
3. Preserve ADR-0001s two-lane scheduling rule; provider awareness changes local identity, not lane semantics.
4. Keep bare-id compatibility only where old data needs it.
**Parallel-safe split.**
- Agent A: `arena-model-call.ts` + analyzer updates
- Agent B: local-model set construction in `index.ts` + runner adjustments after A settles the model identity contract
**Exit criteria.**
- Arena can run local contestants from more than one machine.
- Local-vs-cloud classification still works.
- ADR-0001 behavior remains intact.
### W7. External-Agent Parity
**Goal.** Give `opencode` a real multi-provider local-model story instead of collapsing everything back to `llama-swap/<model>`.
**Files and seams.**
- `apps/coder/src/services/backends/opencode-server.ts`
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/coder/src/services/agent-probe.ts`
- new BooCoder-hosted gateway route or service module under `apps/coder/src/services/`
- host config generation or sync for opencode local models
**Implement.**
1. Add a BooCoder-hosted OpenAI-compatible local gateway that accepts provider-preserving model ids and routes them to the correct local provider ([D-6](artifacts/implementation-decision-log.md#d-6-opencode-parity-uses-a-boocode-local-gateway-not-a-string-rewrite)).
2. Use one opencode-facing provider namespace such as `boocode-local`, where the opencode `providerID` is stable and the `modelID` is the inner composite id like `sam-desktop/qwen3.6-35b`.
3. Update provider snapshot merging so `opencode` advertises `boocode-local/<provider/model>` rather than `llama-swap/<model>`.
4. Update the opencode bridge parser and config sync so duplicate model names remain distinguishable end to end.
5. Add smoke coverage for two providers serving the same wire model name.
**Parallel-safe split.**
- Gateway branch first
- Snapshot/config-sync branch second
- Final opencode backend/parser adjustments last
**Exit criteria.**
- `opencode` can target two local providers with overlapping wire model names and hit the correct machine both times.
- No path rewrites `provider/model` down to plain `llama-swap/model`.
### W8. Operations and Final Verification
**Goal.** End with a repeatable operator workflow, not just a working dev branch.
**Files and seams.**
- `data/llama-providers.example.json`
- operator docs under `docs/`
- OpenSpec tasks/status notes as needed
**Implement.**
1. Document the add-a-machine flow for config-managed local providers.
2. Document the smoke matrix for:
- single legacy provider fallback
- two local providers
- duplicate model names across two providers
- DeepSeek enabled
- `opencode` local parity
3. Record the final interface BooControl should consume: provider registry plus composite ids, not raw host env vars.
**Exit criteria.**
- A third machine can be added by editing config and running the smoke matrix.
- The implementation docs name the exact runtime contract BooControl should build on.
## Verification Plan
- `pnpm -C packages/contracts build`
- `pnpm -C apps/server test`
- `pnpm -C apps/server build`
- `pnpm -C apps/coder test`
- `pnpm -C apps/coder build`
- `npx tsc -p apps/web/tsconfig.app.json --noEmit`
Add targeted tests as the work lands:
- model-ref parse/format and bare-id fallback
- provider-aware routing and DeepSeek collision cases
- context-cache isolation for duplicate model names
- favorites hide-not-delete behavior
- provider snapshot and opencode bridge behavior
- arena local-model classification across multiple providers
## Main Risks
- The W2 contract change to `/api/models` and W5 snapshot changes can drift across apps if contract parity is edited piecemeal. Follow the cross-app contract standard in [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md) and land contract-first branches.
- W7 is the hardest seam. If the gateway is skipped and the old string rewrite is kept, the feature will look complete in UI while still routing the wrong machine.
- `model-context.ts` is a hidden correctness seam. If cache keys stay bare, duplicate model names will mis-share context limits and compaction behavior even after routing is fixed.
## Deferred
- BooControl itself
- picker search and richer filtering
- manual favorite reordering
- host health badges in pickers
## Definition of Done
- BooChat, native BooCoder, Arena, and `opencode` all support provider-aware local models end to end.
- Legacy bare ids remain readable.
- Two providers can expose the same wire model name without ambiguity.
- Adding another local machine is documented and smoke-tested.
- BooControl can start later without inventing a second provider registry.