chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions

View File

@@ -0,0 +1,99 @@
# Multi-Provider Local Models — Operator Guide
How BooCode routes local inference across multiple llama-swap machines, how to
add another machine, and the smoke matrix to run after any provider change.
Implementation plan: [plans/multi-provider-local-models/feature-implementation-plan.md](plans/multi-provider-local-models/feature-implementation-plan.md).
## Runtime contract
- **Config authority:** `/data/llama-providers.json` (bind-mounted; gitignored),
read by both `apps/server` and `apps/coder` via `LLAMA_PROVIDERS_PATH`.
Tracked template: `data/llama-providers.example.json`.
- **Legacy fallback:** when the file is absent, both apps synthesize a single
provider from `LLAMA_SWAP_URL`. Startup never breaks on a missing file.
- **Model identity:** persisted and cached ids are composite `provider/model`
(e.g. `sam-desktop/qwen3.6-35b-a3b`). Wire calls to upstreams always send the
bare model id. Legacy bare ids resolve to `defaultProvider` indefinitely.
- **Resolver:** `resolveModelProvider()` in
`apps/server/src/services/inference/provider.ts` is the single routing
authority for streaming, non-streaming, context lookup, compaction, and
task-model fallback. The coder mirrors this via its registry loader
(`apps/coder/src/services/llama-providers.ts`) for arena and the local gateway.
- **opencode bridge:** the BooCoder-hosted OpenAI-compatible gateway
(`apps/coder/src/services/local-gateway.ts`) exposes all local providers to
opencode under the single namespace `boocode-local`; the inner modelID is the
composite id (`boocode-local/sam-desktop/qwen3.6-35b`). No path rewrites a
composite id down to `llama-swap/<model>`.
## Add a machine
1. Start llama-swap on the new machine, reachable over Tailscale
(e.g. `http://100.x.y.z:84NN`).
2. Edit `/data/llama-providers.json`: append a provider entry
`{ "id": "<machine-slug>", "label": "<Display>", "baseUrl": "http://100.x.y.z:84NN", "kind": "llama-swap" }`.
3. Restart consumers: `docker compose restart boocode` (server reads the file at
startup) and `sudo systemctl restart boocoder`.
4. Verify: `GET /api/models` shows a new provider group; the new machine's
models appear as `<machine-slug>/<model>` in the BooChat picker and the
native BooCoder composer.
5. Run the smoke matrix below.
That is the whole flow — no code changes, no rebuild (config lives in the
bind-mounted `data/`).
## Smoke matrix
Run after adding/removing a provider or changing provider config:
| Case | Steps | Expect |
|---|---|---|
| Legacy fallback | Remove/rename `llama-providers.json`, restart server | Boot OK; single provider synthesized from `LLAMA_SWAP_URL`; bare-id sessions still stream |
| Two local providers | File with `sam-desktop` + `embedding`; chat once on a model from each | Both stream; `GET /api/models` shows both groups with composite ids |
| Duplicate model names | Same wire model name on two providers; chat on each composite id | Each request hits its own machine (check llama-swap logs); context limits are not cross-shared |
| DeepSeek enabled | Set `DEEPSEEK_API_KEY`; pick `deepseek/<model>`; also pick `embedding/deepseek-r1-qwen3-8b` | First routes to DeepSeek cloud; second routes to local `embedding` (collision case) |
| Favorites | Star models from two providers, refresh, unplug one provider, refresh | Favorites persist; offline provider's favorites hidden, not deleted from settings |
| opencode parity | Dispatch an opencode task on `boocode-local/<provider>/<model>` for two providers sharing a wire name | Each lands on the correct machine; no `llama-swap/` collapse in opencode config or logs |
| Arena | Battle with contestants from two local providers | Local lane stays serial (ADR-0001); each contestant calls its own provider |
## Interface for BooControl (follow-on)
BooControl must consume, not reinvent:
- the provider registry file `/data/llama-providers.json` (schema:
`@boocode/contracts/llama-providers`, `LlamaProvidersFileSchema`) as the
single source of provider identity;
- composite `provider/model` ids everywhere it stores or displays model
identity (`parseModelRef`/`formatModelRef` from the same contracts subpath);
- `GET /api/models` for live inventory and `favorite_models` in
`GET/PATCH /api/settings` for user preference — never raw host env vars.
Adding fleet UI = writing this file + restarting consumers; nothing else owns
provider identity.
## External agents
Both of Sam's coder agents get the local fleet through the gateway at coder
startup, under the single provider namespace `boocode-local`:
- **opencode** — `opencode-config-sync.ts` writes the provider (with
`@ai-sdk/openai-compatible` + gateway `baseURL` + model map) into
`~/.config/opencode/opencode.json`.
- **Pi** — `pi-config-sync.ts` writes the provider into
`~/.pi/agent/models.json` (other providers untouched; hand-tuned per-model
`contextWindow`/`maxTokens` overrides on boocode-local entries survive
re-sync).
After adding a machine, `sudo systemctl restart boocoder` re-syncs both.
## Resilience notes
- **Arena's local-model set self-refreshes every 5 min**
(`arena-local-models.ts`): a provider that was down at coder startup is
reclassified as local once it recovers; an unreachable provider keeps its
last-known models (stale-but-local beats a wrong cloud-lane dispatch). Bare
ids are contributed only by the default provider.
- The gateway forwards the client's `Authorization` header to upstreams when
present; its `/v1/*` routes remain unauthenticated on :9502 (repo
convention: the reverse proxy owns auth).
- Gateway `GET /v1/models` serves the live composite model list fetched from
every registry provider.

View File

@@ -0,0 +1,126 @@
# Discovery Notes: Multi-Provider Local Models
Single source of truth for implementation context. Read this first before touching the plan or code.
## Tech stack
- Monorepo with pnpm workspaces.
- `apps/server`: Fastify + Postgres, native inference, local-model routing, BooChat APIs.
- `apps/web`: React + Vite SPA, shared chat and coder UI.
- `apps/coder`: host-side BooCoder service, provider probing, native and external-agent dispatch, Arena, MCP.
- `packages/contracts`: shared cross-app schemas and types, built before consumers.
- TypeScript strict mode. Server and coder use NodeNext and `.js` import suffixes.
- Tests: `pnpm -C apps/server test`, `pnpm -C apps/coder test`. No dedicated web test harness.
## ADRs found
- `docs/adr/0001-arena-two-lane-scheduling.md`
Summary: local llama-backed contestants run serially in one lane, cloud contestants run in parallel in another lane; multi-provider work must preserve this lane model.
- `docs/adr/0002-arena-dedicated-tables-not-flow-runner.md`
Summary: Arena owns its own storage and runtime shape; reuse dispatcher machinery but do not fold Arena back into flow-runner abstractions.
## Coding standards found
- `docs/coding-standards/cross-app-contract-parity.md`
Summary: when a cross-app contract changes, update the canonical package source plus app-side secondary representations in the same batch; missing one side silently drops behavior at runtime.
- `CLAUDE.md`
Summary: `packages/contracts` is the single source for provider-snapshot and message-metadata contracts, deploy-by-surface rules matter, and contract changes must respect app-local secondary unions and renderers where they still exist.
## Relevant architecture notes
- `apps/server/CLAUDE.md`
Summary: `services/inference/provider.ts` is the current llama-swap provider seam; `model-context.ts` and `compaction.ts` currently assume one upstream.
- `apps/coder/CLAUDE.md`
Summary: provider snapshot and `opencode` integration are the main local-model seams; `llama-swap/*` is currently the local namespace assumption.
- `apps/web/CLAUDE.md`
Summary: `ModelPicker` and `AgentComposerBar` are separate UI surfaces with different constraints; any provider snapshot loading-state change can make providers disappear from the coder UI.
## Code touch points
### Shared contracts and config patterns
- `packages/contracts/src/provider-config.ts`
Existing coder ACP provider config schema; useful precedent, but not the right place to overload with local host inventory semantics.
- `apps/coder/src/services/provider-config-registry.ts`
Existing pattern for schema-in-package plus app-local load/build cache.
- `packages/contracts/src/provider-snapshot.ts`
Shared snapshot contract used by coder and web.
### Server: catalog, routing, and downstream local-model consumers
- `apps/server/src/config.ts`
Current env config includes `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and `DEFAULT_MODEL`; multi-provider config must enter here.
- `apps/server/src/routes/models.ts`
Current `/api/models` route fetches one llama-swap and optionally DeepSeek.
- `apps/server/src/services/inference/provider.ts`
Current route selection and AI SDK provider seam; central place to remove heuristic provider detection.
- `apps/server/src/services/model-context.ts`
Current context cache keys by bare model string and assumes one `LLAMA_SWAP_URL`.
- `apps/server/src/services/compaction.ts`
Uses `resolveModelEndpoint()` today, but still contains one-provider assumptions and a DeepSeek prefix special case.
- `apps/server/src/services/task-model.ts`
Returns one resolved `{url, model}` pair today.
- `apps/server/src/index.ts`
Calls `configureModelContext({ llamaSwapUrl })`; this wiring must change when context lookup becomes provider-aware.
- `apps/server/src/routes/settings.ts`
Existing shared settings persistence surface; right place for `favorite_models`.
### Web: BooChat and coder selection UI
- `apps/web/src/components/ModelPicker.tsx`
Shared BooChat model picker component; currently assumes a flat `/api/models` list.
- `apps/web/src/components/AgentComposerBar.tsx`
Native BooCoder provider/mode/model picker surface.
- `apps/web/src/lib/model-label.ts`
Display-only model prettifier used by both pickers.
- `apps/web/src/api/client.ts`
`models()` currently expects `ModelInfo[]`.
- `apps/web/src/api/types.ts`
Holds the web-side API contract for `/api/models` and other cross-app payloads.
### Coder: native, snapshot, arena, and external-agent bridge
- `apps/coder/src/config.ts`
Current coder config still exposes `LLAMA_SWAP_URL`; multi-provider config must enter here too.
- `apps/coder/src/services/provider-snapshot.ts`
Current snapshot fetches one `LLAMA_SWAP_URL`, prefixes local models as `llama-swap/*`, and merges them into `opencode`.
- `apps/coder/src/services/dispatcher.ts`
Current native and external-agent dispatch logic still assumes local bare ids or `llama-swap/*` for local routing.
- `apps/coder/src/services/backends/opencode-server.ts`
`parseModel()` splits only once at `/`; this is good news because a stable outer provider namespace can carry an inner composite model id.
- `apps/coder/src/services/arena-model-call.ts`
Direct one-shot local model call against `LLAMA_SWAP_URL`.
- `apps/coder/src/services/arena-analyzer.ts`
Local-vs-cloud checks rely on one local model set and one upstream.
- `apps/coder/src/index.ts`
Builds the local-model set for Arena from one fetched llama-swap list.
## Recent activity and churn
High-churn files in the last 90 days:
- `apps/web/src/api/types.ts`
- `apps/web/src/api/client.ts`
- `apps/server/src/index.ts`
- `apps/server/src/types/api.ts`
- `apps/coder/src/services/dispatcher.ts`
- `apps/coder/src/index.ts`
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/web/src/components/AgentComposerBar.tsx`
- `apps/server/src/services/compaction.ts`
Implication: keep work units narrow and avoid combining unrelated refactors in these files.
## Constraints and load-bearing facts
- `packages/contracts` already owns provider-snapshot types; if the snapshot contract changes, rebuild the package before touching consumers.
- `apps/web` has no dedicated test harness, so web verification will rely on typecheck plus smoke testing.
- Arenas local lane semantics are intentional; multi-provider support must not collapse local models into parallel execution.
- `opencode` local parity is not a small rename. The current host config and snapshot behavior collapse identity to one `llama-swap` namespace.
## Gaps and unknowns
- No existing shared local-provider config file or schema exists in-repo yet.
- `/api/models` shape change is not yet specified in app-local types; W2 must settle the contract before W4 starts.
- The final `opencode` gateway path is not implemented anywhere yet; W7 is net-new code, not just adaptation.
- No dedicated docs for “add a machine” exist yet; W8 must create them.

View File

@@ -0,0 +1,109 @@
# Implementation Decision Log: Multi-Provider Local Models
This file records the implementation decisions committed while planning the multi-provider local-model rollout.
Behavioral intent lives in [../feature-implementation-plan.md](../feature-implementation-plan.md) and the source
artifacts it cites. Round history lives in [implementation-iteration-history.md](implementation-iteration-history.md).
Source artifacts:
- [../build-phase-outline.md](../build-phase-outline.md)
- [../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md)
- [../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md)
- [../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md)
- [./.discovery-notes.md](./.discovery-notes.md)
### D-1: Shared local-provider config authority
- **Question:** Where does the source of truth for named local providers live, and what belongs in the shared package versus app-local loaders?
- **Decision:** Use `/data/llama-providers.json`, wired through `LLAMA_PROVIDERS_PATH`, as the shared authority for local providers. Put the schema and pure model-ref helpers in `packages/contracts`; keep file I/O and legacy env fallback in app-local registry loaders for server and coder.
- **Rationale:** This matches the existing BooCoder pattern of package-owned schemas plus app-local load/build caches, avoids duplicating config semantics, and avoids forcing Node-specific loader code into every consumer of the contracts package.
- **Evidence:** `packages/contracts/src/provider-config.ts` and `apps/coder/src/services/provider-config-registry.ts` already follow this split; the current local-provider gap is that server and coder do not share any equivalent registry.
- **Rejected alternatives:**
- Keep local providers env-only forever. Rejected because server and coder already drift and more machines would multiply the drift.
- Put file reading only in one app and make the other app consume it indirectly. Rejected because both server and coder need startup-time local-provider awareness.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, Working Assumptions, W1.
### D-2: Persist and cache composite `provider/model` ids; keep wire ids bare
- **Question:** What is the canonical identity format for local model selections and caches?
- **Decision:** Persist and cache `provider/model`. Strip the provider prefix only at the final upstream call boundary. Keep indefinite support for legacy bare ids by resolving them to `defaultProvider`.
- **Rationale:** Duplicate wire model names across machines are otherwise impossible to represent safely. This also keeps DB migrations small because the existing columns are already free-form text.
- **Evidence:** `sessions.model` and `chats.model` are stringly typed; `apps/server/src/services/model-context.ts` currently keys by bare model and would otherwise cross-poison duplicate names.
- **Rejected alternatives:**
- Keep persisted ids bare and use side metadata for provider. Rejected because many call sites already pass the model string around alone.
- Prefix wire calls too. Rejected because upstream llama-swap and DeepSeek calls want the actual provider-native model id.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W1, W2, W3.
### D-3: One provider-aware resolver shared across streaming, non-streaming, context, and Arena
- **Question:** Should each consumer keep its own endpoint logic once multiple local providers exist?
- **Decision:** No. Build one provider-aware resolver contract and make streaming inference, non-streaming calls, context lookup, compaction, task-model resolution, and Arena all go through it.
- **Rationale:** The current failure mode is duplicated routing logic with slightly different heuristics. Fixing only one path would leave subtle misroutes in the others.
- **Evidence:** `apps/server/src/services/inference/provider.ts`, `apps/server/src/services/model-context.ts`, `apps/server/src/services/compaction.ts`, `apps/server/src/services/task-model.ts`, and `apps/coder/src/services/arena-model-call.ts` all handle local-model identity separately today.
- **Rejected alternatives:**
- Only unify server inference and leave context/arena separate. Rejected because that would preserve hidden correctness bugs in context limits and Arena calls.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W2, W3, W6.
### D-4: Favorites are a settings-backed user view, not a server catalog section
- **Question:** Where should the Favorites concept live?
- **Decision:** Store `favorite_models: string[]` in settings and derive the Favorites section client-side from settings plus provider inventory. The server catalog returns providers and models only.
- **Rationale:** Inventory answers “what exists now.” Favorites answer “what this user prefers.” Keeping them separate avoids overloading the server catalog with user-specific UI state.
- **Evidence:** `settings` already exists server-side; the OpenSpec analysis already identified favorites as a user-level concern rather than an inventory concern.
- **Rejected alternatives:**
- Return a synthetic Favorites section from `/api/models`. Rejected because it entangles inventory with user preference and complicates offline/unavailable favorite behavior.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W2, W4.
### D-5: Native `boocode` parity ships before `opencode` parity
- **Question:** Should native and external-agent BooCoder paths move together?
- **Decision:** No. Native `boocode` parity is W5. `opencode` parity is W7 and does not begin until the native path is correct and the UI stops falsely advertising multi-provider local models under the old bridge.
- **Rationale:** Native `boocode` can use the shared resolver directly. `opencode` still assumes one local-provider namespace and is the riskier seam.
- **Evidence:** `apps/coder/src/services/provider-snapshot.ts` prefixes local models as `llama-swap/*`; `apps/coder/src/services/backends/opencode-server.ts` still assumes the outer provider namespace identifies the target upstream.
- **Rejected alternatives:**
- Rename everything to `provider/model` in one pass. Rejected because the external-agent bridge would still collapse identity at the last moment.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W5, W7.
### D-6: `opencode` parity uses a `boocode-local` gateway, not a string rewrite
- **Question:** What is the safe path to external-agent parity?
- **Decision:** Add a BooCoder-hosted OpenAI-compatible local gateway and present it to `opencode` as one stable provider namespace such as `boocode-local`. The inner `modelID` carries the composite local identity like `sam-desktop/qwen3.6-35b`.
- **Rationale:** `parseModel()` in the opencode backend already splits only once at `/`, which means a stable outer provider id can safely carry the inner composite local id. That preserves provider identity without teaching opencode about every machine directly.
- **Evidence:** `apps/coder/src/services/backends/opencode-server.ts` `parseModel()` returns `{ providerID, modelID }` where `modelID` may contain additional slashes; current `llama-swap/<model>` mapping is the ambiguity seam.
- **Rejected alternatives:**
- Keep rewriting `provider/model` back to `llama-swap/model`. Rejected because duplicate local model names would still route incorrectly.
- Add one direct opencode provider per local machine. Rejected because it duplicates the registry and leaks fleet structure into opencode config.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W7.
### D-7: Add-a-machine stays config-driven in this initiative
- **Question:** Does this rollout include a control-plane UI for adding local machines?
- **Decision:** No. Adding a machine stays a config-driven operation in this initiative, documented in W8. BooControl is the later UI/control-plane consumer.
- **Rationale:** The user goal is multi-provider support now, not a new admin product before the substrate exists.
- **Evidence:** BooControls own tasks call this registry work a prerequisite; current repo state has no stable local-provider substrate yet.
- **Rejected alternatives:**
- Build BooControl first. Rejected because it would either duplicate registry logic or bind to todays broken single-provider assumptions.
- **Driven by rounds:** R1.
- **Referenced in plan:** Outcome, W8, Deferred.
### D-8: Work unit sequencing is contract-first, consumer-second, verification-third
- **Question:** How should this be broken down for Orchestration so branches do not constantly collide?
- **Decision:** Sequence every work unit as:
1. contracts and config
2. primary backend seam
3. downstream consumers
4. tests and smoke
and forbid parallel editing of the shared contract and resolver files.
- **Rationale:** The churniest files in this repo are exactly the shared contract and coordinator files. Letting multiple branches edit them in parallel is the fastest path to merge thrash and subtle drift.
- **Evidence:** Recent churn is highest in `apps/web/src/api/types.ts`, `apps/web/src/api/client.ts`, `apps/server/src/index.ts`, `apps/coder/src/services/dispatcher.ts`, and `apps/coder/src/services/provider-snapshot.ts`.
- **Rejected alternatives:**
- Split by app only. Rejected because this feature crosses contracts, server, web, and coder in nearly every phase.
- **Driven by rounds:** R1.
- **Referenced in plan:** Orchestration Rules, Work Unit Index, all work units.

View File

@@ -0,0 +1,38 @@
# Implementation Iteration History: Multi-Provider Local Models
This file records how the implementation plan was assembled from the existing research, OpenSpec docs, and codebase review.
Committed decisions live in [implementation-decision-log.md](implementation-decision-log.md). The primary plan lives in
[../feature-implementation-plan.md](../feature-implementation-plan.md).
## R1: Coordinator pass grounded in source docs and local code review
- **Specialists engaged:** coordinator-only pass using the existing research note, OpenSpec design/tasks, implementation analysis, root and app `CLAUDE.md` files, ADRs, coding standard, and targeted code search. No separate specialist tool round was run in this repo pass.
- **New input provided:** [../build-phase-outline.md](../build-phase-outline.md), [./.discovery-notes.md](./.discovery-notes.md), the OpenSpec batch, and the current code seams in server, web, and coder.
- **Claim ledger:**
| # | Claim | State | Spec-maturity |
|---|---|---|---|
| C1 | There is no single source of truth for local providers shared by server and coder | Evidenced | plan-level |
| C2 | Composite `provider/model` ids are required for duplicate model names across hosts | Evidenced | plan-level |
| C3 | Routing logic is duplicated across streaming, non-streaming, context, compaction, task-model, and Arena | Evidenced | plan-level |
| C4 | Favorites belong in settings plus client derivation, not in the server catalog | Evidenced | plan-level |
| C5 | Native BooCoder can adopt the shared resolver before `opencode` can | Evidenced | plan-level |
| C6 | The current `opencode` bridge collapses local identity and needs a provider-preserving gateway | Evidenced | plan-level |
| C7 | Arena is a separate local-model consumer and must be planned explicitly | Evidenced | plan-level |
| C8 | BooControl depends on this substrate and should not be built first | Evidenced | plan-level |
- **Open Questions raised:**
- OQ-1: shared local-provider authority format and location
Resolution: D-1, `/data/llama-providers.json` plus `LLAMA_PROVIDERS_PATH`
- OQ-2: canonical local model identity format
Resolution: D-2, composite `provider/model`
- OQ-3: how to achieve external-agent parity honestly
Resolution: D-6, `boocode-local` gateway
- OQ-4: whether add-a-machine is UI-driven in this batch
Resolution: D-7, no, keep config-driven
- **Spec-maturity tags:** all findings were plan-level. No spec-stage reopening was required because the earlier research and OpenSpec docs already settled the behavior.
- **Resolution source:** evidence from source docs plus current code inspection.
- **Decisions produced:** D-1, D-2, D-3, D-4, D-5, D-6, D-7, D-8.
- **Changed in plan:** initial authoring of `feature-implementation-plan.md` and its three supporting artifacts.
- **Next-step recommendation:** go to synthesis. The work is ready to execute as W1 through W8 in order, with W7 as the main hard seam and W8 as the operational closeout.

View File

@@ -0,0 +1,390 @@
---
title: "Multi-Provider Local Models — Build Phase Outline"
source_artifact: "Multiple sources: docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md; openspec/changes/multi-llama-swap-providers-model-favorites/design.md; openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md"
audience: "mixed"
generated: "2026-06-10"
generated_by: "han.core:plan-a-phased-build"
---
# Multi-Provider Local Models — Build Phase Outline
This document describes the order in which multi-provider local model support will be built. The work is broken into a sequence of phases, where each phase is a thin end-to-end deliverable that can be demonstrated to a real person, and each phase builds on the one before it. The goal is to let BooCode work cleanly with more than one local model machine today and make it straightforward to add more local machines later.
This outline is built from three sources taken together: the research note that identified the routing and identity problems, the OpenSpec batch that defines the intended behavior, and the implementation analysis that tightened the architecture around the harder integration seams. The source material describes what exists today, what the target behavior is, and where the hidden risks are. This document describes the order in which the work should be built so the system reaches that target in a controlled way.
## Table of Contents
- [Executive Summary](#executive-summary)
- [Build Phase Index](#build-phase-index)
- [How This Rollout Differs from the First Draft](#departures)
- [Phase Kinds](#phase-kinds)
- [Build Phases](#build-phases)
- [Phase 1: Named Provider Inventory](#phase-1)
- [Phase 2: Multi-Provider BooChat](#phase-2)
- [Phase 3: Shared Favorites and Grouped Selection](#phase-3)
- [Phase 4: Native BooCoder Parity](#phase-4)
- [Phase 5: Multi-Provider Arena](#phase-5)
- [Phase 6: External-Agent Parity](#phase-6)
- [Phase 7: Add-a-Machine Operations](#phase-7)
- [Phase 8 (Deferred): BooControl Fleet Layer](#phase-8)
- [Open Questions](#open-questions)
---
## Executive Summary {#executive-summary}
**The goal:** BooCode should treat local inference as a small fleet instead of a single machine. A user should be able to choose models from multiple local providers, keep favorites across BooChat and BooCoder, run coding and arena workflows against the intended provider, and add another local machine later without reopening the core design.
**The shape of the build:**
- The rollout starts by making provider identity real and visible before any routing changes are hidden behind it.
- BooChat gets multi-provider conversations before the broader coding surfaces, so the first live slice proves the model identity and routing rules end to end.
- Shared favorites and grouped pickers land before the coding parity work so the selection experience stabilizes once and is then reused.
- Native BooCoder and Arena adopt the same provider rules before the harder external-agent bridge is attempted.
- The final live phase turns “two machines supported” into “more machines are routine,” so the work ends in an operationally repeatable state instead of a one-off fix.
**Sequencing rationale, in plain language:**
The order starts with the smallest user-visible slice that proves the new mental model: named providers and distinct model identities. Once that exists, BooChat can safely route real conversations across providers and expose any mistakes early. Only after model identity, routing, and favorites are stable does it make sense to move deeper coding surfaces over, because those surfaces are less forgiving and have more hidden assumptions. The external-agent bridge comes late because it is the one place where a simple rename would look correct but still route the wrong machine.
**Departures from the source artifact:**
- Favorites are treated as a user-level view derived from shared settings, not as a built-in section of the servers model inventory.
- Native BooCoder parity comes before external-agent parity, because the external-agent path needs its own provider-preserving bridge.
**Phases deliberately deferred:**
BooControl is listed as a deferred final phase because it depends on this registry and identity work but does not need to exist for the multi-provider rollout itself to be complete. Search, richer filtering, and other picker refinements are also intentionally left out of the live phase sequence unless real usage proves they are needed.
**Where to look next:** The [Build Phase Index](#build-phase-index) lists every phase in order. The [departures section](#departures) names the two decisions that shape the rest of the plan. Detailed write-ups follow under [Build Phases](#build-phases). Decisions the team must resolve before phase 1 can start are at [Open Questions](#open-questions).
---
## Build Phase Index {#build-phase-index}
| # | Phase | Kind | Outcome (one sentence) |
|---|---|---|---|
| 1 | [Named Provider Inventory](#phase-1) | Foundation | BooCode can see distinct local providers and distinct model identities. |
| 2 | [Multi-Provider BooChat](#phase-2) | Feature slice | A chat can run on the intended local provider without misrouting. |
| 3 | [Shared Favorites and Grouped Selection](#phase-3) | Feature slice | Favorites persist once and appear consistently across both chat surfaces. |
| 4 | [Native BooCoder Parity](#phase-4) | Feature slice | Native coding tasks can use the same multi-provider local model pool. |
| 5 | [Multi-Provider Arena](#phase-5) | Feature slice | Arena can compare local models from more than one machine correctly. |
| 6 | [External-Agent Parity](#phase-6) | Feature slice | External coding providers can target local machines without losing provider identity. |
| 7 | [Add-a-Machine Operations](#phase-7) | Polish | Adding another local machine becomes a routine configuration change. |
| 8 | [BooControl Fleet Layer (deferred)](#phase-8) | Deferred | A fleet cockpit can build on the finished provider registry later. |
> Numbers are assigned in build order and are stable for the life of this outline. Cite them as `Phase N` in tickets, comments, and follow-up reports.
---
## How This Rollout Differs from the First Draft {#departures}
The rollout deliberately departs from the first pass of the design in the ways named below. Each departure is summarized once here so the phase write-ups can refer to it by name.
### 1. Favorites are a shared user preference, not part of the provider inventory
The first draft treated favorites as if they belonged inside the model catalog itself. The rollout instead treats them as a shared user preference layered on top of provider inventory. This matters because provider inventory answers “what exists right now,” while favorites answer “what this user prefers across devices and surfaces.”
### 2. External-agent support is a late seam, not part of the first local-model cut
The first draft grouped native and external-agent parity together too early. The rollout separates them because native surfaces can use the new provider resolver directly, while the external-agent path still assumes one local provider behind the scenes. That path needs a real bridge, not a string rewrite.
---
## Phase Kinds {#phase-kinds}
- **Foundation** — A capability that does not yet deliver the full user outcome, but is required for later phases. It must still be demonstrable on its own.
- **Feature slice** — A thin end-to-end strip of new behavior that a real user can experience.
- **Polish** — Refinement, resilience, or operational quality-of-life work that enriches a working core.
- **Deferred** — Listed for traceability; not built in the current plan.
---
## Build Phases {#build-phases}
### Phase 1: Named Provider Inventory {#phase-1}
**Kind.** Foundation.
**Builds on.** Nothing — this is the starting phase.
**What we build.** BooCode learns that “local models” are not one undifferentiated pool. The system gains a shared named-provider list, a stable way to name a selected model as “provider plus model,” a default-provider fallback for old data, and a provider-aware inventory view that can show which models belong to which machine.
**Why this is Phase 1.** No later phase is safe until provider identity exists as a first-class concept. This phase is still demonstrable on its own because a person can see two named local providers with their own model groups and confirm that existing sessions still resolve instead of breaking.
**Outcome to demonstrate.**
1. Start BooCode with two named local providers configured.
2. Open the model selection view and see separate groups for each provider.
3. Open an older session that still stores a legacy bare model value.
4. Confirm the older session still resolves to a usable default instead of failing.
**Source citations.**
- [Research — Recommendation](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Research — What exists today](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#what-exists-today-codebase--current-state-anchor)
- [Implementation analysis — Shared local-provider registry](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#1-shared-local-provider-registry)
**Connects to.**
- Creates the identity rules used by [Phase 2](#phase-2), [Phase 4](#phase-4), and [Phase 5](#phase-5).
- Establishes the provider list that [Phase 7](#phase-7) will operationalize for future machines.
**Preconditions to verify before starting.**
- Confirm the shared provider list lives in one new shared location rather than being split between separate app-specific settings.
- Confirm which provider is the long-term default when legacy bare model values are encountered.
---
### Phase 2: Multi-Provider BooChat {#phase-2}
**Kind.** Feature slice.
**Builds on.** Phase 1, where provider identity and fallback rules are established.
**What we build.** BooChat becomes the first live end-to-end consumer of multiple local providers. A person can choose a model from any configured provider, send a message, and trust that the response came from the intended machine. The same phase also fixes the two current routing hazards: models that happen to share a cloud-provider prefix in their name, and models that should never be sent through the sidecar path.
**Why this is Phase 2.** BooChat is the fastest way to prove the provider resolver against real behavior. It surfaces routing mistakes immediately, but it is still simpler and easier to inspect than the coding surfaces that layer more state and backend behavior on top.
**Outcome to demonstrate.**
1. Open a chat and choose a model from the first local provider.
2. Send a prompt and get a response.
3. Switch to a model from the second local provider and send the same prompt.
4. Confirm both responses arrive successfully and the second provider does not get routed through the wrong path.
5. Run a model whose name resembles a cloud model name and confirm it still uses the intended local provider.
**Source citations.**
- [Research — Recommendation constraints](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Research — Does embedding need a llama-sidecar? No.](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#does-embedding-need-a-llama-sidecar-no)
- [OpenSpec design — Server changes](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md#5-server-changes)
**Connects to.**
- Supplies the stable routing behavior reused in [Phase 3](#phase-3), [Phase 4](#phase-4), and [Phase 5](#phase-5).
- Proves the provider resolver before the coding flows depend on it.
**Preconditions to verify before starting.**
- Confirm the desired provider order for the user-facing list.
- Confirm the cloud-backed model group stays visibly separate from local machine groups.
---
### Phase 3: Shared Favorites and Grouped Selection {#phase-3}
**Kind.** Feature slice.
**Builds on.** Phase 1 for provider identity and Phase 2 for live multi-provider chat behavior.
**What we build.** Model selection becomes a stable, shared experience instead of a one-off list. A person can favorite models, see favorites first, still browse by provider below, and have the same favorite set follow them across chat surfaces. If a provider is temporarily unavailable, its favorites disappear from the visible list without being lost.
**Why this is Phase 3.** Once the routing rules are real, the next highest-value step is to make selection usable. Doing this before the deeper coding surfaces avoids building two different model-selection experiences and then reconciling them later.
**Outcome to demonstrate.**
1. Favorite one model from each local provider.
2. Refresh and confirm both favorites appear at the top while still remaining in their provider groups.
3. Open the other chat surface and confirm the same favorites appear there too.
4. Temporarily remove one provider from the live inventory.
5. Confirm its favorite disappears from view without being deleted, then returns when the provider comes back.
**Source citations.**
- [Research — Dropdown + favorites prior art](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#dropdown--favorites-prior-art-web)
- [Research — Favorites persistence](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#sub-decision--favorites-persistence)
- [Implementation analysis — Provider-aware catalog, client-derived favorites](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#3-provider-aware-catalog-client-derived-favorites)
**Connects to.**
- Provides the selection behavior reused by [Phase 4](#phase-4).
- Stabilizes the shared user preference model before the broader fleet tooling in [Phase 7](#phase-7).
**Preconditions to verify before starting.**
- Confirm favorites are shared for the single user across devices rather than stored per browser.
- Confirm insertion order is enough for the first favorite list and manual reordering can wait.
---
### Phase 4: Native BooCoder Parity {#phase-4}
**Kind.** Feature slice.
**Builds on.** Phase 1 for provider identity, Phase 2 for routing behavior, and Phase 3 for the grouped selection experience.
**What we build.** The native coding path in BooCoder gains the same local model pool as BooChat. A person can choose a local model from any configured provider for native coding work and trust that the coding session is using the selected provider instead of collapsing everything back to one machine.
**Why this is Phase 4.** The native coding path can use the shared provider resolver directly, so it is the safest BooCoder slice to move next. Shipping it before the external-agent bridge delivers real user value while avoiding the hardest integration seam for one more phase.
**Outcome to demonstrate.**
1. Open the native coding experience.
2. Choose a local model from the first provider and run a coding task.
3. Start a second coding task using a model from the second provider.
4. Confirm both tasks run successfully using the intended provider-specific model choice.
**Source citations.**
- [Research — Recommendation constraints](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Implementation analysis — Treat native and external-agent paths differently](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#4-treat-boocoder-native-and-boocoder-external-agent-paths-differently)
- [OpenSpec design — BooCoder integration](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md#7-boocoder-integration)
**Connects to.**
- Establishes the stable native coding baseline before [Phase 6](#phase-6) tackles external-agent parity.
- Shares its provider list and identity rules with [Phase 5](#phase-5).
**Preconditions to verify before starting.**
- Confirm the native coding path is the required BooCoder target for the first live parity slice.
- Confirm the same grouped-selection experience should be preserved in the coding surface without new selection concepts.
---
### Phase 5: Multi-Provider Arena {#phase-5}
**Kind.** Feature slice.
**Builds on.** Phase 1 for provider identity and Phase 2 for provider-aware local routing.
**What we build.** Arena stops treating “local” as one machine and instead treats it as a set of named providers. A person can run local comparisons across models from different machines and get correct routing and fair local classification instead of silent misclassification.
**Why this is Phase 5.** Arena benefits from the same resolver as chat and coding, but it is a separate consumer with its own local-versus-cloud logic. It belongs after the shared routing behavior is proven, but before the harder external-agent bridge so the local evaluation surface is complete early.
**Outcome to demonstrate.**
1. Start an arena comparison using one local model from the first machine and one from the second.
2. Run the comparison to completion.
3. Confirm both contenders are treated as local candidates rather than being collapsed into one generic local lane.
4. Confirm the results still make sense when one contender uses a provider-specific route such as the sidecar-backed machine.
**Source citations.**
- [Research — Recommendation constraints](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Implementation analysis — Arena is a separate local-model consumer](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#f-006--arena-is-a-separate-local-model-consumer-not-just-another-caller)
**Connects to.**
- Reuses the same provider resolver established earlier.
- Supplies the local evaluation surface that [Phase 7](#phase-7) will harden for future machines.
**Preconditions to verify before starting.**
- Confirm that the intended outcome is correct provider-aware behavior, not yet a richer benchmarking or reporting layer.
- Confirm that local fairness rules should still treat all named local providers as part of the local class rather than introducing provider-specific scheduling policy in this phase.
---
### Phase 6: External-Agent Parity {#phase-6}
**Kind.** Feature slice.
**Builds on.** Phases 1 through 5, because this phase depends on the final provider model being stable before it is bridged outward.
**What we build.** External coding providers gain access to the same multi-provider local fleet without losing provider identity. The user-visible outcome is simple: a local model chosen for an external coding workflow still hits the intended machine even when another machine serves a model with the same name.
**Why this is Phase 6.** This is the most failure-prone seam in the entire rollout. Shipping it earlier would make the system look complete while still hiding ambiguous routing behind the scenes. By the time this phase starts, the provider model, picker behavior, and native local routing rules are already stable.
**Outcome to demonstrate.**
1. Open an external coding workflow that can use a local model.
2. Choose a model name that also exists on another local machine.
3. Run the task and confirm the request still reaches the intended provider instead of whichever machine happens to share the name.
4. Repeat with a different local provider and confirm the same behavior.
**Source citations.**
- [Research — Validation V1 and V9](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#validation)
- [Implementation analysis — No safe path for opencode local-model parity](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#g-005--no-safe-path-for-opencode-local-model-parity)
- [Implementation analysis — Preferred parity path for opencode](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#5-preferred-parity-path-for-opencode-a-boocoder-hosted-local-model-gateway)
**Connects to.**
- Completes the coding-side multi-provider story started in [Phase 4](#phase-4).
- Creates the provider bridge that keeps future machines safe in [Phase 7](#phase-7).
**Preconditions to verify before starting.**
- Confirm whether this phase will include a provider-preserving gateway or be split into a follow-up initiative.
- Confirm external-agent parity is required for the same milestone as native parity rather than being a later enhancement.
---
### Phase 7: Add-a-Machine Operations {#phase-7}
**Kind.** Polish.
**Builds on.** Phases 1 through 6, where the provider model and all major consumers are already in place.
**What we build.** The rollout stops being “support two machines” and becomes “support a growing local fleet.” A person can add another local machine by following a repeatable operational path, see it appear in inventory, and trust that chat, coding, and arena all treat it as just another named provider instead of a custom exception.
**Why this is Phase 7.** The architecture can claim success only when adding another machine is routine rather than bespoke. This phase comes late because it is about making the completed system repeatable and low-friction, not about proving the original two-machine behavior.
**Outcome to demonstrate.**
1. Add a third local provider using the documented provider path.
2. Restart or refresh the system.
3. See the new machine appear in the provider inventory with its own model group.
4. Use one model from the new machine in chat, one in coding, and one in arena.
5. Confirm all three surfaces recognize the new machine without custom code changes.
**Source citations.**
- [Research — Recommendation](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md#recommendation)
- [Implementation analysis — Recommended sequence](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#recommended-sequence)
- [Implementation analysis — Shared local-provider registry](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md#1-shared-local-provider-registry)
**Connects to.**
- Turns the whole earlier rollout into an operationally repeatable capability.
- Provides the stable registry that the deferred fleet layer in [Phase 8](#phase-8) can consume later.
**Preconditions to verify before starting.**
- Confirm configuration-based provider management is acceptable for the first operational pass and a full management interface is not required yet.
- Confirm the success bar is “no code changes required to add the machine,” not “all provider administration happens inside the product.”
---
### Phase 8 (Deferred): BooControl Fleet Layer {#phase-8}
**Kind.** Deferred.
**Builds on.** Phases 1 through 7, because it consumes the finished provider registry and the settled provider names.
**What we build.** A dedicated fleet-control and observability layer that can show the state of multiple local model providers, collect live information across them, and eventually make routing and benchmarking easier to understand.
**Why this is deferred.** BooControl depends on the provider registry, but the registry does not depend on BooControl. Building the control layer earlier would either duplicate the provider model or force BooControl to sit on top of assumptions that this rollout is specifically trying to remove.
**Reopen when.** Reopen this phase once multi-provider chat, coding, arena, and add-a-machine operations are already stable and there is enough day-to-day fleet activity to justify a dedicated control surface.
**Outcome to demonstrate (when or if built).**
1. Open the fleet view.
2. See every named local provider in one place.
3. Inspect live state or history without having to visit each machine separately.
**Source citations.**
- [BooControl tasks — prerequisite note](../../../openspec/changes/boocontrol/tasks.md#p0--prerequisite-separate-batch-multi-llama-swap-provider-registry)
- [BooControl proposal — prerequisite note](../../../openspec/changes/boocontrol/proposal.md#why)
---
## Open Questions {#open-questions}
### OQ-1. Where should the shared provider list live, and who owns it? {#oq-1}
**Blocks phase(s).** Phase 1.
The first phase cannot start until there is one agreed source of truth for named local providers. If that decision stays split, every later phase inherits the split.
- **Option A — a new shared provider list used by both apps.** One place defines provider names, addresses, and any provider-specific routing attributes. This keeps the local fleet model unified.
- **Option B — keep the existing separate settings and derive one view from the other.** This lowers the immediate change but keeps the long-term drift risk alive.
- **Recommendation: Option A.** The whole point of the rollout is to make provider identity shared and durable. Keeping two authorities would repeat the same problem in a new shape.
### OQ-2. Does this initiative include external-agent parity, or does it stop after native parity? {#oq-2}
**Blocks phase(s).** Phase 6.
The rollout can reach a useful and honest midpoint after native parity, but it cannot claim full multi-provider coding parity until the external-agent path is solved too.
- **Option A — include external-agent parity in this initiative.** This produces a complete end state, but it requires a dedicated provider-preserving bridge.
- **Option B — stop after native parity and split the external-agent work into a follow-up.** This shortens the first initiative, but the end state remains intentionally incomplete.
- **Recommendation: Option A if the bridge is accepted; otherwise Option B.** If the team is willing to build the bridge properly, finishing the job now avoids a misleading halfway state. If not, native parity should ship honestly as a bounded milestone and the rest should be split explicitly.
### OQ-3. Is a product-based provider management screen required now, or is configuration-based rollout enough? {#oq-3}
**Blocks phase(s).** Phase 7.
The final live phase is about making more machines routine to add. The open question is whether “routine” means “edit the provider list and restart” or whether it already means “manage providers inside the product.”
- **Option A — configuration-based rollout first.** A trusted operator adds machines through the shared provider list and validates them using the product.
- **Option B — product-based management in the same initiative.** Provider administration becomes part of the product immediately.
- **Recommendation: Option A.** The current initiative is about correct provider identity and repeatable multi-provider behavior. A full management screen adds another feature layer before the provider model has had time to prove itself.
### Carry-over notes
- Search, tag filtering, and richer picker controls are intentionally not blockers for the main rollout.
- Full fleet control, reporting, and advanced routing policy stay deferred until the provider model is already stable in daily use.

View File

@@ -0,0 +1,345 @@
# Feature Implementation Plan: Multi-Provider Local Models
This plan turns the multi-provider local-model design into a strict implementation sequence that can be executed with Orchestration. It assumes the target is not just “fix the picker,” but to make local inference work as a small fleet with stable provider identity, shared favorites, correct routing, and an honest parity story for BooCoder.
## Source Specification
- Primary rollout outline: [build-phase-outline.md](build-phase-outline.md)
- Behavioral design: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md)
- Task inventory: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md)
- Architecture analysis: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md)
- Research note: [../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md)
- Discovery notes: [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md)
## Outcome
When this plan is complete:
- BooChat can route local models by named provider, not by one global `LLAMA_SWAP_URL`.
- Favorites are shared across BooChat and native BooCoder, derived from settings instead of being baked into the server catalog ([D-4](artifacts/implementation-decision-log.md#d-4-favorites-are-a-settings-backed-user-view-not-a-server-catalog-section)).
- Duplicate model names on different local machines are safe because persisted and cached identity is `provider/model` ([D-2](artifacts/implementation-decision-log.md#d-2-persist-and-cache-composite-providermodel-ids-keep-wire-ids-bare)).
- Native BooCoder and Arena use the same provider-aware resolver as BooChat ([D-3](artifacts/implementation-decision-log.md#d-3-one-provider-aware-resolver-shared-across-streaming-non-streaming-context-and-arena)).
- External-agent parity is real rather than implied: `opencode` only gets multi-provider local models after a provider-preserving bridge exists ([D-5](artifacts/implementation-decision-log.md#d-5-native-boocode-parity-ships-before-opencode-parity), [D-6](artifacts/implementation-decision-log.md#d-6-opencode-parity-uses-a-boocode-local-gateway-not-a-string-rewrite)).
- Adding another local machine is a config change plus a smoke pass, not another architecture pass ([D-7](artifacts/implementation-decision-log.md#d-7-add-a-machine-stays-config-driven-in-this-initiative)).
## Working Assumptions
- The shared local-provider source of truth is `/data/llama-providers.json`, exposed to both apps through `LLAMA_PROVIDERS_PATH`, with legacy env fallback while the file is absent ([D-1](artifacts/implementation-decision-log.md#d-1-shared-local-provider-config-authority)).
- `packages/contracts` owns schemas and pure helpers; app-local loader modules own file I/O and env fallback, following the existing `provider-config` / `provider-config-registry` split in BooCoder ([D-1](artifacts/implementation-decision-log.md#d-1-shared-local-provider-config-authority)).
- The work ends at a completed multi-provider substrate. BooControl is a follow-on consumer, not part of this implementation batch.
## Orchestration Rules
- Treat each work unit below as one mergeable branch. Do not overlap branches that touch the same shared contract files.
- Never run more than one agent at a time on `packages/contracts/src/*`, `apps/server/src/services/inference/provider.ts`, `apps/web/src/api/types.ts`, or `apps/coder/src/services/provider-snapshot.ts`.
- Inside a work unit, parallelize only disjoint file groups. Contract changes first, consumers second, tests last.
- Close each work unit with its own verification before starting the next one. Do not stack W1-W4 and debug later.
## Work Unit Index
| # | Work Unit | Surface | Delivers | Depends On | Verification |
|---|---|---|---|---|---|
| 1 | Provider Registry Foundation | contracts + server + coder | Shared config schema, model-ref helpers, app-local registry loaders | — | Contracts build, server build, coder build |
| 2 | Server Catalog and Routing | server | Provider-aware `/api/models` and unified resolver | W1 | server tests for routing + collision cases |
| 3 | Server Downstream Consumers | server | Context, compaction, and task-model stop assuming one endpoint | W2 | server tests for cache isolation + bare-id fallback |
| 4 | BooChat Favorites and Grouped Picker | server + web | Shared favorites and provider-grouped chat model selection | W2 | server tests + web smoke |
| 5 | Native BooCoder Parity | coder + web | Native `boocode` local models use composite IDs and grouped selection | W1, W4 | coder tests + BooCoder smoke |
| 6 | Arena Parity | coder | Arena local calls and local-model classification become provider-aware | W5 | coder tests + arena smoke |
| 7 | External-Agent Parity | coder | `opencode` gets multi-provider local models through a real bridge | W5 | coder tests + opencode smoke |
| 8 | Operations and Final Verification | docs + configs + smoke | Add-a-machine runbook, final matrix, ready handoff to BooControl | W7 | end-to-end smoke matrix |
## Work Units
### W1. Provider Registry Foundation
**Goal.** Make provider identity real before any routing or UI changes.
**Files and seams.**
- `packages/contracts/src/` for the new local-provider schema and pure model-ref helpers
- `packages/contracts/package.json` exports
- `apps/server/src/config.ts`
- `apps/coder/src/config.ts`
- new app-local registry loaders under `apps/server/src/services/` and `apps/coder/src/services/`
- `data/llama-providers.example.json`
**Implement.**
1. Add a new contracts subpath for local provider config, separate from the existing coder ACP provider config.
2. Define the shared file shape: `defaultProvider` plus `providers[]` with `id`, `label`, `baseUrl`, optional `sidecarUrl`, and `kind`.
3. Add pure helpers for `parseModelRef`, `formatModelRef`, and legacy bare-id resolution.
4. Add `LLAMA_PROVIDERS_PATH` to both server and coder config.
5. Implement server and coder registry loaders that read the shared file and synthesize one legacy provider from `LLAMA_SWAP_URL` and optional `LLAMA_SIDECAR_URL` when the file is absent.
6. Add a checked example config with `sam-desktop` and `embedding`.
**Parallel-safe split.**
- Agent A: contracts schema + helpers + exports
- Agent B: server config + server loader after A merges
- Agent C: coder config + coder loader after A merges
**Exit criteria.**
- Both apps can start with only legacy env vars.
- Both apps can also start with a real `llama-providers.json`.
- Pure helper tests cover `provider/model` and bare fallback.
### W2. Server Catalog and Routing
**Goal.** Replace server-side routing heuristics with one provider-aware resolver.
**Files and seams.**
- `apps/server/src/routes/models.ts`
- `apps/server/src/services/inference/provider.ts`
- `apps/server/src/types/api.ts`
- `apps/web/src/api/types.ts`
- `apps/web/src/api/client.ts`
- relevant provider tests
**Implement.**
1. Refactor `/api/models` to return provider-grouped inventory only, with every `ModelInfo.id` already composite ([D-4](artifacts/implementation-decision-log.md#d-4-favorites-are-a-settings-backed-user-view-not-a-server-catalog-section)).
2. Build one server resolver that answers:
- provider identity
- upstream base URL
- sidecar eligibility
- final wire model id
- DeepSeek special handling
3. Make both `upstreamModel()` and `resolveModelEndpoint()` call that same resolver.
4. Remove the current “prefix means provider” logic as the authority; keep compatibility only at the bare-id fallback layer.
**Parallel-safe split.**
- First branch: resolver and tests
- Second branch: `/api/models` contract change plus client type updates
**Exit criteria.**
- `embedding/deepseek-r1-qwen3-8b` routes as local `embedding`, not as DeepSeek cloud.
- `embedding/*` never uses a sidecar.
- Legacy bare models still resolve through the configured default provider.
### W3. Server Downstream Consumers
**Goal.** Remove the remaining single-endpoint assumptions in server call sites.
**Files and seams.**
- `apps/server/src/services/model-context.ts`
- `apps/server/src/index.ts`
- `apps/server/src/services/compaction.ts`
- `apps/server/src/services/task-model.ts`
- `apps/server/src/services/inference/error-handler.ts`
- `apps/server/src/services/__tests__/model-context.test.ts`
**Implement.**
1. Change `model-context` to key caches by composite model id, not bare wire id.
2. Move context lookup from one process-wide `LLAMA_SWAP_URL` assumption to the provider-aware resolver.
3. Update compaction to resolve the right upstream before summary calls.
4. Update task-model fallback resolution to use the same parsed model ref path as inference.
5. Audit remaining server `LLAMA_SWAP_URL` call sites and either migrate them or explicitly mark them legacy-only.
**Parallel-safe split.**
- Agent A: `model-context.ts` + tests
- Agent B: `compaction.ts` and `task-model.ts` after A lands, because both depend on the new resolver contract
**Exit criteria.**
- Two providers serving the same wire model name do not share context cache entries.
- Existing sessions with bare models still load context and complete turns.
- No server path doing local inference bypasses the shared resolver.
### W4. BooChat Favorites and Grouped Picker
**Goal.** Stabilize the end-user selection model on BooChat before deeper coding surfaces adopt it.
**Files and seams.**
- `apps/server/src/routes/settings.ts`
- `apps/server/src/services/settings.ts` or equivalent settings helper path
- `apps/web/src/components/ModelPicker.tsx`
- `apps/web/src/lib/model-label.ts`
- `apps/web/src/api/client.ts`
- `apps/web/src/api/types.ts`
- `apps/web/src/pages/Session.tsx`
**Implement.**
1. Add `favorite_models: string[]` handling in settings.
2. Normalize malformed and duplicate entries on write.
3. In the client, derive:
- Favorites section first
- then one section per provider
- hide unavailable favorites without deleting them
4. Keep a favorited model visible in both Favorites and its provider section.
5. Make new model selections write composite ids.
**Parallel-safe split.**
- Server settings branch first
- Web picker branch second against the new contract
**Exit criteria.**
- Favorites persist across refresh.
- Removing a provider from live inventory hides its favorites without deleting the stored ids.
- A new chat selection stores `provider/model`.
### W5. Native BooCoder Parity
**Goal.** Move native `boocode` local model usage onto the shared provider model before touching `opencode`.
**Files and seams.**
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/coder/src/services/dispatcher.ts`
- `apps/web/src/components/AgentComposerBar.tsx`
- `apps/web/src/lib/model-label.ts`
- `packages/contracts/src/provider-snapshot.ts` only if the snapshot contract truly needs new metadata
**Implement.**
1. Make the native `boocode` provider expose composite local model ids from the shared registry.
2. Update native dispatch to resolve composite local ids through the shared registry.
3. Render grouped local models for the native `boocode` path in `AgentComposerBar`.
4. If the current `opencode` snapshot path would falsely advertise multi-provider local models before W7, hide that advertising now rather than leave the UI misleading ([D-5](artifacts/implementation-decision-log.md#d-5-native-boocode-parity-ships-before-opencode-parity)).
**Parallel-safe split.**
- Coder backend first
- AgentComposerBar UI second
**Exit criteria.**
- Native BooCoder tasks can run against at least two distinct local providers.
- The native picker behavior matches BooChats grouped/favorites mental model closely enough that a user is not learning a second local-model identity system.
- `opencode` is not yet claiming parity it does not have.
### W6. Arena Parity
**Goal.** Make Arena consume the same local-provider substrate instead of one live llama-swap list.
**Files and seams.**
- `apps/coder/src/services/arena-model-call.ts`
- `apps/coder/src/services/arena-analyzer.ts`
- `apps/coder/src/services/arena-runner.ts`
- `apps/coder/src/index.ts`
- arena tests
**Implement.**
1. Replace direct `LLAMA_SWAP_URL` local calls with the provider-aware resolver.
2. Build Arenas local-model set from the shared provider registry, not one fetched list.
3. Preserve ADR-0001s two-lane scheduling rule; provider awareness changes local identity, not lane semantics.
4. Keep bare-id compatibility only where old data needs it.
**Parallel-safe split.**
- Agent A: `arena-model-call.ts` + analyzer updates
- Agent B: local-model set construction in `index.ts` + runner adjustments after A settles the model identity contract
**Exit criteria.**
- Arena can run local contestants from more than one machine.
- Local-vs-cloud classification still works.
- ADR-0001 behavior remains intact.
### W7. External-Agent Parity
**Goal.** Give `opencode` a real multi-provider local-model story instead of collapsing everything back to `llama-swap/<model>`.
**Files and seams.**
- `apps/coder/src/services/backends/opencode-server.ts`
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/coder/src/services/agent-probe.ts`
- new BooCoder-hosted gateway route or service module under `apps/coder/src/services/`
- host config generation or sync for opencode local models
**Implement.**
1. Add a BooCoder-hosted OpenAI-compatible local gateway that accepts provider-preserving model ids and routes them to the correct local provider ([D-6](artifacts/implementation-decision-log.md#d-6-opencode-parity-uses-a-boocode-local-gateway-not-a-string-rewrite)).
2. Use one opencode-facing provider namespace such as `boocode-local`, where the opencode `providerID` is stable and the `modelID` is the inner composite id like `sam-desktop/qwen3.6-35b`.
3. Update provider snapshot merging so `opencode` advertises `boocode-local/<provider/model>` rather than `llama-swap/<model>`.
4. Update the opencode bridge parser and config sync so duplicate model names remain distinguishable end to end.
5. Add smoke coverage for two providers serving the same wire model name.
**Parallel-safe split.**
- Gateway branch first
- Snapshot/config-sync branch second
- Final opencode backend/parser adjustments last
**Exit criteria.**
- `opencode` can target two local providers with overlapping wire model names and hit the correct machine both times.
- No path rewrites `provider/model` down to plain `llama-swap/model`.
### W8. Operations and Final Verification
**Goal.** End with a repeatable operator workflow, not just a working dev branch.
**Files and seams.**
- `data/llama-providers.example.json`
- operator docs under `docs/`
- OpenSpec tasks/status notes as needed
**Implement.**
1. Document the add-a-machine flow for config-managed local providers.
2. Document the smoke matrix for:
- single legacy provider fallback
- two local providers
- duplicate model names across two providers
- DeepSeek enabled
- `opencode` local parity
3. Record the final interface BooControl should consume: provider registry plus composite ids, not raw host env vars.
**Exit criteria.**
- A third machine can be added by editing config and running the smoke matrix.
- The implementation docs name the exact runtime contract BooControl should build on.
## Verification Plan
- `pnpm -C packages/contracts build`
- `pnpm -C apps/server test`
- `pnpm -C apps/server build`
- `pnpm -C apps/coder test`
- `pnpm -C apps/coder build`
- `npx tsc -p apps/web/tsconfig.app.json --noEmit`
Add targeted tests as the work lands:
- model-ref parse/format and bare-id fallback
- provider-aware routing and DeepSeek collision cases
- context-cache isolation for duplicate model names
- favorites hide-not-delete behavior
- provider snapshot and opencode bridge behavior
- arena local-model classification across multiple providers
## Main Risks
- The W2 contract change to `/api/models` and W5 snapshot changes can drift across apps if contract parity is edited piecemeal. Follow the cross-app contract standard in [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md) and land contract-first branches.
- W7 is the hardest seam. If the gateway is skipped and the old string rewrite is kept, the feature will look complete in UI while still routing the wrong machine.
- `model-context.ts` is a hidden correctness seam. If cache keys stay bare, duplicate model names will mis-share context limits and compaction behavior even after routing is fixed.
## Deferred
- BooControl itself
- picker search and richer filtering
- manual favorite reordering
- host health badges in pickers
## Definition of Done
- BooChat, native BooCoder, Arena, and `opencode` all support provider-aware local models end to end.
- Legacy bare ids remain readable.
- Two providers can expose the same wire model name without ambiguity.
- Adding another local machine is documented and smoke-tested.
- BooControl can start later without inventing a second provider registry.

View File

@@ -0,0 +1,295 @@
# Research: Integrating two named llama-swap providers ("Sam-desktop", "embedding") with provider-grouped model dropdowns and per-model favorites in BooChat and BooCoder
Question: BooCode currently talks to exactly one llama-swap endpoint. How should a second named provider ("embedding", `100.90.172.55:8411`) be added alongside the renamed existing one ("Sam-desktop", `100.101.41.16:8401`), integrated into both BooChat and BooCoder, with the model dropdown grouped per provider and a favorite button per model (Favorites section listed first)?
Evidence mode: **strict** (default — every recommendation-bearing claim is corroborated or explicitly caveated).
## Summary
Both machines can be added to BooCode as named providers, and the right way is to give BooCode a small provider registry (a name and base URL per machine) and to store selected models as a "provider/model" pair instead of a bare name. Bare names cannot work here: five models exist on both machines under identical names today, and the configured default model has already drifted out of the live list once — so favorites and routing keyed by name alone would be ambiguous and fragile. The dropdown should follow the pattern proven in VS Code's model picker: a Favorites section on top, then one section per provider (Sam-desktop first, then embedding), a star on every row, favorited models staying visible in their provider section, and favorites that are hidden — never deleted — when a machine is offline.
The adversarial validation pass confirmed the direction but showed the change is wider than the obvious spots: chat compaction, context-window lookup, arena battles, the coder's opencode dispatch, and the sidecar routing default all silently assume a single endpoint and need the same provider-resolution change. Two extra hazards were found in the live data: a model on the embedding host literally named `deepseek-r1-qwen3-8b` trips BooCode's "starts with deepseek-" cloud-routing heuristic, and the always-on sidecar default route would swallow embedding-bound requests. The embedding host does **not** need its own llama-sidecar — but sidecar routing must become a Sam-desktop-only attribute.
Well-corroborated: live data from both hosts, direct code evidence, and multiple independent web sources agree; validation expanded the implementation scope but did not overturn the choice.
- **Confidence:** High
## Research Results
### What exists today (codebase — current-state anchor)
BooCode's entire inference surface assumes one llama-swap endpoint, configured as `LLAMA_SWAP_URL=http://100.101.41.16:8401` with `DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4` (A58). The single-endpoint assumption is hard-coded in at least nine places:
1. `GET /api/models` fetches only `{LLAMA_SWAP_URL}/v1/models` (plus DeepSeek cloud when `DEEPSEEK_API_KEY` is set) and returns a flat `ModelInfo[]` with no provider tag (A59).
2. `upstreamModel()` routes by string heuristics: model IDs starting `deepseek-` go to the DeepSeek cloud API; agents with `llama_extra_args` go to the sidecar; **and when `LLAMA_SIDECAR_URL` is configured at all — which it is in docker-compose — every remaining request routes through the sidecar by default**, falling back to llama-swap only when no sidecar is configured (A60). The provider for each base URL is a cached AI-SDK `createOpenAICompatible` instance.
3. `resolveModelEndpoint()` (used by compaction and task-model for non-streaming calls) returns `LLAMA_SWAP_URL` for every non-DeepSeek model (A60, A67).
4. `model-context.ts` fetches `{LLAMA_SWAP_URL}/upstream/<model>/props` for context windows, with a **no-TTL positive cache keyed by the raw model string**, and a `deepseek-` prefix guard that short-circuits to a static 131,072 context without calling any upstream (A66).
5. `task-model.ts` (auto-naming, summaries) falls back through `FAST_MODEL → chat model → DEFAULT_MODEL` against the single URL (A68).
6. Arena battles call `{LLAMA_SWAP_URL}/v1/chat/completions` directly with no routing abstraction at all (A69).
7. The coder's provider snapshot fetches the single llama-swap list and prefixes every ID with `llama-swap/` (A63); its dispatcher prefixes any bare (slash-less) model ID with `llama-swap/` before opencode dispatch, and passes any ID already containing `/` through unchanged (A64).
8. Model IDs persist as bare strings: `sessions.model TEXT NOT NULL`, `chats.model TEXT` nullable, validated only as a 1200-char string (A65).
9. The BooChat dropdown (`ModelPicker.tsx`) and the BooCoder picker (`CompactPicker` inside `AgentComposerBar.tsx`) are flat lists with no grouping, search, or favorites; the coder picker persists per-provider preferences in browser localStorage, while BooChat model choice is server-persisted on the session row (A61, A70). Display code already strips `llama-swap/`-style prefixes when rendering model chips (A71). No favorites/pinning mechanism exists anywhere; the `settings` table is a key-value JSONB store currently holding `default_model` and theme keys (A65).
The coder's runtime provider config (`data/coder-providers.json`) has no `baseUrl` field — there is no way to register a second llama-swap endpoint today (A72).
### What the two hosts actually serve (provided material, retrieved live 2026-06-10)
- **embedding** (`100.90.172.55:8411`, Linux, P104-100 8GB Pascal GPU): 39 models, skewed small — gemma-3-270m through gemma-4-12b, the LFM2.5 family, granite-4.1-3b/8b, qwen3.5-0.8b/4b/9b, qwopus3.5 family, `deepseek-r1-qwen3-8b`, a reranker, extraction models (A54). Its llama-swap config is hand-tuned per model (flash-attn/KV-quant choices for Pascal, ttl 1800), with llama.cpp built from source on the box (A56).
- **Sam-desktop** (`100.101.41.16:8401`, Windows): 21 models, skewed large — qwen3.6-35b-a3b/27b, qwopus3.6 family, granite-4.1-30b, mellum2-12b, nemotron-cascade-2-30b-a3b, north-mini-code, etc. Served by `D:\llama-server` (llama.cpp CUDA build b9591) behind `D:\llama-swap` (llama-swap v224), models in `D:\models`; a `D:\llama-sidecar` directory backs the existing sidecar at `:8402` (A55, A57).
Three load-bearing facts fall out of the live inventories:
- **Five model IDs exist on both hosts**: `granite-4.1-8b`, `negentropy-4.7-9b`, `qwen3.5-9b`, `qwen3.5-9b-deepseek-v4`, `qwopus3.5-9b-coder` (A54, A55). Bare-ID favorites or routing are therefore ambiguous from day one.
- **The configured `DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4` is not in Sam-desktop's current model list** (closest: `qwen3.6-35b-a3b`) — model IDs already churn in practice, so favorites must tolerate stale references (A55, A58).
- **`deepseek-r1-qwen3-8b` on the embedding host collides with BooCode's `deepseek-` heuristics**: with `DEEPSEEK_API_KEY` set it would be routed to the DeepSeek cloud API, and the context-window guard returns a fake 131k context on the name prefix alone regardless (A54, A60, A66).
### How llama-swap identifies models (web, corroborated)
llama-swap model IDs are exactly the YAML keys in its `config.yaml`; `/v1/models` can additionally carry optional per-model `name`, `description`, and arbitrary `metadata` from config — fields neither of Sam's hosts currently populates (A1A4, A54, A55). llama-swap has **no instance-identity field**: two instances are distinguishable only by host:port (A3). `/running` reports load state per model (A1, A12). Peer federation exists (one llama-swap aggregating another), but peer-served models surface as `"peer-name: model-name"` IDs [single-source: A6] and same-ID collisions resolve silently to the lexicographically-first peer (A5) — and, decisive without any web source, BooCode would still see one flat list with no native grouping while the two hosts' uptime becomes coupled. Standalone llama.cpp `llama-server` defaults its `/v1/models` ID to the model file path unless `--alias` is set (A8, A9) — relevant only if a host ever bypasses llama-swap.
### How mature clients solve exactly this (web, corroborated)
Every major OpenAI-compatible client library handles multiple same-protocol providers with **separate named provider instances, each with its own baseURL, namespaced in the client's registry as `provider:model` / `provider/model`** — the model ID actually sent on the wire to each backend stays the bare upstream ID (Vercel AI SDK provider registry: A13, A14; LiteLLM model_list: A15, A16). BooCode already uses the AI SDK's `createOpenAICompatible` (A60) and the coder already namespaces with a `llama-swap/` prefix (A63, A64), so this pattern is an extension of existing conventions, not a new idiom.
### Dropdown + favorites prior art (web)
The closest shipped implementation of the requested UX is VS Code's model picker: models grouped by provider, a pin icon revealed on hover, pinned models lifted into a dedicated top section in stable insertion order, **while remaining visible in their provider group** (display copy, not move) (A45, A46). Cherry Studio independently demonstrates the key-collision lesson: its model identity is the composite `{id, provider}` precisely so two providers serving the same model name don't collide (A35, A36) [third-party code reference; unverifiable from here — supporting color only, see V8]. Open WebUI documents the two pitfalls to avoid: favorites keyed by bare model ID become ambiguous the moment two connections serve the same name (A27), and its stale-pin cleanup **permanently deletes** pins when a backend is temporarily down (A23) — the correct behavior is to hide unavailable favorites and restore them when the host returns. LibreChat groups via admin-configured YAML and added pinning in v0.8.5 (A28, A29). Jan, Chatbox, SillyTavern, Continue.dev, BigAGI, and LM Studio offer weaker or no equivalents (A32A34, A38A44, A47A52) — none contradicts the VS Code pattern.
### Does embedding need a llama-sidecar? No.
The llama-sidecar is a Go daemon on Sam-desktop providing a per-agent llama-server process pool so agents can carry `llama_extra_args` (cache quant, spec decoding, slot save) injected via an `X-Agent-Flags` header (A60, A74). The embedding host needs none of that: its per-model tuning is baked directly into its llama-swap `config.yaml` (A56), and no per-agent flag injection applies to it. **However**, `resolveRoute` currently makes the sidecar the default route for *all* non-DeepSeek inference whenever `LLAMA_SIDECAR_URL` is set (A60) — so under the multi-provider design, sidecar routing must become an attribute of the Sam-desktop provider entry (e.g. optional `sidecarUrl` per provider), not a global default; otherwise requests for embedding-hosted models would be sent to a sidecar that only manages Sam-desktop processes.
### Openspec conventions for the follow-up plan (codebase)
Per-batch docs land in `openspec/changes/<slug>/` with `proposal.md` (why + scope), `tasks.md` (numbered/checkbox action list), and optional `design.md` (architecture/data-model decisions); slugs are lowercase-hyphenated from the batch title (A73). This feature is a natural three-file batch — the provider registry + routing is design-heavy, so `design.md` is warranted.
## Options to Consider
### O1: Named provider registry with composite model IDs (`<provider>/<model>`)
- **What it is:** BooCode config gains a provider list (`{ name, baseUrl, sidecarUrl? }` per entry — "sam-desktop" and "embedding"). Models are stored and selected as `sam-desktop/qwen3.6-35b-a3b`, `embedding/gemma-4-12b`. `/api/models` returns provider-tagged groups; one routing resolver (provider prefix → baseURL, bare wire ID) replaces every `LLAMA_SWAP_URL` hardcode; bare legacy IDs fall back to the default provider (sam-desktop). Favorites, caches, and attribution all key on the composite ID.
- **Trade-offs:** Touches every call site that assumes one endpoint (the nine sites above — see Validation for the full list); needs a deliberate legacy-bare-ID fallback for existing session/chat rows and the seeded `default_model`; the coder's opencode namespace (`llama-swap/`) needs an explicit translation rule. In exchange: no DB schema change for model columns, no llama-swap config changes on either host, matches the AI-SDK idiom BooCode already uses and the coder's existing prefix convention, and makes the `deepseek-` heuristic unnecessary for prefixed IDs.
- **Rests on:** (A13, A14, A15, A16) for the pattern; (A54, A55) for the collision necessity; (A60, A63, A64) for fit with existing code.
- **Evidence status:** corroborated.
### O2: Bare model IDs plus a separate `provider` field everywhere
- **What it is:** Keep model strings as-is and add a `provider` column/field through `sessions`, `chats`, WS frames, `ModelInfo`, `ProviderModel`, and every read path.
- **Trade-offs:** Avoids string munging and display-time prefix stripping, but is strictly more invasive: two schema migrations, a `WsFrameSchema` change rebuilt through `@boocode/contracts`, and every consumer updated in lockstep — while favorites still need a composite key anyway. Higher blast radius for the same outcome.
- **Rests on:** (A65, A62) for the touched surfaces.
- **Evidence status:** corroborated (codebase-derived).
### O3: llama-swap peer federation (Sam-desktop aggregates embedding as a peer)
- **What it is:** Configure embedding as a `peers:` entry in Sam-desktop's llama-swap; BooCode keeps a single endpoint.
- **Trade-offs:** Rejected on codebase-observable grounds: BooCode would still see one flat list (no native named grouping — the feature's whole point), the two hosts' availability becomes coupled, and it requires operational changes on a host outside this repo. Additionally, peer-served model IDs surface as `"peer-name: model-name"` [single-source: A6] with silent first-lexicographic collision resolution (A5).
- **Rests on:** (A5, A6) plus codebase observation (A59, A61).
- **Evidence status:** rejection corroborated by codebase facts; the peer ID-format detail is single-source (caveated) and not load-bearing.
### O4: External aggregator proxy (LiteLLM) in front of both hosts
- **What it is:** A LiteLLM proxy with a `model_list` mapping unique aliases to each host; BooCode keeps one endpoint.
- **Trade-offs:** Proven pattern (A15, A16) but adds a third always-on service with a manually-maintained catalog (no auto-discovery from `/v1/models`), an extra network hop, and still no provider grouping signal unless encoded in alias naming conventions. Overweight for a single-user self-hosted system.
- **Rests on:** (A15, A16).
- **Evidence status:** corroborated.
### Sub-decision — favorites persistence
- **O5a: Server-side, in the `settings` table** (e.g. `favorite_models: string[]` of composite IDs). Survives browsers/devices — and multi-device use is real here (the repo's own docs describe side-by-side iPhone debugging), matching how BooChat model choice is already server-persisted on the session row. Costs a PATCH per star toggle and needs a "hide stale, never delete" rule (A23) plus acceptance that stale composite keys linger until manually unfavorited.
- **O5b: Browser localStorage**, extending the coder's existing `boocode.coder.agent-prefs` pattern (A70). Zero API surface, but per-device, per-browser, and split across the two UIs.
- **Evidence status:** both corroborated; the cross-device argument for O5a is codebase-derived inference from documented usage, not a measured requirement.
## Recommendation
- **Recommendation:** **O1** — named provider registry with `<provider>/<model>` composite IDs — combined with the VS Code-pattern dropdown (Favorites on top in stable insertion order, then Sam-desktop's models, then embedding's; star toggle per row; favorited models remain listed in their provider group) and **O5a** server-side favorites keyed by composite ID. Non-negotiable design constraints carried in from validation:
1. Prefix-strip **only** at wire-URL construction; caches (notably `model-context.ts`'s no-TTL positive cache) key on the **full composite ID**, or the five name-collided models cross-pollute context windows between hosts (V7).
2. The coder dispatcher must translate composite prefixes for opencode (map the default provider to the existing `llama-swap/` namespace, or register new opencode providers) — the current pass-through of any slash-containing ID would hand opencode an unknown provider key (V1).
3. Every single-endpoint call site is in scope: `provider.ts` (`upstreamModel` + `resolveModelEndpoint`), `models.ts`, `model-context.ts` (including its `deepseek-` static-context guard), `compaction.ts`, `task-model.ts`, `arena-model-call.ts` (+ arena callers, coder-side config), coder `provider-snapshot.ts`, coder `dispatcher.ts` (V2V4, V9).
4. Sidecar routing becomes a Sam-desktop provider attribute, not the global default route — embedding needs no sidecar (A60, A74; post-validation verification).
5. Bare legacy IDs (existing rows, seeded `default_model`) resolve to the default provider indefinitely — new sessions inherit a bare seeded default until settings are migrated, so this is a permanent fallback, not a one-time migration (V2).
6. Favorites that reference unavailable models are hidden, never auto-deleted (A23).
- **Evidence basis:** The option choice rests on corroborated evidence throughout: the multi-provider client pattern (A13A16), the live collision and churn data from both hosts (A54, A55, A58 — provided material, independently re-checkable), and codebase fit (A60, A63, A64). The UX pattern rests on corroborated documentation (A45, A46) with the Open WebUI pitfalls as corroborated counter-evidence (A23, A27); the Cherry Studio and VS Code *code-level* references are unverifiable third-party color (V8) and nothing rests on them alone. The single-source peer-ID format (A6) supports only the rejection of O3, which stands independently on codebase facts. The cross-device justification for O5a is codebase-derived inference (documented multi-device usage), explicitly not measured evidence.
## Validation
Adversarial validation attacked the evidence, framing, recommendation, and gathering integrity. Findings (condensed; all code-verified by the validator in this repo):
### V1: "O1 extends the coder's prefix convention" was overstated
- **Strategy:** Challenge the Recommendation
- **Investigation:** `dispatcher.ts:1006-1011`, coder CLAUDE.md, `provider-snapshot.ts:66-72`.
- **Result:** Refuted as originally framed — a stored `sam-desktop/<model>` passes the dispatcher's slash-check unchanged and reaches opencode as an unknown provider key; `llama-swap/` is hardcoded in ≥4 coder locations.
- **Impact:** Recommendation now mandates an explicit opencode namespace-translation rule (constraint 2).
### V2: The bare-ID legacy fallback was asserted, not designed
- **Strategy:** Challenge the Recommendation
- **Investigation:** `provider.ts:115-135`, `stream-phase.ts:110`, `sessions.ts:113-117`, `schema.sql:222`, `model-context.ts:77`.
- **Result:** Partially refuted — architecturally plausible but unimplemented; prefixed IDs would 404 the `/upstream/<model>/props` fetch and break context/compaction display; the seeded bare `default_model` makes the fallback permanent, not migratory.
- **Impact:** Constraints 1, 3, 5 added.
### V3: The `deepseek-` hazard is wider than routing
- **Strategy:** Challenge the Evidence
- **Investigation:** `model-context.ts:40-49`, `provider.ts:98`, `compaction.ts:531`.
- **Result:** Confirmed with added scope — the context guard fires on the name prefix alone, returning a fake 131k context for embedding's `deepseek-r1-qwen3-8b` even after routing is fixed.
- **Impact:** `model-context.ts` guard added to the touch-list (constraint 3).
### V4: `compaction.ts` is a missed hardcode site
- **Strategy:** Challenge the Evidence
- **Investigation:** `compaction.ts:351-357``resolveModelEndpoint` (`provider.ts:139-157`).
- **Result:** Refuted the original C9 list as incomplete — compaction summarization calls would go to the wrong host for embedding models.
- **Impact:** Added to the touch-list (A67, constraint 3).
### V5: Server-side favorites needed justification against the coder's localStorage pattern
- **Strategy:** Challenge the Assumptions
- **Investigation:** `AgentComposerBar.tsx:33-52`, `routes/settings.ts`, root CLAUDE.md auth model.
- **Result:** Partially refuted — the Open WebUI bug distinguishes auto-delete vs hide, not server vs client storage; the original justification conflated the two.
- **Impact:** O5a/O5b reframed as an explicit sub-decision; O5a retained on the cross-device argument, labeled as inference.
### V6: O3's rejection over-relied on a single-source claim
- **Strategy:** Challenge the Evidence-Gathering Integrity
- **Result:** Confirmed with a provenance note — O3 is independently rejectable from codebase facts; the stale GitHub issue is demoted to supporting color.
- **Impact:** O3 rejection rewritten to lead with codebase-observable reasons.
### V7: Composite IDs + naive prefix-stripping would poison the no-TTL context cache
- **Strategy:** Challenge the Recommendation
- **Investigation:** `model-context.ts:9, 26-29, 77-100`; the five cross-host duplicate IDs.
- **Result:** Refuted the unstated design — stripping before the cache key shares entries across providers with different real context windows, permanently until restart.
- **Impact:** Constraint 1 (composite cache key, strip only at URL construction) — the most subtle required design rule.
### V8: Third-party code references (Cherry Studio, VS Code PR) are unverifiable
- **Strategy:** Challenge the Evidence-Gathering Integrity
- **Result:** Partially refuted their evidentiary weight — retained as color; the composite-key argument stands on BooCode's own conventions and the live collision data.
- **Impact:** Evidence basis re-worded; nothing rests on those references alone.
### V9: Arena is the most exposed hardcode
- **Strategy:** Challenge the Evidence
- **Investigation:** `arena-model-call.ts:16-28`, `arena-analyzer.ts:90`.
- **Result:** Confirmed with elevated severity — raw fetch, no abstraction, lives in `apps/coder` with its own config type (cannot reuse the server's resolver as-is).
- **Impact:** Listed as separate coder-side scope (constraint 3).
### Adjustments Made
The recommendation survived but was rewritten: the implementation constraints (composite cache keys, opencode namespace translation, the full nine-site touch-list, permanent bare-ID fallback, hidden-not-deleted favorites) were folded into the Recommendation itself; O3's rejection was re-grounded in codebase facts; the favorites-persistence choice was reframed as an explicit sub-decision; unverifiable third-party code references were demoted to supporting color. Post-validation, the orchestrator additionally verified in `provider.ts` that the sidecar is the *default* route whenever `LLAMA_SIDECAR_URL` is set — adding constraint 4 (sidecar becomes a per-provider attribute; embedding needs none).
### Confidence Assessment
- **Confidence:** High — for the option choice. The validator rated the pre-adjustment synthesis Medium because the implementation scope was understated; that scope is now enumerated above, and no finding challenged the direction (its own words: "architecturally sound given the existing `llama-swap/` convention").
- **Remaining Risks:** (1) The opencode-side translation (V1) may also require host-side `~/.config/opencode/opencode.json` changes — outside this repo. (2) Stale favorite keys accumulate in `settings` with no cleanup mechanism by design (hide-don't-delete); acceptable for single-user but unbounded. (3) The exact `/running` JSON envelope and llama-swap peer aggregation details remain single-source — neither is load-bearing. (4) The five duplicate-ID models make any partial rollout (one call site migrated, another not) actively dangerous; the routing resolver should land as one batch.
## Sources
| ID | Source | Link / location | Retrieved | Trust class | Summary (one line) | Evidence status |
|---|---|---|---|---|---|---|
| A1 | llama-swap README | github.com/mostlygeek/llama-swap | 2026-06-10 | web | Proxy hot-swapping local inference servers; documents /v1/models, /running, /upstream, /health; v224 current | corroborated by A2, A3, A12 |
| A2 | llama-swap configuration.md | github.com/mostlygeek/llama-swap/blob/main/docs/configuration.md | 2026-06-10 | web | Model IDs are YAML keys; per-model name/description/aliases/metadata/ttl/useModelName; includeAliasesInList | corroborated by A3, A4 |
| A3 | llama-swap config-schema.json | github.com/mostlygeek/llama-swap/blob/main/config-schema.json | 2026-06-10 | web | Authoritative config schema; peers section; **no instance-identity field at any level** | corroborated by A2, A4 |
| A4 | llama-swap config.example.yaml | github.com/mostlygeek/llama-swap/blob/main/config.example.yaml | 2026-06-10 | web | Annotated example: aliases, useModelName, metadata, groups, peers | corroborated by A2, A3 |
| A5 | DeepWiki: llama-swap peers | deepwiki.com/mostlygeek/llama-swap/3.7-peer-configuration | 2026-06-10 | web | Duplicate peer model IDs route to first-lexicographic peer with only a warning | corroborated by A6 (collision); single source on aggregation detail |
| A6 | llama-swap issue #539 | github.com/mostlygeek/llama-swap/issues/539 | 2026-06-10 | web | Peer models surface as "peer-name: model-name" IDs; stale, unresolved | single source (caveated) |
| A7 | llama-swap issue #538 | github.com/mostlygeek/llama-swap/issues/538 | 2026-06-10 | web | Aliases hidden from /v1/models unless includeAliasesInList | corroborated by A2, A3 |
| A8 | llama.cpp server README | github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md | 2026-06-10 | web | /v1/models id defaults to file path; --alias overrides; meta block fields | corroborated by A9, A10 |
| A9 | llama.cpp discussion #8547 | github.com/ggml-org/llama.cpp/discussions/8547 | 2026-06-10 | web | Confirms file-path default id; --override-kv doesn't change API id | corroborated by A8 |
| A10 | llama.cpp issue #17860 | github.com/ggml-org/llama.cpp/issues/17860 | 2026-06-10 | web | Only one --alias per llama-server today | corroborated by A8 |
| A11 | LM4eu/llama-swap Go pkg docs | pkg.go.dev/github.com/LM4eu/llama-swap/proxy | 2026-06-10 | web | Model struct {Id, Name, Description, State, Unlisted}; fork, not upstream | single source (caveated) |
| A12 | glukhov.org llama-swap quickstart | glukhov.org/llm-hosting/llama-swap/ | 2026-06-10 | web | /running state values; alias listing behavior | corroborated by A1, A2 |
| A13 | Vercel AI SDK provider management | ai-sdk.dev/docs/ai-sdk-core/provider-management | 2026-06-10 | web | Registry namespaces models as providerId:modelId; per-provider baseURL | corroborated by A14 |
| A14 | Vercel AI SDK OpenAI-compatible providers | ai-sdk.dev/providers/openai-compatible-providers | 2026-06-10 | web | createOpenAICompatible takes name+baseURL per provider; wire model ID unchanged | corroborated by A13 |
| A15 | LiteLLM OpenAI-compatible docs | docs.litellm.ai/docs/providers/openai_compatible | 2026-06-10 | web | Per-entry api_base; aliasing decouples client name from upstream name | corroborated by A16 |
| A16 | McDermott: Centralizing LLMs with LiteLLM | robert-mcdermott.medium.com/...9874563f3062 | 2026-06-10 | web | model_list with unique model_name per upstream resolves collisions | corroborated by A15 |
| A17 | DeepWiki: llama-swap groups | deepwiki.com/mostlygeek/llama-swap/3.4-groups-and-swapping-policies | 2026-06-10 | web | Groups/matrix control concurrency, not model IDs | corroborated by A2A4 |
| A18 | llama-swap releases | github.com/mostlygeek/llama-swap/releases | 2026-06-10 | web | v219v224 changed routing/perf, not /v1/models schema | single source (caveated) |
| A19 | Open WebUI discussion #3443 | github.com/open-webui/open-webui/discussions/3443 | 2026-06-10 | web | Pin-in-dropdown feature request; drag-reorder workaround breaks | corroborated by A21, A23 |
| A20 | Open WebUI discussion #5902 | github.com/open-webui/open-webui/discussions/5902 | 2026-06-10 | web | Filtering 70+ models; whitelist vs hide patterns | corroborated by A19 |
| A21 | Open WebUI env config reference | docs.openwebui.com/reference/env-configuration/ | 2026-06-10 | web | DEFAULT_PINNED_MODELS; settings.pinnedModels sorts pinned to top | corroborated by A22, A23 |
| A22 | Open WebUI database schema | docs.openwebui.com/reference/database-schema/ | 2026-06-10 | web | Pins live in user.settings JSON, keyed by **bare model ID** | corroborated by A21 |
| A23 | Open WebUI discussion #23656 | github.com/open-webui/open-webui/discussions/23656 | 2026-06-10 | web | Stale-pin cleanup permanently deletes pins during backend downtime | corroborated by A21, A53 |
| A24 | Open WebUI discussion #14854 | github.com/open-webui/open-webui/discussions/14854 | 2026-06-10 | web | Unpin buried in three-dot menu; discoverability failure | corroborated by A21 |
| A25 | Open WebUI issue #19183 | github.com/open-webui/open-webui/issues/19183 | 2026-06-10 | web | Local/External/All tabs + tag chips + Fuse.js search in selector | corroborated by A26 |
| A26 | Open WebUI discussion #21502 | github.com/open-webui/open-webui/discussions/21502 | 2026-06-10 | web | Flat select unusable at OpenRouter scale; optgroup/search proposals | corroborated by A25 |
| A27 | Open WebUI discussion #4495 | github.com/open-webui/open-webui/discussions/4495 | 2026-06-10 | web | Same-named models from two connections are indistinguishable (bare-ID failure) | corroborated by A25, A26 |
| A28 | LibreChat model specs docs | librechat.ai/docs/configuration/librechat_yaml/object_structure/model_specs | 2026-06-10 | web | Admin YAML `group` field creates named collapsible sections | corroborated by A29 |
| A29 | LibreChat v0.8.5 changelogs | librechat.ai/changelog/v0.8.5 | 2026-06-10 | web | Pin support for model specs added (PR #11219) | corroborated by A30; persistence detail single-source |
| A30 | LibreChat discussion #11044 | github.com/danny-avila/LibreChat/discussions/11044 | 2026-06-10 | web | Pinning exists; preset-active confusion | corroborated by A29 |
| A31 | DeepWiki: LibreChat DB models | deepwiki.com/danny-avila/LibreChat/7.1-database-models | 2026-06-10 | web | MongoDB/Mongoose; pinned-spec field name unconfirmed | single source (caveated) |
| A32 | Jan v0.6.9 changelog | jan.ai/changelog/2025-08-28-image-support | 2026-06-10 | web | "Favorite models" shipped; no UI detail | single source (caveated) |
| A33 | Jan manage-models docs | jan.ai/docs/desktop/manage-models | 2026-06-10 | web | Organized by source/quantization tier, not provider | corroborated by A32 |
| A34 | Jan data-folder docs | jan.ai/docs/desktop/data-folder | 2026-06-10 | web | Settings in local JSON files | corroborated by A32 |
| A35 | DeepWiki: Cherry Studio models | deepwiki.com/CherryHQ/cherry-studio/5.3-model-configuration-and-capabilities | 2026-06-10 | web | Provider-grouped UI; getModelUniqId composite {id, provider} | corroborated by A36 (see V8 caveat) |
| A36 | Cherry Studio ModelService.ts | github.com/CherryHQ/cherry-studio/.../ModelService.ts | 2026-06-10 | web | Composite-key implementation | corroborated by A35 (see V8 caveat) |
| A37 | Cherry Studio releases | github.com/CherryHQ/cherry-studio/releases | 2026-06-10 | web | No favorites changes v1.9.1v1.9.11 | single source (caveated) |
| A38 | Chatbox issue #1540 | github.com/chatboxai/chatbox/issues/1540 | 2026-06-10 | web | Favorite-models proposal; not shipped | corroborated by A39 |
| A39 | Chatbox issue #2252 | github.com/chatboxai/chatbox/issues/2252 | 2026-06-10 | web | Two-section dropdown proposal (Preferred on top, star per row) | corroborated by A38 |
| A40 | DeepWiki: Chatbox local models | deepwiki.com/chatboxai/chatbox/4.6-local-model-integration | 2026-06-10 | web | settings.favoritedModels in localStorage | single source (caveated) |
| A41 | SillyTavern PR #5536 | github.com/SillyTavern/SillyTavern/pull/5536 | 2026-06-10 | web | Unified sort/group settings drawer across providers | corroborated by A42 |
| A42 | SillyTavern 1.13.5 notes | github.com/SillyTavern/SillyTavern/discussions/4660 | 2026-06-10 | web | Sort/group shipped in 1.13.5 | corroborated by A41 |
| A43 | SillyTavern connection profiles docs | docs.sillytavern.app/usage/core-concepts/connection-profiles/ | 2026-06-10 | web | Profiles = saved config snapshots, not per-model favorites | corroborated by A44 |
| A44 | SillyTavern issue #4565 | github.com/SillyTavern/SillyTavern/issues/4565 | 2026-06-10 | web | Better model selector request closed not-planned | corroborated by A43 |
| A45 | VS Code language models docs | code.visualstudio.com/docs/agent-customization/language-models | 2026-06-10 | web | Provider groups + hover pin + dedicated Pinned top section, stable order, model stays in group | corroborated by A46 |
| A46 | vscode-copilot-chat PR #1111 | github.com/microsoft/vscode-copilot-chat/pull/1111 | 2026-06-10 | web | BYOK models grouped into a category | corroborated by A45 (see V8 caveat) |
| A47 | Continue.dev model roles docs | docs.continue.dev/customize/model-roles/00-intro | 2026-06-10 | web | Role-based dropdowns; no grouping/favorites | corroborated by A48 |
| A48 | Continue.dev providers overview | docs.continue.dev/customize/model-providers/overview | 2026-06-10 | web | Picker reflects config.yaml order | corroborated by A47 |
| A49 | Open WebUI discussion #15449 | github.com/open-webui/open-webui/discussions/15449 | 2026-06-10 | web | Multi-model combination pinning request | single source (caveated) |
| A50 | BigAGI repo + changelog | github.com/enricoros/big-AGI | 2026-06-10 | web | No grouping/favorites evidence (negative finding) | single source (caveated) |
| A51 | LM Studio v0.4.0 changelog | lmstudio.ai/changelog/lmstudio-v0.4.0 | 2026-06-10 | web | Search/format filters; no favorites | corroborated by A52 |
| A52 | LM Studio v0.4.13 changelog | lmstudio.ai/changelog/lmstudio-v0.4.13 | 2026-06-10 | web | No picker changes | corroborated by A51 |
| A53 | Open WebUI issue #22578 | github.com/open-webui/open-webui/issues/22578 | 2026-06-10 | web | Model enable/disable state goes stale on catalog change | corroborated by A23 |
| A54 | embedding host live inventory | provided: `curl http://100.90.172.55:8411/v1/models` + `/running` | 2026-06-10 | provided | 39 models incl. deepseek-r1-qwen3-8b and 5 IDs duplicated on Sam-desktop; /running empty | corroborated by A56 (config matches) |
| A55 | Sam-desktop live inventory | provided: `curl http://100.101.41.16:8401/v1/models` + `/running` | 2026-06-10 | provided | 21 models; qwen3.6-35b-a3b-mxfp4 absent; nemotron-omni running via D:\llama-server | corroborated by A57 |
| A56 | embedding host SSH inventory | provided: `ssh samkintop@100.90.172.55` (~/llama-swap/config.yaml, ~/llama.cpp, ~/models) | 2026-06-10 | provided | P104-tuned llama-swap config (ttl 1800, per-model llama-server cmds); llama.cpp source build | corroborated by A54 |
| A57 | Sam-desktop SSH inventory | provided: `ssh samki@100.101.41.16` (dir D:\) | 2026-06-10 | provided | D:\llama-server (b9591 CUDA), D:\llama-swap (v224), D:\models, D:\llama-sidecar | corroborated by A55 |
| A58 | Current env config | `.env`, `apps/coder/.env.host` | n/a | codebase | LLAMA_SWAP_URL=http://100.101.41.16:8401; DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (both apps) | corroborated (read directly) |
| A59 | Models route | `apps/server/src/routes/models.ts:14-56` | n/a | codebase | GET /api/models fetches only LLAMA_SWAP_URL (+DeepSeek); flat untagged list | corroborated (read directly) |
| A60 | Inference provider/routing | `apps/server/src/services/inference/provider.ts:1-163` | n/a | codebase | resolveRoute: deepseek- prefix → cloud; LLAMA_SIDECAR_URL set → sidecar default for everything; else single swap; resolveModelEndpoint hardcodes LLAMA_SWAP_URL | corroborated (read directly) |
| A61 | BooChat model picker | `apps/web/src/components/ModelPicker.tsx:14-133` | n/a | codebase | Flat lazy list, no grouping/search/favorites; PATCHes session.model | corroborated (explorer + validator) |
| A62 | Provider snapshot contracts | `packages/contracts/src/provider-snapshot.ts` | n/a | codebase | ProviderModel has no provider field; identity implicit in parent entry name | corroborated |
| A63 | Coder provider snapshot | `apps/coder/src/services/provider-snapshot.ts:48-70,256-310` | n/a | codebase | Prefixes single llama-swap list with `llama-swap/`; merges into boocode entry | corroborated |
| A64 | Coder dispatcher prefixing | `apps/coder/src/services/dispatcher.ts:1006-1011` | n/a | codebase | Bare IDs get `llama-swap/`; slash-containing IDs pass through unchanged | corroborated (validator-verified) |
| A65 | Model/settings persistence | `apps/server/src/schema.sql:20,217-222,249`; `routes/settings.ts` | n/a | codebase | sessions.model NOT NULL, chats.model nullable, settings KV JSONB seeded with bare default_model | corroborated |
| A66 | Model context service | `apps/server/src/services/model-context.ts:9,26-29,40-49,77-100` | n/a | codebase | No-TTL positive cache keyed by raw model string; deepseek- guard returns static 131k; /upstream URL from single config | corroborated (validator-verified) |
| A67 | Compaction LLM calls | `apps/server/src/services/compaction.ts:351-357,531` | n/a | codebase | Summarization via resolveModelEndpoint → always LLAMA_SWAP_URL | corroborated (validator-verified) |
| A68 | Task model service | `apps/server/src/services/task-model.ts:59-68` | n/a | codebase | FAST_MODEL fallback chain against single endpoint (TASK_MODEL_URL escape hatch) | corroborated |
| A69 | Arena model calls | `apps/coder/src/services/arena-model-call.ts:16-28`; `arena-analyzer.ts:90` | n/a | codebase | Raw fetch to LLAMA_SWAP_URL, no routing abstraction | corroborated (validator-verified) |
| A70 | Coder composer prefs | `apps/web/src/components/AgentComposerBar.tsx:33-52,118-196` | n/a | codebase | CompactPicker flat lists; prefs in localStorage `boocode.coder.agent-prefs` | corroborated |
| A71 | Model display naming | `apps/web/src/lib/modelName.ts:6-32`; `MessageBubble.tsx:140-189` | n/a | codebase | Display chips already strip `llama-swap/`-style prefixes | corroborated |
| A72 | Coder provider config file | `data/coder-providers.example.json` | n/a | codebase | Per-provider overrides exist; no baseUrl field — second endpoint unregistrable today | corroborated |
| A73 | Openspec conventions | `openspec/README.md` | n/a | codebase | changes/<slug>/{proposal,tasks,design}.md; lowercase-hyphenated slugs | corroborated (read directly) |
| A74 | Sidecar architecture notes | `apps/server/CLAUDE.md` (sidecar sections); `/opt/forks/llama-sidecar/` | n/a | codebase | llama-sidecar = Go per-agent llama-server pool on Sam-desktop; X-Agent-Flags header; boot guard ties llama_extra_args to LLAMA_SIDECAR_URL | corroborated by A60 |
### A54/A55: Live host inventories — recommendation-bearing
- **Link / location:** provided: orchestrator-run `curl` against `http://100.90.172.55:8411` and `http://100.101.41.16:8401` (`/v1/models`, `/running`)
- **Retrieved:** 2026-06-10
- **Trust class:** provided (operator-owned infrastructure, independently re-checkable with the same commands)
- **Summary:** embedding serves 39 mostly-small models; Sam-desktop serves 21 mostly-large models. Five IDs (`granite-4.1-8b`, `negentropy-4.7-9b`, `qwen3.5-9b`, `qwen3.5-9b-deepseek-v4`, `qwopus3.5-9b-coder`) appear on both — making composite keying mandatory, not stylistic. The configured `DEFAULT_MODEL` is absent from Sam-desktop's live list, proving ID churn. embedding's `deepseek-r1-qwen3-8b` collides with the `deepseek-` cloud-routing heuristic. Neither host populates llama-swap's optional `name`/`description` fields, so the UI must derive labels from IDs (as `formatModelLabel` already does).
- **Evidence status:** corroborated by A56/A57 (SSH-level configs match the served lists).
### A60: `provider.ts` routing — recommendation-bearing
- **Link / location:** `apps/server/src/services/inference/provider.ts:90-157`
- **Retrieved:** n/a
- **Trust class:** codebase (current-state anchor)
- **Summary:** The single point where all three routes (deepseek/sidecar/swap) resolve. Establishes that (a) BooCode already builds per-baseURL AI-SDK providers from a cache map — O1 slots into this with minimal new machinery; (b) the sidecar is the default route for everything when configured, which forces constraint 4; (c) `resolveModelEndpoint` is a second, parallel resolution path (compaction/task-model) that must change in lockstep.
- **Evidence status:** corroborated (read directly by orchestrator and validator).
### A13/A14: AI SDK provider registry pattern — recommendation-bearing
- **Link / location:** https://ai-sdk.dev/docs/ai-sdk-core/provider-management ; https://ai-sdk.dev/providers/openai-compatible-providers
- **Retrieved:** 2026-06-10
- **Trust class:** web
- **Summary:** The library BooCode already uses prescribes exactly O1's shape: one named `createOpenAICompatible` instance per provider, registry-level `provider:model` namespacing, bare model IDs on the wire. Adopting O1 is convergence with the upstream idiom rather than a custom scheme.
- **Evidence status:** corroborated (two official doc pages, consistent with LiteLLM's independent design A15/A16).
### A45: VS Code model picker docs — recommendation-bearing (UX)
- **Link / location:** https://code.visualstudio.com/docs/agent-customization/language-models
- **Retrieved:** 2026-06-10
- **Trust class:** web
- **Summary:** Documents the shipped pattern this feature's dropdown adapts: provider-grouped list, hover-revealed pin, dedicated Pinned top section in stable insertion order, pinned models remaining in their provider group.
- **Evidence status:** corroborated by A46; code-level detail treated as color per V8.
### A23/A27: Open WebUI pitfalls — recommendation-bearing (counter-evidence)
- **Link / location:** https://github.com/open-webui/open-webui/discussions/23656 ; https://github.com/open-webui/open-webui/discussions/4495
- **Retrieved:** 2026-06-10
- **Trust class:** web
- **Summary:** The two documented failure modes the design must avoid: bare-model-ID favorites becoming ambiguous across connections, and stale-favorite cleanup permanently destroying user preferences during transient backend downtime.
- **Evidence status:** corroborated by A21/A22/A53 (the surrounding docs and a second stale-state issue).