feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
312 lines
9.9 KiB
Markdown
312 lines
9.9 KiB
Markdown
# multi-llama-swap-providers-model-favorites — implementation analysis
|
|
|
|
## Scope compared
|
|
|
|
- **Current state:** the shipped implementation in `apps/server`, `apps/coder`,
|
|
`apps/web`, and `packages/contracts`
|
|
- **Desired state:** the behavior described in
|
|
`docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md`
|
|
and the corresponding OpenSpec batch
|
|
|
|
Purpose: determine the safest and most coherent implementation path before
|
|
building the feature.
|
|
|
|
## Conclusion
|
|
|
|
The best implementation path is to treat this as a **shared local-model
|
|
routing subsystem**, not as a picker-only UI feature.
|
|
|
|
That subsystem needs two interfaces:
|
|
|
|
1. **An in-process resolver** used directly by BooChat and native BooCoder
|
|
paths.
|
|
2. **A gateway surface** for consumers that cannot call the resolver directly
|
|
and still assume one OpenAI-compatible provider contract.
|
|
|
|
Without that split, the feature looks straightforward in BooChat but stays
|
|
architecturally broken in BooCoder because the existing opencode integration
|
|
collapses provider identity back to one local llama-swap endpoint.
|
|
|
|
## Current-state findings
|
|
|
|
### F-001 — config authority is split
|
|
|
|
- `apps/server` is driven by `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and
|
|
`DEFAULT_MODEL`.
|
|
- `apps/coder` reuses `LLAMA_SWAP_URL` for local models and has a separate
|
|
`data/coder-providers.json` for ACP providers.
|
|
|
|
Effect: there is no single source of truth for local model providers that both
|
|
apps can consume.
|
|
|
|
### F-002 — model identity is still a raw string everywhere that matters
|
|
|
|
- `sessions.model` is `TEXT NOT NULL`.
|
|
- `chats.model` is `TEXT`.
|
|
- `model-context.ts` caches by the raw model string.
|
|
- multiple dispatchers treat the model as an opaque string and infer behavior
|
|
from prefixes.
|
|
|
|
Effect: duplicate model names across hosts cannot be represented safely without
|
|
composite IDs.
|
|
|
|
### F-003 — routing logic is duplicated and heuristic-heavy
|
|
|
|
- BooChat streaming uses `upstreamModel()` in `provider.ts`.
|
|
- non-streaming calls use `resolveModelEndpoint()`.
|
|
- context lookup bypasses both and fetches `LLAMA_SWAP_URL` directly.
|
|
- arena local calls bypass both and hit `LLAMA_SWAP_URL` directly.
|
|
|
|
Effect: even after adding a registry, call sites will diverge unless they all
|
|
share one resolver.
|
|
|
|
### F-004 — favorites are a UI concern backed by shared settings, not a server catalog concern
|
|
|
|
- The `settings` table is already the right persistence surface.
|
|
- BooChat already reads/writes server state.
|
|
- BooCoder currently keeps picker prefs in browser localStorage, but those are
|
|
provider-specific UI prefs, not a shared favorite-model feature.
|
|
|
|
Effect: favorites should be stored server-side and derived in the client from
|
|
`/api/settings` + provider-aware model data.
|
|
|
|
### F-005 — BooCoder has a deeper coupling than the research initially surfaced
|
|
|
|
The dangerous assumption is not only in `dispatcher.ts`. It is in the whole
|
|
opencode local-model bridge:
|
|
|
|
- the snapshot merges local llama models into the `opencode` provider by
|
|
prefixing them as `llama-swap/<model>`
|
|
- the dispatcher treats bare IDs as `llama-swap/<model>`
|
|
- the opencode backend parses `provider/model`
|
|
- current host opencode config points every local-model family at a single
|
|
llama-swap base URL
|
|
|
|
Effect: translating `embedding/qwen3.5-9b` back to `llama-swap/qwen3.5-9b`
|
|
reintroduces the exact ambiguity this batch is trying to remove.
|
|
|
|
### F-006 — Arena is a separate local-model consumer, not just another caller
|
|
|
|
Arena currently:
|
|
|
|
- builds its "local model" set from one live llama-swap list
|
|
- classifies local-vs-cloud contestants from that set
|
|
- performs one-shot local calls directly against `LLAMA_SWAP_URL`
|
|
|
|
Effect: arena needs the same provider-aware resolver as BooChat, but it does
|
|
not need the full BooChat picker/favorites work.
|
|
|
|
## Gap summary
|
|
|
|
### G-001 — no shared local-provider registry
|
|
|
|
What is missing:
|
|
|
|
- one schema and one loader contract for named local providers consumed by
|
|
both server and coder
|
|
|
|
Why it matters:
|
|
|
|
- every downstream fix becomes duplicated if config remains split
|
|
|
|
### G-002 — no canonical model-ref format and parser
|
|
|
|
What is missing:
|
|
|
|
- a shared `provider/model` identity format and parse/format helpers
|
|
|
|
Why it matters:
|
|
|
|
- caches, DB values, routing, and UI rendering cannot stay aligned otherwise
|
|
|
|
### G-003 — no single provider-aware resolver
|
|
|
|
What is missing:
|
|
|
|
- one shared resolver API for:
|
|
- route selection
|
|
- base URL selection
|
|
- sidecar selection
|
|
- wire-model extraction
|
|
- context-props endpoint selection
|
|
|
|
Why it matters:
|
|
|
|
- keeping separate "streaming", "non-streaming", "context", and "arena"
|
|
resolution paths will re-create subtle bugs
|
|
|
|
### G-004 — no neutral provider-aware catalog contract
|
|
|
|
What is missing:
|
|
|
|
- a provider-aware model catalog response that exposes providers and models
|
|
without baking favorites into the server payload
|
|
|
|
Why it matters:
|
|
|
|
- BooChat and BooCoder both need provider metadata, but favorites are derived
|
|
from user settings, not from upstream inventory
|
|
|
|
### G-005 — no safe path for opencode local-model parity
|
|
|
|
What is missing:
|
|
|
|
- either:
|
|
- a generated/synced opencode-facing local-model config, or
|
|
- a BooCoder-hosted OpenAI-compatible gateway that preserves provider
|
|
identity under one provider namespace, or
|
|
- a deliberate scope cut that removes multi-provider local models from the
|
|
`opencode` provider until that bridge exists
|
|
|
|
Why it matters:
|
|
|
|
- without one of these, the feature is correct in BooChat but false-advertised
|
|
in the `opencode` provider
|
|
|
|
## Recommended architecture
|
|
|
|
### 1. Shared local-provider registry
|
|
|
|
Add a new shared config surface for local inference providers, separate from
|
|
`data/coder-providers.json`.
|
|
|
|
Recommendation:
|
|
|
|
- schema in `packages/contracts`
|
|
- live file such as `/data/llama-providers.json`
|
|
- fallback synthesis from `LLAMA_SWAP_URL` and `LLAMA_SIDECAR_URL` while the
|
|
file is absent
|
|
|
|
This keeps ACP provider management and local model provider management as two
|
|
separate concerns.
|
|
|
|
### 2. Shared model-ref and resolver helpers
|
|
|
|
Add shared helpers for:
|
|
|
|
- parsing `provider/model`
|
|
- resolving legacy bare IDs to the default provider
|
|
- deciding route type
|
|
- selecting upstream base URL
|
|
- extracting the wire model id
|
|
|
|
All of these should be used by:
|
|
|
|
- server streaming inference
|
|
- server non-streaming calls
|
|
- model-context lookup
|
|
- arena one-shot local calls
|
|
- any future control-plane or routing feature
|
|
|
|
### 3. Provider-aware catalog, client-derived favorites
|
|
|
|
Do **not** make the server return a synthetic Favorites section.
|
|
|
|
Instead:
|
|
|
|
- `/api/models` (or a replacement contract) should return provider-grouped
|
|
inventory only
|
|
- `/api/settings` should hold `favorite_models: string[]`
|
|
- BooChat and BooCoder should derive:
|
|
- Favorites first
|
|
- then provider sections
|
|
- hide unavailable favorites without deleting them
|
|
|
|
This keeps the server contract inventory-shaped and the favorite behavior
|
|
user-shaped.
|
|
|
|
### 4. Treat BooCoder native and BooCoder external-agent paths differently
|
|
|
|
There are two different BooCoder consumers:
|
|
|
|
- **native `boocode` provider**
|
|
- **external-agent providers like `opencode`**
|
|
|
|
The native `boocode` provider can adopt the shared resolver directly.
|
|
|
|
The `opencode` provider cannot safely adopt `provider/model` by simple string
|
|
translation, because its current local-model bridge still assumes one local
|
|
provider.
|
|
|
|
Recommendation:
|
|
|
|
- ship native `boocode` provider parity first
|
|
- do **not** claim `opencode` parity until provider identity is preserved
|
|
end-to-end there too
|
|
|
|
### 5. Preferred parity path for opencode: a BooCoder-hosted local-model gateway
|
|
|
|
If full `opencode` parity is required in the same initiative, the cleanest path
|
|
is a small OpenAI-compatible gateway inside `apps/coder`:
|
|
|
|
- accepts model ids that still carry provider identity
|
|
- strips provider prefix only at the final upstream boundary
|
|
- routes to the correct local provider
|
|
- becomes the single local-model base URL for `opencode`
|
|
|
|
Why this is better than adding many direct opencode providers:
|
|
|
|
- one stable provider contract for opencode
|
|
- no duplicated base-URL registry in opencode config
|
|
- the same gateway can serve arena/local utility calls later
|
|
- it stays inside an existing always-on service, not a new third service
|
|
|
|
If this gateway is not in scope now, the correct fallback is to remove or hide
|
|
multi-provider local models from the `opencode` provider until the bridge is
|
|
real.
|
|
|
|
## Recommended sequence
|
|
|
|
### Phase 1 — shared foundation
|
|
|
|
- shared local-provider config schema
|
|
- shared `provider/model` parsing helpers
|
|
- shared resolver
|
|
- legacy bare-id fallback
|
|
|
|
### Phase 2 — BooChat + native BooCoder
|
|
|
|
- provider-aware model catalog
|
|
- server inference routing updates
|
|
- model-context cache-key fix
|
|
- compaction and task-model endpoint resolution
|
|
- BooChat picker grouping + server-side favorites
|
|
- BooCoder `boocode` provider model list grouped by local provider
|
|
|
|
### Phase 3 — arena parity
|
|
|
|
- local-model set built from the shared provider catalog, not one llama-swap
|
|
- one-shot local calls use the shared resolver
|
|
|
|
### Phase 4 — opencode parity
|
|
|
|
Choose one:
|
|
|
|
- preferred: BooCoder-hosted local-model gateway plus opencode-facing model
|
|
sync
|
|
- fallback: temporarily stop advertising multi-provider local models under the
|
|
`opencode` provider
|
|
|
|
### Phase 5 — boocontrol
|
|
|
|
- build BooControl only after the local-provider registry and canonical model
|
|
identity land
|
|
|
|
## What this changes in the existing OpenSpec batch
|
|
|
|
1. The design should treat favorites as **client-derived from settings**, not
|
|
as a server-generated catalog section.
|
|
2. The design should explicitly separate **native BooCoder parity** from
|
|
**opencode parity**.
|
|
3. The tasks should call out the `opencode` bridge as a dedicated risk area,
|
|
not as a small dispatcher rename.
|
|
|
|
## Recommendation
|
|
|
|
Implement the shared local-provider registry and resolver first, then ship
|
|
BooChat plus native BooCoder on top of it. Treat `opencode` multi-provider
|
|
support as a distinct integration seam that either gets a real gateway or stays
|
|
out of scope for the first slice.
|
|
|
|
That is the fastest path that is still architecturally honest.
|