chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
@@ -0,0 +1,311 @@
|
||||
# multi-llama-swap-providers-model-favorites — implementation analysis
|
||||
|
||||
## Scope compared
|
||||
|
||||
- **Current state:** the shipped implementation in `apps/server`, `apps/coder`,
|
||||
`apps/web`, and `packages/contracts`
|
||||
- **Desired state:** the behavior described in
|
||||
`docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md`
|
||||
and the corresponding OpenSpec batch
|
||||
|
||||
Purpose: determine the safest and most coherent implementation path before
|
||||
building the feature.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The best implementation path is to treat this as a **shared local-model
|
||||
routing subsystem**, not as a picker-only UI feature.
|
||||
|
||||
That subsystem needs two interfaces:
|
||||
|
||||
1. **An in-process resolver** used directly by BooChat and native BooCoder
|
||||
paths.
|
||||
2. **A gateway surface** for consumers that cannot call the resolver directly
|
||||
and still assume one OpenAI-compatible provider contract.
|
||||
|
||||
Without that split, the feature looks straightforward in BooChat but stays
|
||||
architecturally broken in BooCoder because the existing opencode integration
|
||||
collapses provider identity back to one local llama-swap endpoint.
|
||||
|
||||
## Current-state findings
|
||||
|
||||
### F-001 — config authority is split
|
||||
|
||||
- `apps/server` is driven by `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and
|
||||
`DEFAULT_MODEL`.
|
||||
- `apps/coder` reuses `LLAMA_SWAP_URL` for local models and has a separate
|
||||
`data/coder-providers.json` for ACP providers.
|
||||
|
||||
Effect: there is no single source of truth for local model providers that both
|
||||
apps can consume.
|
||||
|
||||
### F-002 — model identity is still a raw string everywhere that matters
|
||||
|
||||
- `sessions.model` is `TEXT NOT NULL`.
|
||||
- `chats.model` is `TEXT`.
|
||||
- `model-context.ts` caches by the raw model string.
|
||||
- multiple dispatchers treat the model as an opaque string and infer behavior
|
||||
from prefixes.
|
||||
|
||||
Effect: duplicate model names across hosts cannot be represented safely without
|
||||
composite IDs.
|
||||
|
||||
### F-003 — routing logic is duplicated and heuristic-heavy
|
||||
|
||||
- BooChat streaming uses `upstreamModel()` in `provider.ts`.
|
||||
- non-streaming calls use `resolveModelEndpoint()`.
|
||||
- context lookup bypasses both and fetches `LLAMA_SWAP_URL` directly.
|
||||
- arena local calls bypass both and hit `LLAMA_SWAP_URL` directly.
|
||||
|
||||
Effect: even after adding a registry, call sites will diverge unless they all
|
||||
share one resolver.
|
||||
|
||||
### F-004 — favorites are a UI concern backed by shared settings, not a server catalog concern
|
||||
|
||||
- The `settings` table is already the right persistence surface.
|
||||
- BooChat already reads/writes server state.
|
||||
- BooCoder currently keeps picker prefs in browser localStorage, but those are
|
||||
provider-specific UI prefs, not a shared favorite-model feature.
|
||||
|
||||
Effect: favorites should be stored server-side and derived in the client from
|
||||
`/api/settings` + provider-aware model data.
|
||||
|
||||
### F-005 — BooCoder has a deeper coupling than the research initially surfaced
|
||||
|
||||
The dangerous assumption is not only in `dispatcher.ts`. It is in the whole
|
||||
opencode local-model bridge:
|
||||
|
||||
- the snapshot merges local llama models into the `opencode` provider by
|
||||
prefixing them as `llama-swap/<model>`
|
||||
- the dispatcher treats bare IDs as `llama-swap/<model>`
|
||||
- the opencode backend parses `provider/model`
|
||||
- current host opencode config points every local-model family at a single
|
||||
llama-swap base URL
|
||||
|
||||
Effect: translating `embedding/qwen3.5-9b` back to `llama-swap/qwen3.5-9b`
|
||||
reintroduces the exact ambiguity this batch is trying to remove.
|
||||
|
||||
### F-006 — Arena is a separate local-model consumer, not just another caller
|
||||
|
||||
Arena currently:
|
||||
|
||||
- builds its "local model" set from one live llama-swap list
|
||||
- classifies local-vs-cloud contestants from that set
|
||||
- performs one-shot local calls directly against `LLAMA_SWAP_URL`
|
||||
|
||||
Effect: arena needs the same provider-aware resolver as BooChat, but it does
|
||||
not need the full BooChat picker/favorites work.
|
||||
|
||||
## Gap summary
|
||||
|
||||
### G-001 — no shared local-provider registry
|
||||
|
||||
What is missing:
|
||||
|
||||
- one schema and one loader contract for named local providers consumed by
|
||||
both server and coder
|
||||
|
||||
Why it matters:
|
||||
|
||||
- every downstream fix becomes duplicated if config remains split
|
||||
|
||||
### G-002 — no canonical model-ref format and parser
|
||||
|
||||
What is missing:
|
||||
|
||||
- a shared `provider/model` identity format and parse/format helpers
|
||||
|
||||
Why it matters:
|
||||
|
||||
- caches, DB values, routing, and UI rendering cannot stay aligned otherwise
|
||||
|
||||
### G-003 — no single provider-aware resolver
|
||||
|
||||
What is missing:
|
||||
|
||||
- one shared resolver API for:
|
||||
- route selection
|
||||
- base URL selection
|
||||
- sidecar selection
|
||||
- wire-model extraction
|
||||
- context-props endpoint selection
|
||||
|
||||
Why it matters:
|
||||
|
||||
- keeping separate "streaming", "non-streaming", "context", and "arena"
|
||||
resolution paths will re-create subtle bugs
|
||||
|
||||
### G-004 — no neutral provider-aware catalog contract
|
||||
|
||||
What is missing:
|
||||
|
||||
- a provider-aware model catalog response that exposes providers and models
|
||||
without baking favorites into the server payload
|
||||
|
||||
Why it matters:
|
||||
|
||||
- BooChat and BooCoder both need provider metadata, but favorites are derived
|
||||
from user settings, not from upstream inventory
|
||||
|
||||
### G-005 — no safe path for opencode local-model parity
|
||||
|
||||
What is missing:
|
||||
|
||||
- either:
|
||||
- a generated/synced opencode-facing local-model config, or
|
||||
- a BooCoder-hosted OpenAI-compatible gateway that preserves provider
|
||||
identity under one provider namespace, or
|
||||
- a deliberate scope cut that removes multi-provider local models from the
|
||||
`opencode` provider until that bridge exists
|
||||
|
||||
Why it matters:
|
||||
|
||||
- without one of these, the feature is correct in BooChat but false-advertised
|
||||
in the `opencode` provider
|
||||
|
||||
## Recommended architecture
|
||||
|
||||
### 1. Shared local-provider registry
|
||||
|
||||
Add a new shared config surface for local inference providers, separate from
|
||||
`data/coder-providers.json`.
|
||||
|
||||
Recommendation:
|
||||
|
||||
- schema in `packages/contracts`
|
||||
- live file such as `/data/llama-providers.json`
|
||||
- fallback synthesis from `LLAMA_SWAP_URL` and `LLAMA_SIDECAR_URL` while the
|
||||
file is absent
|
||||
|
||||
This keeps ACP provider management and local model provider management as two
|
||||
separate concerns.
|
||||
|
||||
### 2. Shared model-ref and resolver helpers
|
||||
|
||||
Add shared helpers for:
|
||||
|
||||
- parsing `provider/model`
|
||||
- resolving legacy bare IDs to the default provider
|
||||
- deciding route type
|
||||
- selecting upstream base URL
|
||||
- extracting the wire model id
|
||||
|
||||
All of these should be used by:
|
||||
|
||||
- server streaming inference
|
||||
- server non-streaming calls
|
||||
- model-context lookup
|
||||
- arena one-shot local calls
|
||||
- any future control-plane or routing feature
|
||||
|
||||
### 3. Provider-aware catalog, client-derived favorites
|
||||
|
||||
Do **not** make the server return a synthetic Favorites section.
|
||||
|
||||
Instead:
|
||||
|
||||
- `/api/models` (or a replacement contract) should return provider-grouped
|
||||
inventory only
|
||||
- `/api/settings` should hold `favorite_models: string[]`
|
||||
- BooChat and BooCoder should derive:
|
||||
- Favorites first
|
||||
- then provider sections
|
||||
- hide unavailable favorites without deleting them
|
||||
|
||||
This keeps the server contract inventory-shaped and the favorite behavior
|
||||
user-shaped.
|
||||
|
||||
### 4. Treat BooCoder native and BooCoder external-agent paths differently
|
||||
|
||||
There are two different BooCoder consumers:
|
||||
|
||||
- **native `boocode` provider**
|
||||
- **external-agent providers like `opencode`**
|
||||
|
||||
The native `boocode` provider can adopt the shared resolver directly.
|
||||
|
||||
The `opencode` provider cannot safely adopt `provider/model` by simple string
|
||||
translation, because its current local-model bridge still assumes one local
|
||||
provider.
|
||||
|
||||
Recommendation:
|
||||
|
||||
- ship native `boocode` provider parity first
|
||||
- do **not** claim `opencode` parity until provider identity is preserved
|
||||
end-to-end there too
|
||||
|
||||
### 5. Preferred parity path for opencode: a BooCoder-hosted local-model gateway
|
||||
|
||||
If full `opencode` parity is required in the same initiative, the cleanest path
|
||||
is a small OpenAI-compatible gateway inside `apps/coder`:
|
||||
|
||||
- accepts model ids that still carry provider identity
|
||||
- strips provider prefix only at the final upstream boundary
|
||||
- routes to the correct local provider
|
||||
- becomes the single local-model base URL for `opencode`
|
||||
|
||||
Why this is better than adding many direct opencode providers:
|
||||
|
||||
- one stable provider contract for opencode
|
||||
- no duplicated base-URL registry in opencode config
|
||||
- the same gateway can serve arena/local utility calls later
|
||||
- it stays inside an existing always-on service, not a new third service
|
||||
|
||||
If this gateway is not in scope now, the correct fallback is to remove or hide
|
||||
multi-provider local models from the `opencode` provider until the bridge is
|
||||
real.
|
||||
|
||||
## Recommended sequence
|
||||
|
||||
### Phase 1 — shared foundation
|
||||
|
||||
- shared local-provider config schema
|
||||
- shared `provider/model` parsing helpers
|
||||
- shared resolver
|
||||
- legacy bare-id fallback
|
||||
|
||||
### Phase 2 — BooChat + native BooCoder
|
||||
|
||||
- provider-aware model catalog
|
||||
- server inference routing updates
|
||||
- model-context cache-key fix
|
||||
- compaction and task-model endpoint resolution
|
||||
- BooChat picker grouping + server-side favorites
|
||||
- BooCoder `boocode` provider model list grouped by local provider
|
||||
|
||||
### Phase 3 — arena parity
|
||||
|
||||
- local-model set built from the shared provider catalog, not one llama-swap
|
||||
- one-shot local calls use the shared resolver
|
||||
|
||||
### Phase 4 — opencode parity
|
||||
|
||||
Choose one:
|
||||
|
||||
- preferred: BooCoder-hosted local-model gateway plus opencode-facing model
|
||||
sync
|
||||
- fallback: temporarily stop advertising multi-provider local models under the
|
||||
`opencode` provider
|
||||
|
||||
### Phase 5 — boocontrol
|
||||
|
||||
- build BooControl only after the local-provider registry and canonical model
|
||||
identity land
|
||||
|
||||
## What this changes in the existing OpenSpec batch
|
||||
|
||||
1. The design should treat favorites as **client-derived from settings**, not
|
||||
as a server-generated catalog section.
|
||||
2. The design should explicitly separate **native BooCoder parity** from
|
||||
**opencode parity**.
|
||||
3. The tasks should call out the `opencode` bridge as a dedicated risk area,
|
||||
not as a small dispatcher rename.
|
||||
|
||||
## Recommendation
|
||||
|
||||
Implement the shared local-provider registry and resolver first, then ship
|
||||
BooChat plus native BooCoder on top of it. Treat `opencode` multi-provider
|
||||
support as a distinct integration seam that either gets a real gateway or stays
|
||||
out of scope for the first slice.
|
||||
|
||||
That is the fastest path that is still architecturally honest.
|
||||
@@ -0,0 +1,238 @@
|
||||
# multi-llama-swap-providers-model-favorites — design
|
||||
|
||||
Detailed implementation plan for named local model providers, composite model
|
||||
IDs, grouped pickers, and shared favorites across BooChat and BooCoder.
|
||||
|
||||
## 1. Current state
|
||||
|
||||
Today the repo splits inference configuration across two incompatible shapes:
|
||||
|
||||
- `apps/server` reads env vars such as `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`,
|
||||
and `DEFAULT_MODEL`.
|
||||
- `apps/coder` reads the same `LLAMA_SWAP_URL` for BooCode's own provider, plus
|
||||
`data/coder-providers.json` for ACP providers.
|
||||
|
||||
That leaves several hardcoded single-endpoint assumptions:
|
||||
|
||||
- `/api/models` fetches one llama-swap plus optional DeepSeek.
|
||||
- `provider.ts` routes by `deepseek-` name prefix and a global sidecar default.
|
||||
- `model-context.ts` caches by bare model string.
|
||||
- `compaction.ts`, `task-model.ts`, and coder arena use a single upstream URL.
|
||||
- BooCoder prepends `llama-swap/` and treats any other slash-containing value
|
||||
as an already-routable provider namespace.
|
||||
|
||||
## 2. Design principles
|
||||
|
||||
1. Provider identity is explicit.
|
||||
2. Wire model IDs stay bare; persisted model IDs are composite.
|
||||
3. Legacy bare model IDs remain readable indefinitely.
|
||||
4. Favorites are shared across BooChat and BooCoder.
|
||||
5. Sidecar routing is opt-in per provider, not a global fallback.
|
||||
6. Any cache keyed by model identity uses the full composite ID.
|
||||
|
||||
## 3. Recommended config authority
|
||||
|
||||
Introduce a new shared file for local inference providers:
|
||||
|
||||
- Live path: `/data/llama-providers.json`
|
||||
- Env var for both apps: `LLAMA_PROVIDERS_PATH`
|
||||
- Tracked example: `data/llama-providers.example.json`
|
||||
|
||||
Recommended shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"defaultProvider": "sam-desktop",
|
||||
"providers": [
|
||||
{
|
||||
"id": "sam-desktop",
|
||||
"label": "Sam-desktop",
|
||||
"baseUrl": "http://100.101.41.16:8401",
|
||||
"sidecarUrl": "http://100.101.41.16:8402",
|
||||
"kind": "llama-swap"
|
||||
},
|
||||
{
|
||||
"id": "embedding",
|
||||
"label": "embedding",
|
||||
"baseUrl": "http://100.90.172.55:8411",
|
||||
"kind": "llama-swap"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- If the file is missing, synthesize a single legacy provider from
|
||||
`LLAMA_SWAP_URL` and optional `LLAMA_SIDECAR_URL`.
|
||||
- `data/coder-providers.json` remains the ACP registry and is not extended with
|
||||
llama-swap base URLs.
|
||||
- DeepSeek credentials remain env-backed, but the model catalog should expose a
|
||||
synthetic provider group such as `deepseek` so routing no longer depends on a
|
||||
bare `deepseek-` prefix.
|
||||
|
||||
## 4. Model identity and parsing
|
||||
|
||||
Persist model selections as `provider/model`.
|
||||
|
||||
Examples:
|
||||
|
||||
- `sam-desktop/qwen3.6-35b-a3b`
|
||||
- `embedding/gemma-4-12b`
|
||||
- `deepseek/deepseek-v4-pro`
|
||||
|
||||
Helper behavior:
|
||||
|
||||
- `parseModelRef(id)` returns `{ providerId, wireModelId, isLegacyBareId }`
|
||||
- Bare IDs resolve to `{ providerId: defaultProvider, wireModelId: id }`
|
||||
- Only strip the prefix at the final wire-call boundary
|
||||
|
||||
This preserves existing `TEXT` columns while fixing duplicate-name ambiguity.
|
||||
|
||||
## 5. Server changes
|
||||
|
||||
### 5.1 Shared registry + model catalog
|
||||
|
||||
Add shared registry utilities in `packages/contracts` plus server-side loaders
|
||||
used by:
|
||||
|
||||
- `apps/server/src/config.ts`
|
||||
- `apps/server/src/routes/models.ts`
|
||||
- `apps/server/src/services/inference/provider.ts`
|
||||
- `apps/server/src/services/model-context.ts`
|
||||
- `apps/server/src/services/task-model.ts`
|
||||
- `apps/server/src/services/compaction.ts`
|
||||
|
||||
`GET /api/models` should return a provider-aware payload. Recommended shape:
|
||||
|
||||
```ts
|
||||
interface ModelCatalogProvider {
|
||||
id: string;
|
||||
label: string;
|
||||
models: ModelInfo[];
|
||||
}
|
||||
|
||||
interface ModelCatalogResponse {
|
||||
providers: ModelCatalogProvider[];
|
||||
}
|
||||
```
|
||||
|
||||
Where each `ModelInfo.id` is already composite.
|
||||
|
||||
Favorites should **not** be embedded in this payload. They are a user-level
|
||||
view derived in the client from `favorite_models` in `/api/settings`.
|
||||
|
||||
### 5.2 Routing
|
||||
|
||||
Replace string-heuristic routing with provider-aware resolution:
|
||||
|
||||
- `sam-desktop/*` routes to `baseUrl` or `sidecarUrl` depending on agent flags
|
||||
and provider capabilities.
|
||||
- `embedding/*` always routes directly to its llama-swap `baseUrl`.
|
||||
- `deepseek/*` routes to the DeepSeek SDK provider.
|
||||
|
||||
`resolveModelEndpoint()` and `upstreamModel()` must both resolve from the same
|
||||
parsed model reference to keep streaming and non-streaming behavior aligned.
|
||||
|
||||
### 5.3 Context lookup and cache keys
|
||||
|
||||
`model-context.ts` must key caches by the full composite ID. The provider
|
||||
prefix is stripped only when building:
|
||||
|
||||
`<provider.baseUrl>/upstream/<wireModelId>/props`
|
||||
|
||||
This avoids cross-provider cache poisoning for duplicate names.
|
||||
|
||||
## 6. Persistence and settings
|
||||
|
||||
Keep:
|
||||
|
||||
- `sessions.model TEXT`
|
||||
- `chats.model TEXT`
|
||||
|
||||
Add a new `settings` key:
|
||||
|
||||
- `favorite_models: string[]`
|
||||
|
||||
Rules:
|
||||
|
||||
- Stored favorites are composite IDs only.
|
||||
- Missing/offline favorites are hidden from the picker, not deleted.
|
||||
- Legacy bare favorites are not supported; on read they may be ignored or
|
||||
normalized only if the default-provider mapping is unambiguous.
|
||||
|
||||
## 7. BooCoder integration
|
||||
|
||||
Touch points:
|
||||
|
||||
- `apps/coder/src/services/provider-snapshot.ts`
|
||||
- `apps/coder/src/services/dispatcher.ts`
|
||||
- `apps/coder/src/services/arena-model-call.ts`
|
||||
- `apps/coder/src/services/arena-analyzer.ts`
|
||||
- `apps/coder/src/config.ts`
|
||||
|
||||
### 7.1 Native `boocode` provider
|
||||
|
||||
The native `boocode` provider can use the shared local-provider registry and
|
||||
resolver directly. Its model list should expose composite `provider/model` ids
|
||||
and the UI should group them by local provider.
|
||||
|
||||
### 7.2 External-agent parity is a separate seam
|
||||
|
||||
`opencode` is not safe to migrate by a naive string rewrite. The current bridge
|
||||
assumes one local llama-swap provider and collapses identity back to
|
||||
`llama-swap/<model>`.
|
||||
|
||||
Recommended bridge rule:
|
||||
|
||||
- Composite local model IDs remain `provider/model` in native BooCode state and UI.
|
||||
- Do **not** translate `provider/model` back to `llama-swap/<wireModelId>` for
|
||||
external-agent paths; that loses provider identity for duplicate model names.
|
||||
- If full `opencode` parity is required, prefer a BooCoder-hosted
|
||||
OpenAI-compatible local-model gateway that accepts provider-aware model ids
|
||||
and routes them to the correct local upstream.
|
||||
|
||||
If the gateway is not part of the first slice, restrict the initial scope to
|
||||
native `boocode` parity and keep `opencode` local-model parity as a follow-up.
|
||||
|
||||
## 8. Picker UX
|
||||
|
||||
Both BooChat and BooCoder should converge on the same behavior:
|
||||
|
||||
- Favorites section first
|
||||
- Then one section per provider
|
||||
- Favorite toggle on every model row
|
||||
- A favorited model remains visible in its provider section
|
||||
- Provider order defaults to:
|
||||
1. `sam-desktop`
|
||||
2. `embedding`
|
||||
3. `deepseek` when configured
|
||||
|
||||
This batch does not require search. Search can be added later if model counts
|
||||
make the grouped list insufficient.
|
||||
|
||||
## 9. Rollout and compatibility
|
||||
|
||||
1. Land registry/parsing utilities first.
|
||||
2. Switch server routing and model catalog to composite IDs.
|
||||
3. Add favorite persistence and picker grouping.
|
||||
4. Update native BooCoder (`boocode`) model handling and arena.
|
||||
5. Decide the `opencode` parity path: gateway now, or explicit follow-up.
|
||||
6. Verify legacy bare IDs across existing chats and sessions before removing
|
||||
any old env-based assumptions.
|
||||
|
||||
Compatibility requirements:
|
||||
|
||||
- Missing `/data/llama-providers.json` cannot break startup.
|
||||
- Existing DB rows with bare IDs must remain routable.
|
||||
- Existing `DEFAULT_MODEL` can stay bare during transition, but new writes
|
||||
should become composite.
|
||||
|
||||
## 10. Deferred items
|
||||
|
||||
- Picker search/filtering
|
||||
- Manual favorite ordering beyond insertion order
|
||||
- Host health badges in the picker
|
||||
- Automatic normalization of old session/chat model values
|
||||
- Full `opencode` multi-provider parity if the first slice ships native-only
|
||||
- Any boocontrol fleet UI built on top of this registry
|
||||
@@ -0,0 +1,73 @@
|
||||
# multi-llama-swap-providers-model-favorites
|
||||
|
||||
## Why
|
||||
|
||||
BooCode still treats local inference as a single `LLAMA_SWAP_URL`, but the
|
||||
actual setup is already a fleet:
|
||||
|
||||
- `sam-desktop` at `100.101.41.16:8401`
|
||||
- `embedding` at `100.90.172.55:8411`
|
||||
- optional DeepSeek cloud models when `DEEPSEEK_API_KEY` is set
|
||||
|
||||
The current model identity is only a bare model string, which is no longer
|
||||
safe. Five model IDs already exist on both llama-swap hosts, the seeded
|
||||
`DEFAULT_MODEL` has already drifted out of the live list once, and multiple
|
||||
server/coder call sites still hardcode a single upstream.
|
||||
|
||||
The research in
|
||||
`docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md`
|
||||
validated one direction:
|
||||
|
||||
1. Introduce a named provider registry.
|
||||
2. Store selected models as composite IDs: `provider/model`.
|
||||
3. Group pickers by provider with a Favorites section first.
|
||||
4. Persist favorites server-side so BooChat and BooCoder share them.
|
||||
5. Remove single-endpoint assumptions from routing, context lookup,
|
||||
compaction, arena, and coder dispatch.
|
||||
|
||||
This batch is also the prerequisite named in `openspec/changes/boocontrol/`.
|
||||
|
||||
## What Changes
|
||||
|
||||
1. Add a shared provider-registry config for local model providers.
|
||||
2. Replace bare model identity with composite `provider/model` IDs at the API,
|
||||
picker, cache, and routing layers while keeping legacy bare IDs readable.
|
||||
3. Convert the server model catalog from a flat list into grouped provider
|
||||
sections with favorites surfaced first.
|
||||
4. Make sidecar routing an attribute of the `sam-desktop` provider instead of
|
||||
a global default for all non-DeepSeek traffic.
|
||||
5. Update BooCoder's llama-swap namespace bridge so composite IDs still
|
||||
dispatch through opencode correctly.
|
||||
6. Add server-side favorite persistence in `settings` with hide-not-delete
|
||||
behavior for unavailable models.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Replacing the existing ACP provider registry in `data/coder-providers.json`
|
||||
- Introducing llama-swap peer federation or LiteLLM as an aggregation layer
|
||||
- Adding full-text search, tags, or admin curation to the pickers in this batch
|
||||
- Cleaning up stale favorites automatically
|
||||
- Reworking session/chat schema columns from `TEXT` to structured provider fields
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- `GET /api/models` returns a provider-aware catalog that can distinguish
|
||||
duplicate model names across hosts.
|
||||
- Existing sessions/chats that store a bare model ID still work, resolving to
|
||||
the default local provider without data migration.
|
||||
- `embedding/deepseek-r1-qwen3-8b` never routes to DeepSeek cloud and never
|
||||
receives the fake static 131k context window.
|
||||
- Requests for `embedding/*` models never go through llama-sidecar.
|
||||
- BooChat and BooCoder both render a Favorites section first, then provider
|
||||
groups, and a favorited model still remains visible in its provider group.
|
||||
- A favorite for an offline provider disappears from the visible list but
|
||||
returns automatically when that provider comes back.
|
||||
- Arena, compaction, task-model, and model-context all resolve the same
|
||||
provider/model pair consistently.
|
||||
|
||||
## Deliverables
|
||||
|
||||
| Doc | Purpose |
|
||||
|-----|---------|
|
||||
| [`design.md`](./design.md) | Registry shape, model identity rules, routing, UX, rollout |
|
||||
| [`tasks.md`](./tasks.md) | Ordered implementation and verification checklist |
|
||||
@@ -0,0 +1,104 @@
|
||||
# multi-llama-swap-providers-model-favorites — tasks
|
||||
|
||||
## P0 — config and contracts
|
||||
|
||||
- [x] Add a shared local-provider config schema under `packages/contracts`.
|
||||
- [x] Add `LLAMA_PROVIDERS_PATH` to `apps/server/src/config.ts` and
|
||||
`apps/coder/src/config.ts`.
|
||||
- [x] Add `data/llama-providers.example.json` with `sam-desktop` and
|
||||
`embedding`.
|
||||
- [x] Implement a loader that falls back to the legacy single-provider env vars
|
||||
when the shared file is missing.
|
||||
|
||||
## P1 — model identity helpers
|
||||
|
||||
- [x] Add shared parsing/formatting helpers for composite model IDs:
|
||||
`provider/model`.
|
||||
- [x] Preserve indefinite support for legacy bare IDs by resolving them to the
|
||||
configured default provider.
|
||||
- [x] Update display-name helpers to strip only the provider prefix intended for
|
||||
presentation, not for routing/cache identity.
|
||||
|
||||
## P2 — server model catalog and routing
|
||||
|
||||
- [x] Refactor `apps/server/src/routes/models.ts` to emit a provider-aware model
|
||||
catalog with composite IDs.
|
||||
- [x] Refactor `apps/server/src/services/inference/provider.ts` to resolve route
|
||||
and base URL from provider identity instead of string heuristics alone.
|
||||
- [x] Make sidecar routing a per-provider attribute so `embedding/*` never hits
|
||||
`LLAMA_SIDECAR_URL`.
|
||||
- [x] Replace the bare `deepseek-` prefix special case with provider-aware
|
||||
handling for DeepSeek models.
|
||||
|
||||
## P3 — server call sites that currently assume one endpoint
|
||||
|
||||
- [x] Update `apps/server/src/services/model-context.ts` to fetch upstream props
|
||||
from the resolved provider and key caches by the full composite ID.
|
||||
- [x] Update `apps/server/src/services/compaction.ts` to use the resolved
|
||||
provider endpoint for summaries.
|
||||
- [x] Update `apps/server/src/services/task-model.ts` to resolve fallback models
|
||||
through the same provider-aware endpoint logic.
|
||||
- [x] Verify any other direct `LLAMA_SWAP_URL` usage in `apps/server` is either
|
||||
migrated or explicitly documented as legacy-only.
|
||||
|
||||
## P4 — favorites persistence
|
||||
|
||||
- [x] Add `favorite_models` handling to `apps/server/src/routes/settings.ts`.
|
||||
- [x] Define normalization rules for malformed, duplicate, or unavailable
|
||||
favorites.
|
||||
- [x] Ensure unavailable favorites are hidden from visible picker sections but
|
||||
never auto-deleted from settings.
|
||||
- [x] Keep favorites out of the server model-catalog payload; derive the
|
||||
Favorites section in the clients from settings + provider-aware inventory.
|
||||
|
||||
## P5 — BooChat UI
|
||||
|
||||
- [x] Update `apps/web/src/components/ModelPicker.tsx` to render:
|
||||
Favorites first, then provider sections.
|
||||
- [x] Add a per-model favorite toggle wired to `PATCH /api/settings`.
|
||||
- [x] Keep favorited models visible in their provider group as well as the
|
||||
Favorites section.
|
||||
- [x] Verify session model changes write composite IDs for new selections.
|
||||
|
||||
## P6 — BooCoder snapshot, dispatch, and arena
|
||||
|
||||
- [x] Update `apps/coder/src/services/provider-snapshot.ts` so BooCode's local
|
||||
`boocode` provider models retain composite IDs in snapshot data.
|
||||
- [x] Update the compact picker in
|
||||
`apps/web/src/components/AgentComposerBar.tsx` to match the grouped/favorite
|
||||
behavior used by BooChat for native local models.
|
||||
- [x] Update `apps/coder/src/services/arena-model-call.ts` and
|
||||
`apps/coder/src/services/arena-analyzer.ts` to use provider-aware routing.
|
||||
|
||||
## P7 — external-agent parity decision (`opencode`)
|
||||
|
||||
- [x] Decide whether the first slice includes `opencode` multi-provider local
|
||||
models or explicitly limits parity to native `boocode`.
|
||||
- [x] If `opencode` parity is included, add a provider-identity-preserving
|
||||
bridge instead of collapsing to `llama-swap/<wireModelId>`.
|
||||
- [x] Preferred bridge: a BooCoder-hosted OpenAI-compatible local-model gateway
|
||||
for consumers that still assume one provider namespace.
|
||||
- [x] If the bridge is deferred, stop advertising multi-provider local models
|
||||
under the `opencode` provider until the bridge exists.
|
||||
|
||||
## P8 — tests and verification
|
||||
|
||||
- [x] Add unit tests for model-ref parsing, legacy bare-ID fallback, and
|
||||
provider-aware routing.
|
||||
- [x] Add tests covering the `embedding/deepseek-r1-qwen3-8b` collision case.
|
||||
- [x] Add tests proving duplicate model names on two hosts do not share context
|
||||
cache entries.
|
||||
- [x] Add UI or route tests for favorites hide-not-delete behavior.
|
||||
(`apps/server/src/routes/__tests__/settings-favorites.test.ts`, DB-gated:
|
||||
unavailable favorite persists through PATCH/GET and unrelated writes;
|
||||
removal is explicit-only.)
|
||||
- [ ] Smoke test native BooChat/BooCoder against:
|
||||
`sam-desktop`, `embedding`, and DeepSeek-enabled configs.
|
||||
(API layer verified 2026-06-12: both hosts healthy, `/api/models` serving
|
||||
grouped composite ids live. Remaining: in-browser send-a-message pass per
|
||||
provider group + a DeepSeek-enabled config.)
|
||||
- [x] If `opencode` parity ships in-scope, add a smoke test proving duplicate
|
||||
local model names still route to the intended provider.
|
||||
(`apps/coder/src/services/__tests__/local-gateway-routing.test.ts`:
|
||||
resolver + HTTP-route level — same wire name routes to distinct baseUrls
|
||||
with the bare wire id upstream; unknown provider → 400, no upstream call.)
|
||||
Reference in New Issue
Block a user