chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions

View File

@@ -0,0 +1,311 @@
# multi-llama-swap-providers-model-favorites — implementation analysis
## Scope compared
- **Current state:** the shipped implementation in `apps/server`, `apps/coder`,
`apps/web`, and `packages/contracts`
- **Desired state:** the behavior described in
`docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md`
and the corresponding OpenSpec batch
Purpose: determine the safest and most coherent implementation path before
building the feature.
## Conclusion
The best implementation path is to treat this as a **shared local-model
routing subsystem**, not as a picker-only UI feature.
That subsystem needs two interfaces:
1. **An in-process resolver** used directly by BooChat and native BooCoder
paths.
2. **A gateway surface** for consumers that cannot call the resolver directly
and still assume one OpenAI-compatible provider contract.
Without that split, the feature looks straightforward in BooChat but stays
architecturally broken in BooCoder because the existing opencode integration
collapses provider identity back to one local llama-swap endpoint.
## Current-state findings
### F-001 — config authority is split
- `apps/server` is driven by `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and
`DEFAULT_MODEL`.
- `apps/coder` reuses `LLAMA_SWAP_URL` for local models and has a separate
`data/coder-providers.json` for ACP providers.
Effect: there is no single source of truth for local model providers that both
apps can consume.
### F-002 — model identity is still a raw string everywhere that matters
- `sessions.model` is `TEXT NOT NULL`.
- `chats.model` is `TEXT`.
- `model-context.ts` caches by the raw model string.
- multiple dispatchers treat the model as an opaque string and infer behavior
from prefixes.
Effect: duplicate model names across hosts cannot be represented safely without
composite IDs.
### F-003 — routing logic is duplicated and heuristic-heavy
- BooChat streaming uses `upstreamModel()` in `provider.ts`.
- non-streaming calls use `resolveModelEndpoint()`.
- context lookup bypasses both and fetches `LLAMA_SWAP_URL` directly.
- arena local calls bypass both and hit `LLAMA_SWAP_URL` directly.
Effect: even after adding a registry, call sites will diverge unless they all
share one resolver.
### F-004 — favorites are a UI concern backed by shared settings, not a server catalog concern
- The `settings` table is already the right persistence surface.
- BooChat already reads/writes server state.
- BooCoder currently keeps picker prefs in browser localStorage, but those are
provider-specific UI prefs, not a shared favorite-model feature.
Effect: favorites should be stored server-side and derived in the client from
`/api/settings` + provider-aware model data.
### F-005 — BooCoder has a deeper coupling than the research initially surfaced
The dangerous assumption is not only in `dispatcher.ts`. It is in the whole
opencode local-model bridge:
- the snapshot merges local llama models into the `opencode` provider by
prefixing them as `llama-swap/<model>`
- the dispatcher treats bare IDs as `llama-swap/<model>`
- the opencode backend parses `provider/model`
- current host opencode config points every local-model family at a single
llama-swap base URL
Effect: translating `embedding/qwen3.5-9b` back to `llama-swap/qwen3.5-9b`
reintroduces the exact ambiguity this batch is trying to remove.
### F-006 — Arena is a separate local-model consumer, not just another caller
Arena currently:
- builds its "local model" set from one live llama-swap list
- classifies local-vs-cloud contestants from that set
- performs one-shot local calls directly against `LLAMA_SWAP_URL`
Effect: arena needs the same provider-aware resolver as BooChat, but it does
not need the full BooChat picker/favorites work.
## Gap summary
### G-001 — no shared local-provider registry
What is missing:
- one schema and one loader contract for named local providers consumed by
both server and coder
Why it matters:
- every downstream fix becomes duplicated if config remains split
### G-002 — no canonical model-ref format and parser
What is missing:
- a shared `provider/model` identity format and parse/format helpers
Why it matters:
- caches, DB values, routing, and UI rendering cannot stay aligned otherwise
### G-003 — no single provider-aware resolver
What is missing:
- one shared resolver API for:
- route selection
- base URL selection
- sidecar selection
- wire-model extraction
- context-props endpoint selection
Why it matters:
- keeping separate "streaming", "non-streaming", "context", and "arena"
resolution paths will re-create subtle bugs
### G-004 — no neutral provider-aware catalog contract
What is missing:
- a provider-aware model catalog response that exposes providers and models
without baking favorites into the server payload
Why it matters:
- BooChat and BooCoder both need provider metadata, but favorites are derived
from user settings, not from upstream inventory
### G-005 — no safe path for opencode local-model parity
What is missing:
- either:
- a generated/synced opencode-facing local-model config, or
- a BooCoder-hosted OpenAI-compatible gateway that preserves provider
identity under one provider namespace, or
- a deliberate scope cut that removes multi-provider local models from the
`opencode` provider until that bridge exists
Why it matters:
- without one of these, the feature is correct in BooChat but false-advertised
in the `opencode` provider
## Recommended architecture
### 1. Shared local-provider registry
Add a new shared config surface for local inference providers, separate from
`data/coder-providers.json`.
Recommendation:
- schema in `packages/contracts`
- live file such as `/data/llama-providers.json`
- fallback synthesis from `LLAMA_SWAP_URL` and `LLAMA_SIDECAR_URL` while the
file is absent
This keeps ACP provider management and local model provider management as two
separate concerns.
### 2. Shared model-ref and resolver helpers
Add shared helpers for:
- parsing `provider/model`
- resolving legacy bare IDs to the default provider
- deciding route type
- selecting upstream base URL
- extracting the wire model id
All of these should be used by:
- server streaming inference
- server non-streaming calls
- model-context lookup
- arena one-shot local calls
- any future control-plane or routing feature
### 3. Provider-aware catalog, client-derived favorites
Do **not** make the server return a synthetic Favorites section.
Instead:
- `/api/models` (or a replacement contract) should return provider-grouped
inventory only
- `/api/settings` should hold `favorite_models: string[]`
- BooChat and BooCoder should derive:
- Favorites first
- then provider sections
- hide unavailable favorites without deleting them
This keeps the server contract inventory-shaped and the favorite behavior
user-shaped.
### 4. Treat BooCoder native and BooCoder external-agent paths differently
There are two different BooCoder consumers:
- **native `boocode` provider**
- **external-agent providers like `opencode`**
The native `boocode` provider can adopt the shared resolver directly.
The `opencode` provider cannot safely adopt `provider/model` by simple string
translation, because its current local-model bridge still assumes one local
provider.
Recommendation:
- ship native `boocode` provider parity first
- do **not** claim `opencode` parity until provider identity is preserved
end-to-end there too
### 5. Preferred parity path for opencode: a BooCoder-hosted local-model gateway
If full `opencode` parity is required in the same initiative, the cleanest path
is a small OpenAI-compatible gateway inside `apps/coder`:
- accepts model ids that still carry provider identity
- strips provider prefix only at the final upstream boundary
- routes to the correct local provider
- becomes the single local-model base URL for `opencode`
Why this is better than adding many direct opencode providers:
- one stable provider contract for opencode
- no duplicated base-URL registry in opencode config
- the same gateway can serve arena/local utility calls later
- it stays inside an existing always-on service, not a new third service
If this gateway is not in scope now, the correct fallback is to remove or hide
multi-provider local models from the `opencode` provider until the bridge is
real.
## Recommended sequence
### Phase 1 — shared foundation
- shared local-provider config schema
- shared `provider/model` parsing helpers
- shared resolver
- legacy bare-id fallback
### Phase 2 — BooChat + native BooCoder
- provider-aware model catalog
- server inference routing updates
- model-context cache-key fix
- compaction and task-model endpoint resolution
- BooChat picker grouping + server-side favorites
- BooCoder `boocode` provider model list grouped by local provider
### Phase 3 — arena parity
- local-model set built from the shared provider catalog, not one llama-swap
- one-shot local calls use the shared resolver
### Phase 4 — opencode parity
Choose one:
- preferred: BooCoder-hosted local-model gateway plus opencode-facing model
sync
- fallback: temporarily stop advertising multi-provider local models under the
`opencode` provider
### Phase 5 — boocontrol
- build BooControl only after the local-provider registry and canonical model
identity land
## What this changes in the existing OpenSpec batch
1. The design should treat favorites as **client-derived from settings**, not
as a server-generated catalog section.
2. The design should explicitly separate **native BooCoder parity** from
**opencode parity**.
3. The tasks should call out the `opencode` bridge as a dedicated risk area,
not as a small dispatcher rename.
## Recommendation
Implement the shared local-provider registry and resolver first, then ship
BooChat plus native BooCoder on top of it. Treat `opencode` multi-provider
support as a distinct integration seam that either gets a real gateway or stays
out of scope for the first slice.
That is the fastest path that is still architecturally honest.

View File

@@ -0,0 +1,238 @@
# multi-llama-swap-providers-model-favorites — design
Detailed implementation plan for named local model providers, composite model
IDs, grouped pickers, and shared favorites across BooChat and BooCoder.
## 1. Current state
Today the repo splits inference configuration across two incompatible shapes:
- `apps/server` reads env vars such as `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`,
and `DEFAULT_MODEL`.
- `apps/coder` reads the same `LLAMA_SWAP_URL` for BooCode's own provider, plus
`data/coder-providers.json` for ACP providers.
That leaves several hardcoded single-endpoint assumptions:
- `/api/models` fetches one llama-swap plus optional DeepSeek.
- `provider.ts` routes by `deepseek-` name prefix and a global sidecar default.
- `model-context.ts` caches by bare model string.
- `compaction.ts`, `task-model.ts`, and coder arena use a single upstream URL.
- BooCoder prepends `llama-swap/` and treats any other slash-containing value
as an already-routable provider namespace.
## 2. Design principles
1. Provider identity is explicit.
2. Wire model IDs stay bare; persisted model IDs are composite.
3. Legacy bare model IDs remain readable indefinitely.
4. Favorites are shared across BooChat and BooCoder.
5. Sidecar routing is opt-in per provider, not a global fallback.
6. Any cache keyed by model identity uses the full composite ID.
## 3. Recommended config authority
Introduce a new shared file for local inference providers:
- Live path: `/data/llama-providers.json`
- Env var for both apps: `LLAMA_PROVIDERS_PATH`
- Tracked example: `data/llama-providers.example.json`
Recommended shape:
```json
{
"defaultProvider": "sam-desktop",
"providers": [
{
"id": "sam-desktop",
"label": "Sam-desktop",
"baseUrl": "http://100.101.41.16:8401",
"sidecarUrl": "http://100.101.41.16:8402",
"kind": "llama-swap"
},
{
"id": "embedding",
"label": "embedding",
"baseUrl": "http://100.90.172.55:8411",
"kind": "llama-swap"
}
]
}
```
Rules:
- If the file is missing, synthesize a single legacy provider from
`LLAMA_SWAP_URL` and optional `LLAMA_SIDECAR_URL`.
- `data/coder-providers.json` remains the ACP registry and is not extended with
llama-swap base URLs.
- DeepSeek credentials remain env-backed, but the model catalog should expose a
synthetic provider group such as `deepseek` so routing no longer depends on a
bare `deepseek-` prefix.
## 4. Model identity and parsing
Persist model selections as `provider/model`.
Examples:
- `sam-desktop/qwen3.6-35b-a3b`
- `embedding/gemma-4-12b`
- `deepseek/deepseek-v4-pro`
Helper behavior:
- `parseModelRef(id)` returns `{ providerId, wireModelId, isLegacyBareId }`
- Bare IDs resolve to `{ providerId: defaultProvider, wireModelId: id }`
- Only strip the prefix at the final wire-call boundary
This preserves existing `TEXT` columns while fixing duplicate-name ambiguity.
## 5. Server changes
### 5.1 Shared registry + model catalog
Add shared registry utilities in `packages/contracts` plus server-side loaders
used by:
- `apps/server/src/config.ts`
- `apps/server/src/routes/models.ts`
- `apps/server/src/services/inference/provider.ts`
- `apps/server/src/services/model-context.ts`
- `apps/server/src/services/task-model.ts`
- `apps/server/src/services/compaction.ts`
`GET /api/models` should return a provider-aware payload. Recommended shape:
```ts
interface ModelCatalogProvider {
id: string;
label: string;
models: ModelInfo[];
}
interface ModelCatalogResponse {
providers: ModelCatalogProvider[];
}
```
Where each `ModelInfo.id` is already composite.
Favorites should **not** be embedded in this payload. They are a user-level
view derived in the client from `favorite_models` in `/api/settings`.
### 5.2 Routing
Replace string-heuristic routing with provider-aware resolution:
- `sam-desktop/*` routes to `baseUrl` or `sidecarUrl` depending on agent flags
and provider capabilities.
- `embedding/*` always routes directly to its llama-swap `baseUrl`.
- `deepseek/*` routes to the DeepSeek SDK provider.
`resolveModelEndpoint()` and `upstreamModel()` must both resolve from the same
parsed model reference to keep streaming and non-streaming behavior aligned.
### 5.3 Context lookup and cache keys
`model-context.ts` must key caches by the full composite ID. The provider
prefix is stripped only when building:
`<provider.baseUrl>/upstream/<wireModelId>/props`
This avoids cross-provider cache poisoning for duplicate names.
## 6. Persistence and settings
Keep:
- `sessions.model TEXT`
- `chats.model TEXT`
Add a new `settings` key:
- `favorite_models: string[]`
Rules:
- Stored favorites are composite IDs only.
- Missing/offline favorites are hidden from the picker, not deleted.
- Legacy bare favorites are not supported; on read they may be ignored or
normalized only if the default-provider mapping is unambiguous.
## 7. BooCoder integration
Touch points:
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/coder/src/services/dispatcher.ts`
- `apps/coder/src/services/arena-model-call.ts`
- `apps/coder/src/services/arena-analyzer.ts`
- `apps/coder/src/config.ts`
### 7.1 Native `boocode` provider
The native `boocode` provider can use the shared local-provider registry and
resolver directly. Its model list should expose composite `provider/model` ids
and the UI should group them by local provider.
### 7.2 External-agent parity is a separate seam
`opencode` is not safe to migrate by a naive string rewrite. The current bridge
assumes one local llama-swap provider and collapses identity back to
`llama-swap/<model>`.
Recommended bridge rule:
- Composite local model IDs remain `provider/model` in native BooCode state and UI.
- Do **not** translate `provider/model` back to `llama-swap/<wireModelId>` for
external-agent paths; that loses provider identity for duplicate model names.
- If full `opencode` parity is required, prefer a BooCoder-hosted
OpenAI-compatible local-model gateway that accepts provider-aware model ids
and routes them to the correct local upstream.
If the gateway is not part of the first slice, restrict the initial scope to
native `boocode` parity and keep `opencode` local-model parity as a follow-up.
## 8. Picker UX
Both BooChat and BooCoder should converge on the same behavior:
- Favorites section first
- Then one section per provider
- Favorite toggle on every model row
- A favorited model remains visible in its provider section
- Provider order defaults to:
1. `sam-desktop`
2. `embedding`
3. `deepseek` when configured
This batch does not require search. Search can be added later if model counts
make the grouped list insufficient.
## 9. Rollout and compatibility
1. Land registry/parsing utilities first.
2. Switch server routing and model catalog to composite IDs.
3. Add favorite persistence and picker grouping.
4. Update native BooCoder (`boocode`) model handling and arena.
5. Decide the `opencode` parity path: gateway now, or explicit follow-up.
6. Verify legacy bare IDs across existing chats and sessions before removing
any old env-based assumptions.
Compatibility requirements:
- Missing `/data/llama-providers.json` cannot break startup.
- Existing DB rows with bare IDs must remain routable.
- Existing `DEFAULT_MODEL` can stay bare during transition, but new writes
should become composite.
## 10. Deferred items
- Picker search/filtering
- Manual favorite ordering beyond insertion order
- Host health badges in the picker
- Automatic normalization of old session/chat model values
- Full `opencode` multi-provider parity if the first slice ships native-only
- Any boocontrol fleet UI built on top of this registry

View File

@@ -0,0 +1,73 @@
# multi-llama-swap-providers-model-favorites
## Why
BooCode still treats local inference as a single `LLAMA_SWAP_URL`, but the
actual setup is already a fleet:
- `sam-desktop` at `100.101.41.16:8401`
- `embedding` at `100.90.172.55:8411`
- optional DeepSeek cloud models when `DEEPSEEK_API_KEY` is set
The current model identity is only a bare model string, which is no longer
safe. Five model IDs already exist on both llama-swap hosts, the seeded
`DEFAULT_MODEL` has already drifted out of the live list once, and multiple
server/coder call sites still hardcode a single upstream.
The research in
`docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md`
validated one direction:
1. Introduce a named provider registry.
2. Store selected models as composite IDs: `provider/model`.
3. Group pickers by provider with a Favorites section first.
4. Persist favorites server-side so BooChat and BooCoder share them.
5. Remove single-endpoint assumptions from routing, context lookup,
compaction, arena, and coder dispatch.
This batch is also the prerequisite named in `openspec/changes/boocontrol/`.
## What Changes
1. Add a shared provider-registry config for local model providers.
2. Replace bare model identity with composite `provider/model` IDs at the API,
picker, cache, and routing layers while keeping legacy bare IDs readable.
3. Convert the server model catalog from a flat list into grouped provider
sections with favorites surfaced first.
4. Make sidecar routing an attribute of the `sam-desktop` provider instead of
a global default for all non-DeepSeek traffic.
5. Update BooCoder's llama-swap namespace bridge so composite IDs still
dispatch through opencode correctly.
6. Add server-side favorite persistence in `settings` with hide-not-delete
behavior for unavailable models.
## Non-goals
- Replacing the existing ACP provider registry in `data/coder-providers.json`
- Introducing llama-swap peer federation or LiteLLM as an aggregation layer
- Adding full-text search, tags, or admin curation to the pickers in this batch
- Cleaning up stale favorites automatically
- Reworking session/chat schema columns from `TEXT` to structured provider fields
## Success Criteria
- `GET /api/models` returns a provider-aware catalog that can distinguish
duplicate model names across hosts.
- Existing sessions/chats that store a bare model ID still work, resolving to
the default local provider without data migration.
- `embedding/deepseek-r1-qwen3-8b` never routes to DeepSeek cloud and never
receives the fake static 131k context window.
- Requests for `embedding/*` models never go through llama-sidecar.
- BooChat and BooCoder both render a Favorites section first, then provider
groups, and a favorited model still remains visible in its provider group.
- A favorite for an offline provider disappears from the visible list but
returns automatically when that provider comes back.
- Arena, compaction, task-model, and model-context all resolve the same
provider/model pair consistently.
## Deliverables
| Doc | Purpose |
|-----|---------|
| [`design.md`](./design.md) | Registry shape, model identity rules, routing, UX, rollout |
| [`tasks.md`](./tasks.md) | Ordered implementation and verification checklist |

View File

@@ -0,0 +1,104 @@
# multi-llama-swap-providers-model-favorites — tasks
## P0 — config and contracts
- [x] Add a shared local-provider config schema under `packages/contracts`.
- [x] Add `LLAMA_PROVIDERS_PATH` to `apps/server/src/config.ts` and
`apps/coder/src/config.ts`.
- [x] Add `data/llama-providers.example.json` with `sam-desktop` and
`embedding`.
- [x] Implement a loader that falls back to the legacy single-provider env vars
when the shared file is missing.
## P1 — model identity helpers
- [x] Add shared parsing/formatting helpers for composite model IDs:
`provider/model`.
- [x] Preserve indefinite support for legacy bare IDs by resolving them to the
configured default provider.
- [x] Update display-name helpers to strip only the provider prefix intended for
presentation, not for routing/cache identity.
## P2 — server model catalog and routing
- [x] Refactor `apps/server/src/routes/models.ts` to emit a provider-aware model
catalog with composite IDs.
- [x] Refactor `apps/server/src/services/inference/provider.ts` to resolve route
and base URL from provider identity instead of string heuristics alone.
- [x] Make sidecar routing a per-provider attribute so `embedding/*` never hits
`LLAMA_SIDECAR_URL`.
- [x] Replace the bare `deepseek-` prefix special case with provider-aware
handling for DeepSeek models.
## P3 — server call sites that currently assume one endpoint
- [x] Update `apps/server/src/services/model-context.ts` to fetch upstream props
from the resolved provider and key caches by the full composite ID.
- [x] Update `apps/server/src/services/compaction.ts` to use the resolved
provider endpoint for summaries.
- [x] Update `apps/server/src/services/task-model.ts` to resolve fallback models
through the same provider-aware endpoint logic.
- [x] Verify any other direct `LLAMA_SWAP_URL` usage in `apps/server` is either
migrated or explicitly documented as legacy-only.
## P4 — favorites persistence
- [x] Add `favorite_models` handling to `apps/server/src/routes/settings.ts`.
- [x] Define normalization rules for malformed, duplicate, or unavailable
favorites.
- [x] Ensure unavailable favorites are hidden from visible picker sections but
never auto-deleted from settings.
- [x] Keep favorites out of the server model-catalog payload; derive the
Favorites section in the clients from settings + provider-aware inventory.
## P5 — BooChat UI
- [x] Update `apps/web/src/components/ModelPicker.tsx` to render:
Favorites first, then provider sections.
- [x] Add a per-model favorite toggle wired to `PATCH /api/settings`.
- [x] Keep favorited models visible in their provider group as well as the
Favorites section.
- [x] Verify session model changes write composite IDs for new selections.
## P6 — BooCoder snapshot, dispatch, and arena
- [x] Update `apps/coder/src/services/provider-snapshot.ts` so BooCode's local
`boocode` provider models retain composite IDs in snapshot data.
- [x] Update the compact picker in
`apps/web/src/components/AgentComposerBar.tsx` to match the grouped/favorite
behavior used by BooChat for native local models.
- [x] Update `apps/coder/src/services/arena-model-call.ts` and
`apps/coder/src/services/arena-analyzer.ts` to use provider-aware routing.
## P7 — external-agent parity decision (`opencode`)
- [x] Decide whether the first slice includes `opencode` multi-provider local
models or explicitly limits parity to native `boocode`.
- [x] If `opencode` parity is included, add a provider-identity-preserving
bridge instead of collapsing to `llama-swap/<wireModelId>`.
- [x] Preferred bridge: a BooCoder-hosted OpenAI-compatible local-model gateway
for consumers that still assume one provider namespace.
- [x] If the bridge is deferred, stop advertising multi-provider local models
under the `opencode` provider until the bridge exists.
## P8 — tests and verification
- [x] Add unit tests for model-ref parsing, legacy bare-ID fallback, and
provider-aware routing.
- [x] Add tests covering the `embedding/deepseek-r1-qwen3-8b` collision case.
- [x] Add tests proving duplicate model names on two hosts do not share context
cache entries.
- [x] Add UI or route tests for favorites hide-not-delete behavior.
(`apps/server/src/routes/__tests__/settings-favorites.test.ts`, DB-gated:
unavailable favorite persists through PATCH/GET and unrelated writes;
removal is explicit-only.)
- [ ] Smoke test native BooChat/BooCoder against:
`sam-desktop`, `embedding`, and DeepSeek-enabled configs.
(API layer verified 2026-06-12: both hosts healthy, `/api/models` serving
grouped composite ids live. Remaining: in-browser send-a-message pass per
provider group + a DeepSeek-enabled config.)
- [x] If `opencode` parity ships in-scope, add a smoke test proving duplicate
local model names still route to the intended provider.
(`apps/coder/src/services/__tests__/local-gateway-routing.test.ts`:
resolver + HTTP-route level — same wire name routes to distinct baseUrls
with the bare wire id upstream; unknown provider → 400, no upstream call.)