chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
@@ -0,0 +1,311 @@
|
||||
# multi-llama-swap-providers-model-favorites — implementation analysis
|
||||
|
||||
## Scope compared
|
||||
|
||||
- **Current state:** the shipped implementation in `apps/server`, `apps/coder`,
|
||||
`apps/web`, and `packages/contracts`
|
||||
- **Desired state:** the behavior described in
|
||||
`docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md`
|
||||
and the corresponding OpenSpec batch
|
||||
|
||||
Purpose: determine the safest and most coherent implementation path before
|
||||
building the feature.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The best implementation path is to treat this as a **shared local-model
|
||||
routing subsystem**, not as a picker-only UI feature.
|
||||
|
||||
That subsystem needs two interfaces:
|
||||
|
||||
1. **An in-process resolver** used directly by BooChat and native BooCoder
|
||||
paths.
|
||||
2. **A gateway surface** for consumers that cannot call the resolver directly
|
||||
and still assume one OpenAI-compatible provider contract.
|
||||
|
||||
Without that split, the feature looks straightforward in BooChat but stays
|
||||
architecturally broken in BooCoder because the existing opencode integration
|
||||
collapses provider identity back to one local llama-swap endpoint.
|
||||
|
||||
## Current-state findings
|
||||
|
||||
### F-001 — config authority is split
|
||||
|
||||
- `apps/server` is driven by `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and
|
||||
`DEFAULT_MODEL`.
|
||||
- `apps/coder` reuses `LLAMA_SWAP_URL` for local models and has a separate
|
||||
`data/coder-providers.json` for ACP providers.
|
||||
|
||||
Effect: there is no single source of truth for local model providers that both
|
||||
apps can consume.
|
||||
|
||||
### F-002 — model identity is still a raw string everywhere that matters
|
||||
|
||||
- `sessions.model` is `TEXT NOT NULL`.
|
||||
- `chats.model` is `TEXT`.
|
||||
- `model-context.ts` caches by the raw model string.
|
||||
- multiple dispatchers treat the model as an opaque string and infer behavior
|
||||
from prefixes.
|
||||
|
||||
Effect: duplicate model names across hosts cannot be represented safely without
|
||||
composite IDs.
|
||||
|
||||
### F-003 — routing logic is duplicated and heuristic-heavy
|
||||
|
||||
- BooChat streaming uses `upstreamModel()` in `provider.ts`.
|
||||
- non-streaming calls use `resolveModelEndpoint()`.
|
||||
- context lookup bypasses both and fetches `LLAMA_SWAP_URL` directly.
|
||||
- arena local calls bypass both and hit `LLAMA_SWAP_URL` directly.
|
||||
|
||||
Effect: even after adding a registry, call sites will diverge unless they all
|
||||
share one resolver.
|
||||
|
||||
### F-004 — favorites are a UI concern backed by shared settings, not a server catalog concern
|
||||
|
||||
- The `settings` table is already the right persistence surface.
|
||||
- BooChat already reads/writes server state.
|
||||
- BooCoder currently keeps picker prefs in browser localStorage, but those are
|
||||
provider-specific UI prefs, not a shared favorite-model feature.
|
||||
|
||||
Effect: favorites should be stored server-side and derived in the client from
|
||||
`/api/settings` + provider-aware model data.
|
||||
|
||||
### F-005 — BooCoder has a deeper coupling than the research initially surfaced
|
||||
|
||||
The dangerous assumption is not only in `dispatcher.ts`. It is in the whole
|
||||
opencode local-model bridge:
|
||||
|
||||
- the snapshot merges local llama models into the `opencode` provider by
|
||||
prefixing them as `llama-swap/<model>`
|
||||
- the dispatcher treats bare IDs as `llama-swap/<model>`
|
||||
- the opencode backend parses `provider/model`
|
||||
- current host opencode config points every local-model family at a single
|
||||
llama-swap base URL
|
||||
|
||||
Effect: translating `embedding/qwen3.5-9b` back to `llama-swap/qwen3.5-9b`
|
||||
reintroduces the exact ambiguity this batch is trying to remove.
|
||||
|
||||
### F-006 — Arena is a separate local-model consumer, not just another caller
|
||||
|
||||
Arena currently:
|
||||
|
||||
- builds its "local model" set from one live llama-swap list
|
||||
- classifies local-vs-cloud contestants from that set
|
||||
- performs one-shot local calls directly against `LLAMA_SWAP_URL`
|
||||
|
||||
Effect: arena needs the same provider-aware resolver as BooChat, but it does
|
||||
not need the full BooChat picker/favorites work.
|
||||
|
||||
## Gap summary
|
||||
|
||||
### G-001 — no shared local-provider registry
|
||||
|
||||
What is missing:
|
||||
|
||||
- one schema and one loader contract for named local providers consumed by
|
||||
both server and coder
|
||||
|
||||
Why it matters:
|
||||
|
||||
- every downstream fix becomes duplicated if config remains split
|
||||
|
||||
### G-002 — no canonical model-ref format and parser
|
||||
|
||||
What is missing:
|
||||
|
||||
- a shared `provider/model` identity format and parse/format helpers
|
||||
|
||||
Why it matters:
|
||||
|
||||
- caches, DB values, routing, and UI rendering cannot stay aligned otherwise
|
||||
|
||||
### G-003 — no single provider-aware resolver
|
||||
|
||||
What is missing:
|
||||
|
||||
- one shared resolver API for:
|
||||
- route selection
|
||||
- base URL selection
|
||||
- sidecar selection
|
||||
- wire-model extraction
|
||||
- context-props endpoint selection
|
||||
|
||||
Why it matters:
|
||||
|
||||
- keeping separate "streaming", "non-streaming", "context", and "arena"
|
||||
resolution paths will re-create subtle bugs
|
||||
|
||||
### G-004 — no neutral provider-aware catalog contract
|
||||
|
||||
What is missing:
|
||||
|
||||
- a provider-aware model catalog response that exposes providers and models
|
||||
without baking favorites into the server payload
|
||||
|
||||
Why it matters:
|
||||
|
||||
- BooChat and BooCoder both need provider metadata, but favorites are derived
|
||||
from user settings, not from upstream inventory
|
||||
|
||||
### G-005 — no safe path for opencode local-model parity
|
||||
|
||||
What is missing:
|
||||
|
||||
- either:
|
||||
- a generated/synced opencode-facing local-model config, or
|
||||
- a BooCoder-hosted OpenAI-compatible gateway that preserves provider
|
||||
identity under one provider namespace, or
|
||||
- a deliberate scope cut that removes multi-provider local models from the
|
||||
`opencode` provider until that bridge exists
|
||||
|
||||
Why it matters:
|
||||
|
||||
- without one of these, the feature is correct in BooChat but false-advertised
|
||||
in the `opencode` provider
|
||||
|
||||
## Recommended architecture
|
||||
|
||||
### 1. Shared local-provider registry
|
||||
|
||||
Add a new shared config surface for local inference providers, separate from
|
||||
`data/coder-providers.json`.
|
||||
|
||||
Recommendation:
|
||||
|
||||
- schema in `packages/contracts`
|
||||
- live file such as `/data/llama-providers.json`
|
||||
- fallback synthesis from `LLAMA_SWAP_URL` and `LLAMA_SIDECAR_URL` while the
|
||||
file is absent
|
||||
|
||||
This keeps ACP provider management and local model provider management as two
|
||||
separate concerns.
|
||||
|
||||
### 2. Shared model-ref and resolver helpers
|
||||
|
||||
Add shared helpers for:
|
||||
|
||||
- parsing `provider/model`
|
||||
- resolving legacy bare IDs to the default provider
|
||||
- deciding route type
|
||||
- selecting upstream base URL
|
||||
- extracting the wire model id
|
||||
|
||||
All of these should be used by:
|
||||
|
||||
- server streaming inference
|
||||
- server non-streaming calls
|
||||
- model-context lookup
|
||||
- arena one-shot local calls
|
||||
- any future control-plane or routing feature
|
||||
|
||||
### 3. Provider-aware catalog, client-derived favorites
|
||||
|
||||
Do **not** make the server return a synthetic Favorites section.
|
||||
|
||||
Instead:
|
||||
|
||||
- `/api/models` (or a replacement contract) should return provider-grouped
|
||||
inventory only
|
||||
- `/api/settings` should hold `favorite_models: string[]`
|
||||
- BooChat and BooCoder should derive:
|
||||
- Favorites first
|
||||
- then provider sections
|
||||
- hide unavailable favorites without deleting them
|
||||
|
||||
This keeps the server contract inventory-shaped and the favorite behavior
|
||||
user-shaped.
|
||||
|
||||
### 4. Treat BooCoder native and BooCoder external-agent paths differently
|
||||
|
||||
There are two different BooCoder consumers:
|
||||
|
||||
- **native `boocode` provider**
|
||||
- **external-agent providers like `opencode`**
|
||||
|
||||
The native `boocode` provider can adopt the shared resolver directly.
|
||||
|
||||
The `opencode` provider cannot safely adopt `provider/model` by simple string
|
||||
translation, because its current local-model bridge still assumes one local
|
||||
provider.
|
||||
|
||||
Recommendation:
|
||||
|
||||
- ship native `boocode` provider parity first
|
||||
- do **not** claim `opencode` parity until provider identity is preserved
|
||||
end-to-end there too
|
||||
|
||||
### 5. Preferred parity path for opencode: a BooCoder-hosted local-model gateway
|
||||
|
||||
If full `opencode` parity is required in the same initiative, the cleanest path
|
||||
is a small OpenAI-compatible gateway inside `apps/coder`:
|
||||
|
||||
- accepts model ids that still carry provider identity
|
||||
- strips provider prefix only at the final upstream boundary
|
||||
- routes to the correct local provider
|
||||
- becomes the single local-model base URL for `opencode`
|
||||
|
||||
Why this is better than adding many direct opencode providers:
|
||||
|
||||
- one stable provider contract for opencode
|
||||
- no duplicated base-URL registry in opencode config
|
||||
- the same gateway can serve arena/local utility calls later
|
||||
- it stays inside an existing always-on service, not a new third service
|
||||
|
||||
If this gateway is not in scope now, the correct fallback is to remove or hide
|
||||
multi-provider local models from the `opencode` provider until the bridge is
|
||||
real.
|
||||
|
||||
## Recommended sequence
|
||||
|
||||
### Phase 1 — shared foundation
|
||||
|
||||
- shared local-provider config schema
|
||||
- shared `provider/model` parsing helpers
|
||||
- shared resolver
|
||||
- legacy bare-id fallback
|
||||
|
||||
### Phase 2 — BooChat + native BooCoder
|
||||
|
||||
- provider-aware model catalog
|
||||
- server inference routing updates
|
||||
- model-context cache-key fix
|
||||
- compaction and task-model endpoint resolution
|
||||
- BooChat picker grouping + server-side favorites
|
||||
- BooCoder `boocode` provider model list grouped by local provider
|
||||
|
||||
### Phase 3 — arena parity
|
||||
|
||||
- local-model set built from the shared provider catalog, not one llama-swap
|
||||
- one-shot local calls use the shared resolver
|
||||
|
||||
### Phase 4 — opencode parity
|
||||
|
||||
Choose one:
|
||||
|
||||
- preferred: BooCoder-hosted local-model gateway plus opencode-facing model
|
||||
sync
|
||||
- fallback: temporarily stop advertising multi-provider local models under the
|
||||
`opencode` provider
|
||||
|
||||
### Phase 5 — boocontrol
|
||||
|
||||
- build BooControl only after the local-provider registry and canonical model
|
||||
identity land
|
||||
|
||||
## What this changes in the existing OpenSpec batch
|
||||
|
||||
1. The design should treat favorites as **client-derived from settings**, not
|
||||
as a server-generated catalog section.
|
||||
2. The design should explicitly separate **native BooCoder parity** from
|
||||
**opencode parity**.
|
||||
3. The tasks should call out the `opencode` bridge as a dedicated risk area,
|
||||
not as a small dispatcher rename.
|
||||
|
||||
## Recommendation
|
||||
|
||||
Implement the shared local-provider registry and resolver first, then ship
|
||||
BooChat plus native BooCoder on top of it. Treat `opencode` multi-provider
|
||||
support as a distinct integration seam that either gets a real gateway or stays
|
||||
out of scope for the first slice.
|
||||
|
||||
That is the fastest path that is still architecturally honest.
|
||||
Reference in New Issue
Block a user