chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions

View File

@@ -0,0 +1,311 @@
# multi-llama-swap-providers-model-favorites — implementation analysis
## Scope compared
- **Current state:** the shipped implementation in `apps/server`, `apps/coder`,
`apps/web`, and `packages/contracts`
- **Desired state:** the behavior described in
`docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md`
and the corresponding OpenSpec batch
Purpose: determine the safest and most coherent implementation path before
building the feature.
## Conclusion
The best implementation path is to treat this as a **shared local-model
routing subsystem**, not as a picker-only UI feature.
That subsystem needs two interfaces:
1. **An in-process resolver** used directly by BooChat and native BooCoder
paths.
2. **A gateway surface** for consumers that cannot call the resolver directly
and still assume one OpenAI-compatible provider contract.
Without that split, the feature looks straightforward in BooChat but stays
architecturally broken in BooCoder because the existing opencode integration
collapses provider identity back to one local llama-swap endpoint.
## Current-state findings
### F-001 — config authority is split
- `apps/server` is driven by `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and
`DEFAULT_MODEL`.
- `apps/coder` reuses `LLAMA_SWAP_URL` for local models and has a separate
`data/coder-providers.json` for ACP providers.
Effect: there is no single source of truth for local model providers that both
apps can consume.
### F-002 — model identity is still a raw string everywhere that matters
- `sessions.model` is `TEXT NOT NULL`.
- `chats.model` is `TEXT`.
- `model-context.ts` caches by the raw model string.
- multiple dispatchers treat the model as an opaque string and infer behavior
from prefixes.
Effect: duplicate model names across hosts cannot be represented safely without
composite IDs.
### F-003 — routing logic is duplicated and heuristic-heavy
- BooChat streaming uses `upstreamModel()` in `provider.ts`.
- non-streaming calls use `resolveModelEndpoint()`.
- context lookup bypasses both and fetches `LLAMA_SWAP_URL` directly.
- arena local calls bypass both and hit `LLAMA_SWAP_URL` directly.
Effect: even after adding a registry, call sites will diverge unless they all
share one resolver.
### F-004 — favorites are a UI concern backed by shared settings, not a server catalog concern
- The `settings` table is already the right persistence surface.
- BooChat already reads/writes server state.
- BooCoder currently keeps picker prefs in browser localStorage, but those are
provider-specific UI prefs, not a shared favorite-model feature.
Effect: favorites should be stored server-side and derived in the client from
`/api/settings` + provider-aware model data.
### F-005 — BooCoder has a deeper coupling than the research initially surfaced
The dangerous assumption is not only in `dispatcher.ts`. It is in the whole
opencode local-model bridge:
- the snapshot merges local llama models into the `opencode` provider by
prefixing them as `llama-swap/<model>`
- the dispatcher treats bare IDs as `llama-swap/<model>`
- the opencode backend parses `provider/model`
- current host opencode config points every local-model family at a single
llama-swap base URL
Effect: translating `embedding/qwen3.5-9b` back to `llama-swap/qwen3.5-9b`
reintroduces the exact ambiguity this batch is trying to remove.
### F-006 — Arena is a separate local-model consumer, not just another caller
Arena currently:
- builds its "local model" set from one live llama-swap list
- classifies local-vs-cloud contestants from that set
- performs one-shot local calls directly against `LLAMA_SWAP_URL`
Effect: arena needs the same provider-aware resolver as BooChat, but it does
not need the full BooChat picker/favorites work.
## Gap summary
### G-001 — no shared local-provider registry
What is missing:
- one schema and one loader contract for named local providers consumed by
both server and coder
Why it matters:
- every downstream fix becomes duplicated if config remains split
### G-002 — no canonical model-ref format and parser
What is missing:
- a shared `provider/model` identity format and parse/format helpers
Why it matters:
- caches, DB values, routing, and UI rendering cannot stay aligned otherwise
### G-003 — no single provider-aware resolver
What is missing:
- one shared resolver API for:
- route selection
- base URL selection
- sidecar selection
- wire-model extraction
- context-props endpoint selection
Why it matters:
- keeping separate "streaming", "non-streaming", "context", and "arena"
resolution paths will re-create subtle bugs
### G-004 — no neutral provider-aware catalog contract
What is missing:
- a provider-aware model catalog response that exposes providers and models
without baking favorites into the server payload
Why it matters:
- BooChat and BooCoder both need provider metadata, but favorites are derived
from user settings, not from upstream inventory
### G-005 — no safe path for opencode local-model parity
What is missing:
- either:
- a generated/synced opencode-facing local-model config, or
- a BooCoder-hosted OpenAI-compatible gateway that preserves provider
identity under one provider namespace, or
- a deliberate scope cut that removes multi-provider local models from the
`opencode` provider until that bridge exists
Why it matters:
- without one of these, the feature is correct in BooChat but false-advertised
in the `opencode` provider
## Recommended architecture
### 1. Shared local-provider registry
Add a new shared config surface for local inference providers, separate from
`data/coder-providers.json`.
Recommendation:
- schema in `packages/contracts`
- live file such as `/data/llama-providers.json`
- fallback synthesis from `LLAMA_SWAP_URL` and `LLAMA_SIDECAR_URL` while the
file is absent
This keeps ACP provider management and local model provider management as two
separate concerns.
### 2. Shared model-ref and resolver helpers
Add shared helpers for:
- parsing `provider/model`
- resolving legacy bare IDs to the default provider
- deciding route type
- selecting upstream base URL
- extracting the wire model id
All of these should be used by:
- server streaming inference
- server non-streaming calls
- model-context lookup
- arena one-shot local calls
- any future control-plane or routing feature
### 3. Provider-aware catalog, client-derived favorites
Do **not** make the server return a synthetic Favorites section.
Instead:
- `/api/models` (or a replacement contract) should return provider-grouped
inventory only
- `/api/settings` should hold `favorite_models: string[]`
- BooChat and BooCoder should derive:
- Favorites first
- then provider sections
- hide unavailable favorites without deleting them
This keeps the server contract inventory-shaped and the favorite behavior
user-shaped.
### 4. Treat BooCoder native and BooCoder external-agent paths differently
There are two different BooCoder consumers:
- **native `boocode` provider**
- **external-agent providers like `opencode`**
The native `boocode` provider can adopt the shared resolver directly.
The `opencode` provider cannot safely adopt `provider/model` by simple string
translation, because its current local-model bridge still assumes one local
provider.
Recommendation:
- ship native `boocode` provider parity first
- do **not** claim `opencode` parity until provider identity is preserved
end-to-end there too
### 5. Preferred parity path for opencode: a BooCoder-hosted local-model gateway
If full `opencode` parity is required in the same initiative, the cleanest path
is a small OpenAI-compatible gateway inside `apps/coder`:
- accepts model ids that still carry provider identity
- strips provider prefix only at the final upstream boundary
- routes to the correct local provider
- becomes the single local-model base URL for `opencode`
Why this is better than adding many direct opencode providers:
- one stable provider contract for opencode
- no duplicated base-URL registry in opencode config
- the same gateway can serve arena/local utility calls later
- it stays inside an existing always-on service, not a new third service
If this gateway is not in scope now, the correct fallback is to remove or hide
multi-provider local models from the `opencode` provider until the bridge is
real.
## Recommended sequence
### Phase 1 — shared foundation
- shared local-provider config schema
- shared `provider/model` parsing helpers
- shared resolver
- legacy bare-id fallback
### Phase 2 — BooChat + native BooCoder
- provider-aware model catalog
- server inference routing updates
- model-context cache-key fix
- compaction and task-model endpoint resolution
- BooChat picker grouping + server-side favorites
- BooCoder `boocode` provider model list grouped by local provider
### Phase 3 — arena parity
- local-model set built from the shared provider catalog, not one llama-swap
- one-shot local calls use the shared resolver
### Phase 4 — opencode parity
Choose one:
- preferred: BooCoder-hosted local-model gateway plus opencode-facing model
sync
- fallback: temporarily stop advertising multi-provider local models under the
`opencode` provider
### Phase 5 — boocontrol
- build BooControl only after the local-provider registry and canonical model
identity land
## What this changes in the existing OpenSpec batch
1. The design should treat favorites as **client-derived from settings**, not
as a server-generated catalog section.
2. The design should explicitly separate **native BooCoder parity** from
**opencode parity**.
3. The tasks should call out the `opencode` bridge as a dedicated risk area,
not as a small dispatcher rename.
## Recommendation
Implement the shared local-provider registry and resolver first, then ship
BooChat plus native BooCoder on top of it. Treat `opencode` multi-provider
support as a distinct integration seam that either gets a real gateway or stays
out of scope for the first slice.
That is the fastest path that is still architecturally honest.