chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions
--- a/openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md
+++ b/openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md
@@ -0,0 +1,311 @@
+# multi-llama-swap-providers-model-favorites — implementation analysis
+
+## Scope compared
+
+- **Current state:** the shipped implementation in `apps/server`, `apps/coder`,
+  `apps/web`, and `packages/contracts`
+- **Desired state:** the behavior described in
+  `docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md`
+  and the corresponding OpenSpec batch
+
+Purpose: determine the safest and most coherent implementation path before
+building the feature.
+
+## Conclusion
+
+The best implementation path is to treat this as a **shared local-model
+routing subsystem**, not as a picker-only UI feature.
+
+That subsystem needs two interfaces:
+
+1. **An in-process resolver** used directly by BooChat and native BooCoder
+   paths.
+2. **A gateway surface** for consumers that cannot call the resolver directly
+   and still assume one OpenAI-compatible provider contract.
+
+Without that split, the feature looks straightforward in BooChat but stays
+architecturally broken in BooCoder because the existing opencode integration
+collapses provider identity back to one local llama-swap endpoint.
+
+## Current-state findings
+
+### F-001 — config authority is split
+
+- `apps/server` is driven by `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and
+  `DEFAULT_MODEL`.
+- `apps/coder` reuses `LLAMA_SWAP_URL` for local models and has a separate
+  `data/coder-providers.json` for ACP providers.
+
+Effect: there is no single source of truth for local model providers that both
+apps can consume.
+
+### F-002 — model identity is still a raw string everywhere that matters
+
+- `sessions.model` is `TEXT NOT NULL`.
+- `chats.model` is `TEXT`.
+- `model-context.ts` caches by the raw model string.
+- multiple dispatchers treat the model as an opaque string and infer behavior
+  from prefixes.
+
+Effect: duplicate model names across hosts cannot be represented safely without
+composite IDs.
+
+### F-003 — routing logic is duplicated and heuristic-heavy
+
+- BooChat streaming uses `upstreamModel()` in `provider.ts`.
+- non-streaming calls use `resolveModelEndpoint()`.
+- context lookup bypasses both and fetches `LLAMA_SWAP_URL` directly.
+- arena local calls bypass both and hit `LLAMA_SWAP_URL` directly.
+
+Effect: even after adding a registry, call sites will diverge unless they all
+share one resolver.
+
+### F-004 — favorites are a UI concern backed by shared settings, not a server catalog concern
+
+- The `settings` table is already the right persistence surface.
+- BooChat already reads/writes server state.
+- BooCoder currently keeps picker prefs in browser localStorage, but those are
+  provider-specific UI prefs, not a shared favorite-model feature.
+
+Effect: favorites should be stored server-side and derived in the client from
+`/api/settings` + provider-aware model data.
+
+### F-005 — BooCoder has a deeper coupling than the research initially surfaced
+
+The dangerous assumption is not only in `dispatcher.ts`. It is in the whole
+opencode local-model bridge:
+
+- the snapshot merges local llama models into the `opencode` provider by
+  prefixing them as `llama-swap/<model>`
+- the dispatcher treats bare IDs as `llama-swap/<model>`
+- the opencode backend parses `provider/model`
+- current host opencode config points every local-model family at a single
+  llama-swap base URL
+
+Effect: translating `embedding/qwen3.5-9b` back to `llama-swap/qwen3.5-9b`
+reintroduces the exact ambiguity this batch is trying to remove.
+
+### F-006 — Arena is a separate local-model consumer, not just another caller
+
+Arena currently:
+
+- builds its "local model" set from one live llama-swap list
+- classifies local-vs-cloud contestants from that set
+- performs one-shot local calls directly against `LLAMA_SWAP_URL`
+
+Effect: arena needs the same provider-aware resolver as BooChat, but it does
+not need the full BooChat picker/favorites work.
+
+## Gap summary
+
+### G-001 — no shared local-provider registry
+
+What is missing:
+
+- one schema and one loader contract for named local providers consumed by
+  both server and coder
+
+Why it matters:
+
+- every downstream fix becomes duplicated if config remains split
+
+### G-002 — no canonical model-ref format and parser
+
+What is missing:
+
+- a shared `provider/model` identity format and parse/format helpers
+
+Why it matters:
+
+- caches, DB values, routing, and UI rendering cannot stay aligned otherwise
+
+### G-003 — no single provider-aware resolver
+
+What is missing:
+
+- one shared resolver API for:
+  - route selection
+  - base URL selection
+  - sidecar selection
+  - wire-model extraction
+  - context-props endpoint selection
+
+Why it matters:
+
+- keeping separate "streaming", "non-streaming", "context", and "arena"
+  resolution paths will re-create subtle bugs
+
+### G-004 — no neutral provider-aware catalog contract
+
+What is missing:
+
+- a provider-aware model catalog response that exposes providers and models
+  without baking favorites into the server payload
+
+Why it matters:
+
+- BooChat and BooCoder both need provider metadata, but favorites are derived
+  from user settings, not from upstream inventory
+
+### G-005 — no safe path for opencode local-model parity
+
+What is missing:
+
+- either:
+  - a generated/synced opencode-facing local-model config, or
+  - a BooCoder-hosted OpenAI-compatible gateway that preserves provider
+    identity under one provider namespace, or
+  - a deliberate scope cut that removes multi-provider local models from the
+    `opencode` provider until that bridge exists
+
+Why it matters:
+
+- without one of these, the feature is correct in BooChat but false-advertised
+  in the `opencode` provider
+
+## Recommended architecture
+
+### 1. Shared local-provider registry
+
+Add a new shared config surface for local inference providers, separate from
+`data/coder-providers.json`.
+
+Recommendation:
+
+- schema in `packages/contracts`
+- live file such as `/data/llama-providers.json`
+- fallback synthesis from `LLAMA_SWAP_URL` and `LLAMA_SIDECAR_URL` while the
+  file is absent
+
+This keeps ACP provider management and local model provider management as two
+separate concerns.
+
+### 2. Shared model-ref and resolver helpers
+
+Add shared helpers for:
+
+- parsing `provider/model`
+- resolving legacy bare IDs to the default provider
+- deciding route type
+- selecting upstream base URL
+- extracting the wire model id
+
+All of these should be used by:
+
+- server streaming inference
+- server non-streaming calls
+- model-context lookup
+- arena one-shot local calls
+- any future control-plane or routing feature
+
+### 3. Provider-aware catalog, client-derived favorites
+
+Do **not** make the server return a synthetic Favorites section.
+
+Instead:
+
+- `/api/models` (or a replacement contract) should return provider-grouped
+  inventory only
+- `/api/settings` should hold `favorite_models: string[]`
+- BooChat and BooCoder should derive:
+  - Favorites first
+  - then provider sections
+  - hide unavailable favorites without deleting them
+
+This keeps the server contract inventory-shaped and the favorite behavior
+user-shaped.
+
+### 4. Treat BooCoder native and BooCoder external-agent paths differently
+
+There are two different BooCoder consumers:
+
+- **native `boocode` provider**
+- **external-agent providers like `opencode`**
+
+The native `boocode` provider can adopt the shared resolver directly.
+
+The `opencode` provider cannot safely adopt `provider/model` by simple string
+translation, because its current local-model bridge still assumes one local
+provider.
+
+Recommendation:
+
+- ship native `boocode` provider parity first
+- do **not** claim `opencode` parity until provider identity is preserved
+  end-to-end there too
+
+### 5. Preferred parity path for opencode: a BooCoder-hosted local-model gateway
+
+If full `opencode` parity is required in the same initiative, the cleanest path
+is a small OpenAI-compatible gateway inside `apps/coder`:
+
+- accepts model ids that still carry provider identity
+- strips provider prefix only at the final upstream boundary
+- routes to the correct local provider
+- becomes the single local-model base URL for `opencode`
+
+Why this is better than adding many direct opencode providers:
+
+- one stable provider contract for opencode
+- no duplicated base-URL registry in opencode config
+- the same gateway can serve arena/local utility calls later
+- it stays inside an existing always-on service, not a new third service
+
+If this gateway is not in scope now, the correct fallback is to remove or hide
+multi-provider local models from the `opencode` provider until the bridge is
+real.
+
+## Recommended sequence
+
+### Phase 1 — shared foundation
+
+- shared local-provider config schema
+- shared `provider/model` parsing helpers
+- shared resolver
+- legacy bare-id fallback
+
+### Phase 2 — BooChat + native BooCoder
+
+- provider-aware model catalog
+- server inference routing updates
+- model-context cache-key fix
+- compaction and task-model endpoint resolution
+- BooChat picker grouping + server-side favorites
+- BooCoder `boocode` provider model list grouped by local provider
+
+### Phase 3 — arena parity
+
+- local-model set built from the shared provider catalog, not one llama-swap
+- one-shot local calls use the shared resolver
+
+### Phase 4 — opencode parity
+
+Choose one:
+
+- preferred: BooCoder-hosted local-model gateway plus opencode-facing model
+  sync
+- fallback: temporarily stop advertising multi-provider local models under the
+  `opencode` provider
+
+### Phase 5 — boocontrol
+
+- build BooControl only after the local-provider registry and canonical model
+  identity land
+
+## What this changes in the existing OpenSpec batch
+
+1. The design should treat favorites as **client-derived from settings**, not
+   as a server-generated catalog section.
+2. The design should explicitly separate **native BooCoder parity** from
+   **opencode parity**.
+3. The tasks should call out the `opencode` bridge as a dedicated risk area,
+   not as a small dispatcher rename.
+
+## Recommendation
+
+Implement the shared local-provider registry and resolver first, then ship
+BooChat plus native BooCoder on top of it. Treat `opencode` multi-provider
+support as a distinct integration seam that either gets a real gateway or stays
+out of scope for the first slice.
+
+That is the fastest path that is still architecturally honest.