chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions

View File

@@ -0,0 +1,238 @@
# multi-llama-swap-providers-model-favorites — design
Detailed implementation plan for named local model providers, composite model
IDs, grouped pickers, and shared favorites across BooChat and BooCoder.
## 1. Current state
Today the repo splits inference configuration across two incompatible shapes:
- `apps/server` reads env vars such as `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`,
and `DEFAULT_MODEL`.
- `apps/coder` reads the same `LLAMA_SWAP_URL` for BooCode's own provider, plus
`data/coder-providers.json` for ACP providers.
That leaves several hardcoded single-endpoint assumptions:
- `/api/models` fetches one llama-swap plus optional DeepSeek.
- `provider.ts` routes by `deepseek-` name prefix and a global sidecar default.
- `model-context.ts` caches by bare model string.
- `compaction.ts`, `task-model.ts`, and coder arena use a single upstream URL.
- BooCoder prepends `llama-swap/` and treats any other slash-containing value
as an already-routable provider namespace.
## 2. Design principles
1. Provider identity is explicit.
2. Wire model IDs stay bare; persisted model IDs are composite.
3. Legacy bare model IDs remain readable indefinitely.
4. Favorites are shared across BooChat and BooCoder.
5. Sidecar routing is opt-in per provider, not a global fallback.
6. Any cache keyed by model identity uses the full composite ID.
## 3. Recommended config authority
Introduce a new shared file for local inference providers:
- Live path: `/data/llama-providers.json`
- Env var for both apps: `LLAMA_PROVIDERS_PATH`
- Tracked example: `data/llama-providers.example.json`
Recommended shape:
```json
{
"defaultProvider": "sam-desktop",
"providers": [
{
"id": "sam-desktop",
"label": "Sam-desktop",
"baseUrl": "http://100.101.41.16:8401",
"sidecarUrl": "http://100.101.41.16:8402",
"kind": "llama-swap"
},
{
"id": "embedding",
"label": "embedding",
"baseUrl": "http://100.90.172.55:8411",
"kind": "llama-swap"
}
]
}
```
Rules:
- If the file is missing, synthesize a single legacy provider from
`LLAMA_SWAP_URL` and optional `LLAMA_SIDECAR_URL`.
- `data/coder-providers.json` remains the ACP registry and is not extended with
llama-swap base URLs.
- DeepSeek credentials remain env-backed, but the model catalog should expose a
synthetic provider group such as `deepseek` so routing no longer depends on a
bare `deepseek-` prefix.
## 4. Model identity and parsing
Persist model selections as `provider/model`.
Examples:
- `sam-desktop/qwen3.6-35b-a3b`
- `embedding/gemma-4-12b`
- `deepseek/deepseek-v4-pro`
Helper behavior:
- `parseModelRef(id)` returns `{ providerId, wireModelId, isLegacyBareId }`
- Bare IDs resolve to `{ providerId: defaultProvider, wireModelId: id }`
- Only strip the prefix at the final wire-call boundary
This preserves existing `TEXT` columns while fixing duplicate-name ambiguity.
## 5. Server changes
### 5.1 Shared registry + model catalog
Add shared registry utilities in `packages/contracts` plus server-side loaders
used by:
- `apps/server/src/config.ts`
- `apps/server/src/routes/models.ts`
- `apps/server/src/services/inference/provider.ts`
- `apps/server/src/services/model-context.ts`
- `apps/server/src/services/task-model.ts`
- `apps/server/src/services/compaction.ts`
`GET /api/models` should return a provider-aware payload. Recommended shape:
```ts
interface ModelCatalogProvider {
id: string;
label: string;
models: ModelInfo[];
}
interface ModelCatalogResponse {
providers: ModelCatalogProvider[];
}
```
Where each `ModelInfo.id` is already composite.
Favorites should **not** be embedded in this payload. They are a user-level
view derived in the client from `favorite_models` in `/api/settings`.
### 5.2 Routing
Replace string-heuristic routing with provider-aware resolution:
- `sam-desktop/*` routes to `baseUrl` or `sidecarUrl` depending on agent flags
and provider capabilities.
- `embedding/*` always routes directly to its llama-swap `baseUrl`.
- `deepseek/*` routes to the DeepSeek SDK provider.
`resolveModelEndpoint()` and `upstreamModel()` must both resolve from the same
parsed model reference to keep streaming and non-streaming behavior aligned.
### 5.3 Context lookup and cache keys
`model-context.ts` must key caches by the full composite ID. The provider
prefix is stripped only when building:
`<provider.baseUrl>/upstream/<wireModelId>/props`
This avoids cross-provider cache poisoning for duplicate names.
## 6. Persistence and settings
Keep:
- `sessions.model TEXT`
- `chats.model TEXT`
Add a new `settings` key:
- `favorite_models: string[]`
Rules:
- Stored favorites are composite IDs only.
- Missing/offline favorites are hidden from the picker, not deleted.
- Legacy bare favorites are not supported; on read they may be ignored or
normalized only if the default-provider mapping is unambiguous.
## 7. BooCoder integration
Touch points:
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/coder/src/services/dispatcher.ts`
- `apps/coder/src/services/arena-model-call.ts`
- `apps/coder/src/services/arena-analyzer.ts`
- `apps/coder/src/config.ts`
### 7.1 Native `boocode` provider
The native `boocode` provider can use the shared local-provider registry and
resolver directly. Its model list should expose composite `provider/model` ids
and the UI should group them by local provider.
### 7.2 External-agent parity is a separate seam
`opencode` is not safe to migrate by a naive string rewrite. The current bridge
assumes one local llama-swap provider and collapses identity back to
`llama-swap/<model>`.
Recommended bridge rule:
- Composite local model IDs remain `provider/model` in native BooCode state and UI.
- Do **not** translate `provider/model` back to `llama-swap/<wireModelId>` for
external-agent paths; that loses provider identity for duplicate model names.
- If full `opencode` parity is required, prefer a BooCoder-hosted
OpenAI-compatible local-model gateway that accepts provider-aware model ids
and routes them to the correct local upstream.
If the gateway is not part of the first slice, restrict the initial scope to
native `boocode` parity and keep `opencode` local-model parity as a follow-up.
## 8. Picker UX
Both BooChat and BooCoder should converge on the same behavior:
- Favorites section first
- Then one section per provider
- Favorite toggle on every model row
- A favorited model remains visible in its provider section
- Provider order defaults to:
1. `sam-desktop`
2. `embedding`
3. `deepseek` when configured
This batch does not require search. Search can be added later if model counts
make the grouped list insufficient.
## 9. Rollout and compatibility
1. Land registry/parsing utilities first.
2. Switch server routing and model catalog to composite IDs.
3. Add favorite persistence and picker grouping.
4. Update native BooCoder (`boocode`) model handling and arena.
5. Decide the `opencode` parity path: gateway now, or explicit follow-up.
6. Verify legacy bare IDs across existing chats and sessions before removing
any old env-based assumptions.
Compatibility requirements:
- Missing `/data/llama-providers.json` cannot break startup.
- Existing DB rows with bare IDs must remain routable.
- Existing `DEFAULT_MODEL` can stay bare during transition, but new writes
should become composite.
## 10. Deferred items
- Picker search/filtering
- Manual favorite ordering beyond insertion order
- Host health badges in the picker
- Automatic normalization of old session/chat model values
- Full `opencode` multi-provider parity if the first slice ships native-only
- Any boocontrol fleet UI built on top of this registry