chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
@@ -0,0 +1,238 @@
|
||||
# multi-llama-swap-providers-model-favorites — design
|
||||
|
||||
Detailed implementation plan for named local model providers, composite model
|
||||
IDs, grouped pickers, and shared favorites across BooChat and BooCoder.
|
||||
|
||||
## 1. Current state
|
||||
|
||||
Today the repo splits inference configuration across two incompatible shapes:
|
||||
|
||||
- `apps/server` reads env vars such as `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`,
|
||||
and `DEFAULT_MODEL`.
|
||||
- `apps/coder` reads the same `LLAMA_SWAP_URL` for BooCode's own provider, plus
|
||||
`data/coder-providers.json` for ACP providers.
|
||||
|
||||
That leaves several hardcoded single-endpoint assumptions:
|
||||
|
||||
- `/api/models` fetches one llama-swap plus optional DeepSeek.
|
||||
- `provider.ts` routes by `deepseek-` name prefix and a global sidecar default.
|
||||
- `model-context.ts` caches by bare model string.
|
||||
- `compaction.ts`, `task-model.ts`, and coder arena use a single upstream URL.
|
||||
- BooCoder prepends `llama-swap/` and treats any other slash-containing value
|
||||
as an already-routable provider namespace.
|
||||
|
||||
## 2. Design principles
|
||||
|
||||
1. Provider identity is explicit.
|
||||
2. Wire model IDs stay bare; persisted model IDs are composite.
|
||||
3. Legacy bare model IDs remain readable indefinitely.
|
||||
4. Favorites are shared across BooChat and BooCoder.
|
||||
5. Sidecar routing is opt-in per provider, not a global fallback.
|
||||
6. Any cache keyed by model identity uses the full composite ID.
|
||||
|
||||
## 3. Recommended config authority
|
||||
|
||||
Introduce a new shared file for local inference providers:
|
||||
|
||||
- Live path: `/data/llama-providers.json`
|
||||
- Env var for both apps: `LLAMA_PROVIDERS_PATH`
|
||||
- Tracked example: `data/llama-providers.example.json`
|
||||
|
||||
Recommended shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"defaultProvider": "sam-desktop",
|
||||
"providers": [
|
||||
{
|
||||
"id": "sam-desktop",
|
||||
"label": "Sam-desktop",
|
||||
"baseUrl": "http://100.101.41.16:8401",
|
||||
"sidecarUrl": "http://100.101.41.16:8402",
|
||||
"kind": "llama-swap"
|
||||
},
|
||||
{
|
||||
"id": "embedding",
|
||||
"label": "embedding",
|
||||
"baseUrl": "http://100.90.172.55:8411",
|
||||
"kind": "llama-swap"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- If the file is missing, synthesize a single legacy provider from
|
||||
`LLAMA_SWAP_URL` and optional `LLAMA_SIDECAR_URL`.
|
||||
- `data/coder-providers.json` remains the ACP registry and is not extended with
|
||||
llama-swap base URLs.
|
||||
- DeepSeek credentials remain env-backed, but the model catalog should expose a
|
||||
synthetic provider group such as `deepseek` so routing no longer depends on a
|
||||
bare `deepseek-` prefix.
|
||||
|
||||
## 4. Model identity and parsing
|
||||
|
||||
Persist model selections as `provider/model`.
|
||||
|
||||
Examples:
|
||||
|
||||
- `sam-desktop/qwen3.6-35b-a3b`
|
||||
- `embedding/gemma-4-12b`
|
||||
- `deepseek/deepseek-v4-pro`
|
||||
|
||||
Helper behavior:
|
||||
|
||||
- `parseModelRef(id)` returns `{ providerId, wireModelId, isLegacyBareId }`
|
||||
- Bare IDs resolve to `{ providerId: defaultProvider, wireModelId: id }`
|
||||
- Only strip the prefix at the final wire-call boundary
|
||||
|
||||
This preserves existing `TEXT` columns while fixing duplicate-name ambiguity.
|
||||
|
||||
## 5. Server changes
|
||||
|
||||
### 5.1 Shared registry + model catalog
|
||||
|
||||
Add shared registry utilities in `packages/contracts` plus server-side loaders
|
||||
used by:
|
||||
|
||||
- `apps/server/src/config.ts`
|
||||
- `apps/server/src/routes/models.ts`
|
||||
- `apps/server/src/services/inference/provider.ts`
|
||||
- `apps/server/src/services/model-context.ts`
|
||||
- `apps/server/src/services/task-model.ts`
|
||||
- `apps/server/src/services/compaction.ts`
|
||||
|
||||
`GET /api/models` should return a provider-aware payload. Recommended shape:
|
||||
|
||||
```ts
|
||||
interface ModelCatalogProvider {
|
||||
id: string;
|
||||
label: string;
|
||||
models: ModelInfo[];
|
||||
}
|
||||
|
||||
interface ModelCatalogResponse {
|
||||
providers: ModelCatalogProvider[];
|
||||
}
|
||||
```
|
||||
|
||||
Where each `ModelInfo.id` is already composite.
|
||||
|
||||
Favorites should **not** be embedded in this payload. They are a user-level
|
||||
view derived in the client from `favorite_models` in `/api/settings`.
|
||||
|
||||
### 5.2 Routing
|
||||
|
||||
Replace string-heuristic routing with provider-aware resolution:
|
||||
|
||||
- `sam-desktop/*` routes to `baseUrl` or `sidecarUrl` depending on agent flags
|
||||
and provider capabilities.
|
||||
- `embedding/*` always routes directly to its llama-swap `baseUrl`.
|
||||
- `deepseek/*` routes to the DeepSeek SDK provider.
|
||||
|
||||
`resolveModelEndpoint()` and `upstreamModel()` must both resolve from the same
|
||||
parsed model reference to keep streaming and non-streaming behavior aligned.
|
||||
|
||||
### 5.3 Context lookup and cache keys
|
||||
|
||||
`model-context.ts` must key caches by the full composite ID. The provider
|
||||
prefix is stripped only when building:
|
||||
|
||||
`<provider.baseUrl>/upstream/<wireModelId>/props`
|
||||
|
||||
This avoids cross-provider cache poisoning for duplicate names.
|
||||
|
||||
## 6. Persistence and settings
|
||||
|
||||
Keep:
|
||||
|
||||
- `sessions.model TEXT`
|
||||
- `chats.model TEXT`
|
||||
|
||||
Add a new `settings` key:
|
||||
|
||||
- `favorite_models: string[]`
|
||||
|
||||
Rules:
|
||||
|
||||
- Stored favorites are composite IDs only.
|
||||
- Missing/offline favorites are hidden from the picker, not deleted.
|
||||
- Legacy bare favorites are not supported; on read they may be ignored or
|
||||
normalized only if the default-provider mapping is unambiguous.
|
||||
|
||||
## 7. BooCoder integration
|
||||
|
||||
Touch points:
|
||||
|
||||
- `apps/coder/src/services/provider-snapshot.ts`
|
||||
- `apps/coder/src/services/dispatcher.ts`
|
||||
- `apps/coder/src/services/arena-model-call.ts`
|
||||
- `apps/coder/src/services/arena-analyzer.ts`
|
||||
- `apps/coder/src/config.ts`
|
||||
|
||||
### 7.1 Native `boocode` provider
|
||||
|
||||
The native `boocode` provider can use the shared local-provider registry and
|
||||
resolver directly. Its model list should expose composite `provider/model` ids
|
||||
and the UI should group them by local provider.
|
||||
|
||||
### 7.2 External-agent parity is a separate seam
|
||||
|
||||
`opencode` is not safe to migrate by a naive string rewrite. The current bridge
|
||||
assumes one local llama-swap provider and collapses identity back to
|
||||
`llama-swap/<model>`.
|
||||
|
||||
Recommended bridge rule:
|
||||
|
||||
- Composite local model IDs remain `provider/model` in native BooCode state and UI.
|
||||
- Do **not** translate `provider/model` back to `llama-swap/<wireModelId>` for
|
||||
external-agent paths; that loses provider identity for duplicate model names.
|
||||
- If full `opencode` parity is required, prefer a BooCoder-hosted
|
||||
OpenAI-compatible local-model gateway that accepts provider-aware model ids
|
||||
and routes them to the correct local upstream.
|
||||
|
||||
If the gateway is not part of the first slice, restrict the initial scope to
|
||||
native `boocode` parity and keep `opencode` local-model parity as a follow-up.
|
||||
|
||||
## 8. Picker UX
|
||||
|
||||
Both BooChat and BooCoder should converge on the same behavior:
|
||||
|
||||
- Favorites section first
|
||||
- Then one section per provider
|
||||
- Favorite toggle on every model row
|
||||
- A favorited model remains visible in its provider section
|
||||
- Provider order defaults to:
|
||||
1. `sam-desktop`
|
||||
2. `embedding`
|
||||
3. `deepseek` when configured
|
||||
|
||||
This batch does not require search. Search can be added later if model counts
|
||||
make the grouped list insufficient.
|
||||
|
||||
## 9. Rollout and compatibility
|
||||
|
||||
1. Land registry/parsing utilities first.
|
||||
2. Switch server routing and model catalog to composite IDs.
|
||||
3. Add favorite persistence and picker grouping.
|
||||
4. Update native BooCoder (`boocode`) model handling and arena.
|
||||
5. Decide the `opencode` parity path: gateway now, or explicit follow-up.
|
||||
6. Verify legacy bare IDs across existing chats and sessions before removing
|
||||
any old env-based assumptions.
|
||||
|
||||
Compatibility requirements:
|
||||
|
||||
- Missing `/data/llama-providers.json` cannot break startup.
|
||||
- Existing DB rows with bare IDs must remain routable.
|
||||
- Existing `DEFAULT_MODEL` can stay bare during transition, but new writes
|
||||
should become composite.
|
||||
|
||||
## 10. Deferred items
|
||||
|
||||
- Picker search/filtering
|
||||
- Manual favorite ordering beyond insertion order
|
||||
- Host health badges in the picker
|
||||
- Automatic normalization of old session/chat model values
|
||||
- Full `opencode` multi-provider parity if the first slice ships native-only
|
||||
- Any boocontrol fleet UI built on top of this registry
|
||||
Reference in New Issue
Block a user