chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
@@ -0,0 +1,345 @@
|
||||
# Feature Implementation Plan: Multi-Provider Local Models
|
||||
|
||||
This plan turns the multi-provider local-model design into a strict implementation sequence that can be executed with Orchestration. It assumes the target is not just “fix the picker,” but to make local inference work as a small fleet with stable provider identity, shared favorites, correct routing, and an honest parity story for BooCoder.
|
||||
|
||||
## Source Specification
|
||||
|
||||
- Primary rollout outline: [build-phase-outline.md](build-phase-outline.md)
|
||||
- Behavioral design: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md)
|
||||
- Task inventory: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md)
|
||||
- Architecture analysis: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md)
|
||||
- Research note: [../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md)
|
||||
- Discovery notes: [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md)
|
||||
|
||||
## Outcome
|
||||
|
||||
When this plan is complete:
|
||||
|
||||
- BooChat can route local models by named provider, not by one global `LLAMA_SWAP_URL`.
|
||||
- Favorites are shared across BooChat and native BooCoder, derived from settings instead of being baked into the server catalog ([D-4](artifacts/implementation-decision-log.md#d-4-favorites-are-a-settings-backed-user-view-not-a-server-catalog-section)).
|
||||
- Duplicate model names on different local machines are safe because persisted and cached identity is `provider/model` ([D-2](artifacts/implementation-decision-log.md#d-2-persist-and-cache-composite-providermodel-ids-keep-wire-ids-bare)).
|
||||
- Native BooCoder and Arena use the same provider-aware resolver as BooChat ([D-3](artifacts/implementation-decision-log.md#d-3-one-provider-aware-resolver-shared-across-streaming-non-streaming-context-and-arena)).
|
||||
- External-agent parity is real rather than implied: `opencode` only gets multi-provider local models after a provider-preserving bridge exists ([D-5](artifacts/implementation-decision-log.md#d-5-native-boocode-parity-ships-before-opencode-parity), [D-6](artifacts/implementation-decision-log.md#d-6-opencode-parity-uses-a-boocode-local-gateway-not-a-string-rewrite)).
|
||||
- Adding another local machine is a config change plus a smoke pass, not another architecture pass ([D-7](artifacts/implementation-decision-log.md#d-7-add-a-machine-stays-config-driven-in-this-initiative)).
|
||||
|
||||
## Working Assumptions
|
||||
|
||||
- The shared local-provider source of truth is `/data/llama-providers.json`, exposed to both apps through `LLAMA_PROVIDERS_PATH`, with legacy env fallback while the file is absent ([D-1](artifacts/implementation-decision-log.md#d-1-shared-local-provider-config-authority)).
|
||||
- `packages/contracts` owns schemas and pure helpers; app-local loader modules own file I/O and env fallback, following the existing `provider-config` / `provider-config-registry` split in BooCoder ([D-1](artifacts/implementation-decision-log.md#d-1-shared-local-provider-config-authority)).
|
||||
- The work ends at a completed multi-provider substrate. BooControl is a follow-on consumer, not part of this implementation batch.
|
||||
|
||||
## Orchestration Rules
|
||||
|
||||
- Treat each work unit below as one mergeable branch. Do not overlap branches that touch the same shared contract files.
|
||||
- Never run more than one agent at a time on `packages/contracts/src/*`, `apps/server/src/services/inference/provider.ts`, `apps/web/src/api/types.ts`, or `apps/coder/src/services/provider-snapshot.ts`.
|
||||
- Inside a work unit, parallelize only disjoint file groups. Contract changes first, consumers second, tests last.
|
||||
- Close each work unit with its own verification before starting the next one. Do not stack W1-W4 and debug later.
|
||||
|
||||
## Work Unit Index
|
||||
|
||||
| # | Work Unit | Surface | Delivers | Depends On | Verification |
|
||||
|---|---|---|---|---|---|
|
||||
| 1 | Provider Registry Foundation | contracts + server + coder | Shared config schema, model-ref helpers, app-local registry loaders | — | Contracts build, server build, coder build |
|
||||
| 2 | Server Catalog and Routing | server | Provider-aware `/api/models` and unified resolver | W1 | server tests for routing + collision cases |
|
||||
| 3 | Server Downstream Consumers | server | Context, compaction, and task-model stop assuming one endpoint | W2 | server tests for cache isolation + bare-id fallback |
|
||||
| 4 | BooChat Favorites and Grouped Picker | server + web | Shared favorites and provider-grouped chat model selection | W2 | server tests + web smoke |
|
||||
| 5 | Native BooCoder Parity | coder + web | Native `boocode` local models use composite IDs and grouped selection | W1, W4 | coder tests + BooCoder smoke |
|
||||
| 6 | Arena Parity | coder | Arena local calls and local-model classification become provider-aware | W5 | coder tests + arena smoke |
|
||||
| 7 | External-Agent Parity | coder | `opencode` gets multi-provider local models through a real bridge | W5 | coder tests + opencode smoke |
|
||||
| 8 | Operations and Final Verification | docs + configs + smoke | Add-a-machine runbook, final matrix, ready handoff to BooControl | W7 | end-to-end smoke matrix |
|
||||
|
||||
## Work Units
|
||||
|
||||
### W1. Provider Registry Foundation
|
||||
|
||||
**Goal.** Make provider identity real before any routing or UI changes.
|
||||
|
||||
**Files and seams.**
|
||||
|
||||
- `packages/contracts/src/` for the new local-provider schema and pure model-ref helpers
|
||||
- `packages/contracts/package.json` exports
|
||||
- `apps/server/src/config.ts`
|
||||
- `apps/coder/src/config.ts`
|
||||
- new app-local registry loaders under `apps/server/src/services/` and `apps/coder/src/services/`
|
||||
- `data/llama-providers.example.json`
|
||||
|
||||
**Implement.**
|
||||
|
||||
1. Add a new contracts subpath for local provider config, separate from the existing coder ACP provider config.
|
||||
2. Define the shared file shape: `defaultProvider` plus `providers[]` with `id`, `label`, `baseUrl`, optional `sidecarUrl`, and `kind`.
|
||||
3. Add pure helpers for `parseModelRef`, `formatModelRef`, and legacy bare-id resolution.
|
||||
4. Add `LLAMA_PROVIDERS_PATH` to both server and coder config.
|
||||
5. Implement server and coder registry loaders that read the shared file and synthesize one legacy provider from `LLAMA_SWAP_URL` and optional `LLAMA_SIDECAR_URL` when the file is absent.
|
||||
6. Add a checked example config with `sam-desktop` and `embedding`.
|
||||
|
||||
**Parallel-safe split.**
|
||||
|
||||
- Agent A: contracts schema + helpers + exports
|
||||
- Agent B: server config + server loader after A merges
|
||||
- Agent C: coder config + coder loader after A merges
|
||||
|
||||
**Exit criteria.**
|
||||
|
||||
- Both apps can start with only legacy env vars.
|
||||
- Both apps can also start with a real `llama-providers.json`.
|
||||
- Pure helper tests cover `provider/model` and bare fallback.
|
||||
|
||||
### W2. Server Catalog and Routing
|
||||
|
||||
**Goal.** Replace server-side routing heuristics with one provider-aware resolver.
|
||||
|
||||
**Files and seams.**
|
||||
|
||||
- `apps/server/src/routes/models.ts`
|
||||
- `apps/server/src/services/inference/provider.ts`
|
||||
- `apps/server/src/types/api.ts`
|
||||
- `apps/web/src/api/types.ts`
|
||||
- `apps/web/src/api/client.ts`
|
||||
- relevant provider tests
|
||||
|
||||
**Implement.**
|
||||
|
||||
1. Refactor `/api/models` to return provider-grouped inventory only, with every `ModelInfo.id` already composite ([D-4](artifacts/implementation-decision-log.md#d-4-favorites-are-a-settings-backed-user-view-not-a-server-catalog-section)).
|
||||
2. Build one server resolver that answers:
|
||||
- provider identity
|
||||
- upstream base URL
|
||||
- sidecar eligibility
|
||||
- final wire model id
|
||||
- DeepSeek special handling
|
||||
3. Make both `upstreamModel()` and `resolveModelEndpoint()` call that same resolver.
|
||||
4. Remove the current “prefix means provider” logic as the authority; keep compatibility only at the bare-id fallback layer.
|
||||
|
||||
**Parallel-safe split.**
|
||||
|
||||
- First branch: resolver and tests
|
||||
- Second branch: `/api/models` contract change plus client type updates
|
||||
|
||||
**Exit criteria.**
|
||||
|
||||
- `embedding/deepseek-r1-qwen3-8b` routes as local `embedding`, not as DeepSeek cloud.
|
||||
- `embedding/*` never uses a sidecar.
|
||||
- Legacy bare models still resolve through the configured default provider.
|
||||
|
||||
### W3. Server Downstream Consumers
|
||||
|
||||
**Goal.** Remove the remaining single-endpoint assumptions in server call sites.
|
||||
|
||||
**Files and seams.**
|
||||
|
||||
- `apps/server/src/services/model-context.ts`
|
||||
- `apps/server/src/index.ts`
|
||||
- `apps/server/src/services/compaction.ts`
|
||||
- `apps/server/src/services/task-model.ts`
|
||||
- `apps/server/src/services/inference/error-handler.ts`
|
||||
- `apps/server/src/services/__tests__/model-context.test.ts`
|
||||
|
||||
**Implement.**
|
||||
|
||||
1. Change `model-context` to key caches by composite model id, not bare wire id.
|
||||
2. Move context lookup from one process-wide `LLAMA_SWAP_URL` assumption to the provider-aware resolver.
|
||||
3. Update compaction to resolve the right upstream before summary calls.
|
||||
4. Update task-model fallback resolution to use the same parsed model ref path as inference.
|
||||
5. Audit remaining server `LLAMA_SWAP_URL` call sites and either migrate them or explicitly mark them legacy-only.
|
||||
|
||||
**Parallel-safe split.**
|
||||
|
||||
- Agent A: `model-context.ts` + tests
|
||||
- Agent B: `compaction.ts` and `task-model.ts` after A lands, because both depend on the new resolver contract
|
||||
|
||||
**Exit criteria.**
|
||||
|
||||
- Two providers serving the same wire model name do not share context cache entries.
|
||||
- Existing sessions with bare models still load context and complete turns.
|
||||
- No server path doing local inference bypasses the shared resolver.
|
||||
|
||||
### W4. BooChat Favorites and Grouped Picker
|
||||
|
||||
**Goal.** Stabilize the end-user selection model on BooChat before deeper coding surfaces adopt it.
|
||||
|
||||
**Files and seams.**
|
||||
|
||||
- `apps/server/src/routes/settings.ts`
|
||||
- `apps/server/src/services/settings.ts` or equivalent settings helper path
|
||||
- `apps/web/src/components/ModelPicker.tsx`
|
||||
- `apps/web/src/lib/model-label.ts`
|
||||
- `apps/web/src/api/client.ts`
|
||||
- `apps/web/src/api/types.ts`
|
||||
- `apps/web/src/pages/Session.tsx`
|
||||
|
||||
**Implement.**
|
||||
|
||||
1. Add `favorite_models: string[]` handling in settings.
|
||||
2. Normalize malformed and duplicate entries on write.
|
||||
3. In the client, derive:
|
||||
- Favorites section first
|
||||
- then one section per provider
|
||||
- hide unavailable favorites without deleting them
|
||||
4. Keep a favorited model visible in both Favorites and its provider section.
|
||||
5. Make new model selections write composite ids.
|
||||
|
||||
**Parallel-safe split.**
|
||||
|
||||
- Server settings branch first
|
||||
- Web picker branch second against the new contract
|
||||
|
||||
**Exit criteria.**
|
||||
|
||||
- Favorites persist across refresh.
|
||||
- Removing a provider from live inventory hides its favorites without deleting the stored ids.
|
||||
- A new chat selection stores `provider/model`.
|
||||
|
||||
### W5. Native BooCoder Parity
|
||||
|
||||
**Goal.** Move native `boocode` local model usage onto the shared provider model before touching `opencode`.
|
||||
|
||||
**Files and seams.**
|
||||
|
||||
- `apps/coder/src/services/provider-snapshot.ts`
|
||||
- `apps/coder/src/services/dispatcher.ts`
|
||||
- `apps/web/src/components/AgentComposerBar.tsx`
|
||||
- `apps/web/src/lib/model-label.ts`
|
||||
- `packages/contracts/src/provider-snapshot.ts` only if the snapshot contract truly needs new metadata
|
||||
|
||||
**Implement.**
|
||||
|
||||
1. Make the native `boocode` provider expose composite local model ids from the shared registry.
|
||||
2. Update native dispatch to resolve composite local ids through the shared registry.
|
||||
3. Render grouped local models for the native `boocode` path in `AgentComposerBar`.
|
||||
4. If the current `opencode` snapshot path would falsely advertise multi-provider local models before W7, hide that advertising now rather than leave the UI misleading ([D-5](artifacts/implementation-decision-log.md#d-5-native-boocode-parity-ships-before-opencode-parity)).
|
||||
|
||||
**Parallel-safe split.**
|
||||
|
||||
- Coder backend first
|
||||
- AgentComposerBar UI second
|
||||
|
||||
**Exit criteria.**
|
||||
|
||||
- Native BooCoder tasks can run against at least two distinct local providers.
|
||||
- The native picker behavior matches BooChat’s grouped/favorites mental model closely enough that a user is not learning a second local-model identity system.
|
||||
- `opencode` is not yet claiming parity it does not have.
|
||||
|
||||
### W6. Arena Parity
|
||||
|
||||
**Goal.** Make Arena consume the same local-provider substrate instead of one live llama-swap list.
|
||||
|
||||
**Files and seams.**
|
||||
|
||||
- `apps/coder/src/services/arena-model-call.ts`
|
||||
- `apps/coder/src/services/arena-analyzer.ts`
|
||||
- `apps/coder/src/services/arena-runner.ts`
|
||||
- `apps/coder/src/index.ts`
|
||||
- arena tests
|
||||
|
||||
**Implement.**
|
||||
|
||||
1. Replace direct `LLAMA_SWAP_URL` local calls with the provider-aware resolver.
|
||||
2. Build Arena’s local-model set from the shared provider registry, not one fetched list.
|
||||
3. Preserve ADR-0001’s two-lane scheduling rule; provider awareness changes local identity, not lane semantics.
|
||||
4. Keep bare-id compatibility only where old data needs it.
|
||||
|
||||
**Parallel-safe split.**
|
||||
|
||||
- Agent A: `arena-model-call.ts` + analyzer updates
|
||||
- Agent B: local-model set construction in `index.ts` + runner adjustments after A settles the model identity contract
|
||||
|
||||
**Exit criteria.**
|
||||
|
||||
- Arena can run local contestants from more than one machine.
|
||||
- Local-vs-cloud classification still works.
|
||||
- ADR-0001 behavior remains intact.
|
||||
|
||||
### W7. External-Agent Parity
|
||||
|
||||
**Goal.** Give `opencode` a real multi-provider local-model story instead of collapsing everything back to `llama-swap/<model>`.
|
||||
|
||||
**Files and seams.**
|
||||
|
||||
- `apps/coder/src/services/backends/opencode-server.ts`
|
||||
- `apps/coder/src/services/provider-snapshot.ts`
|
||||
- `apps/coder/src/services/agent-probe.ts`
|
||||
- new BooCoder-hosted gateway route or service module under `apps/coder/src/services/`
|
||||
- host config generation or sync for opencode local models
|
||||
|
||||
**Implement.**
|
||||
|
||||
1. Add a BooCoder-hosted OpenAI-compatible local gateway that accepts provider-preserving model ids and routes them to the correct local provider ([D-6](artifacts/implementation-decision-log.md#d-6-opencode-parity-uses-a-boocode-local-gateway-not-a-string-rewrite)).
|
||||
2. Use one opencode-facing provider namespace such as `boocode-local`, where the opencode `providerID` is stable and the `modelID` is the inner composite id like `sam-desktop/qwen3.6-35b`.
|
||||
3. Update provider snapshot merging so `opencode` advertises `boocode-local/<provider/model>` rather than `llama-swap/<model>`.
|
||||
4. Update the opencode bridge parser and config sync so duplicate model names remain distinguishable end to end.
|
||||
5. Add smoke coverage for two providers serving the same wire model name.
|
||||
|
||||
**Parallel-safe split.**
|
||||
|
||||
- Gateway branch first
|
||||
- Snapshot/config-sync branch second
|
||||
- Final opencode backend/parser adjustments last
|
||||
|
||||
**Exit criteria.**
|
||||
|
||||
- `opencode` can target two local providers with overlapping wire model names and hit the correct machine both times.
|
||||
- No path rewrites `provider/model` down to plain `llama-swap/model`.
|
||||
|
||||
### W8. Operations and Final Verification
|
||||
|
||||
**Goal.** End with a repeatable operator workflow, not just a working dev branch.
|
||||
|
||||
**Files and seams.**
|
||||
|
||||
- `data/llama-providers.example.json`
|
||||
- operator docs under `docs/`
|
||||
- OpenSpec tasks/status notes as needed
|
||||
|
||||
**Implement.**
|
||||
|
||||
1. Document the add-a-machine flow for config-managed local providers.
|
||||
2. Document the smoke matrix for:
|
||||
- single legacy provider fallback
|
||||
- two local providers
|
||||
- duplicate model names across two providers
|
||||
- DeepSeek enabled
|
||||
- `opencode` local parity
|
||||
3. Record the final interface BooControl should consume: provider registry plus composite ids, not raw host env vars.
|
||||
|
||||
**Exit criteria.**
|
||||
|
||||
- A third machine can be added by editing config and running the smoke matrix.
|
||||
- The implementation docs name the exact runtime contract BooControl should build on.
|
||||
|
||||
## Verification Plan
|
||||
|
||||
- `pnpm -C packages/contracts build`
|
||||
- `pnpm -C apps/server test`
|
||||
- `pnpm -C apps/server build`
|
||||
- `pnpm -C apps/coder test`
|
||||
- `pnpm -C apps/coder build`
|
||||
- `npx tsc -p apps/web/tsconfig.app.json --noEmit`
|
||||
|
||||
Add targeted tests as the work lands:
|
||||
|
||||
- model-ref parse/format and bare-id fallback
|
||||
- provider-aware routing and DeepSeek collision cases
|
||||
- context-cache isolation for duplicate model names
|
||||
- favorites hide-not-delete behavior
|
||||
- provider snapshot and opencode bridge behavior
|
||||
- arena local-model classification across multiple providers
|
||||
|
||||
## Main Risks
|
||||
|
||||
- The W2 contract change to `/api/models` and W5 snapshot changes can drift across apps if contract parity is edited piecemeal. Follow the cross-app contract standard in [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md) and land contract-first branches.
|
||||
- W7 is the hardest seam. If the gateway is skipped and the old string rewrite is kept, the feature will look complete in UI while still routing the wrong machine.
|
||||
- `model-context.ts` is a hidden correctness seam. If cache keys stay bare, duplicate model names will mis-share context limits and compaction behavior even after routing is fixed.
|
||||
|
||||
## Deferred
|
||||
|
||||
- BooControl itself
|
||||
- picker search and richer filtering
|
||||
- manual favorite reordering
|
||||
- host health badges in pickers
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- BooChat, native BooCoder, Arena, and `opencode` all support provider-aware local models end to end.
|
||||
- Legacy bare ids remain readable.
|
||||
- Two providers can expose the same wire model name without ambiguity.
|
||||
- Adding another local machine is documented and smoke-tested.
|
||||
- BooControl can start later without inventing a second provider registry.
|
||||
Reference in New Issue
Block a user