chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions

View File

@@ -0,0 +1,345 @@
# Feature Implementation Plan: Multi-Provider Local Models
This plan turns the multi-provider local-model design into a strict implementation sequence that can be executed with Orchestration. It assumes the target is not just “fix the picker,” but to make local inference work as a small fleet with stable provider identity, shared favorites, correct routing, and an honest parity story for BooCoder.
## Source Specification
- Primary rollout outline: [build-phase-outline.md](build-phase-outline.md)
- Behavioral design: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md)
- Task inventory: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md)
- Architecture analysis: [../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md)
- Research note: [../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md)
- Discovery notes: [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md)
## Outcome
When this plan is complete:
- BooChat can route local models by named provider, not by one global `LLAMA_SWAP_URL`.
- Favorites are shared across BooChat and native BooCoder, derived from settings instead of being baked into the server catalog ([D-4](artifacts/implementation-decision-log.md#d-4-favorites-are-a-settings-backed-user-view-not-a-server-catalog-section)).
- Duplicate model names on different local machines are safe because persisted and cached identity is `provider/model` ([D-2](artifacts/implementation-decision-log.md#d-2-persist-and-cache-composite-providermodel-ids-keep-wire-ids-bare)).
- Native BooCoder and Arena use the same provider-aware resolver as BooChat ([D-3](artifacts/implementation-decision-log.md#d-3-one-provider-aware-resolver-shared-across-streaming-non-streaming-context-and-arena)).
- External-agent parity is real rather than implied: `opencode` only gets multi-provider local models after a provider-preserving bridge exists ([D-5](artifacts/implementation-decision-log.md#d-5-native-boocode-parity-ships-before-opencode-parity), [D-6](artifacts/implementation-decision-log.md#d-6-opencode-parity-uses-a-boocode-local-gateway-not-a-string-rewrite)).
- Adding another local machine is a config change plus a smoke pass, not another architecture pass ([D-7](artifacts/implementation-decision-log.md#d-7-add-a-machine-stays-config-driven-in-this-initiative)).
## Working Assumptions
- The shared local-provider source of truth is `/data/llama-providers.json`, exposed to both apps through `LLAMA_PROVIDERS_PATH`, with legacy env fallback while the file is absent ([D-1](artifacts/implementation-decision-log.md#d-1-shared-local-provider-config-authority)).
- `packages/contracts` owns schemas and pure helpers; app-local loader modules own file I/O and env fallback, following the existing `provider-config` / `provider-config-registry` split in BooCoder ([D-1](artifacts/implementation-decision-log.md#d-1-shared-local-provider-config-authority)).
- The work ends at a completed multi-provider substrate. BooControl is a follow-on consumer, not part of this implementation batch.
## Orchestration Rules
- Treat each work unit below as one mergeable branch. Do not overlap branches that touch the same shared contract files.
- Never run more than one agent at a time on `packages/contracts/src/*`, `apps/server/src/services/inference/provider.ts`, `apps/web/src/api/types.ts`, or `apps/coder/src/services/provider-snapshot.ts`.
- Inside a work unit, parallelize only disjoint file groups. Contract changes first, consumers second, tests last.
- Close each work unit with its own verification before starting the next one. Do not stack W1-W4 and debug later.
## Work Unit Index
| # | Work Unit | Surface | Delivers | Depends On | Verification |
|---|---|---|---|---|---|
| 1 | Provider Registry Foundation | contracts + server + coder | Shared config schema, model-ref helpers, app-local registry loaders | — | Contracts build, server build, coder build |
| 2 | Server Catalog and Routing | server | Provider-aware `/api/models` and unified resolver | W1 | server tests for routing + collision cases |
| 3 | Server Downstream Consumers | server | Context, compaction, and task-model stop assuming one endpoint | W2 | server tests for cache isolation + bare-id fallback |
| 4 | BooChat Favorites and Grouped Picker | server + web | Shared favorites and provider-grouped chat model selection | W2 | server tests + web smoke |
| 5 | Native BooCoder Parity | coder + web | Native `boocode` local models use composite IDs and grouped selection | W1, W4 | coder tests + BooCoder smoke |
| 6 | Arena Parity | coder | Arena local calls and local-model classification become provider-aware | W5 | coder tests + arena smoke |
| 7 | External-Agent Parity | coder | `opencode` gets multi-provider local models through a real bridge | W5 | coder tests + opencode smoke |
| 8 | Operations and Final Verification | docs + configs + smoke | Add-a-machine runbook, final matrix, ready handoff to BooControl | W7 | end-to-end smoke matrix |
## Work Units
### W1. Provider Registry Foundation
**Goal.** Make provider identity real before any routing or UI changes.
**Files and seams.**
- `packages/contracts/src/` for the new local-provider schema and pure model-ref helpers
- `packages/contracts/package.json` exports
- `apps/server/src/config.ts`
- `apps/coder/src/config.ts`
- new app-local registry loaders under `apps/server/src/services/` and `apps/coder/src/services/`
- `data/llama-providers.example.json`
**Implement.**
1. Add a new contracts subpath for local provider config, separate from the existing coder ACP provider config.
2. Define the shared file shape: `defaultProvider` plus `providers[]` with `id`, `label`, `baseUrl`, optional `sidecarUrl`, and `kind`.
3. Add pure helpers for `parseModelRef`, `formatModelRef`, and legacy bare-id resolution.
4. Add `LLAMA_PROVIDERS_PATH` to both server and coder config.
5. Implement server and coder registry loaders that read the shared file and synthesize one legacy provider from `LLAMA_SWAP_URL` and optional `LLAMA_SIDECAR_URL` when the file is absent.
6. Add a checked example config with `sam-desktop` and `embedding`.
**Parallel-safe split.**
- Agent A: contracts schema + helpers + exports
- Agent B: server config + server loader after A merges
- Agent C: coder config + coder loader after A merges
**Exit criteria.**
- Both apps can start with only legacy env vars.
- Both apps can also start with a real `llama-providers.json`.
- Pure helper tests cover `provider/model` and bare fallback.
### W2. Server Catalog and Routing
**Goal.** Replace server-side routing heuristics with one provider-aware resolver.
**Files and seams.**
- `apps/server/src/routes/models.ts`
- `apps/server/src/services/inference/provider.ts`
- `apps/server/src/types/api.ts`
- `apps/web/src/api/types.ts`
- `apps/web/src/api/client.ts`
- relevant provider tests
**Implement.**
1. Refactor `/api/models` to return provider-grouped inventory only, with every `ModelInfo.id` already composite ([D-4](artifacts/implementation-decision-log.md#d-4-favorites-are-a-settings-backed-user-view-not-a-server-catalog-section)).
2. Build one server resolver that answers:
- provider identity
- upstream base URL
- sidecar eligibility
- final wire model id
- DeepSeek special handling
3. Make both `upstreamModel()` and `resolveModelEndpoint()` call that same resolver.
4. Remove the current “prefix means provider” logic as the authority; keep compatibility only at the bare-id fallback layer.
**Parallel-safe split.**
- First branch: resolver and tests
- Second branch: `/api/models` contract change plus client type updates
**Exit criteria.**
- `embedding/deepseek-r1-qwen3-8b` routes as local `embedding`, not as DeepSeek cloud.
- `embedding/*` never uses a sidecar.
- Legacy bare models still resolve through the configured default provider.
### W3. Server Downstream Consumers
**Goal.** Remove the remaining single-endpoint assumptions in server call sites.
**Files and seams.**
- `apps/server/src/services/model-context.ts`
- `apps/server/src/index.ts`
- `apps/server/src/services/compaction.ts`
- `apps/server/src/services/task-model.ts`
- `apps/server/src/services/inference/error-handler.ts`
- `apps/server/src/services/__tests__/model-context.test.ts`
**Implement.**
1. Change `model-context` to key caches by composite model id, not bare wire id.
2. Move context lookup from one process-wide `LLAMA_SWAP_URL` assumption to the provider-aware resolver.
3. Update compaction to resolve the right upstream before summary calls.
4. Update task-model fallback resolution to use the same parsed model ref path as inference.
5. Audit remaining server `LLAMA_SWAP_URL` call sites and either migrate them or explicitly mark them legacy-only.
**Parallel-safe split.**
- Agent A: `model-context.ts` + tests
- Agent B: `compaction.ts` and `task-model.ts` after A lands, because both depend on the new resolver contract
**Exit criteria.**
- Two providers serving the same wire model name do not share context cache entries.
- Existing sessions with bare models still load context and complete turns.
- No server path doing local inference bypasses the shared resolver.
### W4. BooChat Favorites and Grouped Picker
**Goal.** Stabilize the end-user selection model on BooChat before deeper coding surfaces adopt it.
**Files and seams.**
- `apps/server/src/routes/settings.ts`
- `apps/server/src/services/settings.ts` or equivalent settings helper path
- `apps/web/src/components/ModelPicker.tsx`
- `apps/web/src/lib/model-label.ts`
- `apps/web/src/api/client.ts`
- `apps/web/src/api/types.ts`
- `apps/web/src/pages/Session.tsx`
**Implement.**
1. Add `favorite_models: string[]` handling in settings.
2. Normalize malformed and duplicate entries on write.
3. In the client, derive:
- Favorites section first
- then one section per provider
- hide unavailable favorites without deleting them
4. Keep a favorited model visible in both Favorites and its provider section.
5. Make new model selections write composite ids.
**Parallel-safe split.**
- Server settings branch first
- Web picker branch second against the new contract
**Exit criteria.**
- Favorites persist across refresh.
- Removing a provider from live inventory hides its favorites without deleting the stored ids.
- A new chat selection stores `provider/model`.
### W5. Native BooCoder Parity
**Goal.** Move native `boocode` local model usage onto the shared provider model before touching `opencode`.
**Files and seams.**
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/coder/src/services/dispatcher.ts`
- `apps/web/src/components/AgentComposerBar.tsx`
- `apps/web/src/lib/model-label.ts`
- `packages/contracts/src/provider-snapshot.ts` only if the snapshot contract truly needs new metadata
**Implement.**
1. Make the native `boocode` provider expose composite local model ids from the shared registry.
2. Update native dispatch to resolve composite local ids through the shared registry.
3. Render grouped local models for the native `boocode` path in `AgentComposerBar`.
4. If the current `opencode` snapshot path would falsely advertise multi-provider local models before W7, hide that advertising now rather than leave the UI misleading ([D-5](artifacts/implementation-decision-log.md#d-5-native-boocode-parity-ships-before-opencode-parity)).
**Parallel-safe split.**
- Coder backend first
- AgentComposerBar UI second
**Exit criteria.**
- Native BooCoder tasks can run against at least two distinct local providers.
- The native picker behavior matches BooChats grouped/favorites mental model closely enough that a user is not learning a second local-model identity system.
- `opencode` is not yet claiming parity it does not have.
### W6. Arena Parity
**Goal.** Make Arena consume the same local-provider substrate instead of one live llama-swap list.
**Files and seams.**
- `apps/coder/src/services/arena-model-call.ts`
- `apps/coder/src/services/arena-analyzer.ts`
- `apps/coder/src/services/arena-runner.ts`
- `apps/coder/src/index.ts`
- arena tests
**Implement.**
1. Replace direct `LLAMA_SWAP_URL` local calls with the provider-aware resolver.
2. Build Arenas local-model set from the shared provider registry, not one fetched list.
3. Preserve ADR-0001s two-lane scheduling rule; provider awareness changes local identity, not lane semantics.
4. Keep bare-id compatibility only where old data needs it.
**Parallel-safe split.**
- Agent A: `arena-model-call.ts` + analyzer updates
- Agent B: local-model set construction in `index.ts` + runner adjustments after A settles the model identity contract
**Exit criteria.**
- Arena can run local contestants from more than one machine.
- Local-vs-cloud classification still works.
- ADR-0001 behavior remains intact.
### W7. External-Agent Parity
**Goal.** Give `opencode` a real multi-provider local-model story instead of collapsing everything back to `llama-swap/<model>`.
**Files and seams.**
- `apps/coder/src/services/backends/opencode-server.ts`
- `apps/coder/src/services/provider-snapshot.ts`
- `apps/coder/src/services/agent-probe.ts`
- new BooCoder-hosted gateway route or service module under `apps/coder/src/services/`
- host config generation or sync for opencode local models
**Implement.**
1. Add a BooCoder-hosted OpenAI-compatible local gateway that accepts provider-preserving model ids and routes them to the correct local provider ([D-6](artifacts/implementation-decision-log.md#d-6-opencode-parity-uses-a-boocode-local-gateway-not-a-string-rewrite)).
2. Use one opencode-facing provider namespace such as `boocode-local`, where the opencode `providerID` is stable and the `modelID` is the inner composite id like `sam-desktop/qwen3.6-35b`.
3. Update provider snapshot merging so `opencode` advertises `boocode-local/<provider/model>` rather than `llama-swap/<model>`.
4. Update the opencode bridge parser and config sync so duplicate model names remain distinguishable end to end.
5. Add smoke coverage for two providers serving the same wire model name.
**Parallel-safe split.**
- Gateway branch first
- Snapshot/config-sync branch second
- Final opencode backend/parser adjustments last
**Exit criteria.**
- `opencode` can target two local providers with overlapping wire model names and hit the correct machine both times.
- No path rewrites `provider/model` down to plain `llama-swap/model`.
### W8. Operations and Final Verification
**Goal.** End with a repeatable operator workflow, not just a working dev branch.
**Files and seams.**
- `data/llama-providers.example.json`
- operator docs under `docs/`
- OpenSpec tasks/status notes as needed
**Implement.**
1. Document the add-a-machine flow for config-managed local providers.
2. Document the smoke matrix for:
- single legacy provider fallback
- two local providers
- duplicate model names across two providers
- DeepSeek enabled
- `opencode` local parity
3. Record the final interface BooControl should consume: provider registry plus composite ids, not raw host env vars.
**Exit criteria.**
- A third machine can be added by editing config and running the smoke matrix.
- The implementation docs name the exact runtime contract BooControl should build on.
## Verification Plan
- `pnpm -C packages/contracts build`
- `pnpm -C apps/server test`
- `pnpm -C apps/server build`
- `pnpm -C apps/coder test`
- `pnpm -C apps/coder build`
- `npx tsc -p apps/web/tsconfig.app.json --noEmit`
Add targeted tests as the work lands:
- model-ref parse/format and bare-id fallback
- provider-aware routing and DeepSeek collision cases
- context-cache isolation for duplicate model names
- favorites hide-not-delete behavior
- provider snapshot and opencode bridge behavior
- arena local-model classification across multiple providers
## Main Risks
- The W2 contract change to `/api/models` and W5 snapshot changes can drift across apps if contract parity is edited piecemeal. Follow the cross-app contract standard in [artifacts/.discovery-notes.md](artifacts/.discovery-notes.md) and land contract-first branches.
- W7 is the hardest seam. If the gateway is skipped and the old string rewrite is kept, the feature will look complete in UI while still routing the wrong machine.
- `model-context.ts` is a hidden correctness seam. If cache keys stay bare, duplicate model names will mis-share context limits and compaction behavior even after routing is fixed.
## Deferred
- BooControl itself
- picker search and richer filtering
- manual favorite reordering
- host health badges in pickers
## Definition of Done
- BooChat, native BooCoder, Arena, and `opencode` all support provider-aware local models end to end.
- Legacy bare ids remain readable.
- Two providers can expose the same wire model name without ambiguity.
- Adding another local machine is documented and smoke-tested.
- BooControl can start later without inventing a second provider registry.