Files
boocode/docs/plans/multi-provider-local-models/feature-implementation-plan.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

17 KiB
Raw Blame History

Feature Implementation Plan: Multi-Provider Local Models

This plan turns the multi-provider local-model design into a strict implementation sequence that can be executed with Orchestration. It assumes the target is not just “fix the picker,” but to make local inference work as a small fleet with stable provider identity, shared favorites, correct routing, and an honest parity story for BooCoder.

Source Specification

Outcome

When this plan is complete:

  • BooChat can route local models by named provider, not by one global LLAMA_SWAP_URL.
  • Favorites are shared across BooChat and native BooCoder, derived from settings instead of being baked into the server catalog (D-4).
  • Duplicate model names on different local machines are safe because persisted and cached identity is provider/model (D-2).
  • Native BooCoder and Arena use the same provider-aware resolver as BooChat (D-3).
  • External-agent parity is real rather than implied: opencode only gets multi-provider local models after a provider-preserving bridge exists (D-5, D-6).
  • Adding another local machine is a config change plus a smoke pass, not another architecture pass (D-7).

Working Assumptions

  • The shared local-provider source of truth is /data/llama-providers.json, exposed to both apps through LLAMA_PROVIDERS_PATH, with legacy env fallback while the file is absent (D-1).
  • packages/contracts owns schemas and pure helpers; app-local loader modules own file I/O and env fallback, following the existing provider-config / provider-config-registry split in BooCoder (D-1).
  • The work ends at a completed multi-provider substrate. BooControl is a follow-on consumer, not part of this implementation batch.

Orchestration Rules

  • Treat each work unit below as one mergeable branch. Do not overlap branches that touch the same shared contract files.
  • Never run more than one agent at a time on packages/contracts/src/*, apps/server/src/services/inference/provider.ts, apps/web/src/api/types.ts, or apps/coder/src/services/provider-snapshot.ts.
  • Inside a work unit, parallelize only disjoint file groups. Contract changes first, consumers second, tests last.
  • Close each work unit with its own verification before starting the next one. Do not stack W1-W4 and debug later.

Work Unit Index

# Work Unit Surface Delivers Depends On Verification
1 Provider Registry Foundation contracts + server + coder Shared config schema, model-ref helpers, app-local registry loaders Contracts build, server build, coder build
2 Server Catalog and Routing server Provider-aware /api/models and unified resolver W1 server tests for routing + collision cases
3 Server Downstream Consumers server Context, compaction, and task-model stop assuming one endpoint W2 server tests for cache isolation + bare-id fallback
4 BooChat Favorites and Grouped Picker server + web Shared favorites and provider-grouped chat model selection W2 server tests + web smoke
5 Native BooCoder Parity coder + web Native boocode local models use composite IDs and grouped selection W1, W4 coder tests + BooCoder smoke
6 Arena Parity coder Arena local calls and local-model classification become provider-aware W5 coder tests + arena smoke
7 External-Agent Parity coder opencode gets multi-provider local models through a real bridge W5 coder tests + opencode smoke
8 Operations and Final Verification docs + configs + smoke Add-a-machine runbook, final matrix, ready handoff to BooControl W7 end-to-end smoke matrix

Work Units

W1. Provider Registry Foundation

Goal. Make provider identity real before any routing or UI changes.

Files and seams.

  • packages/contracts/src/ for the new local-provider schema and pure model-ref helpers
  • packages/contracts/package.json exports
  • apps/server/src/config.ts
  • apps/coder/src/config.ts
  • new app-local registry loaders under apps/server/src/services/ and apps/coder/src/services/
  • data/llama-providers.example.json

Implement.

  1. Add a new contracts subpath for local provider config, separate from the existing coder ACP provider config.
  2. Define the shared file shape: defaultProvider plus providers[] with id, label, baseUrl, optional sidecarUrl, and kind.
  3. Add pure helpers for parseModelRef, formatModelRef, and legacy bare-id resolution.
  4. Add LLAMA_PROVIDERS_PATH to both server and coder config.
  5. Implement server and coder registry loaders that read the shared file and synthesize one legacy provider from LLAMA_SWAP_URL and optional LLAMA_SIDECAR_URL when the file is absent.
  6. Add a checked example config with sam-desktop and embedding.

Parallel-safe split.

  • Agent A: contracts schema + helpers + exports
  • Agent B: server config + server loader after A merges
  • Agent C: coder config + coder loader after A merges

Exit criteria.

  • Both apps can start with only legacy env vars.
  • Both apps can also start with a real llama-providers.json.
  • Pure helper tests cover provider/model and bare fallback.

W2. Server Catalog and Routing

Goal. Replace server-side routing heuristics with one provider-aware resolver.

Files and seams.

  • apps/server/src/routes/models.ts
  • apps/server/src/services/inference/provider.ts
  • apps/server/src/types/api.ts
  • apps/web/src/api/types.ts
  • apps/web/src/api/client.ts
  • relevant provider tests

Implement.

  1. Refactor /api/models to return provider-grouped inventory only, with every ModelInfo.id already composite (D-4).
  2. Build one server resolver that answers:
    • provider identity
    • upstream base URL
    • sidecar eligibility
    • final wire model id
    • DeepSeek special handling
  3. Make both upstreamModel() and resolveModelEndpoint() call that same resolver.
  4. Remove the current “prefix means provider” logic as the authority; keep compatibility only at the bare-id fallback layer.

Parallel-safe split.

  • First branch: resolver and tests
  • Second branch: /api/models contract change plus client type updates

Exit criteria.

  • embedding/deepseek-r1-qwen3-8b routes as local embedding, not as DeepSeek cloud.
  • embedding/* never uses a sidecar.
  • Legacy bare models still resolve through the configured default provider.

W3. Server Downstream Consumers

Goal. Remove the remaining single-endpoint assumptions in server call sites.

Files and seams.

  • apps/server/src/services/model-context.ts
  • apps/server/src/index.ts
  • apps/server/src/services/compaction.ts
  • apps/server/src/services/task-model.ts
  • apps/server/src/services/inference/error-handler.ts
  • apps/server/src/services/__tests__/model-context.test.ts

Implement.

  1. Change model-context to key caches by composite model id, not bare wire id.
  2. Move context lookup from one process-wide LLAMA_SWAP_URL assumption to the provider-aware resolver.
  3. Update compaction to resolve the right upstream before summary calls.
  4. Update task-model fallback resolution to use the same parsed model ref path as inference.
  5. Audit remaining server LLAMA_SWAP_URL call sites and either migrate them or explicitly mark them legacy-only.

Parallel-safe split.

  • Agent A: model-context.ts + tests
  • Agent B: compaction.ts and task-model.ts after A lands, because both depend on the new resolver contract

Exit criteria.

  • Two providers serving the same wire model name do not share context cache entries.
  • Existing sessions with bare models still load context and complete turns.
  • No server path doing local inference bypasses the shared resolver.

W4. BooChat Favorites and Grouped Picker

Goal. Stabilize the end-user selection model on BooChat before deeper coding surfaces adopt it.

Files and seams.

  • apps/server/src/routes/settings.ts
  • apps/server/src/services/settings.ts or equivalent settings helper path
  • apps/web/src/components/ModelPicker.tsx
  • apps/web/src/lib/model-label.ts
  • apps/web/src/api/client.ts
  • apps/web/src/api/types.ts
  • apps/web/src/pages/Session.tsx

Implement.

  1. Add favorite_models: string[] handling in settings.
  2. Normalize malformed and duplicate entries on write.
  3. In the client, derive:
    • Favorites section first
    • then one section per provider
    • hide unavailable favorites without deleting them
  4. Keep a favorited model visible in both Favorites and its provider section.
  5. Make new model selections write composite ids.

Parallel-safe split.

  • Server settings branch first
  • Web picker branch second against the new contract

Exit criteria.

  • Favorites persist across refresh.
  • Removing a provider from live inventory hides its favorites without deleting the stored ids.
  • A new chat selection stores provider/model.

W5. Native BooCoder Parity

Goal. Move native boocode local model usage onto the shared provider model before touching opencode.

Files and seams.

  • apps/coder/src/services/provider-snapshot.ts
  • apps/coder/src/services/dispatcher.ts
  • apps/web/src/components/AgentComposerBar.tsx
  • apps/web/src/lib/model-label.ts
  • packages/contracts/src/provider-snapshot.ts only if the snapshot contract truly needs new metadata

Implement.

  1. Make the native boocode provider expose composite local model ids from the shared registry.
  2. Update native dispatch to resolve composite local ids through the shared registry.
  3. Render grouped local models for the native boocode path in AgentComposerBar.
  4. If the current opencode snapshot path would falsely advertise multi-provider local models before W7, hide that advertising now rather than leave the UI misleading (D-5).

Parallel-safe split.

  • Coder backend first
  • AgentComposerBar UI second

Exit criteria.

  • Native BooCoder tasks can run against at least two distinct local providers.
  • The native picker behavior matches BooChats grouped/favorites mental model closely enough that a user is not learning a second local-model identity system.
  • opencode is not yet claiming parity it does not have.

W6. Arena Parity

Goal. Make Arena consume the same local-provider substrate instead of one live llama-swap list.

Files and seams.

  • apps/coder/src/services/arena-model-call.ts
  • apps/coder/src/services/arena-analyzer.ts
  • apps/coder/src/services/arena-runner.ts
  • apps/coder/src/index.ts
  • arena tests

Implement.

  1. Replace direct LLAMA_SWAP_URL local calls with the provider-aware resolver.
  2. Build Arenas local-model set from the shared provider registry, not one fetched list.
  3. Preserve ADR-0001s two-lane scheduling rule; provider awareness changes local identity, not lane semantics.
  4. Keep bare-id compatibility only where old data needs it.

Parallel-safe split.

  • Agent A: arena-model-call.ts + analyzer updates
  • Agent B: local-model set construction in index.ts + runner adjustments after A settles the model identity contract

Exit criteria.

  • Arena can run local contestants from more than one machine.
  • Local-vs-cloud classification still works.
  • ADR-0001 behavior remains intact.

W7. External-Agent Parity

Goal. Give opencode a real multi-provider local-model story instead of collapsing everything back to llama-swap/<model>.

Files and seams.

  • apps/coder/src/services/backends/opencode-server.ts
  • apps/coder/src/services/provider-snapshot.ts
  • apps/coder/src/services/agent-probe.ts
  • new BooCoder-hosted gateway route or service module under apps/coder/src/services/
  • host config generation or sync for opencode local models

Implement.

  1. Add a BooCoder-hosted OpenAI-compatible local gateway that accepts provider-preserving model ids and routes them to the correct local provider (D-6).
  2. Use one opencode-facing provider namespace such as boocode-local, where the opencode providerID is stable and the modelID is the inner composite id like sam-desktop/qwen3.6-35b.
  3. Update provider snapshot merging so opencode advertises boocode-local/<provider/model> rather than llama-swap/<model>.
  4. Update the opencode bridge parser and config sync so duplicate model names remain distinguishable end to end.
  5. Add smoke coverage for two providers serving the same wire model name.

Parallel-safe split.

  • Gateway branch first
  • Snapshot/config-sync branch second
  • Final opencode backend/parser adjustments last

Exit criteria.

  • opencode can target two local providers with overlapping wire model names and hit the correct machine both times.
  • No path rewrites provider/model down to plain llama-swap/model.

W8. Operations and Final Verification

Goal. End with a repeatable operator workflow, not just a working dev branch.

Files and seams.

  • data/llama-providers.example.json
  • operator docs under docs/
  • OpenSpec tasks/status notes as needed

Implement.

  1. Document the add-a-machine flow for config-managed local providers.
  2. Document the smoke matrix for:
    • single legacy provider fallback
    • two local providers
    • duplicate model names across two providers
    • DeepSeek enabled
    • opencode local parity
  3. Record the final interface BooControl should consume: provider registry plus composite ids, not raw host env vars.

Exit criteria.

  • A third machine can be added by editing config and running the smoke matrix.
  • The implementation docs name the exact runtime contract BooControl should build on.

Verification Plan

  • pnpm -C packages/contracts build
  • pnpm -C apps/server test
  • pnpm -C apps/server build
  • pnpm -C apps/coder test
  • pnpm -C apps/coder build
  • npx tsc -p apps/web/tsconfig.app.json --noEmit

Add targeted tests as the work lands:

  • model-ref parse/format and bare-id fallback
  • provider-aware routing and DeepSeek collision cases
  • context-cache isolation for duplicate model names
  • favorites hide-not-delete behavior
  • provider snapshot and opencode bridge behavior
  • arena local-model classification across multiple providers

Main Risks

  • The W2 contract change to /api/models and W5 snapshot changes can drift across apps if contract parity is edited piecemeal. Follow the cross-app contract standard in artifacts/.discovery-notes.md and land contract-first branches.
  • W7 is the hardest seam. If the gateway is skipped and the old string rewrite is kept, the feature will look complete in UI while still routing the wrong machine.
  • model-context.ts is a hidden correctness seam. If cache keys stay bare, duplicate model names will mis-share context limits and compaction behavior even after routing is fixed.

Deferred

  • BooControl itself
  • picker search and richer filtering
  • manual favorite reordering
  • host health badges in pickers

Definition of Done

  • BooChat, native BooCoder, Arena, and opencode all support provider-aware local models end to end.
  • Legacy bare ids remain readable.
  • Two providers can expose the same wire model name without ambiguity.
  • Adding another local machine is documented and smoke-tested.
  • BooControl can start later without inventing a second provider registry.