feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
9.6 KiB
9.6 KiB
Implementation Decision Log: Multi-Provider Local Models
This file records the implementation decisions committed while planning the multi-provider local-model rollout. Behavioral intent lives in ../feature-implementation-plan.md and the source artifacts it cites. Round history lives in implementation-iteration-history.md.
Source artifacts:
- ../build-phase-outline.md
- ../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md
- ../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md
- ../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md
- ./.discovery-notes.md
D-1: Shared local-provider config authority
- Question: Where does the source of truth for named local providers live, and what belongs in the shared package versus app-local loaders?
- Decision: Use
/data/llama-providers.json, wired throughLLAMA_PROVIDERS_PATH, as the shared authority for local providers. Put the schema and pure model-ref helpers inpackages/contracts; keep file I/O and legacy env fallback in app-local registry loaders for server and coder. - Rationale: This matches the existing BooCoder pattern of package-owned schemas plus app-local load/build caches, avoids duplicating config semantics, and avoids forcing Node-specific loader code into every consumer of the contracts package.
- Evidence:
packages/contracts/src/provider-config.tsandapps/coder/src/services/provider-config-registry.tsalready follow this split; the current local-provider gap is that server and coder do not share any equivalent registry. - Rejected alternatives:
- Keep local providers env-only forever. Rejected because server and coder already drift and more machines would multiply the drift.
- Put file reading only in one app and make the other app consume it indirectly. Rejected because both server and coder need startup-time local-provider awareness.
- Driven by rounds: R1.
- Referenced in plan: Outcome, Working Assumptions, W1.
D-2: Persist and cache composite provider/model ids; keep wire ids bare
- Question: What is the canonical identity format for local model selections and caches?
- Decision: Persist and cache
provider/model. Strip the provider prefix only at the final upstream call boundary. Keep indefinite support for legacy bare ids by resolving them todefaultProvider. - Rationale: Duplicate wire model names across machines are otherwise impossible to represent safely. This also keeps DB migrations small because the existing columns are already free-form text.
- Evidence:
sessions.modelandchats.modelare stringly typed;apps/server/src/services/model-context.tscurrently keys by bare model and would otherwise cross-poison duplicate names. - Rejected alternatives:
- Keep persisted ids bare and use side metadata for provider. Rejected because many call sites already pass the model string around alone.
- Prefix wire calls too. Rejected because upstream llama-swap and DeepSeek calls want the actual provider-native model id.
- Driven by rounds: R1.
- Referenced in plan: Outcome, W1, W2, W3.
D-3: One provider-aware resolver shared across streaming, non-streaming, context, and Arena
- Question: Should each consumer keep its own endpoint logic once multiple local providers exist?
- Decision: No. Build one provider-aware resolver contract and make streaming inference, non-streaming calls, context lookup, compaction, task-model resolution, and Arena all go through it.
- Rationale: The current failure mode is duplicated routing logic with slightly different heuristics. Fixing only one path would leave subtle misroutes in the others.
- Evidence:
apps/server/src/services/inference/provider.ts,apps/server/src/services/model-context.ts,apps/server/src/services/compaction.ts,apps/server/src/services/task-model.ts, andapps/coder/src/services/arena-model-call.tsall handle local-model identity separately today. - Rejected alternatives:
- Only unify server inference and leave context/arena separate. Rejected because that would preserve hidden correctness bugs in context limits and Arena calls.
- Driven by rounds: R1.
- Referenced in plan: Outcome, W2, W3, W6.
D-4: Favorites are a settings-backed user view, not a server catalog section
- Question: Where should the Favorites concept live?
- Decision: Store
favorite_models: string[]in settings and derive the Favorites section client-side from settings plus provider inventory. The server catalog returns providers and models only. - Rationale: Inventory answers “what exists now.” Favorites answer “what this user prefers.” Keeping them separate avoids overloading the server catalog with user-specific UI state.
- Evidence:
settingsalready exists server-side; the OpenSpec analysis already identified favorites as a user-level concern rather than an inventory concern. - Rejected alternatives:
- Return a synthetic Favorites section from
/api/models. Rejected because it entangles inventory with user preference and complicates offline/unavailable favorite behavior.
- Return a synthetic Favorites section from
- Driven by rounds: R1.
- Referenced in plan: Outcome, W2, W4.
D-5: Native boocode parity ships before opencode parity
- Question: Should native and external-agent BooCoder paths move together?
- Decision: No. Native
boocodeparity is W5.opencodeparity is W7 and does not begin until the native path is correct and the UI stops falsely advertising multi-provider local models under the old bridge. - Rationale: Native
boocodecan use the shared resolver directly.opencodestill assumes one local-provider namespace and is the riskier seam. - Evidence:
apps/coder/src/services/provider-snapshot.tsprefixes local models asllama-swap/*;apps/coder/src/services/backends/opencode-server.tsstill assumes the outer provider namespace identifies the target upstream. - Rejected alternatives:
- Rename everything to
provider/modelin one pass. Rejected because the external-agent bridge would still collapse identity at the last moment.
- Rename everything to
- Driven by rounds: R1.
- Referenced in plan: Outcome, W5, W7.
D-6: opencode parity uses a boocode-local gateway, not a string rewrite
- Question: What is the safe path to external-agent parity?
- Decision: Add a BooCoder-hosted OpenAI-compatible local gateway and present it to
opencodeas one stable provider namespace such asboocode-local. The innermodelIDcarries the composite local identity likesam-desktop/qwen3.6-35b. - Rationale:
parseModel()in the opencode backend already splits only once at/, which means a stable outer provider id can safely carry the inner composite local id. That preserves provider identity without teaching opencode about every machine directly. - Evidence:
apps/coder/src/services/backends/opencode-server.tsparseModel()returns{ providerID, modelID }wheremodelIDmay contain additional slashes; currentllama-swap/<model>mapping is the ambiguity seam. - Rejected alternatives:
- Keep rewriting
provider/modelback tollama-swap/model. Rejected because duplicate local model names would still route incorrectly. - Add one direct opencode provider per local machine. Rejected because it duplicates the registry and leaks fleet structure into opencode config.
- Keep rewriting
- Driven by rounds: R1.
- Referenced in plan: Outcome, W7.
D-7: Add-a-machine stays config-driven in this initiative
- Question: Does this rollout include a control-plane UI for adding local machines?
- Decision: No. Adding a machine stays a config-driven operation in this initiative, documented in W8. BooControl is the later UI/control-plane consumer.
- Rationale: The user goal is multi-provider support now, not a new admin product before the substrate exists.
- Evidence: BooControl’s own tasks call this registry work a prerequisite; current repo state has no stable local-provider substrate yet.
- Rejected alternatives:
- Build BooControl first. Rejected because it would either duplicate registry logic or bind to today’s broken single-provider assumptions.
- Driven by rounds: R1.
- Referenced in plan: Outcome, W8, Deferred.
D-8: Work unit sequencing is contract-first, consumer-second, verification-third
- Question: How should this be broken down for Orchestration so branches do not constantly collide?
- Decision: Sequence every work unit as:
- contracts and config
- primary backend seam
- downstream consumers
- tests and smoke and forbid parallel editing of the shared contract and resolver files.
- Rationale: The churniest files in this repo are exactly the shared contract and coordinator files. Letting multiple branches edit them in parallel is the fastest path to merge thrash and subtle drift.
- Evidence: Recent churn is highest in
apps/web/src/api/types.ts,apps/web/src/api/client.ts,apps/server/src/index.ts,apps/coder/src/services/dispatcher.ts, andapps/coder/src/services/provider-snapshot.ts. - Rejected alternatives:
- Split by app only. Rejected because this feature crosses contracts, server, web, and coder in nearly every phase.
- Driven by rounds: R1.
- Referenced in plan: Orchestration Rules, Work Unit Index, all work units.