chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions
--- a/docs/plans/multi-provider-local-models/artifacts/.discovery-notes.md
+++ b/docs/plans/multi-provider-local-models/artifacts/.discovery-notes.md
@@ -0,0 +1,126 @@
+# Discovery Notes: Multi-Provider Local Models
+
+Single source of truth for implementation context. Read this first before touching the plan or code.
+
+## Tech stack
+
+- Monorepo with pnpm workspaces.
+- `apps/server`: Fastify + Postgres, native inference, local-model routing, BooChat APIs.
+- `apps/web`: React + Vite SPA, shared chat and coder UI.
+- `apps/coder`: host-side BooCoder service, provider probing, native and external-agent dispatch, Arena, MCP.
+- `packages/contracts`: shared cross-app schemas and types, built before consumers.
+- TypeScript strict mode. Server and coder use NodeNext and `.js` import suffixes.
+- Tests: `pnpm -C apps/server test`, `pnpm -C apps/coder test`. No dedicated web test harness.
+
+## ADRs found
+
+- `docs/adr/0001-arena-two-lane-scheduling.md`
+  Summary: local llama-backed contestants run serially in one lane, cloud contestants run in parallel in another lane; multi-provider work must preserve this lane model.
+- `docs/adr/0002-arena-dedicated-tables-not-flow-runner.md`
+  Summary: Arena owns its own storage and runtime shape; reuse dispatcher machinery but do not fold Arena back into flow-runner abstractions.
+
+## Coding standards found
+
+- `docs/coding-standards/cross-app-contract-parity.md`
+  Summary: when a cross-app contract changes, update the canonical package source plus app-side secondary representations in the same batch; missing one side silently drops behavior at runtime.
+- `CLAUDE.md`
+  Summary: `packages/contracts` is the single source for provider-snapshot and message-metadata contracts, deploy-by-surface rules matter, and contract changes must respect app-local secondary unions and renderers where they still exist.
+
+## Relevant architecture notes
+
+- `apps/server/CLAUDE.md`
+  Summary: `services/inference/provider.ts` is the current llama-swap provider seam; `model-context.ts` and `compaction.ts` currently assume one upstream.
+- `apps/coder/CLAUDE.md`
+  Summary: provider snapshot and `opencode` integration are the main local-model seams; `llama-swap/*` is currently the local namespace assumption.
+- `apps/web/CLAUDE.md`
+  Summary: `ModelPicker` and `AgentComposerBar` are separate UI surfaces with different constraints; any provider snapshot loading-state change can make providers disappear from the coder UI.
+
+## Code touch points
+
+### Shared contracts and config patterns
+
+- `packages/contracts/src/provider-config.ts`
+  Existing coder ACP provider config schema; useful precedent, but not the right place to overload with local host inventory semantics.
+- `apps/coder/src/services/provider-config-registry.ts`
+  Existing pattern for schema-in-package plus app-local load/build cache.
+- `packages/contracts/src/provider-snapshot.ts`
+  Shared snapshot contract used by coder and web.
+
+### Server: catalog, routing, and downstream local-model consumers
+
+- `apps/server/src/config.ts`
+  Current env config includes `LLAMA_SWAP_URL`, `LLAMA_SIDECAR_URL`, and `DEFAULT_MODEL`; multi-provider config must enter here.
+- `apps/server/src/routes/models.ts`
+  Current `/api/models` route fetches one llama-swap and optionally DeepSeek.
+- `apps/server/src/services/inference/provider.ts`
+  Current route selection and AI SDK provider seam; central place to remove heuristic provider detection.
+- `apps/server/src/services/model-context.ts`
+  Current context cache keys by bare model string and assumes one `LLAMA_SWAP_URL`.
+- `apps/server/src/services/compaction.ts`
+  Uses `resolveModelEndpoint()` today, but still contains one-provider assumptions and a DeepSeek prefix special case.
+- `apps/server/src/services/task-model.ts`
+  Returns one resolved `{url, model}` pair today.
+- `apps/server/src/index.ts`
+  Calls `configureModelContext({ llamaSwapUrl })`; this wiring must change when context lookup becomes provider-aware.
+- `apps/server/src/routes/settings.ts`
+  Existing shared settings persistence surface; right place for `favorite_models`.
+
+### Web: BooChat and coder selection UI
+
+- `apps/web/src/components/ModelPicker.tsx`
+  Shared BooChat model picker component; currently assumes a flat `/api/models` list.
+- `apps/web/src/components/AgentComposerBar.tsx`
+  Native BooCoder provider/mode/model picker surface.
+- `apps/web/src/lib/model-label.ts`
+  Display-only model prettifier used by both pickers.
+- `apps/web/src/api/client.ts`
+  `models()` currently expects `ModelInfo[]`.
+- `apps/web/src/api/types.ts`
+  Holds the web-side API contract for `/api/models` and other cross-app payloads.
+
+### Coder: native, snapshot, arena, and external-agent bridge
+
+- `apps/coder/src/config.ts`
+  Current coder config still exposes `LLAMA_SWAP_URL`; multi-provider config must enter here too.
+- `apps/coder/src/services/provider-snapshot.ts`
+  Current snapshot fetches one `LLAMA_SWAP_URL`, prefixes local models as `llama-swap/*`, and merges them into `opencode`.
+- `apps/coder/src/services/dispatcher.ts`
+  Current native and external-agent dispatch logic still assumes local bare ids or `llama-swap/*` for local routing.
+- `apps/coder/src/services/backends/opencode-server.ts`
+  `parseModel()` splits only once at `/`; this is good news because a stable outer provider namespace can carry an inner composite model id.
+- `apps/coder/src/services/arena-model-call.ts`
+  Direct one-shot local model call against `LLAMA_SWAP_URL`.
+- `apps/coder/src/services/arena-analyzer.ts`
+  Local-vs-cloud checks rely on one local model set and one upstream.
+- `apps/coder/src/index.ts`
+  Builds the local-model set for Arena from one fetched llama-swap list.
+
+## Recent activity and churn
+
+High-churn files in the last 90 days:
+
+- `apps/web/src/api/types.ts`
+- `apps/web/src/api/client.ts`
+- `apps/server/src/index.ts`
+- `apps/server/src/types/api.ts`
+- `apps/coder/src/services/dispatcher.ts`
+- `apps/coder/src/index.ts`
+- `apps/coder/src/services/provider-snapshot.ts`
+- `apps/web/src/components/AgentComposerBar.tsx`
+- `apps/server/src/services/compaction.ts`
+
+Implication: keep work units narrow and avoid combining unrelated refactors in these files.
+
+## Constraints and load-bearing facts
+
+- `packages/contracts` already owns provider-snapshot types; if the snapshot contract changes, rebuild the package before touching consumers.
+- `apps/web` has no dedicated test harness, so web verification will rely on typecheck plus smoke testing.
+- Arena’s local lane semantics are intentional; multi-provider support must not collapse local models into parallel execution.
+- `opencode` local parity is not a small rename. The current host config and snapshot behavior collapse identity to one `llama-swap` namespace.
+
+## Gaps and unknowns
+
+- No existing shared local-provider config file or schema exists in-repo yet.
+- `/api/models` shape change is not yet specified in app-local types; W2 must settle the contract before W4 starts.
+- The final `opencode` gateway path is not implemented anywhere yet; W7 is net-new code, not just adaptation.
+- No dedicated docs for “add a machine” exist yet; W8 must create them.
--- a/docs/plans/multi-provider-local-models/artifacts/implementation-decision-log.md
+++ b/docs/plans/multi-provider-local-models/artifacts/implementation-decision-log.md
@@ -0,0 +1,109 @@
+# Implementation Decision Log: Multi-Provider Local Models
+
+This file records the implementation decisions committed while planning the multi-provider local-model rollout.
+Behavioral intent lives in [../feature-implementation-plan.md](../feature-implementation-plan.md) and the source
+artifacts it cites. Round history lives in [implementation-iteration-history.md](implementation-iteration-history.md).
+
+Source artifacts:
+
+- [../build-phase-outline.md](../build-phase-outline.md)
+- [../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/design.md)
+- [../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md](../../../openspec/changes/multi-llama-swap-providers-model-favorites/artifacts/implementation-analysis.md)
+- [../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md](../../research/2026-06-10-multi-llama-swap-providers-model-favorites.md)
+- [./.discovery-notes.md](./.discovery-notes.md)
+
+### D-1: Shared local-provider config authority
+
+- **Question:** Where does the source of truth for named local providers live, and what belongs in the shared package versus app-local loaders?
+- **Decision:** Use `/data/llama-providers.json`, wired through `LLAMA_PROVIDERS_PATH`, as the shared authority for local providers. Put the schema and pure model-ref helpers in `packages/contracts`; keep file I/O and legacy env fallback in app-local registry loaders for server and coder.
+- **Rationale:** This matches the existing BooCoder pattern of package-owned schemas plus app-local load/build caches, avoids duplicating config semantics, and avoids forcing Node-specific loader code into every consumer of the contracts package.
+- **Evidence:** `packages/contracts/src/provider-config.ts` and `apps/coder/src/services/provider-config-registry.ts` already follow this split; the current local-provider gap is that server and coder do not share any equivalent registry.
+- **Rejected alternatives:**
+  - Keep local providers env-only forever. Rejected because server and coder already drift and more machines would multiply the drift.
+  - Put file reading only in one app and make the other app consume it indirectly. Rejected because both server and coder need startup-time local-provider awareness.
+- **Driven by rounds:** R1.
+- **Referenced in plan:** Outcome, Working Assumptions, W1.
+
+### D-2: Persist and cache composite `provider/model` ids; keep wire ids bare
+
+- **Question:** What is the canonical identity format for local model selections and caches?
+- **Decision:** Persist and cache `provider/model`. Strip the provider prefix only at the final upstream call boundary. Keep indefinite support for legacy bare ids by resolving them to `defaultProvider`.
+- **Rationale:** Duplicate wire model names across machines are otherwise impossible to represent safely. This also keeps DB migrations small because the existing columns are already free-form text.
+- **Evidence:** `sessions.model` and `chats.model` are stringly typed; `apps/server/src/services/model-context.ts` currently keys by bare model and would otherwise cross-poison duplicate names.
+- **Rejected alternatives:**
+  - Keep persisted ids bare and use side metadata for provider. Rejected because many call sites already pass the model string around alone.
+  - Prefix wire calls too. Rejected because upstream llama-swap and DeepSeek calls want the actual provider-native model id.
+- **Driven by rounds:** R1.
+- **Referenced in plan:** Outcome, W1, W2, W3.
+
+### D-3: One provider-aware resolver shared across streaming, non-streaming, context, and Arena
+
+- **Question:** Should each consumer keep its own endpoint logic once multiple local providers exist?
+- **Decision:** No. Build one provider-aware resolver contract and make streaming inference, non-streaming calls, context lookup, compaction, task-model resolution, and Arena all go through it.
+- **Rationale:** The current failure mode is duplicated routing logic with slightly different heuristics. Fixing only one path would leave subtle misroutes in the others.
+- **Evidence:** `apps/server/src/services/inference/provider.ts`, `apps/server/src/services/model-context.ts`, `apps/server/src/services/compaction.ts`, `apps/server/src/services/task-model.ts`, and `apps/coder/src/services/arena-model-call.ts` all handle local-model identity separately today.
+- **Rejected alternatives:**
+  - Only unify server inference and leave context/arena separate. Rejected because that would preserve hidden correctness bugs in context limits and Arena calls.
+- **Driven by rounds:** R1.
+- **Referenced in plan:** Outcome, W2, W3, W6.
+
+### D-4: Favorites are a settings-backed user view, not a server catalog section
+
+- **Question:** Where should the Favorites concept live?
+- **Decision:** Store `favorite_models: string[]` in settings and derive the Favorites section client-side from settings plus provider inventory. The server catalog returns providers and models only.
+- **Rationale:** Inventory answers “what exists now.” Favorites answer “what this user prefers.” Keeping them separate avoids overloading the server catalog with user-specific UI state.
+- **Evidence:** `settings` already exists server-side; the OpenSpec analysis already identified favorites as a user-level concern rather than an inventory concern.
+- **Rejected alternatives:**
+  - Return a synthetic Favorites section from `/api/models`. Rejected because it entangles inventory with user preference and complicates offline/unavailable favorite behavior.
+- **Driven by rounds:** R1.
+- **Referenced in plan:** Outcome, W2, W4.
+
+### D-5: Native `boocode` parity ships before `opencode` parity
+
+- **Question:** Should native and external-agent BooCoder paths move together?
+- **Decision:** No. Native `boocode` parity is W5. `opencode` parity is W7 and does not begin until the native path is correct and the UI stops falsely advertising multi-provider local models under the old bridge.
+- **Rationale:** Native `boocode` can use the shared resolver directly. `opencode` still assumes one local-provider namespace and is the riskier seam.
+- **Evidence:** `apps/coder/src/services/provider-snapshot.ts` prefixes local models as `llama-swap/*`; `apps/coder/src/services/backends/opencode-server.ts` still assumes the outer provider namespace identifies the target upstream.
+- **Rejected alternatives:**
+  - Rename everything to `provider/model` in one pass. Rejected because the external-agent bridge would still collapse identity at the last moment.
+- **Driven by rounds:** R1.
+- **Referenced in plan:** Outcome, W5, W7.
+
+### D-6: `opencode` parity uses a `boocode-local` gateway, not a string rewrite
+
+- **Question:** What is the safe path to external-agent parity?
+- **Decision:** Add a BooCoder-hosted OpenAI-compatible local gateway and present it to `opencode` as one stable provider namespace such as `boocode-local`. The inner `modelID` carries the composite local identity like `sam-desktop/qwen3.6-35b`.
+- **Rationale:** `parseModel()` in the opencode backend already splits only once at `/`, which means a stable outer provider id can safely carry the inner composite local id. That preserves provider identity without teaching opencode about every machine directly.
+- **Evidence:** `apps/coder/src/services/backends/opencode-server.ts` `parseModel()` returns `{ providerID, modelID }` where `modelID` may contain additional slashes; current `llama-swap/<model>` mapping is the ambiguity seam.
+- **Rejected alternatives:**
+  - Keep rewriting `provider/model` back to `llama-swap/model`. Rejected because duplicate local model names would still route incorrectly.
+  - Add one direct opencode provider per local machine. Rejected because it duplicates the registry and leaks fleet structure into opencode config.
+- **Driven by rounds:** R1.
+- **Referenced in plan:** Outcome, W7.
+
+### D-7: Add-a-machine stays config-driven in this initiative
+
+- **Question:** Does this rollout include a control-plane UI for adding local machines?
+- **Decision:** No. Adding a machine stays a config-driven operation in this initiative, documented in W8. BooControl is the later UI/control-plane consumer.
+- **Rationale:** The user goal is multi-provider support now, not a new admin product before the substrate exists.
+- **Evidence:** BooControl’s own tasks call this registry work a prerequisite; current repo state has no stable local-provider substrate yet.
+- **Rejected alternatives:**
+  - Build BooControl first. Rejected because it would either duplicate registry logic or bind to today’s broken single-provider assumptions.
+- **Driven by rounds:** R1.
+- **Referenced in plan:** Outcome, W8, Deferred.
+
+### D-8: Work unit sequencing is contract-first, consumer-second, verification-third
+
+- **Question:** How should this be broken down for Orchestration so branches do not constantly collide?
+- **Decision:** Sequence every work unit as:
+  1. contracts and config
+  2. primary backend seam
+  3. downstream consumers
+  4. tests and smoke
+  and forbid parallel editing of the shared contract and resolver files.
+- **Rationale:** The churniest files in this repo are exactly the shared contract and coordinator files. Letting multiple branches edit them in parallel is the fastest path to merge thrash and subtle drift.
+- **Evidence:** Recent churn is highest in `apps/web/src/api/types.ts`, `apps/web/src/api/client.ts`, `apps/server/src/index.ts`, `apps/coder/src/services/dispatcher.ts`, and `apps/coder/src/services/provider-snapshot.ts`.
+- **Rejected alternatives:**
+  - Split by app only. Rejected because this feature crosses contracts, server, web, and coder in nearly every phase.
+- **Driven by rounds:** R1.
+- **Referenced in plan:** Orchestration Rules, Work Unit Index, all work units.
--- a/docs/plans/multi-provider-local-models/artifacts/implementation-iteration-history.md
+++ b/docs/plans/multi-provider-local-models/artifacts/implementation-iteration-history.md
@@ -0,0 +1,38 @@
+# Implementation Iteration History: Multi-Provider Local Models
+
+This file records how the implementation plan was assembled from the existing research, OpenSpec docs, and codebase review.
+Committed decisions live in [implementation-decision-log.md](implementation-decision-log.md). The primary plan lives in
+[../feature-implementation-plan.md](../feature-implementation-plan.md).
+
+## R1: Coordinator pass grounded in source docs and local code review
+
+- **Specialists engaged:** coordinator-only pass using the existing research note, OpenSpec design/tasks, implementation analysis, root and app `CLAUDE.md` files, ADRs, coding standard, and targeted code search. No separate specialist tool round was run in this repo pass.
+- **New input provided:** [../build-phase-outline.md](../build-phase-outline.md), [./.discovery-notes.md](./.discovery-notes.md), the OpenSpec batch, and the current code seams in server, web, and coder.
+- **Claim ledger:**
+
+  | # | Claim | State | Spec-maturity |
+  |---|---|---|---|
+  | C1 | There is no single source of truth for local providers shared by server and coder | Evidenced | plan-level |
+  | C2 | Composite `provider/model` ids are required for duplicate model names across hosts | Evidenced | plan-level |
+  | C3 | Routing logic is duplicated across streaming, non-streaming, context, compaction, task-model, and Arena | Evidenced | plan-level |
+  | C4 | Favorites belong in settings plus client derivation, not in the server catalog | Evidenced | plan-level |
+  | C5 | Native BooCoder can adopt the shared resolver before `opencode` can | Evidenced | plan-level |
+  | C6 | The current `opencode` bridge collapses local identity and needs a provider-preserving gateway | Evidenced | plan-level |
+  | C7 | Arena is a separate local-model consumer and must be planned explicitly | Evidenced | plan-level |
+  | C8 | BooControl depends on this substrate and should not be built first | Evidenced | plan-level |
+
+- **Open Questions raised:**
+  - OQ-1: shared local-provider authority format and location
+    Resolution: D-1, `/data/llama-providers.json` plus `LLAMA_PROVIDERS_PATH`
+  - OQ-2: canonical local model identity format
+    Resolution: D-2, composite `provider/model`
+  - OQ-3: how to achieve external-agent parity honestly
+    Resolution: D-6, `boocode-local` gateway
+  - OQ-4: whether add-a-machine is UI-driven in this batch
+    Resolution: D-7, no, keep config-driven
+
+- **Spec-maturity tags:** all findings were plan-level. No spec-stage reopening was required because the earlier research and OpenSpec docs already settled the behavior.
+- **Resolution source:** evidence from source docs plus current code inspection.
+- **Decisions produced:** D-1, D-2, D-3, D-4, D-5, D-6, D-7, D-8.
+- **Changed in plan:** initial authoring of `feature-implementation-plan.md` and its three supporting artifacts.
+- **Next-step recommendation:** go to synthesis. The work is ready to execute as W1 through W8 in order, with W7 as the main hard seam and W8 as the operational closeout.