chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions
--- a/docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md
+++ b/docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md
@@ -0,0 +1,295 @@
+# Research: Integrating two named llama-swap providers ("Sam-desktop", "embedding") with provider-grouped model dropdowns and per-model favorites in BooChat and BooCoder
+
+Question: BooCode currently talks to exactly one llama-swap endpoint. How should a second named provider ("embedding", `100.90.172.55:8411`) be added alongside the renamed existing one ("Sam-desktop", `100.101.41.16:8401`), integrated into both BooChat and BooCoder, with the model dropdown grouped per provider and a favorite button per model (Favorites section listed first)?
+
+Evidence mode: **strict** (default — every recommendation-bearing claim is corroborated or explicitly caveated).
+
+## Summary
+
+Both machines can be added to BooCode as named providers, and the right way is to give BooCode a small provider registry (a name and base URL per machine) and to store selected models as a "provider/model" pair instead of a bare name. Bare names cannot work here: five models exist on both machines under identical names today, and the configured default model has already drifted out of the live list once — so favorites and routing keyed by name alone would be ambiguous and fragile. The dropdown should follow the pattern proven in VS Code's model picker: a Favorites section on top, then one section per provider (Sam-desktop first, then embedding), a star on every row, favorited models staying visible in their provider section, and favorites that are hidden — never deleted — when a machine is offline.
+
+The adversarial validation pass confirmed the direction but showed the change is wider than the obvious spots: chat compaction, context-window lookup, arena battles, the coder's opencode dispatch, and the sidecar routing default all silently assume a single endpoint and need the same provider-resolution change. Two extra hazards were found in the live data: a model on the embedding host literally named `deepseek-r1-qwen3-8b` trips BooCode's "starts with deepseek-" cloud-routing heuristic, and the always-on sidecar default route would swallow embedding-bound requests. The embedding host does **not** need its own llama-sidecar — but sidecar routing must become a Sam-desktop-only attribute.
+
+Well-corroborated: live data from both hosts, direct code evidence, and multiple independent web sources agree; validation expanded the implementation scope but did not overturn the choice.
+
+- **Confidence:** High
+
+## Research Results
+
+### What exists today (codebase — current-state anchor)
+
+BooCode's entire inference surface assumes one llama-swap endpoint, configured as `LLAMA_SWAP_URL=http://100.101.41.16:8401` with `DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4` (A58). The single-endpoint assumption is hard-coded in at least nine places:
+
+1. `GET /api/models` fetches only `{LLAMA_SWAP_URL}/v1/models` (plus DeepSeek cloud when `DEEPSEEK_API_KEY` is set) and returns a flat `ModelInfo[]` with no provider tag (A59).
+2. `upstreamModel()` routes by string heuristics: model IDs starting `deepseek-` go to the DeepSeek cloud API; agents with `llama_extra_args` go to the sidecar; **and when `LLAMA_SIDECAR_URL` is configured at all — which it is in docker-compose — every remaining request routes through the sidecar by default**, falling back to llama-swap only when no sidecar is configured (A60). The provider for each base URL is a cached AI-SDK `createOpenAICompatible` instance.
+3. `resolveModelEndpoint()` (used by compaction and task-model for non-streaming calls) returns `LLAMA_SWAP_URL` for every non-DeepSeek model (A60, A67).
+4. `model-context.ts` fetches `{LLAMA_SWAP_URL}/upstream/<model>/props` for context windows, with a **no-TTL positive cache keyed by the raw model string**, and a `deepseek-` prefix guard that short-circuits to a static 131,072 context without calling any upstream (A66).
+5. `task-model.ts` (auto-naming, summaries) falls back through `FAST_MODEL → chat model → DEFAULT_MODEL` against the single URL (A68).
+6. Arena battles call `{LLAMA_SWAP_URL}/v1/chat/completions` directly with no routing abstraction at all (A69).
+7. The coder's provider snapshot fetches the single llama-swap list and prefixes every ID with `llama-swap/` (A63); its dispatcher prefixes any bare (slash-less) model ID with `llama-swap/` before opencode dispatch, and passes any ID already containing `/` through unchanged (A64).
+8. Model IDs persist as bare strings: `sessions.model TEXT NOT NULL`, `chats.model TEXT` nullable, validated only as a 1–200-char string (A65).
+9. The BooChat dropdown (`ModelPicker.tsx`) and the BooCoder picker (`CompactPicker` inside `AgentComposerBar.tsx`) are flat lists with no grouping, search, or favorites; the coder picker persists per-provider preferences in browser localStorage, while BooChat model choice is server-persisted on the session row (A61, A70). Display code already strips `llama-swap/`-style prefixes when rendering model chips (A71). No favorites/pinning mechanism exists anywhere; the `settings` table is a key-value JSONB store currently holding `default_model` and theme keys (A65).
+
+The coder's runtime provider config (`data/coder-providers.json`) has no `baseUrl` field — there is no way to register a second llama-swap endpoint today (A72).
+
+### What the two hosts actually serve (provided material, retrieved live 2026-06-10)
+
+- **embedding** (`100.90.172.55:8411`, Linux, P104-100 8GB Pascal GPU): 39 models, skewed small — gemma-3-270m through gemma-4-12b, the LFM2.5 family, granite-4.1-3b/8b, qwen3.5-0.8b/4b/9b, qwopus3.5 family, `deepseek-r1-qwen3-8b`, a reranker, extraction models (A54). Its llama-swap config is hand-tuned per model (flash-attn/KV-quant choices for Pascal, ttl 1800), with llama.cpp built from source on the box (A56).
+- **Sam-desktop** (`100.101.41.16:8401`, Windows): 21 models, skewed large — qwen3.6-35b-a3b/27b, qwopus3.6 family, granite-4.1-30b, mellum2-12b, nemotron-cascade-2-30b-a3b, north-mini-code, etc. Served by `D:\llama-server` (llama.cpp CUDA build b9591) behind `D:\llama-swap` (llama-swap v224), models in `D:\models`; a `D:\llama-sidecar` directory backs the existing sidecar at `:8402` (A55, A57).
+
+Three load-bearing facts fall out of the live inventories:
+
+- **Five model IDs exist on both hosts**: `granite-4.1-8b`, `negentropy-4.7-9b`, `qwen3.5-9b`, `qwen3.5-9b-deepseek-v4`, `qwopus3.5-9b-coder` (A54, A55). Bare-ID favorites or routing are therefore ambiguous from day one.
+- **The configured `DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4` is not in Sam-desktop's current model list** (closest: `qwen3.6-35b-a3b`) — model IDs already churn in practice, so favorites must tolerate stale references (A55, A58).
+- **`deepseek-r1-qwen3-8b` on the embedding host collides with BooCode's `deepseek-` heuristics**: with `DEEPSEEK_API_KEY` set it would be routed to the DeepSeek cloud API, and the context-window guard returns a fake 131k context on the name prefix alone regardless (A54, A60, A66).
+
+### How llama-swap identifies models (web, corroborated)
+
+llama-swap model IDs are exactly the YAML keys in its `config.yaml`; `/v1/models` can additionally carry optional per-model `name`, `description`, and arbitrary `metadata` from config — fields neither of Sam's hosts currently populates (A1–A4, A54, A55). llama-swap has **no instance-identity field**: two instances are distinguishable only by host:port (A3). `/running` reports load state per model (A1, A12). Peer federation exists (one llama-swap aggregating another), but peer-served models surface as `"peer-name: model-name"` IDs [single-source: A6] and same-ID collisions resolve silently to the lexicographically-first peer (A5) — and, decisive without any web source, BooCode would still see one flat list with no native grouping while the two hosts' uptime becomes coupled. Standalone llama.cpp `llama-server` defaults its `/v1/models` ID to the model file path unless `--alias` is set (A8, A9) — relevant only if a host ever bypasses llama-swap.
+
+### How mature clients solve exactly this (web, corroborated)
+
+Every major OpenAI-compatible client library handles multiple same-protocol providers with **separate named provider instances, each with its own baseURL, namespaced in the client's registry as `provider:model` / `provider/model`** — the model ID actually sent on the wire to each backend stays the bare upstream ID (Vercel AI SDK provider registry: A13, A14; LiteLLM model_list: A15, A16). BooCode already uses the AI SDK's `createOpenAICompatible` (A60) and the coder already namespaces with a `llama-swap/` prefix (A63, A64), so this pattern is an extension of existing conventions, not a new idiom.
+
+### Dropdown + favorites prior art (web)
+
+The closest shipped implementation of the requested UX is VS Code's model picker: models grouped by provider, a pin icon revealed on hover, pinned models lifted into a dedicated top section in stable insertion order, **while remaining visible in their provider group** (display copy, not move) (A45, A46). Cherry Studio independently demonstrates the key-collision lesson: its model identity is the composite `{id, provider}` precisely so two providers serving the same model name don't collide (A35, A36) [third-party code reference; unverifiable from here — supporting color only, see V8]. Open WebUI documents the two pitfalls to avoid: favorites keyed by bare model ID become ambiguous the moment two connections serve the same name (A27), and its stale-pin cleanup **permanently deletes** pins when a backend is temporarily down (A23) — the correct behavior is to hide unavailable favorites and restore them when the host returns. LibreChat groups via admin-configured YAML and added pinning in v0.8.5 (A28, A29). Jan, Chatbox, SillyTavern, Continue.dev, BigAGI, and LM Studio offer weaker or no equivalents (A32–A34, A38–A44, A47–A52) — none contradicts the VS Code pattern.
+
+### Does embedding need a llama-sidecar? No.
+
+The llama-sidecar is a Go daemon on Sam-desktop providing a per-agent llama-server process pool so agents can carry `llama_extra_args` (cache quant, spec decoding, slot save) injected via an `X-Agent-Flags` header (A60, A74). The embedding host needs none of that: its per-model tuning is baked directly into its llama-swap `config.yaml` (A56), and no per-agent flag injection applies to it. **However**, `resolveRoute` currently makes the sidecar the default route for *all* non-DeepSeek inference whenever `LLAMA_SIDECAR_URL` is set (A60) — so under the multi-provider design, sidecar routing must become an attribute of the Sam-desktop provider entry (e.g. optional `sidecarUrl` per provider), not a global default; otherwise requests for embedding-hosted models would be sent to a sidecar that only manages Sam-desktop processes.
+
+### Openspec conventions for the follow-up plan (codebase)
+
+Per-batch docs land in `openspec/changes/<slug>/` with `proposal.md` (why + scope), `tasks.md` (numbered/checkbox action list), and optional `design.md` (architecture/data-model decisions); slugs are lowercase-hyphenated from the batch title (A73). This feature is a natural three-file batch — the provider registry + routing is design-heavy, so `design.md` is warranted.
+
+## Options to Consider
+
+### O1: Named provider registry with composite model IDs (`<provider>/<model>`)
+
+- **What it is:** BooCode config gains a provider list (`{ name, baseUrl, sidecarUrl? }` per entry — "sam-desktop" and "embedding"). Models are stored and selected as `sam-desktop/qwen3.6-35b-a3b`, `embedding/gemma-4-12b`. `/api/models` returns provider-tagged groups; one routing resolver (provider prefix → baseURL, bare wire ID) replaces every `LLAMA_SWAP_URL` hardcode; bare legacy IDs fall back to the default provider (sam-desktop). Favorites, caches, and attribution all key on the composite ID.
+- **Trade-offs:** Touches every call site that assumes one endpoint (the nine sites above — see Validation for the full list); needs a deliberate legacy-bare-ID fallback for existing session/chat rows and the seeded `default_model`; the coder's opencode namespace (`llama-swap/`) needs an explicit translation rule. In exchange: no DB schema change for model columns, no llama-swap config changes on either host, matches the AI-SDK idiom BooCode already uses and the coder's existing prefix convention, and makes the `deepseek-` heuristic unnecessary for prefixed IDs.
+- **Rests on:** (A13, A14, A15, A16) for the pattern; (A54, A55) for the collision necessity; (A60, A63, A64) for fit with existing code.
+- **Evidence status:** corroborated.
+
+### O2: Bare model IDs plus a separate `provider` field everywhere
+
+- **What it is:** Keep model strings as-is and add a `provider` column/field through `sessions`, `chats`, WS frames, `ModelInfo`, `ProviderModel`, and every read path.
+- **Trade-offs:** Avoids string munging and display-time prefix stripping, but is strictly more invasive: two schema migrations, a `WsFrameSchema` change rebuilt through `@boocode/contracts`, and every consumer updated in lockstep — while favorites still need a composite key anyway. Higher blast radius for the same outcome.
+- **Rests on:** (A65, A62) for the touched surfaces.
+- **Evidence status:** corroborated (codebase-derived).
+
+### O3: llama-swap peer federation (Sam-desktop aggregates embedding as a peer)
+
+- **What it is:** Configure embedding as a `peers:` entry in Sam-desktop's llama-swap; BooCode keeps a single endpoint.
+- **Trade-offs:** Rejected on codebase-observable grounds: BooCode would still see one flat list (no native named grouping — the feature's whole point), the two hosts' availability becomes coupled, and it requires operational changes on a host outside this repo. Additionally, peer-served model IDs surface as `"peer-name: model-name"` [single-source: A6] with silent first-lexicographic collision resolution (A5).
+- **Rests on:** (A5, A6) plus codebase observation (A59, A61).
+- **Evidence status:** rejection corroborated by codebase facts; the peer ID-format detail is single-source (caveated) and not load-bearing.
+
+### O4: External aggregator proxy (LiteLLM) in front of both hosts
+
+- **What it is:** A LiteLLM proxy with a `model_list` mapping unique aliases to each host; BooCode keeps one endpoint.
+- **Trade-offs:** Proven pattern (A15, A16) but adds a third always-on service with a manually-maintained catalog (no auto-discovery from `/v1/models`), an extra network hop, and still no provider grouping signal unless encoded in alias naming conventions. Overweight for a single-user self-hosted system.
+- **Rests on:** (A15, A16).
+- **Evidence status:** corroborated.
+
+### Sub-decision — favorites persistence
+
+- **O5a: Server-side, in the `settings` table** (e.g. `favorite_models: string[]` of composite IDs). Survives browsers/devices — and multi-device use is real here (the repo's own docs describe side-by-side iPhone debugging), matching how BooChat model choice is already server-persisted on the session row. Costs a PATCH per star toggle and needs a "hide stale, never delete" rule (A23) plus acceptance that stale composite keys linger until manually unfavorited.
+- **O5b: Browser localStorage**, extending the coder's existing `boocode.coder.agent-prefs` pattern (A70). Zero API surface, but per-device, per-browser, and split across the two UIs.
+- **Evidence status:** both corroborated; the cross-device argument for O5a is codebase-derived inference from documented usage, not a measured requirement.
+
+## Recommendation
+
+- **Recommendation:** **O1** — named provider registry with `<provider>/<model>` composite IDs — combined with the VS Code-pattern dropdown (Favorites on top in stable insertion order, then Sam-desktop's models, then embedding's; star toggle per row; favorited models remain listed in their provider group) and **O5a** server-side favorites keyed by composite ID. Non-negotiable design constraints carried in from validation:
+  1. Prefix-strip **only** at wire-URL construction; caches (notably `model-context.ts`'s no-TTL positive cache) key on the **full composite ID**, or the five name-collided models cross-pollute context windows between hosts (V7).
+  2. The coder dispatcher must translate composite prefixes for opencode (map the default provider to the existing `llama-swap/` namespace, or register new opencode providers) — the current pass-through of any slash-containing ID would hand opencode an unknown provider key (V1).
+  3. Every single-endpoint call site is in scope: `provider.ts` (`upstreamModel` + `resolveModelEndpoint`), `models.ts`, `model-context.ts` (including its `deepseek-` static-context guard), `compaction.ts`, `task-model.ts`, `arena-model-call.ts` (+ arena callers, coder-side config), coder `provider-snapshot.ts`, coder `dispatcher.ts` (V2–V4, V9).
+  4. Sidecar routing becomes a Sam-desktop provider attribute, not the global default route — embedding needs no sidecar (A60, A74; post-validation verification).
+  5. Bare legacy IDs (existing rows, seeded `default_model`) resolve to the default provider indefinitely — new sessions inherit a bare seeded default until settings are migrated, so this is a permanent fallback, not a one-time migration (V2).
+  6. Favorites that reference unavailable models are hidden, never auto-deleted (A23).
+- **Evidence basis:** The option choice rests on corroborated evidence throughout: the multi-provider client pattern (A13–A16), the live collision and churn data from both hosts (A54, A55, A58 — provided material, independently re-checkable), and codebase fit (A60, A63, A64). The UX pattern rests on corroborated documentation (A45, A46) with the Open WebUI pitfalls as corroborated counter-evidence (A23, A27); the Cherry Studio and VS Code *code-level* references are unverifiable third-party color (V8) and nothing rests on them alone. The single-source peer-ID format (A6) supports only the rejection of O3, which stands independently on codebase facts. The cross-device justification for O5a is codebase-derived inference (documented multi-device usage), explicitly not measured evidence.
+
+## Validation
+
+Adversarial validation attacked the evidence, framing, recommendation, and gathering integrity. Findings (condensed; all code-verified by the validator in this repo):
+
+### V1: "O1 extends the coder's prefix convention" was overstated
+- **Strategy:** Challenge the Recommendation
+- **Investigation:** `dispatcher.ts:1006-1011`, coder CLAUDE.md, `provider-snapshot.ts:66-72`.
+- **Result:** Refuted as originally framed — a stored `sam-desktop/<model>` passes the dispatcher's slash-check unchanged and reaches opencode as an unknown provider key; `llama-swap/` is hardcoded in ≥4 coder locations.
+- **Impact:** Recommendation now mandates an explicit opencode namespace-translation rule (constraint 2).
+
+### V2: The bare-ID legacy fallback was asserted, not designed
+- **Strategy:** Challenge the Recommendation
+- **Investigation:** `provider.ts:115-135`, `stream-phase.ts:110`, `sessions.ts:113-117`, `schema.sql:222`, `model-context.ts:77`.
+- **Result:** Partially refuted — architecturally plausible but unimplemented; prefixed IDs would 404 the `/upstream/<model>/props` fetch and break context/compaction display; the seeded bare `default_model` makes the fallback permanent, not migratory.
+- **Impact:** Constraints 1, 3, 5 added.
+
+### V3: The `deepseek-` hazard is wider than routing
+- **Strategy:** Challenge the Evidence
+- **Investigation:** `model-context.ts:40-49`, `provider.ts:98`, `compaction.ts:531`.
+- **Result:** Confirmed with added scope — the context guard fires on the name prefix alone, returning a fake 131k context for embedding's `deepseek-r1-qwen3-8b` even after routing is fixed.
+- **Impact:** `model-context.ts` guard added to the touch-list (constraint 3).
+
+### V4: `compaction.ts` is a missed hardcode site
+- **Strategy:** Challenge the Evidence
+- **Investigation:** `compaction.ts:351-357` → `resolveModelEndpoint` (`provider.ts:139-157`).
+- **Result:** Refuted the original C9 list as incomplete — compaction summarization calls would go to the wrong host for embedding models.
+- **Impact:** Added to the touch-list (A67, constraint 3).
+
+### V5: Server-side favorites needed justification against the coder's localStorage pattern
+- **Strategy:** Challenge the Assumptions
+- **Investigation:** `AgentComposerBar.tsx:33-52`, `routes/settings.ts`, root CLAUDE.md auth model.
+- **Result:** Partially refuted — the Open WebUI bug distinguishes auto-delete vs hide, not server vs client storage; the original justification conflated the two.
+- **Impact:** O5a/O5b reframed as an explicit sub-decision; O5a retained on the cross-device argument, labeled as inference.
+
+### V6: O3's rejection over-relied on a single-source claim
+- **Strategy:** Challenge the Evidence-Gathering Integrity
+- **Result:** Confirmed with a provenance note — O3 is independently rejectable from codebase facts; the stale GitHub issue is demoted to supporting color.
+- **Impact:** O3 rejection rewritten to lead with codebase-observable reasons.
+
+### V7: Composite IDs + naive prefix-stripping would poison the no-TTL context cache
+- **Strategy:** Challenge the Recommendation
+- **Investigation:** `model-context.ts:9, 26-29, 77-100`; the five cross-host duplicate IDs.
+- **Result:** Refuted the unstated design — stripping before the cache key shares entries across providers with different real context windows, permanently until restart.
+- **Impact:** Constraint 1 (composite cache key, strip only at URL construction) — the most subtle required design rule.
+
+### V8: Third-party code references (Cherry Studio, VS Code PR) are unverifiable
+- **Strategy:** Challenge the Evidence-Gathering Integrity
+- **Result:** Partially refuted their evidentiary weight — retained as color; the composite-key argument stands on BooCode's own conventions and the live collision data.
+- **Impact:** Evidence basis re-worded; nothing rests on those references alone.
+
+### V9: Arena is the most exposed hardcode
+- **Strategy:** Challenge the Evidence
+- **Investigation:** `arena-model-call.ts:16-28`, `arena-analyzer.ts:90`.
+- **Result:** Confirmed with elevated severity — raw fetch, no abstraction, lives in `apps/coder` with its own config type (cannot reuse the server's resolver as-is).
+- **Impact:** Listed as separate coder-side scope (constraint 3).
+
+### Adjustments Made
+
+The recommendation survived but was rewritten: the implementation constraints (composite cache keys, opencode namespace translation, the full nine-site touch-list, permanent bare-ID fallback, hidden-not-deleted favorites) were folded into the Recommendation itself; O3's rejection was re-grounded in codebase facts; the favorites-persistence choice was reframed as an explicit sub-decision; unverifiable third-party code references were demoted to supporting color. Post-validation, the orchestrator additionally verified in `provider.ts` that the sidecar is the *default* route whenever `LLAMA_SIDECAR_URL` is set — adding constraint 4 (sidecar becomes a per-provider attribute; embedding needs none).
+
+### Confidence Assessment
+
+- **Confidence:** High — for the option choice. The validator rated the pre-adjustment synthesis Medium because the implementation scope was understated; that scope is now enumerated above, and no finding challenged the direction (its own words: "architecturally sound given the existing `llama-swap/` convention").
+- **Remaining Risks:** (1) The opencode-side translation (V1) may also require host-side `~/.config/opencode/opencode.json` changes — outside this repo. (2) Stale favorite keys accumulate in `settings` with no cleanup mechanism by design (hide-don't-delete); acceptable for single-user but unbounded. (3) The exact `/running` JSON envelope and llama-swap peer aggregation details remain single-source — neither is load-bearing. (4) The five duplicate-ID models make any partial rollout (one call site migrated, another not) actively dangerous; the routing resolver should land as one batch.
+
+## Sources
+
+| ID | Source | Link / location | Retrieved | Trust class | Summary (one line) | Evidence status |
+|---|---|---|---|---|---|---|
+| A1 | llama-swap README | github.com/mostlygeek/llama-swap | 2026-06-10 | web | Proxy hot-swapping local inference servers; documents /v1/models, /running, /upstream, /health; v224 current | corroborated by A2, A3, A12 |
+| A2 | llama-swap configuration.md | github.com/mostlygeek/llama-swap/blob/main/docs/configuration.md | 2026-06-10 | web | Model IDs are YAML keys; per-model name/description/aliases/metadata/ttl/useModelName; includeAliasesInList | corroborated by A3, A4 |
+| A3 | llama-swap config-schema.json | github.com/mostlygeek/llama-swap/blob/main/config-schema.json | 2026-06-10 | web | Authoritative config schema; peers section; **no instance-identity field at any level** | corroborated by A2, A4 |
+| A4 | llama-swap config.example.yaml | github.com/mostlygeek/llama-swap/blob/main/config.example.yaml | 2026-06-10 | web | Annotated example: aliases, useModelName, metadata, groups, peers | corroborated by A2, A3 |
+| A5 | DeepWiki: llama-swap peers | deepwiki.com/mostlygeek/llama-swap/3.7-peer-configuration | 2026-06-10 | web | Duplicate peer model IDs route to first-lexicographic peer with only a warning | corroborated by A6 (collision); single source on aggregation detail |
+| A6 | llama-swap issue #539 | github.com/mostlygeek/llama-swap/issues/539 | 2026-06-10 | web | Peer models surface as "peer-name: model-name" IDs; stale, unresolved | single source (caveated) |
+| A7 | llama-swap issue #538 | github.com/mostlygeek/llama-swap/issues/538 | 2026-06-10 | web | Aliases hidden from /v1/models unless includeAliasesInList | corroborated by A2, A3 |
+| A8 | llama.cpp server README | github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md | 2026-06-10 | web | /v1/models id defaults to file path; --alias overrides; meta block fields | corroborated by A9, A10 |
+| A9 | llama.cpp discussion #8547 | github.com/ggml-org/llama.cpp/discussions/8547 | 2026-06-10 | web | Confirms file-path default id; --override-kv doesn't change API id | corroborated by A8 |
+| A10 | llama.cpp issue #17860 | github.com/ggml-org/llama.cpp/issues/17860 | 2026-06-10 | web | Only one --alias per llama-server today | corroborated by A8 |
+| A11 | LM4eu/llama-swap Go pkg docs | pkg.go.dev/github.com/LM4eu/llama-swap/proxy | 2026-06-10 | web | Model struct {Id, Name, Description, State, Unlisted}; fork, not upstream | single source (caveated) |
+| A12 | glukhov.org llama-swap quickstart | glukhov.org/llm-hosting/llama-swap/ | 2026-06-10 | web | /running state values; alias listing behavior | corroborated by A1, A2 |
+| A13 | Vercel AI SDK provider management | ai-sdk.dev/docs/ai-sdk-core/provider-management | 2026-06-10 | web | Registry namespaces models as providerId:modelId; per-provider baseURL | corroborated by A14 |
+| A14 | Vercel AI SDK OpenAI-compatible providers | ai-sdk.dev/providers/openai-compatible-providers | 2026-06-10 | web | createOpenAICompatible takes name+baseURL per provider; wire model ID unchanged | corroborated by A13 |
+| A15 | LiteLLM OpenAI-compatible docs | docs.litellm.ai/docs/providers/openai_compatible | 2026-06-10 | web | Per-entry api_base; aliasing decouples client name from upstream name | corroborated by A16 |
+| A16 | McDermott: Centralizing LLMs with LiteLLM | robert-mcdermott.medium.com/...9874563f3062 | 2026-06-10 | web | model_list with unique model_name per upstream resolves collisions | corroborated by A15 |
+| A17 | DeepWiki: llama-swap groups | deepwiki.com/mostlygeek/llama-swap/3.4-groups-and-swapping-policies | 2026-06-10 | web | Groups/matrix control concurrency, not model IDs | corroborated by A2–A4 |
+| A18 | llama-swap releases | github.com/mostlygeek/llama-swap/releases | 2026-06-10 | web | v219–v224 changed routing/perf, not /v1/models schema | single source (caveated) |
+| A19 | Open WebUI discussion #3443 | github.com/open-webui/open-webui/discussions/3443 | 2026-06-10 | web | Pin-in-dropdown feature request; drag-reorder workaround breaks | corroborated by A21, A23 |
+| A20 | Open WebUI discussion #5902 | github.com/open-webui/open-webui/discussions/5902 | 2026-06-10 | web | Filtering 70+ models; whitelist vs hide patterns | corroborated by A19 |
+| A21 | Open WebUI env config reference | docs.openwebui.com/reference/env-configuration/ | 2026-06-10 | web | DEFAULT_PINNED_MODELS; settings.pinnedModels sorts pinned to top | corroborated by A22, A23 |
+| A22 | Open WebUI database schema | docs.openwebui.com/reference/database-schema/ | 2026-06-10 | web | Pins live in user.settings JSON, keyed by **bare model ID** | corroborated by A21 |
+| A23 | Open WebUI discussion #23656 | github.com/open-webui/open-webui/discussions/23656 | 2026-06-10 | web | Stale-pin cleanup permanently deletes pins during backend downtime | corroborated by A21, A53 |
+| A24 | Open WebUI discussion #14854 | github.com/open-webui/open-webui/discussions/14854 | 2026-06-10 | web | Unpin buried in three-dot menu; discoverability failure | corroborated by A21 |
+| A25 | Open WebUI issue #19183 | github.com/open-webui/open-webui/issues/19183 | 2026-06-10 | web | Local/External/All tabs + tag chips + Fuse.js search in selector | corroborated by A26 |
+| A26 | Open WebUI discussion #21502 | github.com/open-webui/open-webui/discussions/21502 | 2026-06-10 | web | Flat select unusable at OpenRouter scale; optgroup/search proposals | corroborated by A25 |
+| A27 | Open WebUI discussion #4495 | github.com/open-webui/open-webui/discussions/4495 | 2026-06-10 | web | Same-named models from two connections are indistinguishable (bare-ID failure) | corroborated by A25, A26 |
+| A28 | LibreChat model specs docs | librechat.ai/docs/configuration/librechat_yaml/object_structure/model_specs | 2026-06-10 | web | Admin YAML `group` field creates named collapsible sections | corroborated by A29 |
+| A29 | LibreChat v0.8.5 changelogs | librechat.ai/changelog/v0.8.5 | 2026-06-10 | web | Pin support for model specs added (PR #11219) | corroborated by A30; persistence detail single-source |
+| A30 | LibreChat discussion #11044 | github.com/danny-avila/LibreChat/discussions/11044 | 2026-06-10 | web | Pinning exists; preset-active confusion | corroborated by A29 |
+| A31 | DeepWiki: LibreChat DB models | deepwiki.com/danny-avila/LibreChat/7.1-database-models | 2026-06-10 | web | MongoDB/Mongoose; pinned-spec field name unconfirmed | single source (caveated) |
+| A32 | Jan v0.6.9 changelog | jan.ai/changelog/2025-08-28-image-support | 2026-06-10 | web | "Favorite models" shipped; no UI detail | single source (caveated) |
+| A33 | Jan manage-models docs | jan.ai/docs/desktop/manage-models | 2026-06-10 | web | Organized by source/quantization tier, not provider | corroborated by A32 |
+| A34 | Jan data-folder docs | jan.ai/docs/desktop/data-folder | 2026-06-10 | web | Settings in local JSON files | corroborated by A32 |
+| A35 | DeepWiki: Cherry Studio models | deepwiki.com/CherryHQ/cherry-studio/5.3-model-configuration-and-capabilities | 2026-06-10 | web | Provider-grouped UI; getModelUniqId composite {id, provider} | corroborated by A36 (see V8 caveat) |
+| A36 | Cherry Studio ModelService.ts | github.com/CherryHQ/cherry-studio/.../ModelService.ts | 2026-06-10 | web | Composite-key implementation | corroborated by A35 (see V8 caveat) |
+| A37 | Cherry Studio releases | github.com/CherryHQ/cherry-studio/releases | 2026-06-10 | web | No favorites changes v1.9.1–v1.9.11 | single source (caveated) |
+| A38 | Chatbox issue #1540 | github.com/chatboxai/chatbox/issues/1540 | 2026-06-10 | web | Favorite-models proposal; not shipped | corroborated by A39 |
+| A39 | Chatbox issue #2252 | github.com/chatboxai/chatbox/issues/2252 | 2026-06-10 | web | Two-section dropdown proposal (Preferred on top, star per row) | corroborated by A38 |
+| A40 | DeepWiki: Chatbox local models | deepwiki.com/chatboxai/chatbox/4.6-local-model-integration | 2026-06-10 | web | settings.favoritedModels in localStorage | single source (caveated) |
+| A41 | SillyTavern PR #5536 | github.com/SillyTavern/SillyTavern/pull/5536 | 2026-06-10 | web | Unified sort/group settings drawer across providers | corroborated by A42 |
+| A42 | SillyTavern 1.13.5 notes | github.com/SillyTavern/SillyTavern/discussions/4660 | 2026-06-10 | web | Sort/group shipped in 1.13.5 | corroborated by A41 |
+| A43 | SillyTavern connection profiles docs | docs.sillytavern.app/usage/core-concepts/connection-profiles/ | 2026-06-10 | web | Profiles = saved config snapshots, not per-model favorites | corroborated by A44 |
+| A44 | SillyTavern issue #4565 | github.com/SillyTavern/SillyTavern/issues/4565 | 2026-06-10 | web | Better model selector request closed not-planned | corroborated by A43 |
+| A45 | VS Code language models docs | code.visualstudio.com/docs/agent-customization/language-models | 2026-06-10 | web | Provider groups + hover pin + dedicated Pinned top section, stable order, model stays in group | corroborated by A46 |
+| A46 | vscode-copilot-chat PR #1111 | github.com/microsoft/vscode-copilot-chat/pull/1111 | 2026-06-10 | web | BYOK models grouped into a category | corroborated by A45 (see V8 caveat) |
+| A47 | Continue.dev model roles docs | docs.continue.dev/customize/model-roles/00-intro | 2026-06-10 | web | Role-based dropdowns; no grouping/favorites | corroborated by A48 |
+| A48 | Continue.dev providers overview | docs.continue.dev/customize/model-providers/overview | 2026-06-10 | web | Picker reflects config.yaml order | corroborated by A47 |
+| A49 | Open WebUI discussion #15449 | github.com/open-webui/open-webui/discussions/15449 | 2026-06-10 | web | Multi-model combination pinning request | single source (caveated) |
+| A50 | BigAGI repo + changelog | github.com/enricoros/big-AGI | 2026-06-10 | web | No grouping/favorites evidence (negative finding) | single source (caveated) |
+| A51 | LM Studio v0.4.0 changelog | lmstudio.ai/changelog/lmstudio-v0.4.0 | 2026-06-10 | web | Search/format filters; no favorites | corroborated by A52 |
+| A52 | LM Studio v0.4.13 changelog | lmstudio.ai/changelog/lmstudio-v0.4.13 | 2026-06-10 | web | No picker changes | corroborated by A51 |
+| A53 | Open WebUI issue #22578 | github.com/open-webui/open-webui/issues/22578 | 2026-06-10 | web | Model enable/disable state goes stale on catalog change | corroborated by A23 |
+| A54 | embedding host live inventory | provided: `curl http://100.90.172.55:8411/v1/models` + `/running` | 2026-06-10 | provided | 39 models incl. deepseek-r1-qwen3-8b and 5 IDs duplicated on Sam-desktop; /running empty | corroborated by A56 (config matches) |
+| A55 | Sam-desktop live inventory | provided: `curl http://100.101.41.16:8401/v1/models` + `/running` | 2026-06-10 | provided | 21 models; qwen3.6-35b-a3b-mxfp4 absent; nemotron-omni running via D:\llama-server | corroborated by A57 |
+| A56 | embedding host SSH inventory | provided: `ssh samkintop@100.90.172.55` (~/llama-swap/config.yaml, ~/llama.cpp, ~/models) | 2026-06-10 | provided | P104-tuned llama-swap config (ttl 1800, per-model llama-server cmds); llama.cpp source build | corroborated by A54 |
+| A57 | Sam-desktop SSH inventory | provided: `ssh samki@100.101.41.16` (dir D:\) | 2026-06-10 | provided | D:\llama-server (b9591 CUDA), D:\llama-swap (v224), D:\models, D:\llama-sidecar | corroborated by A55 |
+| A58 | Current env config | `.env`, `apps/coder/.env.host` | n/a | codebase | LLAMA_SWAP_URL=http://100.101.41.16:8401; DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (both apps) | corroborated (read directly) |
+| A59 | Models route | `apps/server/src/routes/models.ts:14-56` | n/a | codebase | GET /api/models fetches only LLAMA_SWAP_URL (+DeepSeek); flat untagged list | corroborated (read directly) |
+| A60 | Inference provider/routing | `apps/server/src/services/inference/provider.ts:1-163` | n/a | codebase | resolveRoute: deepseek- prefix → cloud; LLAMA_SIDECAR_URL set → sidecar default for everything; else single swap; resolveModelEndpoint hardcodes LLAMA_SWAP_URL | corroborated (read directly) |
+| A61 | BooChat model picker | `apps/web/src/components/ModelPicker.tsx:14-133` | n/a | codebase | Flat lazy list, no grouping/search/favorites; PATCHes session.model | corroborated (explorer + validator) |
+| A62 | Provider snapshot contracts | `packages/contracts/src/provider-snapshot.ts` | n/a | codebase | ProviderModel has no provider field; identity implicit in parent entry name | corroborated |
+| A63 | Coder provider snapshot | `apps/coder/src/services/provider-snapshot.ts:48-70,256-310` | n/a | codebase | Prefixes single llama-swap list with `llama-swap/`; merges into boocode entry | corroborated |
+| A64 | Coder dispatcher prefixing | `apps/coder/src/services/dispatcher.ts:1006-1011` | n/a | codebase | Bare IDs get `llama-swap/`; slash-containing IDs pass through unchanged | corroborated (validator-verified) |
+| A65 | Model/settings persistence | `apps/server/src/schema.sql:20,217-222,249`; `routes/settings.ts` | n/a | codebase | sessions.model NOT NULL, chats.model nullable, settings KV JSONB seeded with bare default_model | corroborated |
+| A66 | Model context service | `apps/server/src/services/model-context.ts:9,26-29,40-49,77-100` | n/a | codebase | No-TTL positive cache keyed by raw model string; deepseek- guard returns static 131k; /upstream URL from single config | corroborated (validator-verified) |
+| A67 | Compaction LLM calls | `apps/server/src/services/compaction.ts:351-357,531` | n/a | codebase | Summarization via resolveModelEndpoint → always LLAMA_SWAP_URL | corroborated (validator-verified) |
+| A68 | Task model service | `apps/server/src/services/task-model.ts:59-68` | n/a | codebase | FAST_MODEL fallback chain against single endpoint (TASK_MODEL_URL escape hatch) | corroborated |
+| A69 | Arena model calls | `apps/coder/src/services/arena-model-call.ts:16-28`; `arena-analyzer.ts:90` | n/a | codebase | Raw fetch to LLAMA_SWAP_URL, no routing abstraction | corroborated (validator-verified) |
+| A70 | Coder composer prefs | `apps/web/src/components/AgentComposerBar.tsx:33-52,118-196` | n/a | codebase | CompactPicker flat lists; prefs in localStorage `boocode.coder.agent-prefs` | corroborated |
+| A71 | Model display naming | `apps/web/src/lib/modelName.ts:6-32`; `MessageBubble.tsx:140-189` | n/a | codebase | Display chips already strip `llama-swap/`-style prefixes | corroborated |
+| A72 | Coder provider config file | `data/coder-providers.example.json` | n/a | codebase | Per-provider overrides exist; no baseUrl field — second endpoint unregistrable today | corroborated |
+| A73 | Openspec conventions | `openspec/README.md` | n/a | codebase | changes/<slug>/{proposal,tasks,design}.md; lowercase-hyphenated slugs | corroborated (read directly) |
+| A74 | Sidecar architecture notes | `apps/server/CLAUDE.md` (sidecar sections); `/opt/forks/llama-sidecar/` | n/a | codebase | llama-sidecar = Go per-agent llama-server pool on Sam-desktop; X-Agent-Flags header; boot guard ties llama_extra_args to LLAMA_SIDECAR_URL | corroborated by A60 |
+
+### A54/A55: Live host inventories — recommendation-bearing
+
+- **Link / location:** provided: orchestrator-run `curl` against `http://100.90.172.55:8411` and `http://100.101.41.16:8401` (`/v1/models`, `/running`)
+- **Retrieved:** 2026-06-10
+- **Trust class:** provided (operator-owned infrastructure, independently re-checkable with the same commands)
+- **Summary:** embedding serves 39 mostly-small models; Sam-desktop serves 21 mostly-large models. Five IDs (`granite-4.1-8b`, `negentropy-4.7-9b`, `qwen3.5-9b`, `qwen3.5-9b-deepseek-v4`, `qwopus3.5-9b-coder`) appear on both — making composite keying mandatory, not stylistic. The configured `DEFAULT_MODEL` is absent from Sam-desktop's live list, proving ID churn. embedding's `deepseek-r1-qwen3-8b` collides with the `deepseek-` cloud-routing heuristic. Neither host populates llama-swap's optional `name`/`description` fields, so the UI must derive labels from IDs (as `formatModelLabel` already does).
+- **Evidence status:** corroborated by A56/A57 (SSH-level configs match the served lists).
+
+### A60: `provider.ts` routing — recommendation-bearing
+
+- **Link / location:** `apps/server/src/services/inference/provider.ts:90-157`
+- **Retrieved:** n/a
+- **Trust class:** codebase (current-state anchor)
+- **Summary:** The single point where all three routes (deepseek/sidecar/swap) resolve. Establishes that (a) BooCode already builds per-baseURL AI-SDK providers from a cache map — O1 slots into this with minimal new machinery; (b) the sidecar is the default route for everything when configured, which forces constraint 4; (c) `resolveModelEndpoint` is a second, parallel resolution path (compaction/task-model) that must change in lockstep.
+- **Evidence status:** corroborated (read directly by orchestrator and validator).
+
+### A13/A14: AI SDK provider registry pattern — recommendation-bearing
+
+- **Link / location:** https://ai-sdk.dev/docs/ai-sdk-core/provider-management ; https://ai-sdk.dev/providers/openai-compatible-providers
+- **Retrieved:** 2026-06-10
+- **Trust class:** web
+- **Summary:** The library BooCode already uses prescribes exactly O1's shape: one named `createOpenAICompatible` instance per provider, registry-level `provider:model` namespacing, bare model IDs on the wire. Adopting O1 is convergence with the upstream idiom rather than a custom scheme.
+- **Evidence status:** corroborated (two official doc pages, consistent with LiteLLM's independent design A15/A16).
+
+### A45: VS Code model picker docs — recommendation-bearing (UX)
+
+- **Link / location:** https://code.visualstudio.com/docs/agent-customization/language-models
+- **Retrieved:** 2026-06-10
+- **Trust class:** web
+- **Summary:** Documents the shipped pattern this feature's dropdown adapts: provider-grouped list, hover-revealed pin, dedicated Pinned top section in stable insertion order, pinned models remaining in their provider group.
+- **Evidence status:** corroborated by A46; code-level detail treated as color per V8.
+
+### A23/A27: Open WebUI pitfalls — recommendation-bearing (counter-evidence)
+
+- **Link / location:** https://github.com/open-webui/open-webui/discussions/23656 ; https://github.com/open-webui/open-webui/discussions/4495
+- **Retrieved:** 2026-06-10
+- **Trust class:** web
+- **Summary:** The two documented failure modes the design must avoid: bare-model-ID favorites becoming ambiguous across connections, and stale-favorite cleanup permanently destroying user preferences during transient backend downtime.
+- **Evidence status:** corroborated by A21/A22/A53 (the surrounding docs and a second stale-state issue).