Files
boocode/docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

296 lines
42 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Research: Integrating two named llama-swap providers ("Sam-desktop", "embedding") with provider-grouped model dropdowns and per-model favorites in BooChat and BooCoder
Question: BooCode currently talks to exactly one llama-swap endpoint. How should a second named provider ("embedding", `100.90.172.55:8411`) be added alongside the renamed existing one ("Sam-desktop", `100.101.41.16:8401`), integrated into both BooChat and BooCoder, with the model dropdown grouped per provider and a favorite button per model (Favorites section listed first)?
Evidence mode: **strict** (default — every recommendation-bearing claim is corroborated or explicitly caveated).
## Summary
Both machines can be added to BooCode as named providers, and the right way is to give BooCode a small provider registry (a name and base URL per machine) and to store selected models as a "provider/model" pair instead of a bare name. Bare names cannot work here: five models exist on both machines under identical names today, and the configured default model has already drifted out of the live list once — so favorites and routing keyed by name alone would be ambiguous and fragile. The dropdown should follow the pattern proven in VS Code's model picker: a Favorites section on top, then one section per provider (Sam-desktop first, then embedding), a star on every row, favorited models staying visible in their provider section, and favorites that are hidden — never deleted — when a machine is offline.
The adversarial validation pass confirmed the direction but showed the change is wider than the obvious spots: chat compaction, context-window lookup, arena battles, the coder's opencode dispatch, and the sidecar routing default all silently assume a single endpoint and need the same provider-resolution change. Two extra hazards were found in the live data: a model on the embedding host literally named `deepseek-r1-qwen3-8b` trips BooCode's "starts with deepseek-" cloud-routing heuristic, and the always-on sidecar default route would swallow embedding-bound requests. The embedding host does **not** need its own llama-sidecar — but sidecar routing must become a Sam-desktop-only attribute.
Well-corroborated: live data from both hosts, direct code evidence, and multiple independent web sources agree; validation expanded the implementation scope but did not overturn the choice.
- **Confidence:** High
## Research Results
### What exists today (codebase — current-state anchor)
BooCode's entire inference surface assumes one llama-swap endpoint, configured as `LLAMA_SWAP_URL=http://100.101.41.16:8401` with `DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4` (A58). The single-endpoint assumption is hard-coded in at least nine places:
1. `GET /api/models` fetches only `{LLAMA_SWAP_URL}/v1/models` (plus DeepSeek cloud when `DEEPSEEK_API_KEY` is set) and returns a flat `ModelInfo[]` with no provider tag (A59).
2. `upstreamModel()` routes by string heuristics: model IDs starting `deepseek-` go to the DeepSeek cloud API; agents with `llama_extra_args` go to the sidecar; **and when `LLAMA_SIDECAR_URL` is configured at all — which it is in docker-compose — every remaining request routes through the sidecar by default**, falling back to llama-swap only when no sidecar is configured (A60). The provider for each base URL is a cached AI-SDK `createOpenAICompatible` instance.
3. `resolveModelEndpoint()` (used by compaction and task-model for non-streaming calls) returns `LLAMA_SWAP_URL` for every non-DeepSeek model (A60, A67).
4. `model-context.ts` fetches `{LLAMA_SWAP_URL}/upstream/<model>/props` for context windows, with a **no-TTL positive cache keyed by the raw model string**, and a `deepseek-` prefix guard that short-circuits to a static 131,072 context without calling any upstream (A66).
5. `task-model.ts` (auto-naming, summaries) falls back through `FAST_MODEL → chat model → DEFAULT_MODEL` against the single URL (A68).
6. Arena battles call `{LLAMA_SWAP_URL}/v1/chat/completions` directly with no routing abstraction at all (A69).
7. The coder's provider snapshot fetches the single llama-swap list and prefixes every ID with `llama-swap/` (A63); its dispatcher prefixes any bare (slash-less) model ID with `llama-swap/` before opencode dispatch, and passes any ID already containing `/` through unchanged (A64).
8. Model IDs persist as bare strings: `sessions.model TEXT NOT NULL`, `chats.model TEXT` nullable, validated only as a 1200-char string (A65).
9. The BooChat dropdown (`ModelPicker.tsx`) and the BooCoder picker (`CompactPicker` inside `AgentComposerBar.tsx`) are flat lists with no grouping, search, or favorites; the coder picker persists per-provider preferences in browser localStorage, while BooChat model choice is server-persisted on the session row (A61, A70). Display code already strips `llama-swap/`-style prefixes when rendering model chips (A71). No favorites/pinning mechanism exists anywhere; the `settings` table is a key-value JSONB store currently holding `default_model` and theme keys (A65).
The coder's runtime provider config (`data/coder-providers.json`) has no `baseUrl` field — there is no way to register a second llama-swap endpoint today (A72).
### What the two hosts actually serve (provided material, retrieved live 2026-06-10)
- **embedding** (`100.90.172.55:8411`, Linux, P104-100 8GB Pascal GPU): 39 models, skewed small — gemma-3-270m through gemma-4-12b, the LFM2.5 family, granite-4.1-3b/8b, qwen3.5-0.8b/4b/9b, qwopus3.5 family, `deepseek-r1-qwen3-8b`, a reranker, extraction models (A54). Its llama-swap config is hand-tuned per model (flash-attn/KV-quant choices for Pascal, ttl 1800), with llama.cpp built from source on the box (A56).
- **Sam-desktop** (`100.101.41.16:8401`, Windows): 21 models, skewed large — qwen3.6-35b-a3b/27b, qwopus3.6 family, granite-4.1-30b, mellum2-12b, nemotron-cascade-2-30b-a3b, north-mini-code, etc. Served by `D:\llama-server` (llama.cpp CUDA build b9591) behind `D:\llama-swap` (llama-swap v224), models in `D:\models`; a `D:\llama-sidecar` directory backs the existing sidecar at `:8402` (A55, A57).
Three load-bearing facts fall out of the live inventories:
- **Five model IDs exist on both hosts**: `granite-4.1-8b`, `negentropy-4.7-9b`, `qwen3.5-9b`, `qwen3.5-9b-deepseek-v4`, `qwopus3.5-9b-coder` (A54, A55). Bare-ID favorites or routing are therefore ambiguous from day one.
- **The configured `DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4` is not in Sam-desktop's current model list** (closest: `qwen3.6-35b-a3b`) — model IDs already churn in practice, so favorites must tolerate stale references (A55, A58).
- **`deepseek-r1-qwen3-8b` on the embedding host collides with BooCode's `deepseek-` heuristics**: with `DEEPSEEK_API_KEY` set it would be routed to the DeepSeek cloud API, and the context-window guard returns a fake 131k context on the name prefix alone regardless (A54, A60, A66).
### How llama-swap identifies models (web, corroborated)
llama-swap model IDs are exactly the YAML keys in its `config.yaml`; `/v1/models` can additionally carry optional per-model `name`, `description`, and arbitrary `metadata` from config — fields neither of Sam's hosts currently populates (A1A4, A54, A55). llama-swap has **no instance-identity field**: two instances are distinguishable only by host:port (A3). `/running` reports load state per model (A1, A12). Peer federation exists (one llama-swap aggregating another), but peer-served models surface as `"peer-name: model-name"` IDs [single-source: A6] and same-ID collisions resolve silently to the lexicographically-first peer (A5) — and, decisive without any web source, BooCode would still see one flat list with no native grouping while the two hosts' uptime becomes coupled. Standalone llama.cpp `llama-server` defaults its `/v1/models` ID to the model file path unless `--alias` is set (A8, A9) — relevant only if a host ever bypasses llama-swap.
### How mature clients solve exactly this (web, corroborated)
Every major OpenAI-compatible client library handles multiple same-protocol providers with **separate named provider instances, each with its own baseURL, namespaced in the client's registry as `provider:model` / `provider/model`** — the model ID actually sent on the wire to each backend stays the bare upstream ID (Vercel AI SDK provider registry: A13, A14; LiteLLM model_list: A15, A16). BooCode already uses the AI SDK's `createOpenAICompatible` (A60) and the coder already namespaces with a `llama-swap/` prefix (A63, A64), so this pattern is an extension of existing conventions, not a new idiom.
### Dropdown + favorites prior art (web)
The closest shipped implementation of the requested UX is VS Code's model picker: models grouped by provider, a pin icon revealed on hover, pinned models lifted into a dedicated top section in stable insertion order, **while remaining visible in their provider group** (display copy, not move) (A45, A46). Cherry Studio independently demonstrates the key-collision lesson: its model identity is the composite `{id, provider}` precisely so two providers serving the same model name don't collide (A35, A36) [third-party code reference; unverifiable from here — supporting color only, see V8]. Open WebUI documents the two pitfalls to avoid: favorites keyed by bare model ID become ambiguous the moment two connections serve the same name (A27), and its stale-pin cleanup **permanently deletes** pins when a backend is temporarily down (A23) — the correct behavior is to hide unavailable favorites and restore them when the host returns. LibreChat groups via admin-configured YAML and added pinning in v0.8.5 (A28, A29). Jan, Chatbox, SillyTavern, Continue.dev, BigAGI, and LM Studio offer weaker or no equivalents (A32A34, A38A44, A47A52) — none contradicts the VS Code pattern.
### Does embedding need a llama-sidecar? No.
The llama-sidecar is a Go daemon on Sam-desktop providing a per-agent llama-server process pool so agents can carry `llama_extra_args` (cache quant, spec decoding, slot save) injected via an `X-Agent-Flags` header (A60, A74). The embedding host needs none of that: its per-model tuning is baked directly into its llama-swap `config.yaml` (A56), and no per-agent flag injection applies to it. **However**, `resolveRoute` currently makes the sidecar the default route for *all* non-DeepSeek inference whenever `LLAMA_SIDECAR_URL` is set (A60) — so under the multi-provider design, sidecar routing must become an attribute of the Sam-desktop provider entry (e.g. optional `sidecarUrl` per provider), not a global default; otherwise requests for embedding-hosted models would be sent to a sidecar that only manages Sam-desktop processes.
### Openspec conventions for the follow-up plan (codebase)
Per-batch docs land in `openspec/changes/<slug>/` with `proposal.md` (why + scope), `tasks.md` (numbered/checkbox action list), and optional `design.md` (architecture/data-model decisions); slugs are lowercase-hyphenated from the batch title (A73). This feature is a natural three-file batch — the provider registry + routing is design-heavy, so `design.md` is warranted.
## Options to Consider
### O1: Named provider registry with composite model IDs (`<provider>/<model>`)
- **What it is:** BooCode config gains a provider list (`{ name, baseUrl, sidecarUrl? }` per entry — "sam-desktop" and "embedding"). Models are stored and selected as `sam-desktop/qwen3.6-35b-a3b`, `embedding/gemma-4-12b`. `/api/models` returns provider-tagged groups; one routing resolver (provider prefix → baseURL, bare wire ID) replaces every `LLAMA_SWAP_URL` hardcode; bare legacy IDs fall back to the default provider (sam-desktop). Favorites, caches, and attribution all key on the composite ID.
- **Trade-offs:** Touches every call site that assumes one endpoint (the nine sites above — see Validation for the full list); needs a deliberate legacy-bare-ID fallback for existing session/chat rows and the seeded `default_model`; the coder's opencode namespace (`llama-swap/`) needs an explicit translation rule. In exchange: no DB schema change for model columns, no llama-swap config changes on either host, matches the AI-SDK idiom BooCode already uses and the coder's existing prefix convention, and makes the `deepseek-` heuristic unnecessary for prefixed IDs.
- **Rests on:** (A13, A14, A15, A16) for the pattern; (A54, A55) for the collision necessity; (A60, A63, A64) for fit with existing code.
- **Evidence status:** corroborated.
### O2: Bare model IDs plus a separate `provider` field everywhere
- **What it is:** Keep model strings as-is and add a `provider` column/field through `sessions`, `chats`, WS frames, `ModelInfo`, `ProviderModel`, and every read path.
- **Trade-offs:** Avoids string munging and display-time prefix stripping, but is strictly more invasive: two schema migrations, a `WsFrameSchema` change rebuilt through `@boocode/contracts`, and every consumer updated in lockstep — while favorites still need a composite key anyway. Higher blast radius for the same outcome.
- **Rests on:** (A65, A62) for the touched surfaces.
- **Evidence status:** corroborated (codebase-derived).
### O3: llama-swap peer federation (Sam-desktop aggregates embedding as a peer)
- **What it is:** Configure embedding as a `peers:` entry in Sam-desktop's llama-swap; BooCode keeps a single endpoint.
- **Trade-offs:** Rejected on codebase-observable grounds: BooCode would still see one flat list (no native named grouping — the feature's whole point), the two hosts' availability becomes coupled, and it requires operational changes on a host outside this repo. Additionally, peer-served model IDs surface as `"peer-name: model-name"` [single-source: A6] with silent first-lexicographic collision resolution (A5).
- **Rests on:** (A5, A6) plus codebase observation (A59, A61).
- **Evidence status:** rejection corroborated by codebase facts; the peer ID-format detail is single-source (caveated) and not load-bearing.
### O4: External aggregator proxy (LiteLLM) in front of both hosts
- **What it is:** A LiteLLM proxy with a `model_list` mapping unique aliases to each host; BooCode keeps one endpoint.
- **Trade-offs:** Proven pattern (A15, A16) but adds a third always-on service with a manually-maintained catalog (no auto-discovery from `/v1/models`), an extra network hop, and still no provider grouping signal unless encoded in alias naming conventions. Overweight for a single-user self-hosted system.
- **Rests on:** (A15, A16).
- **Evidence status:** corroborated.
### Sub-decision — favorites persistence
- **O5a: Server-side, in the `settings` table** (e.g. `favorite_models: string[]` of composite IDs). Survives browsers/devices — and multi-device use is real here (the repo's own docs describe side-by-side iPhone debugging), matching how BooChat model choice is already server-persisted on the session row. Costs a PATCH per star toggle and needs a "hide stale, never delete" rule (A23) plus acceptance that stale composite keys linger until manually unfavorited.
- **O5b: Browser localStorage**, extending the coder's existing `boocode.coder.agent-prefs` pattern (A70). Zero API surface, but per-device, per-browser, and split across the two UIs.
- **Evidence status:** both corroborated; the cross-device argument for O5a is codebase-derived inference from documented usage, not a measured requirement.
## Recommendation
- **Recommendation:** **O1** — named provider registry with `<provider>/<model>` composite IDs — combined with the VS Code-pattern dropdown (Favorites on top in stable insertion order, then Sam-desktop's models, then embedding's; star toggle per row; favorited models remain listed in their provider group) and **O5a** server-side favorites keyed by composite ID. Non-negotiable design constraints carried in from validation:
1. Prefix-strip **only** at wire-URL construction; caches (notably `model-context.ts`'s no-TTL positive cache) key on the **full composite ID**, or the five name-collided models cross-pollute context windows between hosts (V7).
2. The coder dispatcher must translate composite prefixes for opencode (map the default provider to the existing `llama-swap/` namespace, or register new opencode providers) — the current pass-through of any slash-containing ID would hand opencode an unknown provider key (V1).
3. Every single-endpoint call site is in scope: `provider.ts` (`upstreamModel` + `resolveModelEndpoint`), `models.ts`, `model-context.ts` (including its `deepseek-` static-context guard), `compaction.ts`, `task-model.ts`, `arena-model-call.ts` (+ arena callers, coder-side config), coder `provider-snapshot.ts`, coder `dispatcher.ts` (V2V4, V9).
4. Sidecar routing becomes a Sam-desktop provider attribute, not the global default route — embedding needs no sidecar (A60, A74; post-validation verification).
5. Bare legacy IDs (existing rows, seeded `default_model`) resolve to the default provider indefinitely — new sessions inherit a bare seeded default until settings are migrated, so this is a permanent fallback, not a one-time migration (V2).
6. Favorites that reference unavailable models are hidden, never auto-deleted (A23).
- **Evidence basis:** The option choice rests on corroborated evidence throughout: the multi-provider client pattern (A13A16), the live collision and churn data from both hosts (A54, A55, A58 — provided material, independently re-checkable), and codebase fit (A60, A63, A64). The UX pattern rests on corroborated documentation (A45, A46) with the Open WebUI pitfalls as corroborated counter-evidence (A23, A27); the Cherry Studio and VS Code *code-level* references are unverifiable third-party color (V8) and nothing rests on them alone. The single-source peer-ID format (A6) supports only the rejection of O3, which stands independently on codebase facts. The cross-device justification for O5a is codebase-derived inference (documented multi-device usage), explicitly not measured evidence.
## Validation
Adversarial validation attacked the evidence, framing, recommendation, and gathering integrity. Findings (condensed; all code-verified by the validator in this repo):
### V1: "O1 extends the coder's prefix convention" was overstated
- **Strategy:** Challenge the Recommendation
- **Investigation:** `dispatcher.ts:1006-1011`, coder CLAUDE.md, `provider-snapshot.ts:66-72`.
- **Result:** Refuted as originally framed — a stored `sam-desktop/<model>` passes the dispatcher's slash-check unchanged and reaches opencode as an unknown provider key; `llama-swap/` is hardcoded in ≥4 coder locations.
- **Impact:** Recommendation now mandates an explicit opencode namespace-translation rule (constraint 2).
### V2: The bare-ID legacy fallback was asserted, not designed
- **Strategy:** Challenge the Recommendation
- **Investigation:** `provider.ts:115-135`, `stream-phase.ts:110`, `sessions.ts:113-117`, `schema.sql:222`, `model-context.ts:77`.
- **Result:** Partially refuted — architecturally plausible but unimplemented; prefixed IDs would 404 the `/upstream/<model>/props` fetch and break context/compaction display; the seeded bare `default_model` makes the fallback permanent, not migratory.
- **Impact:** Constraints 1, 3, 5 added.
### V3: The `deepseek-` hazard is wider than routing
- **Strategy:** Challenge the Evidence
- **Investigation:** `model-context.ts:40-49`, `provider.ts:98`, `compaction.ts:531`.
- **Result:** Confirmed with added scope — the context guard fires on the name prefix alone, returning a fake 131k context for embedding's `deepseek-r1-qwen3-8b` even after routing is fixed.
- **Impact:** `model-context.ts` guard added to the touch-list (constraint 3).
### V4: `compaction.ts` is a missed hardcode site
- **Strategy:** Challenge the Evidence
- **Investigation:** `compaction.ts:351-357``resolveModelEndpoint` (`provider.ts:139-157`).
- **Result:** Refuted the original C9 list as incomplete — compaction summarization calls would go to the wrong host for embedding models.
- **Impact:** Added to the touch-list (A67, constraint 3).
### V5: Server-side favorites needed justification against the coder's localStorage pattern
- **Strategy:** Challenge the Assumptions
- **Investigation:** `AgentComposerBar.tsx:33-52`, `routes/settings.ts`, root CLAUDE.md auth model.
- **Result:** Partially refuted — the Open WebUI bug distinguishes auto-delete vs hide, not server vs client storage; the original justification conflated the two.
- **Impact:** O5a/O5b reframed as an explicit sub-decision; O5a retained on the cross-device argument, labeled as inference.
### V6: O3's rejection over-relied on a single-source claim
- **Strategy:** Challenge the Evidence-Gathering Integrity
- **Result:** Confirmed with a provenance note — O3 is independently rejectable from codebase facts; the stale GitHub issue is demoted to supporting color.
- **Impact:** O3 rejection rewritten to lead with codebase-observable reasons.
### V7: Composite IDs + naive prefix-stripping would poison the no-TTL context cache
- **Strategy:** Challenge the Recommendation
- **Investigation:** `model-context.ts:9, 26-29, 77-100`; the five cross-host duplicate IDs.
- **Result:** Refuted the unstated design — stripping before the cache key shares entries across providers with different real context windows, permanently until restart.
- **Impact:** Constraint 1 (composite cache key, strip only at URL construction) — the most subtle required design rule.
### V8: Third-party code references (Cherry Studio, VS Code PR) are unverifiable
- **Strategy:** Challenge the Evidence-Gathering Integrity
- **Result:** Partially refuted their evidentiary weight — retained as color; the composite-key argument stands on BooCode's own conventions and the live collision data.
- **Impact:** Evidence basis re-worded; nothing rests on those references alone.
### V9: Arena is the most exposed hardcode
- **Strategy:** Challenge the Evidence
- **Investigation:** `arena-model-call.ts:16-28`, `arena-analyzer.ts:90`.
- **Result:** Confirmed with elevated severity — raw fetch, no abstraction, lives in `apps/coder` with its own config type (cannot reuse the server's resolver as-is).
- **Impact:** Listed as separate coder-side scope (constraint 3).
### Adjustments Made
The recommendation survived but was rewritten: the implementation constraints (composite cache keys, opencode namespace translation, the full nine-site touch-list, permanent bare-ID fallback, hidden-not-deleted favorites) were folded into the Recommendation itself; O3's rejection was re-grounded in codebase facts; the favorites-persistence choice was reframed as an explicit sub-decision; unverifiable third-party code references were demoted to supporting color. Post-validation, the orchestrator additionally verified in `provider.ts` that the sidecar is the *default* route whenever `LLAMA_SIDECAR_URL` is set — adding constraint 4 (sidecar becomes a per-provider attribute; embedding needs none).
### Confidence Assessment
- **Confidence:** High — for the option choice. The validator rated the pre-adjustment synthesis Medium because the implementation scope was understated; that scope is now enumerated above, and no finding challenged the direction (its own words: "architecturally sound given the existing `llama-swap/` convention").
- **Remaining Risks:** (1) The opencode-side translation (V1) may also require host-side `~/.config/opencode/opencode.json` changes — outside this repo. (2) Stale favorite keys accumulate in `settings` with no cleanup mechanism by design (hide-don't-delete); acceptable for single-user but unbounded. (3) The exact `/running` JSON envelope and llama-swap peer aggregation details remain single-source — neither is load-bearing. (4) The five duplicate-ID models make any partial rollout (one call site migrated, another not) actively dangerous; the routing resolver should land as one batch.
## Sources
| ID | Source | Link / location | Retrieved | Trust class | Summary (one line) | Evidence status |
|---|---|---|---|---|---|---|
| A1 | llama-swap README | github.com/mostlygeek/llama-swap | 2026-06-10 | web | Proxy hot-swapping local inference servers; documents /v1/models, /running, /upstream, /health; v224 current | corroborated by A2, A3, A12 |
| A2 | llama-swap configuration.md | github.com/mostlygeek/llama-swap/blob/main/docs/configuration.md | 2026-06-10 | web | Model IDs are YAML keys; per-model name/description/aliases/metadata/ttl/useModelName; includeAliasesInList | corroborated by A3, A4 |
| A3 | llama-swap config-schema.json | github.com/mostlygeek/llama-swap/blob/main/config-schema.json | 2026-06-10 | web | Authoritative config schema; peers section; **no instance-identity field at any level** | corroborated by A2, A4 |
| A4 | llama-swap config.example.yaml | github.com/mostlygeek/llama-swap/blob/main/config.example.yaml | 2026-06-10 | web | Annotated example: aliases, useModelName, metadata, groups, peers | corroborated by A2, A3 |
| A5 | DeepWiki: llama-swap peers | deepwiki.com/mostlygeek/llama-swap/3.7-peer-configuration | 2026-06-10 | web | Duplicate peer model IDs route to first-lexicographic peer with only a warning | corroborated by A6 (collision); single source on aggregation detail |
| A6 | llama-swap issue #539 | github.com/mostlygeek/llama-swap/issues/539 | 2026-06-10 | web | Peer models surface as "peer-name: model-name" IDs; stale, unresolved | single source (caveated) |
| A7 | llama-swap issue #538 | github.com/mostlygeek/llama-swap/issues/538 | 2026-06-10 | web | Aliases hidden from /v1/models unless includeAliasesInList | corroborated by A2, A3 |
| A8 | llama.cpp server README | github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md | 2026-06-10 | web | /v1/models id defaults to file path; --alias overrides; meta block fields | corroborated by A9, A10 |
| A9 | llama.cpp discussion #8547 | github.com/ggml-org/llama.cpp/discussions/8547 | 2026-06-10 | web | Confirms file-path default id; --override-kv doesn't change API id | corroborated by A8 |
| A10 | llama.cpp issue #17860 | github.com/ggml-org/llama.cpp/issues/17860 | 2026-06-10 | web | Only one --alias per llama-server today | corroborated by A8 |
| A11 | LM4eu/llama-swap Go pkg docs | pkg.go.dev/github.com/LM4eu/llama-swap/proxy | 2026-06-10 | web | Model struct {Id, Name, Description, State, Unlisted}; fork, not upstream | single source (caveated) |
| A12 | glukhov.org llama-swap quickstart | glukhov.org/llm-hosting/llama-swap/ | 2026-06-10 | web | /running state values; alias listing behavior | corroborated by A1, A2 |
| A13 | Vercel AI SDK provider management | ai-sdk.dev/docs/ai-sdk-core/provider-management | 2026-06-10 | web | Registry namespaces models as providerId:modelId; per-provider baseURL | corroborated by A14 |
| A14 | Vercel AI SDK OpenAI-compatible providers | ai-sdk.dev/providers/openai-compatible-providers | 2026-06-10 | web | createOpenAICompatible takes name+baseURL per provider; wire model ID unchanged | corroborated by A13 |
| A15 | LiteLLM OpenAI-compatible docs | docs.litellm.ai/docs/providers/openai_compatible | 2026-06-10 | web | Per-entry api_base; aliasing decouples client name from upstream name | corroborated by A16 |
| A16 | McDermott: Centralizing LLMs with LiteLLM | robert-mcdermott.medium.com/...9874563f3062 | 2026-06-10 | web | model_list with unique model_name per upstream resolves collisions | corroborated by A15 |
| A17 | DeepWiki: llama-swap groups | deepwiki.com/mostlygeek/llama-swap/3.4-groups-and-swapping-policies | 2026-06-10 | web | Groups/matrix control concurrency, not model IDs | corroborated by A2A4 |
| A18 | llama-swap releases | github.com/mostlygeek/llama-swap/releases | 2026-06-10 | web | v219v224 changed routing/perf, not /v1/models schema | single source (caveated) |
| A19 | Open WebUI discussion #3443 | github.com/open-webui/open-webui/discussions/3443 | 2026-06-10 | web | Pin-in-dropdown feature request; drag-reorder workaround breaks | corroborated by A21, A23 |
| A20 | Open WebUI discussion #5902 | github.com/open-webui/open-webui/discussions/5902 | 2026-06-10 | web | Filtering 70+ models; whitelist vs hide patterns | corroborated by A19 |
| A21 | Open WebUI env config reference | docs.openwebui.com/reference/env-configuration/ | 2026-06-10 | web | DEFAULT_PINNED_MODELS; settings.pinnedModels sorts pinned to top | corroborated by A22, A23 |
| A22 | Open WebUI database schema | docs.openwebui.com/reference/database-schema/ | 2026-06-10 | web | Pins live in user.settings JSON, keyed by **bare model ID** | corroborated by A21 |
| A23 | Open WebUI discussion #23656 | github.com/open-webui/open-webui/discussions/23656 | 2026-06-10 | web | Stale-pin cleanup permanently deletes pins during backend downtime | corroborated by A21, A53 |
| A24 | Open WebUI discussion #14854 | github.com/open-webui/open-webui/discussions/14854 | 2026-06-10 | web | Unpin buried in three-dot menu; discoverability failure | corroborated by A21 |
| A25 | Open WebUI issue #19183 | github.com/open-webui/open-webui/issues/19183 | 2026-06-10 | web | Local/External/All tabs + tag chips + Fuse.js search in selector | corroborated by A26 |
| A26 | Open WebUI discussion #21502 | github.com/open-webui/open-webui/discussions/21502 | 2026-06-10 | web | Flat select unusable at OpenRouter scale; optgroup/search proposals | corroborated by A25 |
| A27 | Open WebUI discussion #4495 | github.com/open-webui/open-webui/discussions/4495 | 2026-06-10 | web | Same-named models from two connections are indistinguishable (bare-ID failure) | corroborated by A25, A26 |
| A28 | LibreChat model specs docs | librechat.ai/docs/configuration/librechat_yaml/object_structure/model_specs | 2026-06-10 | web | Admin YAML `group` field creates named collapsible sections | corroborated by A29 |
| A29 | LibreChat v0.8.5 changelogs | librechat.ai/changelog/v0.8.5 | 2026-06-10 | web | Pin support for model specs added (PR #11219) | corroborated by A30; persistence detail single-source |
| A30 | LibreChat discussion #11044 | github.com/danny-avila/LibreChat/discussions/11044 | 2026-06-10 | web | Pinning exists; preset-active confusion | corroborated by A29 |
| A31 | DeepWiki: LibreChat DB models | deepwiki.com/danny-avila/LibreChat/7.1-database-models | 2026-06-10 | web | MongoDB/Mongoose; pinned-spec field name unconfirmed | single source (caveated) |
| A32 | Jan v0.6.9 changelog | jan.ai/changelog/2025-08-28-image-support | 2026-06-10 | web | "Favorite models" shipped; no UI detail | single source (caveated) |
| A33 | Jan manage-models docs | jan.ai/docs/desktop/manage-models | 2026-06-10 | web | Organized by source/quantization tier, not provider | corroborated by A32 |
| A34 | Jan data-folder docs | jan.ai/docs/desktop/data-folder | 2026-06-10 | web | Settings in local JSON files | corroborated by A32 |
| A35 | DeepWiki: Cherry Studio models | deepwiki.com/CherryHQ/cherry-studio/5.3-model-configuration-and-capabilities | 2026-06-10 | web | Provider-grouped UI; getModelUniqId composite {id, provider} | corroborated by A36 (see V8 caveat) |
| A36 | Cherry Studio ModelService.ts | github.com/CherryHQ/cherry-studio/.../ModelService.ts | 2026-06-10 | web | Composite-key implementation | corroborated by A35 (see V8 caveat) |
| A37 | Cherry Studio releases | github.com/CherryHQ/cherry-studio/releases | 2026-06-10 | web | No favorites changes v1.9.1v1.9.11 | single source (caveated) |
| A38 | Chatbox issue #1540 | github.com/chatboxai/chatbox/issues/1540 | 2026-06-10 | web | Favorite-models proposal; not shipped | corroborated by A39 |
| A39 | Chatbox issue #2252 | github.com/chatboxai/chatbox/issues/2252 | 2026-06-10 | web | Two-section dropdown proposal (Preferred on top, star per row) | corroborated by A38 |
| A40 | DeepWiki: Chatbox local models | deepwiki.com/chatboxai/chatbox/4.6-local-model-integration | 2026-06-10 | web | settings.favoritedModels in localStorage | single source (caveated) |
| A41 | SillyTavern PR #5536 | github.com/SillyTavern/SillyTavern/pull/5536 | 2026-06-10 | web | Unified sort/group settings drawer across providers | corroborated by A42 |
| A42 | SillyTavern 1.13.5 notes | github.com/SillyTavern/SillyTavern/discussions/4660 | 2026-06-10 | web | Sort/group shipped in 1.13.5 | corroborated by A41 |
| A43 | SillyTavern connection profiles docs | docs.sillytavern.app/usage/core-concepts/connection-profiles/ | 2026-06-10 | web | Profiles = saved config snapshots, not per-model favorites | corroborated by A44 |
| A44 | SillyTavern issue #4565 | github.com/SillyTavern/SillyTavern/issues/4565 | 2026-06-10 | web | Better model selector request closed not-planned | corroborated by A43 |
| A45 | VS Code language models docs | code.visualstudio.com/docs/agent-customization/language-models | 2026-06-10 | web | Provider groups + hover pin + dedicated Pinned top section, stable order, model stays in group | corroborated by A46 |
| A46 | vscode-copilot-chat PR #1111 | github.com/microsoft/vscode-copilot-chat/pull/1111 | 2026-06-10 | web | BYOK models grouped into a category | corroborated by A45 (see V8 caveat) |
| A47 | Continue.dev model roles docs | docs.continue.dev/customize/model-roles/00-intro | 2026-06-10 | web | Role-based dropdowns; no grouping/favorites | corroborated by A48 |
| A48 | Continue.dev providers overview | docs.continue.dev/customize/model-providers/overview | 2026-06-10 | web | Picker reflects config.yaml order | corroborated by A47 |
| A49 | Open WebUI discussion #15449 | github.com/open-webui/open-webui/discussions/15449 | 2026-06-10 | web | Multi-model combination pinning request | single source (caveated) |
| A50 | BigAGI repo + changelog | github.com/enricoros/big-AGI | 2026-06-10 | web | No grouping/favorites evidence (negative finding) | single source (caveated) |
| A51 | LM Studio v0.4.0 changelog | lmstudio.ai/changelog/lmstudio-v0.4.0 | 2026-06-10 | web | Search/format filters; no favorites | corroborated by A52 |
| A52 | LM Studio v0.4.13 changelog | lmstudio.ai/changelog/lmstudio-v0.4.13 | 2026-06-10 | web | No picker changes | corroborated by A51 |
| A53 | Open WebUI issue #22578 | github.com/open-webui/open-webui/issues/22578 | 2026-06-10 | web | Model enable/disable state goes stale on catalog change | corroborated by A23 |
| A54 | embedding host live inventory | provided: `curl http://100.90.172.55:8411/v1/models` + `/running` | 2026-06-10 | provided | 39 models incl. deepseek-r1-qwen3-8b and 5 IDs duplicated on Sam-desktop; /running empty | corroborated by A56 (config matches) |
| A55 | Sam-desktop live inventory | provided: `curl http://100.101.41.16:8401/v1/models` + `/running` | 2026-06-10 | provided | 21 models; qwen3.6-35b-a3b-mxfp4 absent; nemotron-omni running via D:\llama-server | corroborated by A57 |
| A56 | embedding host SSH inventory | provided: `ssh samkintop@100.90.172.55` (~/llama-swap/config.yaml, ~/llama.cpp, ~/models) | 2026-06-10 | provided | P104-tuned llama-swap config (ttl 1800, per-model llama-server cmds); llama.cpp source build | corroborated by A54 |
| A57 | Sam-desktop SSH inventory | provided: `ssh samki@100.101.41.16` (dir D:\) | 2026-06-10 | provided | D:\llama-server (b9591 CUDA), D:\llama-swap (v224), D:\models, D:\llama-sidecar | corroborated by A55 |
| A58 | Current env config | `.env`, `apps/coder/.env.host` | n/a | codebase | LLAMA_SWAP_URL=http://100.101.41.16:8401; DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (both apps) | corroborated (read directly) |
| A59 | Models route | `apps/server/src/routes/models.ts:14-56` | n/a | codebase | GET /api/models fetches only LLAMA_SWAP_URL (+DeepSeek); flat untagged list | corroborated (read directly) |
| A60 | Inference provider/routing | `apps/server/src/services/inference/provider.ts:1-163` | n/a | codebase | resolveRoute: deepseek- prefix → cloud; LLAMA_SIDECAR_URL set → sidecar default for everything; else single swap; resolveModelEndpoint hardcodes LLAMA_SWAP_URL | corroborated (read directly) |
| A61 | BooChat model picker | `apps/web/src/components/ModelPicker.tsx:14-133` | n/a | codebase | Flat lazy list, no grouping/search/favorites; PATCHes session.model | corroborated (explorer + validator) |
| A62 | Provider snapshot contracts | `packages/contracts/src/provider-snapshot.ts` | n/a | codebase | ProviderModel has no provider field; identity implicit in parent entry name | corroborated |
| A63 | Coder provider snapshot | `apps/coder/src/services/provider-snapshot.ts:48-70,256-310` | n/a | codebase | Prefixes single llama-swap list with `llama-swap/`; merges into boocode entry | corroborated |
| A64 | Coder dispatcher prefixing | `apps/coder/src/services/dispatcher.ts:1006-1011` | n/a | codebase | Bare IDs get `llama-swap/`; slash-containing IDs pass through unchanged | corroborated (validator-verified) |
| A65 | Model/settings persistence | `apps/server/src/schema.sql:20,217-222,249`; `routes/settings.ts` | n/a | codebase | sessions.model NOT NULL, chats.model nullable, settings KV JSONB seeded with bare default_model | corroborated |
| A66 | Model context service | `apps/server/src/services/model-context.ts:9,26-29,40-49,77-100` | n/a | codebase | No-TTL positive cache keyed by raw model string; deepseek- guard returns static 131k; /upstream URL from single config | corroborated (validator-verified) |
| A67 | Compaction LLM calls | `apps/server/src/services/compaction.ts:351-357,531` | n/a | codebase | Summarization via resolveModelEndpoint → always LLAMA_SWAP_URL | corroborated (validator-verified) |
| A68 | Task model service | `apps/server/src/services/task-model.ts:59-68` | n/a | codebase | FAST_MODEL fallback chain against single endpoint (TASK_MODEL_URL escape hatch) | corroborated |
| A69 | Arena model calls | `apps/coder/src/services/arena-model-call.ts:16-28`; `arena-analyzer.ts:90` | n/a | codebase | Raw fetch to LLAMA_SWAP_URL, no routing abstraction | corroborated (validator-verified) |
| A70 | Coder composer prefs | `apps/web/src/components/AgentComposerBar.tsx:33-52,118-196` | n/a | codebase | CompactPicker flat lists; prefs in localStorage `boocode.coder.agent-prefs` | corroborated |
| A71 | Model display naming | `apps/web/src/lib/modelName.ts:6-32`; `MessageBubble.tsx:140-189` | n/a | codebase | Display chips already strip `llama-swap/`-style prefixes | corroborated |
| A72 | Coder provider config file | `data/coder-providers.example.json` | n/a | codebase | Per-provider overrides exist; no baseUrl field — second endpoint unregistrable today | corroborated |
| A73 | Openspec conventions | `openspec/README.md` | n/a | codebase | changes/<slug>/{proposal,tasks,design}.md; lowercase-hyphenated slugs | corroborated (read directly) |
| A74 | Sidecar architecture notes | `apps/server/CLAUDE.md` (sidecar sections); `/opt/forks/llama-sidecar/` | n/a | codebase | llama-sidecar = Go per-agent llama-server pool on Sam-desktop; X-Agent-Flags header; boot guard ties llama_extra_args to LLAMA_SIDECAR_URL | corroborated by A60 |
### A54/A55: Live host inventories — recommendation-bearing
- **Link / location:** provided: orchestrator-run `curl` against `http://100.90.172.55:8411` and `http://100.101.41.16:8401` (`/v1/models`, `/running`)
- **Retrieved:** 2026-06-10
- **Trust class:** provided (operator-owned infrastructure, independently re-checkable with the same commands)
- **Summary:** embedding serves 39 mostly-small models; Sam-desktop serves 21 mostly-large models. Five IDs (`granite-4.1-8b`, `negentropy-4.7-9b`, `qwen3.5-9b`, `qwen3.5-9b-deepseek-v4`, `qwopus3.5-9b-coder`) appear on both — making composite keying mandatory, not stylistic. The configured `DEFAULT_MODEL` is absent from Sam-desktop's live list, proving ID churn. embedding's `deepseek-r1-qwen3-8b` collides with the `deepseek-` cloud-routing heuristic. Neither host populates llama-swap's optional `name`/`description` fields, so the UI must derive labels from IDs (as `formatModelLabel` already does).
- **Evidence status:** corroborated by A56/A57 (SSH-level configs match the served lists).
### A60: `provider.ts` routing — recommendation-bearing
- **Link / location:** `apps/server/src/services/inference/provider.ts:90-157`
- **Retrieved:** n/a
- **Trust class:** codebase (current-state anchor)
- **Summary:** The single point where all three routes (deepseek/sidecar/swap) resolve. Establishes that (a) BooCode already builds per-baseURL AI-SDK providers from a cache map — O1 slots into this with minimal new machinery; (b) the sidecar is the default route for everything when configured, which forces constraint 4; (c) `resolveModelEndpoint` is a second, parallel resolution path (compaction/task-model) that must change in lockstep.
- **Evidence status:** corroborated (read directly by orchestrator and validator).
### A13/A14: AI SDK provider registry pattern — recommendation-bearing
- **Link / location:** https://ai-sdk.dev/docs/ai-sdk-core/provider-management ; https://ai-sdk.dev/providers/openai-compatible-providers
- **Retrieved:** 2026-06-10
- **Trust class:** web
- **Summary:** The library BooCode already uses prescribes exactly O1's shape: one named `createOpenAICompatible` instance per provider, registry-level `provider:model` namespacing, bare model IDs on the wire. Adopting O1 is convergence with the upstream idiom rather than a custom scheme.
- **Evidence status:** corroborated (two official doc pages, consistent with LiteLLM's independent design A15/A16).
### A45: VS Code model picker docs — recommendation-bearing (UX)
- **Link / location:** https://code.visualstudio.com/docs/agent-customization/language-models
- **Retrieved:** 2026-06-10
- **Trust class:** web
- **Summary:** Documents the shipped pattern this feature's dropdown adapts: provider-grouped list, hover-revealed pin, dedicated Pinned top section in stable insertion order, pinned models remaining in their provider group.
- **Evidence status:** corroborated by A46; code-level detail treated as color per V8.
### A23/A27: Open WebUI pitfalls — recommendation-bearing (counter-evidence)
- **Link / location:** https://github.com/open-webui/open-webui/discussions/23656 ; https://github.com/open-webui/open-webui/discussions/4495
- **Retrieved:** 2026-06-10
- **Trust class:** web
- **Summary:** The two documented failure modes the design must avoid: bare-model-ID favorites becoming ambiguous across connections, and stale-favorite cleanup permanently destroying user preferences during transient backend downtime.
- **Evidence status:** corroborated by A21/A22/A53 (the surrounding docs and a second stale-state issue).