Files
boocode/docs/research/2026-06-10-multi-llama-swap-providers-model-favorites.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

42 KiB
Raw Blame History

Research: Integrating two named llama-swap providers ("Sam-desktop", "embedding") with provider-grouped model dropdowns and per-model favorites in BooChat and BooCoder

Question: BooCode currently talks to exactly one llama-swap endpoint. How should a second named provider ("embedding", 100.90.172.55:8411) be added alongside the renamed existing one ("Sam-desktop", 100.101.41.16:8401), integrated into both BooChat and BooCoder, with the model dropdown grouped per provider and a favorite button per model (Favorites section listed first)?

Evidence mode: strict (default — every recommendation-bearing claim is corroborated or explicitly caveated).

Summary

Both machines can be added to BooCode as named providers, and the right way is to give BooCode a small provider registry (a name and base URL per machine) and to store selected models as a "provider/model" pair instead of a bare name. Bare names cannot work here: five models exist on both machines under identical names today, and the configured default model has already drifted out of the live list once — so favorites and routing keyed by name alone would be ambiguous and fragile. The dropdown should follow the pattern proven in VS Code's model picker: a Favorites section on top, then one section per provider (Sam-desktop first, then embedding), a star on every row, favorited models staying visible in their provider section, and favorites that are hidden — never deleted — when a machine is offline.

The adversarial validation pass confirmed the direction but showed the change is wider than the obvious spots: chat compaction, context-window lookup, arena battles, the coder's opencode dispatch, and the sidecar routing default all silently assume a single endpoint and need the same provider-resolution change. Two extra hazards were found in the live data: a model on the embedding host literally named deepseek-r1-qwen3-8b trips BooCode's "starts with deepseek-" cloud-routing heuristic, and the always-on sidecar default route would swallow embedding-bound requests. The embedding host does not need its own llama-sidecar — but sidecar routing must become a Sam-desktop-only attribute.

Well-corroborated: live data from both hosts, direct code evidence, and multiple independent web sources agree; validation expanded the implementation scope but did not overturn the choice.

  • Confidence: High

Research Results

What exists today (codebase — current-state anchor)

BooCode's entire inference surface assumes one llama-swap endpoint, configured as LLAMA_SWAP_URL=http://100.101.41.16:8401 with DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (A58). The single-endpoint assumption is hard-coded in at least nine places:

  1. GET /api/models fetches only {LLAMA_SWAP_URL}/v1/models (plus DeepSeek cloud when DEEPSEEK_API_KEY is set) and returns a flat ModelInfo[] with no provider tag (A59).
  2. upstreamModel() routes by string heuristics: model IDs starting deepseek- go to the DeepSeek cloud API; agents with llama_extra_args go to the sidecar; and when LLAMA_SIDECAR_URL is configured at all — which it is in docker-compose — every remaining request routes through the sidecar by default, falling back to llama-swap only when no sidecar is configured (A60). The provider for each base URL is a cached AI-SDK createOpenAICompatible instance.
  3. resolveModelEndpoint() (used by compaction and task-model for non-streaming calls) returns LLAMA_SWAP_URL for every non-DeepSeek model (A60, A67).
  4. model-context.ts fetches {LLAMA_SWAP_URL}/upstream/<model>/props for context windows, with a no-TTL positive cache keyed by the raw model string, and a deepseek- prefix guard that short-circuits to a static 131,072 context without calling any upstream (A66).
  5. task-model.ts (auto-naming, summaries) falls back through FAST_MODEL → chat model → DEFAULT_MODEL against the single URL (A68).
  6. Arena battles call {LLAMA_SWAP_URL}/v1/chat/completions directly with no routing abstraction at all (A69).
  7. The coder's provider snapshot fetches the single llama-swap list and prefixes every ID with llama-swap/ (A63); its dispatcher prefixes any bare (slash-less) model ID with llama-swap/ before opencode dispatch, and passes any ID already containing / through unchanged (A64).
  8. Model IDs persist as bare strings: sessions.model TEXT NOT NULL, chats.model TEXT nullable, validated only as a 1200-char string (A65).
  9. The BooChat dropdown (ModelPicker.tsx) and the BooCoder picker (CompactPicker inside AgentComposerBar.tsx) are flat lists with no grouping, search, or favorites; the coder picker persists per-provider preferences in browser localStorage, while BooChat model choice is server-persisted on the session row (A61, A70). Display code already strips llama-swap/-style prefixes when rendering model chips (A71). No favorites/pinning mechanism exists anywhere; the settings table is a key-value JSONB store currently holding default_model and theme keys (A65).

The coder's runtime provider config (data/coder-providers.json) has no baseUrl field — there is no way to register a second llama-swap endpoint today (A72).

What the two hosts actually serve (provided material, retrieved live 2026-06-10)

  • embedding (100.90.172.55:8411, Linux, P104-100 8GB Pascal GPU): 39 models, skewed small — gemma-3-270m through gemma-4-12b, the LFM2.5 family, granite-4.1-3b/8b, qwen3.5-0.8b/4b/9b, qwopus3.5 family, deepseek-r1-qwen3-8b, a reranker, extraction models (A54). Its llama-swap config is hand-tuned per model (flash-attn/KV-quant choices for Pascal, ttl 1800), with llama.cpp built from source on the box (A56).
  • Sam-desktop (100.101.41.16:8401, Windows): 21 models, skewed large — qwen3.6-35b-a3b/27b, qwopus3.6 family, granite-4.1-30b, mellum2-12b, nemotron-cascade-2-30b-a3b, north-mini-code, etc. Served by D:\llama-server (llama.cpp CUDA build b9591) behind D:\llama-swap (llama-swap v224), models in D:\models; a D:\llama-sidecar directory backs the existing sidecar at :8402 (A55, A57).

Three load-bearing facts fall out of the live inventories:

  • Five model IDs exist on both hosts: granite-4.1-8b, negentropy-4.7-9b, qwen3.5-9b, qwen3.5-9b-deepseek-v4, qwopus3.5-9b-coder (A54, A55). Bare-ID favorites or routing are therefore ambiguous from day one.
  • The configured DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 is not in Sam-desktop's current model list (closest: qwen3.6-35b-a3b) — model IDs already churn in practice, so favorites must tolerate stale references (A55, A58).
  • deepseek-r1-qwen3-8b on the embedding host collides with BooCode's deepseek- heuristics: with DEEPSEEK_API_KEY set it would be routed to the DeepSeek cloud API, and the context-window guard returns a fake 131k context on the name prefix alone regardless (A54, A60, A66).

How llama-swap identifies models (web, corroborated)

llama-swap model IDs are exactly the YAML keys in its config.yaml; /v1/models can additionally carry optional per-model name, description, and arbitrary metadata from config — fields neither of Sam's hosts currently populates (A1A4, A54, A55). llama-swap has no instance-identity field: two instances are distinguishable only by host:port (A3). /running reports load state per model (A1, A12). Peer federation exists (one llama-swap aggregating another), but peer-served models surface as "peer-name: model-name" IDs [single-source: A6] and same-ID collisions resolve silently to the lexicographically-first peer (A5) — and, decisive without any web source, BooCode would still see one flat list with no native grouping while the two hosts' uptime becomes coupled. Standalone llama.cpp llama-server defaults its /v1/models ID to the model file path unless --alias is set (A8, A9) — relevant only if a host ever bypasses llama-swap.

How mature clients solve exactly this (web, corroborated)

Every major OpenAI-compatible client library handles multiple same-protocol providers with separate named provider instances, each with its own baseURL, namespaced in the client's registry as provider:model / provider/model — the model ID actually sent on the wire to each backend stays the bare upstream ID (Vercel AI SDK provider registry: A13, A14; LiteLLM model_list: A15, A16). BooCode already uses the AI SDK's createOpenAICompatible (A60) and the coder already namespaces with a llama-swap/ prefix (A63, A64), so this pattern is an extension of existing conventions, not a new idiom.

Dropdown + favorites prior art (web)

The closest shipped implementation of the requested UX is VS Code's model picker: models grouped by provider, a pin icon revealed on hover, pinned models lifted into a dedicated top section in stable insertion order, while remaining visible in their provider group (display copy, not move) (A45, A46). Cherry Studio independently demonstrates the key-collision lesson: its model identity is the composite {id, provider} precisely so two providers serving the same model name don't collide (A35, A36) [third-party code reference; unverifiable from here — supporting color only, see V8]. Open WebUI documents the two pitfalls to avoid: favorites keyed by bare model ID become ambiguous the moment two connections serve the same name (A27), and its stale-pin cleanup permanently deletes pins when a backend is temporarily down (A23) — the correct behavior is to hide unavailable favorites and restore them when the host returns. LibreChat groups via admin-configured YAML and added pinning in v0.8.5 (A28, A29). Jan, Chatbox, SillyTavern, Continue.dev, BigAGI, and LM Studio offer weaker or no equivalents (A32A34, A38A44, A47A52) — none contradicts the VS Code pattern.

Does embedding need a llama-sidecar? No.

The llama-sidecar is a Go daemon on Sam-desktop providing a per-agent llama-server process pool so agents can carry llama_extra_args (cache quant, spec decoding, slot save) injected via an X-Agent-Flags header (A60, A74). The embedding host needs none of that: its per-model tuning is baked directly into its llama-swap config.yaml (A56), and no per-agent flag injection applies to it. However, resolveRoute currently makes the sidecar the default route for all non-DeepSeek inference whenever LLAMA_SIDECAR_URL is set (A60) — so under the multi-provider design, sidecar routing must become an attribute of the Sam-desktop provider entry (e.g. optional sidecarUrl per provider), not a global default; otherwise requests for embedding-hosted models would be sent to a sidecar that only manages Sam-desktop processes.

Openspec conventions for the follow-up plan (codebase)

Per-batch docs land in openspec/changes/<slug>/ with proposal.md (why + scope), tasks.md (numbered/checkbox action list), and optional design.md (architecture/data-model decisions); slugs are lowercase-hyphenated from the batch title (A73). This feature is a natural three-file batch — the provider registry + routing is design-heavy, so design.md is warranted.

Options to Consider

O1: Named provider registry with composite model IDs (<provider>/<model>)

  • What it is: BooCode config gains a provider list ({ name, baseUrl, sidecarUrl? } per entry — "sam-desktop" and "embedding"). Models are stored and selected as sam-desktop/qwen3.6-35b-a3b, embedding/gemma-4-12b. /api/models returns provider-tagged groups; one routing resolver (provider prefix → baseURL, bare wire ID) replaces every LLAMA_SWAP_URL hardcode; bare legacy IDs fall back to the default provider (sam-desktop). Favorites, caches, and attribution all key on the composite ID.
  • Trade-offs: Touches every call site that assumes one endpoint (the nine sites above — see Validation for the full list); needs a deliberate legacy-bare-ID fallback for existing session/chat rows and the seeded default_model; the coder's opencode namespace (llama-swap/) needs an explicit translation rule. In exchange: no DB schema change for model columns, no llama-swap config changes on either host, matches the AI-SDK idiom BooCode already uses and the coder's existing prefix convention, and makes the deepseek- heuristic unnecessary for prefixed IDs.
  • Rests on: (A13, A14, A15, A16) for the pattern; (A54, A55) for the collision necessity; (A60, A63, A64) for fit with existing code.
  • Evidence status: corroborated.

O2: Bare model IDs plus a separate provider field everywhere

  • What it is: Keep model strings as-is and add a provider column/field through sessions, chats, WS frames, ModelInfo, ProviderModel, and every read path.
  • Trade-offs: Avoids string munging and display-time prefix stripping, but is strictly more invasive: two schema migrations, a WsFrameSchema change rebuilt through @boocode/contracts, and every consumer updated in lockstep — while favorites still need a composite key anyway. Higher blast radius for the same outcome.
  • Rests on: (A65, A62) for the touched surfaces.
  • Evidence status: corroborated (codebase-derived).

O3: llama-swap peer federation (Sam-desktop aggregates embedding as a peer)

  • What it is: Configure embedding as a peers: entry in Sam-desktop's llama-swap; BooCode keeps a single endpoint.
  • Trade-offs: Rejected on codebase-observable grounds: BooCode would still see one flat list (no native named grouping — the feature's whole point), the two hosts' availability becomes coupled, and it requires operational changes on a host outside this repo. Additionally, peer-served model IDs surface as "peer-name: model-name" [single-source: A6] with silent first-lexicographic collision resolution (A5).
  • Rests on: (A5, A6) plus codebase observation (A59, A61).
  • Evidence status: rejection corroborated by codebase facts; the peer ID-format detail is single-source (caveated) and not load-bearing.

O4: External aggregator proxy (LiteLLM) in front of both hosts

  • What it is: A LiteLLM proxy with a model_list mapping unique aliases to each host; BooCode keeps one endpoint.
  • Trade-offs: Proven pattern (A15, A16) but adds a third always-on service with a manually-maintained catalog (no auto-discovery from /v1/models), an extra network hop, and still no provider grouping signal unless encoded in alias naming conventions. Overweight for a single-user self-hosted system.
  • Rests on: (A15, A16).
  • Evidence status: corroborated.

Sub-decision — favorites persistence

  • O5a: Server-side, in the settings table (e.g. favorite_models: string[] of composite IDs). Survives browsers/devices — and multi-device use is real here (the repo's own docs describe side-by-side iPhone debugging), matching how BooChat model choice is already server-persisted on the session row. Costs a PATCH per star toggle and needs a "hide stale, never delete" rule (A23) plus acceptance that stale composite keys linger until manually unfavorited.
  • O5b: Browser localStorage, extending the coder's existing boocode.coder.agent-prefs pattern (A70). Zero API surface, but per-device, per-browser, and split across the two UIs.
  • Evidence status: both corroborated; the cross-device argument for O5a is codebase-derived inference from documented usage, not a measured requirement.

Recommendation

  • Recommendation: O1 — named provider registry with <provider>/<model> composite IDs — combined with the VS Code-pattern dropdown (Favorites on top in stable insertion order, then Sam-desktop's models, then embedding's; star toggle per row; favorited models remain listed in their provider group) and O5a server-side favorites keyed by composite ID. Non-negotiable design constraints carried in from validation:
    1. Prefix-strip only at wire-URL construction; caches (notably model-context.ts's no-TTL positive cache) key on the full composite ID, or the five name-collided models cross-pollute context windows between hosts (V7).
    2. The coder dispatcher must translate composite prefixes for opencode (map the default provider to the existing llama-swap/ namespace, or register new opencode providers) — the current pass-through of any slash-containing ID would hand opencode an unknown provider key (V1).
    3. Every single-endpoint call site is in scope: provider.ts (upstreamModel + resolveModelEndpoint), models.ts, model-context.ts (including its deepseek- static-context guard), compaction.ts, task-model.ts, arena-model-call.ts (+ arena callers, coder-side config), coder provider-snapshot.ts, coder dispatcher.ts (V2V4, V9).
    4. Sidecar routing becomes a Sam-desktop provider attribute, not the global default route — embedding needs no sidecar (A60, A74; post-validation verification).
    5. Bare legacy IDs (existing rows, seeded default_model) resolve to the default provider indefinitely — new sessions inherit a bare seeded default until settings are migrated, so this is a permanent fallback, not a one-time migration (V2).
    6. Favorites that reference unavailable models are hidden, never auto-deleted (A23).
  • Evidence basis: The option choice rests on corroborated evidence throughout: the multi-provider client pattern (A13A16), the live collision and churn data from both hosts (A54, A55, A58 — provided material, independently re-checkable), and codebase fit (A60, A63, A64). The UX pattern rests on corroborated documentation (A45, A46) with the Open WebUI pitfalls as corroborated counter-evidence (A23, A27); the Cherry Studio and VS Code code-level references are unverifiable third-party color (V8) and nothing rests on them alone. The single-source peer-ID format (A6) supports only the rejection of O3, which stands independently on codebase facts. The cross-device justification for O5a is codebase-derived inference (documented multi-device usage), explicitly not measured evidence.

Validation

Adversarial validation attacked the evidence, framing, recommendation, and gathering integrity. Findings (condensed; all code-verified by the validator in this repo):

V1: "O1 extends the coder's prefix convention" was overstated

  • Strategy: Challenge the Recommendation
  • Investigation: dispatcher.ts:1006-1011, coder CLAUDE.md, provider-snapshot.ts:66-72.
  • Result: Refuted as originally framed — a stored sam-desktop/<model> passes the dispatcher's slash-check unchanged and reaches opencode as an unknown provider key; llama-swap/ is hardcoded in ≥4 coder locations.
  • Impact: Recommendation now mandates an explicit opencode namespace-translation rule (constraint 2).

V2: The bare-ID legacy fallback was asserted, not designed

  • Strategy: Challenge the Recommendation
  • Investigation: provider.ts:115-135, stream-phase.ts:110, sessions.ts:113-117, schema.sql:222, model-context.ts:77.
  • Result: Partially refuted — architecturally plausible but unimplemented; prefixed IDs would 404 the /upstream/<model>/props fetch and break context/compaction display; the seeded bare default_model makes the fallback permanent, not migratory.
  • Impact: Constraints 1, 3, 5 added.

V3: The deepseek- hazard is wider than routing

  • Strategy: Challenge the Evidence
  • Investigation: model-context.ts:40-49, provider.ts:98, compaction.ts:531.
  • Result: Confirmed with added scope — the context guard fires on the name prefix alone, returning a fake 131k context for embedding's deepseek-r1-qwen3-8b even after routing is fixed.
  • Impact: model-context.ts guard added to the touch-list (constraint 3).

V4: compaction.ts is a missed hardcode site

  • Strategy: Challenge the Evidence
  • Investigation: compaction.ts:351-357resolveModelEndpoint (provider.ts:139-157).
  • Result: Refuted the original C9 list as incomplete — compaction summarization calls would go to the wrong host for embedding models.
  • Impact: Added to the touch-list (A67, constraint 3).

V5: Server-side favorites needed justification against the coder's localStorage pattern

  • Strategy: Challenge the Assumptions
  • Investigation: AgentComposerBar.tsx:33-52, routes/settings.ts, root CLAUDE.md auth model.
  • Result: Partially refuted — the Open WebUI bug distinguishes auto-delete vs hide, not server vs client storage; the original justification conflated the two.
  • Impact: O5a/O5b reframed as an explicit sub-decision; O5a retained on the cross-device argument, labeled as inference.

V6: O3's rejection over-relied on a single-source claim

  • Strategy: Challenge the Evidence-Gathering Integrity
  • Result: Confirmed with a provenance note — O3 is independently rejectable from codebase facts; the stale GitHub issue is demoted to supporting color.
  • Impact: O3 rejection rewritten to lead with codebase-observable reasons.

V7: Composite IDs + naive prefix-stripping would poison the no-TTL context cache

  • Strategy: Challenge the Recommendation
  • Investigation: model-context.ts:9, 26-29, 77-100; the five cross-host duplicate IDs.
  • Result: Refuted the unstated design — stripping before the cache key shares entries across providers with different real context windows, permanently until restart.
  • Impact: Constraint 1 (composite cache key, strip only at URL construction) — the most subtle required design rule.

V8: Third-party code references (Cherry Studio, VS Code PR) are unverifiable

  • Strategy: Challenge the Evidence-Gathering Integrity
  • Result: Partially refuted their evidentiary weight — retained as color; the composite-key argument stands on BooCode's own conventions and the live collision data.
  • Impact: Evidence basis re-worded; nothing rests on those references alone.

V9: Arena is the most exposed hardcode

  • Strategy: Challenge the Evidence
  • Investigation: arena-model-call.ts:16-28, arena-analyzer.ts:90.
  • Result: Confirmed with elevated severity — raw fetch, no abstraction, lives in apps/coder with its own config type (cannot reuse the server's resolver as-is).
  • Impact: Listed as separate coder-side scope (constraint 3).

Adjustments Made

The recommendation survived but was rewritten: the implementation constraints (composite cache keys, opencode namespace translation, the full nine-site touch-list, permanent bare-ID fallback, hidden-not-deleted favorites) were folded into the Recommendation itself; O3's rejection was re-grounded in codebase facts; the favorites-persistence choice was reframed as an explicit sub-decision; unverifiable third-party code references were demoted to supporting color. Post-validation, the orchestrator additionally verified in provider.ts that the sidecar is the default route whenever LLAMA_SIDECAR_URL is set — adding constraint 4 (sidecar becomes a per-provider attribute; embedding needs none).

Confidence Assessment

  • Confidence: High — for the option choice. The validator rated the pre-adjustment synthesis Medium because the implementation scope was understated; that scope is now enumerated above, and no finding challenged the direction (its own words: "architecturally sound given the existing llama-swap/ convention").
  • Remaining Risks: (1) The opencode-side translation (V1) may also require host-side ~/.config/opencode/opencode.json changes — outside this repo. (2) Stale favorite keys accumulate in settings with no cleanup mechanism by design (hide-don't-delete); acceptable for single-user but unbounded. (3) The exact /running JSON envelope and llama-swap peer aggregation details remain single-source — neither is load-bearing. (4) The five duplicate-ID models make any partial rollout (one call site migrated, another not) actively dangerous; the routing resolver should land as one batch.

Sources

ID Source Link / location Retrieved Trust class Summary (one line) Evidence status
A1 llama-swap README github.com/mostlygeek/llama-swap 2026-06-10 web Proxy hot-swapping local inference servers; documents /v1/models, /running, /upstream, /health; v224 current corroborated by A2, A3, A12
A2 llama-swap configuration.md github.com/mostlygeek/llama-swap/blob/main/docs/configuration.md 2026-06-10 web Model IDs are YAML keys; per-model name/description/aliases/metadata/ttl/useModelName; includeAliasesInList corroborated by A3, A4
A3 llama-swap config-schema.json github.com/mostlygeek/llama-swap/blob/main/config-schema.json 2026-06-10 web Authoritative config schema; peers section; no instance-identity field at any level corroborated by A2, A4
A4 llama-swap config.example.yaml github.com/mostlygeek/llama-swap/blob/main/config.example.yaml 2026-06-10 web Annotated example: aliases, useModelName, metadata, groups, peers corroborated by A2, A3
A5 DeepWiki: llama-swap peers deepwiki.com/mostlygeek/llama-swap/3.7-peer-configuration 2026-06-10 web Duplicate peer model IDs route to first-lexicographic peer with only a warning corroborated by A6 (collision); single source on aggregation detail
A6 llama-swap issue #539 github.com/mostlygeek/llama-swap/issues/539 2026-06-10 web Peer models surface as "peer-name: model-name" IDs; stale, unresolved single source (caveated)
A7 llama-swap issue #538 github.com/mostlygeek/llama-swap/issues/538 2026-06-10 web Aliases hidden from /v1/models unless includeAliasesInList corroborated by A2, A3
A8 llama.cpp server README github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md 2026-06-10 web /v1/models id defaults to file path; --alias overrides; meta block fields corroborated by A9, A10
A9 llama.cpp discussion #8547 github.com/ggml-org/llama.cpp/discussions/8547 2026-06-10 web Confirms file-path default id; --override-kv doesn't change API id corroborated by A8
A10 llama.cpp issue #17860 github.com/ggml-org/llama.cpp/issues/17860 2026-06-10 web Only one --alias per llama-server today corroborated by A8
A11 LM4eu/llama-swap Go pkg docs pkg.go.dev/github.com/LM4eu/llama-swap/proxy 2026-06-10 web Model struct {Id, Name, Description, State, Unlisted}; fork, not upstream single source (caveated)
A12 glukhov.org llama-swap quickstart glukhov.org/llm-hosting/llama-swap/ 2026-06-10 web /running state values; alias listing behavior corroborated by A1, A2
A13 Vercel AI SDK provider management ai-sdk.dev/docs/ai-sdk-core/provider-management 2026-06-10 web Registry namespaces models as providerId:modelId; per-provider baseURL corroborated by A14
A14 Vercel AI SDK OpenAI-compatible providers ai-sdk.dev/providers/openai-compatible-providers 2026-06-10 web createOpenAICompatible takes name+baseURL per provider; wire model ID unchanged corroborated by A13
A15 LiteLLM OpenAI-compatible docs docs.litellm.ai/docs/providers/openai_compatible 2026-06-10 web Per-entry api_base; aliasing decouples client name from upstream name corroborated by A16
A16 McDermott: Centralizing LLMs with LiteLLM robert-mcdermott.medium.com/...9874563f3062 2026-06-10 web model_list with unique model_name per upstream resolves collisions corroborated by A15
A17 DeepWiki: llama-swap groups deepwiki.com/mostlygeek/llama-swap/3.4-groups-and-swapping-policies 2026-06-10 web Groups/matrix control concurrency, not model IDs corroborated by A2A4
A18 llama-swap releases github.com/mostlygeek/llama-swap/releases 2026-06-10 web v219v224 changed routing/perf, not /v1/models schema single source (caveated)
A19 Open WebUI discussion #3443 github.com/open-webui/open-webui/discussions/3443 2026-06-10 web Pin-in-dropdown feature request; drag-reorder workaround breaks corroborated by A21, A23
A20 Open WebUI discussion #5902 github.com/open-webui/open-webui/discussions/5902 2026-06-10 web Filtering 70+ models; whitelist vs hide patterns corroborated by A19
A21 Open WebUI env config reference docs.openwebui.com/reference/env-configuration/ 2026-06-10 web DEFAULT_PINNED_MODELS; settings.pinnedModels sorts pinned to top corroborated by A22, A23
A22 Open WebUI database schema docs.openwebui.com/reference/database-schema/ 2026-06-10 web Pins live in user.settings JSON, keyed by bare model ID corroborated by A21
A23 Open WebUI discussion #23656 github.com/open-webui/open-webui/discussions/23656 2026-06-10 web Stale-pin cleanup permanently deletes pins during backend downtime corroborated by A21, A53
A24 Open WebUI discussion #14854 github.com/open-webui/open-webui/discussions/14854 2026-06-10 web Unpin buried in three-dot menu; discoverability failure corroborated by A21
A25 Open WebUI issue #19183 github.com/open-webui/open-webui/issues/19183 2026-06-10 web Local/External/All tabs + tag chips + Fuse.js search in selector corroborated by A26
A26 Open WebUI discussion #21502 github.com/open-webui/open-webui/discussions/21502 2026-06-10 web Flat select unusable at OpenRouter scale; optgroup/search proposals corroborated by A25
A27 Open WebUI discussion #4495 github.com/open-webui/open-webui/discussions/4495 2026-06-10 web Same-named models from two connections are indistinguishable (bare-ID failure) corroborated by A25, A26
A28 LibreChat model specs docs librechat.ai/docs/configuration/librechat_yaml/object_structure/model_specs 2026-06-10 web Admin YAML group field creates named collapsible sections corroborated by A29
A29 LibreChat v0.8.5 changelogs librechat.ai/changelog/v0.8.5 2026-06-10 web Pin support for model specs added (PR #11219) corroborated by A30; persistence detail single-source
A30 LibreChat discussion #11044 github.com/danny-avila/LibreChat/discussions/11044 2026-06-10 web Pinning exists; preset-active confusion corroborated by A29
A31 DeepWiki: LibreChat DB models deepwiki.com/danny-avila/LibreChat/7.1-database-models 2026-06-10 web MongoDB/Mongoose; pinned-spec field name unconfirmed single source (caveated)
A32 Jan v0.6.9 changelog jan.ai/changelog/2025-08-28-image-support 2026-06-10 web "Favorite models" shipped; no UI detail single source (caveated)
A33 Jan manage-models docs jan.ai/docs/desktop/manage-models 2026-06-10 web Organized by source/quantization tier, not provider corroborated by A32
A34 Jan data-folder docs jan.ai/docs/desktop/data-folder 2026-06-10 web Settings in local JSON files corroborated by A32
A35 DeepWiki: Cherry Studio models deepwiki.com/CherryHQ/cherry-studio/5.3-model-configuration-and-capabilities 2026-06-10 web Provider-grouped UI; getModelUniqId composite {id, provider} corroborated by A36 (see V8 caveat)
A36 Cherry Studio ModelService.ts github.com/CherryHQ/cherry-studio/.../ModelService.ts 2026-06-10 web Composite-key implementation corroborated by A35 (see V8 caveat)
A37 Cherry Studio releases github.com/CherryHQ/cherry-studio/releases 2026-06-10 web No favorites changes v1.9.1v1.9.11 single source (caveated)
A38 Chatbox issue #1540 github.com/chatboxai/chatbox/issues/1540 2026-06-10 web Favorite-models proposal; not shipped corroborated by A39
A39 Chatbox issue #2252 github.com/chatboxai/chatbox/issues/2252 2026-06-10 web Two-section dropdown proposal (Preferred on top, star per row) corroborated by A38
A40 DeepWiki: Chatbox local models deepwiki.com/chatboxai/chatbox/4.6-local-model-integration 2026-06-10 web settings.favoritedModels in localStorage single source (caveated)
A41 SillyTavern PR #5536 github.com/SillyTavern/SillyTavern/pull/5536 2026-06-10 web Unified sort/group settings drawer across providers corroborated by A42
A42 SillyTavern 1.13.5 notes github.com/SillyTavern/SillyTavern/discussions/4660 2026-06-10 web Sort/group shipped in 1.13.5 corroborated by A41
A43 SillyTavern connection profiles docs docs.sillytavern.app/usage/core-concepts/connection-profiles/ 2026-06-10 web Profiles = saved config snapshots, not per-model favorites corroborated by A44
A44 SillyTavern issue #4565 github.com/SillyTavern/SillyTavern/issues/4565 2026-06-10 web Better model selector request closed not-planned corroborated by A43
A45 VS Code language models docs code.visualstudio.com/docs/agent-customization/language-models 2026-06-10 web Provider groups + hover pin + dedicated Pinned top section, stable order, model stays in group corroborated by A46
A46 vscode-copilot-chat PR #1111 github.com/microsoft/vscode-copilot-chat/pull/1111 2026-06-10 web BYOK models grouped into a category corroborated by A45 (see V8 caveat)
A47 Continue.dev model roles docs docs.continue.dev/customize/model-roles/00-intro 2026-06-10 web Role-based dropdowns; no grouping/favorites corroborated by A48
A48 Continue.dev providers overview docs.continue.dev/customize/model-providers/overview 2026-06-10 web Picker reflects config.yaml order corroborated by A47
A49 Open WebUI discussion #15449 github.com/open-webui/open-webui/discussions/15449 2026-06-10 web Multi-model combination pinning request single source (caveated)
A50 BigAGI repo + changelog github.com/enricoros/big-AGI 2026-06-10 web No grouping/favorites evidence (negative finding) single source (caveated)
A51 LM Studio v0.4.0 changelog lmstudio.ai/changelog/lmstudio-v0.4.0 2026-06-10 web Search/format filters; no favorites corroborated by A52
A52 LM Studio v0.4.13 changelog lmstudio.ai/changelog/lmstudio-v0.4.13 2026-06-10 web No picker changes corroborated by A51
A53 Open WebUI issue #22578 github.com/open-webui/open-webui/issues/22578 2026-06-10 web Model enable/disable state goes stale on catalog change corroborated by A23
A54 embedding host live inventory provided: curl http://100.90.172.55:8411/v1/models + /running 2026-06-10 provided 39 models incl. deepseek-r1-qwen3-8b and 5 IDs duplicated on Sam-desktop; /running empty corroborated by A56 (config matches)
A55 Sam-desktop live inventory provided: curl http://100.101.41.16:8401/v1/models + /running 2026-06-10 provided 21 models; qwen3.6-35b-a3b-mxfp4 absent; nemotron-omni running via D:\llama-server corroborated by A57
A56 embedding host SSH inventory provided: ssh samkintop@100.90.172.55 (~/llama-swap/config.yaml, ~/llama.cpp, ~/models) 2026-06-10 provided P104-tuned llama-swap config (ttl 1800, per-model llama-server cmds); llama.cpp source build corroborated by A54
A57 Sam-desktop SSH inventory provided: ssh samki@100.101.41.16 (dir D:) 2026-06-10 provided D:\llama-server (b9591 CUDA), D:\llama-swap (v224), D:\models, D:\llama-sidecar corroborated by A55
A58 Current env config .env, apps/coder/.env.host n/a codebase LLAMA_SWAP_URL=http://100.101.41.16:8401; DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (both apps) corroborated (read directly)
A59 Models route apps/server/src/routes/models.ts:14-56 n/a codebase GET /api/models fetches only LLAMA_SWAP_URL (+DeepSeek); flat untagged list corroborated (read directly)
A60 Inference provider/routing apps/server/src/services/inference/provider.ts:1-163 n/a codebase resolveRoute: deepseek- prefix → cloud; LLAMA_SIDECAR_URL set → sidecar default for everything; else single swap; resolveModelEndpoint hardcodes LLAMA_SWAP_URL corroborated (read directly)
A61 BooChat model picker apps/web/src/components/ModelPicker.tsx:14-133 n/a codebase Flat lazy list, no grouping/search/favorites; PATCHes session.model corroborated (explorer + validator)
A62 Provider snapshot contracts packages/contracts/src/provider-snapshot.ts n/a codebase ProviderModel has no provider field; identity implicit in parent entry name corroborated
A63 Coder provider snapshot apps/coder/src/services/provider-snapshot.ts:48-70,256-310 n/a codebase Prefixes single llama-swap list with llama-swap/; merges into boocode entry corroborated
A64 Coder dispatcher prefixing apps/coder/src/services/dispatcher.ts:1006-1011 n/a codebase Bare IDs get llama-swap/; slash-containing IDs pass through unchanged corroborated (validator-verified)
A65 Model/settings persistence apps/server/src/schema.sql:20,217-222,249; routes/settings.ts n/a codebase sessions.model NOT NULL, chats.model nullable, settings KV JSONB seeded with bare default_model corroborated
A66 Model context service apps/server/src/services/model-context.ts:9,26-29,40-49,77-100 n/a codebase No-TTL positive cache keyed by raw model string; deepseek- guard returns static 131k; /upstream URL from single config corroborated (validator-verified)
A67 Compaction LLM calls apps/server/src/services/compaction.ts:351-357,531 n/a codebase Summarization via resolveModelEndpoint → always LLAMA_SWAP_URL corroborated (validator-verified)
A68 Task model service apps/server/src/services/task-model.ts:59-68 n/a codebase FAST_MODEL fallback chain against single endpoint (TASK_MODEL_URL escape hatch) corroborated
A69 Arena model calls apps/coder/src/services/arena-model-call.ts:16-28; arena-analyzer.ts:90 n/a codebase Raw fetch to LLAMA_SWAP_URL, no routing abstraction corroborated (validator-verified)
A70 Coder composer prefs apps/web/src/components/AgentComposerBar.tsx:33-52,118-196 n/a codebase CompactPicker flat lists; prefs in localStorage boocode.coder.agent-prefs corroborated
A71 Model display naming apps/web/src/lib/modelName.ts:6-32; MessageBubble.tsx:140-189 n/a codebase Display chips already strip llama-swap/-style prefixes corroborated
A72 Coder provider config file data/coder-providers.example.json n/a codebase Per-provider overrides exist; no baseUrl field — second endpoint unregistrable today corroborated
A73 Openspec conventions openspec/README.md n/a codebase changes//{proposal,tasks,design}.md; lowercase-hyphenated slugs corroborated (read directly)
A74 Sidecar architecture notes apps/server/CLAUDE.md (sidecar sections); /opt/forks/llama-sidecar/ n/a codebase llama-sidecar = Go per-agent llama-server pool on Sam-desktop; X-Agent-Flags header; boot guard ties llama_extra_args to LLAMA_SIDECAR_URL corroborated by A60

A54/A55: Live host inventories — recommendation-bearing

  • Link / location: provided: orchestrator-run curl against http://100.90.172.55:8411 and http://100.101.41.16:8401 (/v1/models, /running)
  • Retrieved: 2026-06-10
  • Trust class: provided (operator-owned infrastructure, independently re-checkable with the same commands)
  • Summary: embedding serves 39 mostly-small models; Sam-desktop serves 21 mostly-large models. Five IDs (granite-4.1-8b, negentropy-4.7-9b, qwen3.5-9b, qwen3.5-9b-deepseek-v4, qwopus3.5-9b-coder) appear on both — making composite keying mandatory, not stylistic. The configured DEFAULT_MODEL is absent from Sam-desktop's live list, proving ID churn. embedding's deepseek-r1-qwen3-8b collides with the deepseek- cloud-routing heuristic. Neither host populates llama-swap's optional name/description fields, so the UI must derive labels from IDs (as formatModelLabel already does).
  • Evidence status: corroborated by A56/A57 (SSH-level configs match the served lists).

A60: provider.ts routing — recommendation-bearing

  • Link / location: apps/server/src/services/inference/provider.ts:90-157
  • Retrieved: n/a
  • Trust class: codebase (current-state anchor)
  • Summary: The single point where all three routes (deepseek/sidecar/swap) resolve. Establishes that (a) BooCode already builds per-baseURL AI-SDK providers from a cache map — O1 slots into this with minimal new machinery; (b) the sidecar is the default route for everything when configured, which forces constraint 4; (c) resolveModelEndpoint is a second, parallel resolution path (compaction/task-model) that must change in lockstep.
  • Evidence status: corroborated (read directly by orchestrator and validator).

A13/A14: AI SDK provider registry pattern — recommendation-bearing

  • Link / location: https://ai-sdk.dev/docs/ai-sdk-core/provider-management ; https://ai-sdk.dev/providers/openai-compatible-providers
  • Retrieved: 2026-06-10
  • Trust class: web
  • Summary: The library BooCode already uses prescribes exactly O1's shape: one named createOpenAICompatible instance per provider, registry-level provider:model namespacing, bare model IDs on the wire. Adopting O1 is convergence with the upstream idiom rather than a custom scheme.
  • Evidence status: corroborated (two official doc pages, consistent with LiteLLM's independent design A15/A16).

A45: VS Code model picker docs — recommendation-bearing (UX)

  • Link / location: https://code.visualstudio.com/docs/agent-customization/language-models
  • Retrieved: 2026-06-10
  • Trust class: web
  • Summary: Documents the shipped pattern this feature's dropdown adapts: provider-grouped list, hover-revealed pin, dedicated Pinned top section in stable insertion order, pinned models remaining in their provider group.
  • Evidence status: corroborated by A46; code-level detail treated as color per V8.

A23/A27: Open WebUI pitfalls — recommendation-bearing (counter-evidence)