feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
42 KiB
Research: Integrating two named llama-swap providers ("Sam-desktop", "embedding") with provider-grouped model dropdowns and per-model favorites in BooChat and BooCoder
Question: BooCode currently talks to exactly one llama-swap endpoint. How should a second named provider ("embedding", 100.90.172.55:8411) be added alongside the renamed existing one ("Sam-desktop", 100.101.41.16:8401), integrated into both BooChat and BooCoder, with the model dropdown grouped per provider and a favorite button per model (Favorites section listed first)?
Evidence mode: strict (default — every recommendation-bearing claim is corroborated or explicitly caveated).
Summary
Both machines can be added to BooCode as named providers, and the right way is to give BooCode a small provider registry (a name and base URL per machine) and to store selected models as a "provider/model" pair instead of a bare name. Bare names cannot work here: five models exist on both machines under identical names today, and the configured default model has already drifted out of the live list once — so favorites and routing keyed by name alone would be ambiguous and fragile. The dropdown should follow the pattern proven in VS Code's model picker: a Favorites section on top, then one section per provider (Sam-desktop first, then embedding), a star on every row, favorited models staying visible in their provider section, and favorites that are hidden — never deleted — when a machine is offline.
The adversarial validation pass confirmed the direction but showed the change is wider than the obvious spots: chat compaction, context-window lookup, arena battles, the coder's opencode dispatch, and the sidecar routing default all silently assume a single endpoint and need the same provider-resolution change. Two extra hazards were found in the live data: a model on the embedding host literally named deepseek-r1-qwen3-8b trips BooCode's "starts with deepseek-" cloud-routing heuristic, and the always-on sidecar default route would swallow embedding-bound requests. The embedding host does not need its own llama-sidecar — but sidecar routing must become a Sam-desktop-only attribute.
Well-corroborated: live data from both hosts, direct code evidence, and multiple independent web sources agree; validation expanded the implementation scope but did not overturn the choice.
- Confidence: High
Research Results
What exists today (codebase — current-state anchor)
BooCode's entire inference surface assumes one llama-swap endpoint, configured as LLAMA_SWAP_URL=http://100.101.41.16:8401 with DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (A58). The single-endpoint assumption is hard-coded in at least nine places:
GET /api/modelsfetches only{LLAMA_SWAP_URL}/v1/models(plus DeepSeek cloud whenDEEPSEEK_API_KEYis set) and returns a flatModelInfo[]with no provider tag (A59).upstreamModel()routes by string heuristics: model IDs startingdeepseek-go to the DeepSeek cloud API; agents withllama_extra_argsgo to the sidecar; and whenLLAMA_SIDECAR_URLis configured at all — which it is in docker-compose — every remaining request routes through the sidecar by default, falling back to llama-swap only when no sidecar is configured (A60). The provider for each base URL is a cached AI-SDKcreateOpenAICompatibleinstance.resolveModelEndpoint()(used by compaction and task-model for non-streaming calls) returnsLLAMA_SWAP_URLfor every non-DeepSeek model (A60, A67).model-context.tsfetches{LLAMA_SWAP_URL}/upstream/<model>/propsfor context windows, with a no-TTL positive cache keyed by the raw model string, and adeepseek-prefix guard that short-circuits to a static 131,072 context without calling any upstream (A66).task-model.ts(auto-naming, summaries) falls back throughFAST_MODEL → chat model → DEFAULT_MODELagainst the single URL (A68).- Arena battles call
{LLAMA_SWAP_URL}/v1/chat/completionsdirectly with no routing abstraction at all (A69). - The coder's provider snapshot fetches the single llama-swap list and prefixes every ID with
llama-swap/(A63); its dispatcher prefixes any bare (slash-less) model ID withllama-swap/before opencode dispatch, and passes any ID already containing/through unchanged (A64). - Model IDs persist as bare strings:
sessions.model TEXT NOT NULL,chats.model TEXTnullable, validated only as a 1–200-char string (A65). - The BooChat dropdown (
ModelPicker.tsx) and the BooCoder picker (CompactPickerinsideAgentComposerBar.tsx) are flat lists with no grouping, search, or favorites; the coder picker persists per-provider preferences in browser localStorage, while BooChat model choice is server-persisted on the session row (A61, A70). Display code already stripsllama-swap/-style prefixes when rendering model chips (A71). No favorites/pinning mechanism exists anywhere; thesettingstable is a key-value JSONB store currently holdingdefault_modeland theme keys (A65).
The coder's runtime provider config (data/coder-providers.json) has no baseUrl field — there is no way to register a second llama-swap endpoint today (A72).
What the two hosts actually serve (provided material, retrieved live 2026-06-10)
- embedding (
100.90.172.55:8411, Linux, P104-100 8GB Pascal GPU): 39 models, skewed small — gemma-3-270m through gemma-4-12b, the LFM2.5 family, granite-4.1-3b/8b, qwen3.5-0.8b/4b/9b, qwopus3.5 family,deepseek-r1-qwen3-8b, a reranker, extraction models (A54). Its llama-swap config is hand-tuned per model (flash-attn/KV-quant choices for Pascal, ttl 1800), with llama.cpp built from source on the box (A56). - Sam-desktop (
100.101.41.16:8401, Windows): 21 models, skewed large — qwen3.6-35b-a3b/27b, qwopus3.6 family, granite-4.1-30b, mellum2-12b, nemotron-cascade-2-30b-a3b, north-mini-code, etc. Served byD:\llama-server(llama.cpp CUDA build b9591) behindD:\llama-swap(llama-swap v224), models inD:\models; aD:\llama-sidecardirectory backs the existing sidecar at:8402(A55, A57).
Three load-bearing facts fall out of the live inventories:
- Five model IDs exist on both hosts:
granite-4.1-8b,negentropy-4.7-9b,qwen3.5-9b,qwen3.5-9b-deepseek-v4,qwopus3.5-9b-coder(A54, A55). Bare-ID favorites or routing are therefore ambiguous from day one. - The configured
DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4is not in Sam-desktop's current model list (closest:qwen3.6-35b-a3b) — model IDs already churn in practice, so favorites must tolerate stale references (A55, A58). deepseek-r1-qwen3-8bon the embedding host collides with BooCode'sdeepseek-heuristics: withDEEPSEEK_API_KEYset it would be routed to the DeepSeek cloud API, and the context-window guard returns a fake 131k context on the name prefix alone regardless (A54, A60, A66).
How llama-swap identifies models (web, corroborated)
llama-swap model IDs are exactly the YAML keys in its config.yaml; /v1/models can additionally carry optional per-model name, description, and arbitrary metadata from config — fields neither of Sam's hosts currently populates (A1–A4, A54, A55). llama-swap has no instance-identity field: two instances are distinguishable only by host:port (A3). /running reports load state per model (A1, A12). Peer federation exists (one llama-swap aggregating another), but peer-served models surface as "peer-name: model-name" IDs [single-source: A6] and same-ID collisions resolve silently to the lexicographically-first peer (A5) — and, decisive without any web source, BooCode would still see one flat list with no native grouping while the two hosts' uptime becomes coupled. Standalone llama.cpp llama-server defaults its /v1/models ID to the model file path unless --alias is set (A8, A9) — relevant only if a host ever bypasses llama-swap.
How mature clients solve exactly this (web, corroborated)
Every major OpenAI-compatible client library handles multiple same-protocol providers with separate named provider instances, each with its own baseURL, namespaced in the client's registry as provider:model / provider/model — the model ID actually sent on the wire to each backend stays the bare upstream ID (Vercel AI SDK provider registry: A13, A14; LiteLLM model_list: A15, A16). BooCode already uses the AI SDK's createOpenAICompatible (A60) and the coder already namespaces with a llama-swap/ prefix (A63, A64), so this pattern is an extension of existing conventions, not a new idiom.
Dropdown + favorites prior art (web)
The closest shipped implementation of the requested UX is VS Code's model picker: models grouped by provider, a pin icon revealed on hover, pinned models lifted into a dedicated top section in stable insertion order, while remaining visible in their provider group (display copy, not move) (A45, A46). Cherry Studio independently demonstrates the key-collision lesson: its model identity is the composite {id, provider} precisely so two providers serving the same model name don't collide (A35, A36) [third-party code reference; unverifiable from here — supporting color only, see V8]. Open WebUI documents the two pitfalls to avoid: favorites keyed by bare model ID become ambiguous the moment two connections serve the same name (A27), and its stale-pin cleanup permanently deletes pins when a backend is temporarily down (A23) — the correct behavior is to hide unavailable favorites and restore them when the host returns. LibreChat groups via admin-configured YAML and added pinning in v0.8.5 (A28, A29). Jan, Chatbox, SillyTavern, Continue.dev, BigAGI, and LM Studio offer weaker or no equivalents (A32–A34, A38–A44, A47–A52) — none contradicts the VS Code pattern.
Does embedding need a llama-sidecar? No.
The llama-sidecar is a Go daemon on Sam-desktop providing a per-agent llama-server process pool so agents can carry llama_extra_args (cache quant, spec decoding, slot save) injected via an X-Agent-Flags header (A60, A74). The embedding host needs none of that: its per-model tuning is baked directly into its llama-swap config.yaml (A56), and no per-agent flag injection applies to it. However, resolveRoute currently makes the sidecar the default route for all non-DeepSeek inference whenever LLAMA_SIDECAR_URL is set (A60) — so under the multi-provider design, sidecar routing must become an attribute of the Sam-desktop provider entry (e.g. optional sidecarUrl per provider), not a global default; otherwise requests for embedding-hosted models would be sent to a sidecar that only manages Sam-desktop processes.
Openspec conventions for the follow-up plan (codebase)
Per-batch docs land in openspec/changes/<slug>/ with proposal.md (why + scope), tasks.md (numbered/checkbox action list), and optional design.md (architecture/data-model decisions); slugs are lowercase-hyphenated from the batch title (A73). This feature is a natural three-file batch — the provider registry + routing is design-heavy, so design.md is warranted.
Options to Consider
O1: Named provider registry with composite model IDs (<provider>/<model>)
- What it is: BooCode config gains a provider list (
{ name, baseUrl, sidecarUrl? }per entry — "sam-desktop" and "embedding"). Models are stored and selected assam-desktop/qwen3.6-35b-a3b,embedding/gemma-4-12b./api/modelsreturns provider-tagged groups; one routing resolver (provider prefix → baseURL, bare wire ID) replaces everyLLAMA_SWAP_URLhardcode; bare legacy IDs fall back to the default provider (sam-desktop). Favorites, caches, and attribution all key on the composite ID. - Trade-offs: Touches every call site that assumes one endpoint (the nine sites above — see Validation for the full list); needs a deliberate legacy-bare-ID fallback for existing session/chat rows and the seeded
default_model; the coder's opencode namespace (llama-swap/) needs an explicit translation rule. In exchange: no DB schema change for model columns, no llama-swap config changes on either host, matches the AI-SDK idiom BooCode already uses and the coder's existing prefix convention, and makes thedeepseek-heuristic unnecessary for prefixed IDs. - Rests on: (A13, A14, A15, A16) for the pattern; (A54, A55) for the collision necessity; (A60, A63, A64) for fit with existing code.
- Evidence status: corroborated.
O2: Bare model IDs plus a separate provider field everywhere
- What it is: Keep model strings as-is and add a
providercolumn/field throughsessions,chats, WS frames,ModelInfo,ProviderModel, and every read path. - Trade-offs: Avoids string munging and display-time prefix stripping, but is strictly more invasive: two schema migrations, a
WsFrameSchemachange rebuilt through@boocode/contracts, and every consumer updated in lockstep — while favorites still need a composite key anyway. Higher blast radius for the same outcome. - Rests on: (A65, A62) for the touched surfaces.
- Evidence status: corroborated (codebase-derived).
O3: llama-swap peer federation (Sam-desktop aggregates embedding as a peer)
- What it is: Configure embedding as a
peers:entry in Sam-desktop's llama-swap; BooCode keeps a single endpoint. - Trade-offs: Rejected on codebase-observable grounds: BooCode would still see one flat list (no native named grouping — the feature's whole point), the two hosts' availability becomes coupled, and it requires operational changes on a host outside this repo. Additionally, peer-served model IDs surface as
"peer-name: model-name"[single-source: A6] with silent first-lexicographic collision resolution (A5). - Rests on: (A5, A6) plus codebase observation (A59, A61).
- Evidence status: rejection corroborated by codebase facts; the peer ID-format detail is single-source (caveated) and not load-bearing.
O4: External aggregator proxy (LiteLLM) in front of both hosts
- What it is: A LiteLLM proxy with a
model_listmapping unique aliases to each host; BooCode keeps one endpoint. - Trade-offs: Proven pattern (A15, A16) but adds a third always-on service with a manually-maintained catalog (no auto-discovery from
/v1/models), an extra network hop, and still no provider grouping signal unless encoded in alias naming conventions. Overweight for a single-user self-hosted system. - Rests on: (A15, A16).
- Evidence status: corroborated.
Sub-decision — favorites persistence
- O5a: Server-side, in the
settingstable (e.g.favorite_models: string[]of composite IDs). Survives browsers/devices — and multi-device use is real here (the repo's own docs describe side-by-side iPhone debugging), matching how BooChat model choice is already server-persisted on the session row. Costs a PATCH per star toggle and needs a "hide stale, never delete" rule (A23) plus acceptance that stale composite keys linger until manually unfavorited. - O5b: Browser localStorage, extending the coder's existing
boocode.coder.agent-prefspattern (A70). Zero API surface, but per-device, per-browser, and split across the two UIs. - Evidence status: both corroborated; the cross-device argument for O5a is codebase-derived inference from documented usage, not a measured requirement.
Recommendation
- Recommendation: O1 — named provider registry with
<provider>/<model>composite IDs — combined with the VS Code-pattern dropdown (Favorites on top in stable insertion order, then Sam-desktop's models, then embedding's; star toggle per row; favorited models remain listed in their provider group) and O5a server-side favorites keyed by composite ID. Non-negotiable design constraints carried in from validation:- Prefix-strip only at wire-URL construction; caches (notably
model-context.ts's no-TTL positive cache) key on the full composite ID, or the five name-collided models cross-pollute context windows between hosts (V7). - The coder dispatcher must translate composite prefixes for opencode (map the default provider to the existing
llama-swap/namespace, or register new opencode providers) — the current pass-through of any slash-containing ID would hand opencode an unknown provider key (V1). - Every single-endpoint call site is in scope:
provider.ts(upstreamModel+resolveModelEndpoint),models.ts,model-context.ts(including itsdeepseek-static-context guard),compaction.ts,task-model.ts,arena-model-call.ts(+ arena callers, coder-side config), coderprovider-snapshot.ts, coderdispatcher.ts(V2–V4, V9). - Sidecar routing becomes a Sam-desktop provider attribute, not the global default route — embedding needs no sidecar (A60, A74; post-validation verification).
- Bare legacy IDs (existing rows, seeded
default_model) resolve to the default provider indefinitely — new sessions inherit a bare seeded default until settings are migrated, so this is a permanent fallback, not a one-time migration (V2). - Favorites that reference unavailable models are hidden, never auto-deleted (A23).
- Prefix-strip only at wire-URL construction; caches (notably
- Evidence basis: The option choice rests on corroborated evidence throughout: the multi-provider client pattern (A13–A16), the live collision and churn data from both hosts (A54, A55, A58 — provided material, independently re-checkable), and codebase fit (A60, A63, A64). The UX pattern rests on corroborated documentation (A45, A46) with the Open WebUI pitfalls as corroborated counter-evidence (A23, A27); the Cherry Studio and VS Code code-level references are unverifiable third-party color (V8) and nothing rests on them alone. The single-source peer-ID format (A6) supports only the rejection of O3, which stands independently on codebase facts. The cross-device justification for O5a is codebase-derived inference (documented multi-device usage), explicitly not measured evidence.
Validation
Adversarial validation attacked the evidence, framing, recommendation, and gathering integrity. Findings (condensed; all code-verified by the validator in this repo):
V1: "O1 extends the coder's prefix convention" was overstated
- Strategy: Challenge the Recommendation
- Investigation:
dispatcher.ts:1006-1011, coder CLAUDE.md,provider-snapshot.ts:66-72. - Result: Refuted as originally framed — a stored
sam-desktop/<model>passes the dispatcher's slash-check unchanged and reaches opencode as an unknown provider key;llama-swap/is hardcoded in ≥4 coder locations. - Impact: Recommendation now mandates an explicit opencode namespace-translation rule (constraint 2).
V2: The bare-ID legacy fallback was asserted, not designed
- Strategy: Challenge the Recommendation
- Investigation:
provider.ts:115-135,stream-phase.ts:110,sessions.ts:113-117,schema.sql:222,model-context.ts:77. - Result: Partially refuted — architecturally plausible but unimplemented; prefixed IDs would 404 the
/upstream/<model>/propsfetch and break context/compaction display; the seeded baredefault_modelmakes the fallback permanent, not migratory. - Impact: Constraints 1, 3, 5 added.
V3: The deepseek- hazard is wider than routing
- Strategy: Challenge the Evidence
- Investigation:
model-context.ts:40-49,provider.ts:98,compaction.ts:531. - Result: Confirmed with added scope — the context guard fires on the name prefix alone, returning a fake 131k context for embedding's
deepseek-r1-qwen3-8beven after routing is fixed. - Impact:
model-context.tsguard added to the touch-list (constraint 3).
V4: compaction.ts is a missed hardcode site
- Strategy: Challenge the Evidence
- Investigation:
compaction.ts:351-357→resolveModelEndpoint(provider.ts:139-157). - Result: Refuted the original C9 list as incomplete — compaction summarization calls would go to the wrong host for embedding models.
- Impact: Added to the touch-list (A67, constraint 3).
V5: Server-side favorites needed justification against the coder's localStorage pattern
- Strategy: Challenge the Assumptions
- Investigation:
AgentComposerBar.tsx:33-52,routes/settings.ts, root CLAUDE.md auth model. - Result: Partially refuted — the Open WebUI bug distinguishes auto-delete vs hide, not server vs client storage; the original justification conflated the two.
- Impact: O5a/O5b reframed as an explicit sub-decision; O5a retained on the cross-device argument, labeled as inference.
V6: O3's rejection over-relied on a single-source claim
- Strategy: Challenge the Evidence-Gathering Integrity
- Result: Confirmed with a provenance note — O3 is independently rejectable from codebase facts; the stale GitHub issue is demoted to supporting color.
- Impact: O3 rejection rewritten to lead with codebase-observable reasons.
V7: Composite IDs + naive prefix-stripping would poison the no-TTL context cache
- Strategy: Challenge the Recommendation
- Investigation:
model-context.ts:9, 26-29, 77-100; the five cross-host duplicate IDs. - Result: Refuted the unstated design — stripping before the cache key shares entries across providers with different real context windows, permanently until restart.
- Impact: Constraint 1 (composite cache key, strip only at URL construction) — the most subtle required design rule.
V8: Third-party code references (Cherry Studio, VS Code PR) are unverifiable
- Strategy: Challenge the Evidence-Gathering Integrity
- Result: Partially refuted their evidentiary weight — retained as color; the composite-key argument stands on BooCode's own conventions and the live collision data.
- Impact: Evidence basis re-worded; nothing rests on those references alone.
V9: Arena is the most exposed hardcode
- Strategy: Challenge the Evidence
- Investigation:
arena-model-call.ts:16-28,arena-analyzer.ts:90. - Result: Confirmed with elevated severity — raw fetch, no abstraction, lives in
apps/coderwith its own config type (cannot reuse the server's resolver as-is). - Impact: Listed as separate coder-side scope (constraint 3).
Adjustments Made
The recommendation survived but was rewritten: the implementation constraints (composite cache keys, opencode namespace translation, the full nine-site touch-list, permanent bare-ID fallback, hidden-not-deleted favorites) were folded into the Recommendation itself; O3's rejection was re-grounded in codebase facts; the favorites-persistence choice was reframed as an explicit sub-decision; unverifiable third-party code references were demoted to supporting color. Post-validation, the orchestrator additionally verified in provider.ts that the sidecar is the default route whenever LLAMA_SIDECAR_URL is set — adding constraint 4 (sidecar becomes a per-provider attribute; embedding needs none).
Confidence Assessment
- Confidence: High — for the option choice. The validator rated the pre-adjustment synthesis Medium because the implementation scope was understated; that scope is now enumerated above, and no finding challenged the direction (its own words: "architecturally sound given the existing
llama-swap/convention"). - Remaining Risks: (1) The opencode-side translation (V1) may also require host-side
~/.config/opencode/opencode.jsonchanges — outside this repo. (2) Stale favorite keys accumulate insettingswith no cleanup mechanism by design (hide-don't-delete); acceptable for single-user but unbounded. (3) The exact/runningJSON envelope and llama-swap peer aggregation details remain single-source — neither is load-bearing. (4) The five duplicate-ID models make any partial rollout (one call site migrated, another not) actively dangerous; the routing resolver should land as one batch.
Sources
| ID | Source | Link / location | Retrieved | Trust class | Summary (one line) | Evidence status |
|---|---|---|---|---|---|---|
| A1 | llama-swap README | github.com/mostlygeek/llama-swap | 2026-06-10 | web | Proxy hot-swapping local inference servers; documents /v1/models, /running, /upstream, /health; v224 current | corroborated by A2, A3, A12 |
| A2 | llama-swap configuration.md | github.com/mostlygeek/llama-swap/blob/main/docs/configuration.md | 2026-06-10 | web | Model IDs are YAML keys; per-model name/description/aliases/metadata/ttl/useModelName; includeAliasesInList | corroborated by A3, A4 |
| A3 | llama-swap config-schema.json | github.com/mostlygeek/llama-swap/blob/main/config-schema.json | 2026-06-10 | web | Authoritative config schema; peers section; no instance-identity field at any level | corroborated by A2, A4 |
| A4 | llama-swap config.example.yaml | github.com/mostlygeek/llama-swap/blob/main/config.example.yaml | 2026-06-10 | web | Annotated example: aliases, useModelName, metadata, groups, peers | corroborated by A2, A3 |
| A5 | DeepWiki: llama-swap peers | deepwiki.com/mostlygeek/llama-swap/3.7-peer-configuration | 2026-06-10 | web | Duplicate peer model IDs route to first-lexicographic peer with only a warning | corroborated by A6 (collision); single source on aggregation detail |
| A6 | llama-swap issue #539 | github.com/mostlygeek/llama-swap/issues/539 | 2026-06-10 | web | Peer models surface as "peer-name: model-name" IDs; stale, unresolved | single source (caveated) |
| A7 | llama-swap issue #538 | github.com/mostlygeek/llama-swap/issues/538 | 2026-06-10 | web | Aliases hidden from /v1/models unless includeAliasesInList | corroborated by A2, A3 |
| A8 | llama.cpp server README | github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md | 2026-06-10 | web | /v1/models id defaults to file path; --alias overrides; meta block fields | corroborated by A9, A10 |
| A9 | llama.cpp discussion #8547 | github.com/ggml-org/llama.cpp/discussions/8547 | 2026-06-10 | web | Confirms file-path default id; --override-kv doesn't change API id | corroborated by A8 |
| A10 | llama.cpp issue #17860 | github.com/ggml-org/llama.cpp/issues/17860 | 2026-06-10 | web | Only one --alias per llama-server today | corroborated by A8 |
| A11 | LM4eu/llama-swap Go pkg docs | pkg.go.dev/github.com/LM4eu/llama-swap/proxy | 2026-06-10 | web | Model struct {Id, Name, Description, State, Unlisted}; fork, not upstream | single source (caveated) |
| A12 | glukhov.org llama-swap quickstart | glukhov.org/llm-hosting/llama-swap/ | 2026-06-10 | web | /running state values; alias listing behavior | corroborated by A1, A2 |
| A13 | Vercel AI SDK provider management | ai-sdk.dev/docs/ai-sdk-core/provider-management | 2026-06-10 | web | Registry namespaces models as providerId:modelId; per-provider baseURL | corroborated by A14 |
| A14 | Vercel AI SDK OpenAI-compatible providers | ai-sdk.dev/providers/openai-compatible-providers | 2026-06-10 | web | createOpenAICompatible takes name+baseURL per provider; wire model ID unchanged | corroborated by A13 |
| A15 | LiteLLM OpenAI-compatible docs | docs.litellm.ai/docs/providers/openai_compatible | 2026-06-10 | web | Per-entry api_base; aliasing decouples client name from upstream name | corroborated by A16 |
| A16 | McDermott: Centralizing LLMs with LiteLLM | robert-mcdermott.medium.com/...9874563f3062 | 2026-06-10 | web | model_list with unique model_name per upstream resolves collisions | corroborated by A15 |
| A17 | DeepWiki: llama-swap groups | deepwiki.com/mostlygeek/llama-swap/3.4-groups-and-swapping-policies | 2026-06-10 | web | Groups/matrix control concurrency, not model IDs | corroborated by A2–A4 |
| A18 | llama-swap releases | github.com/mostlygeek/llama-swap/releases | 2026-06-10 | web | v219–v224 changed routing/perf, not /v1/models schema | single source (caveated) |
| A19 | Open WebUI discussion #3443 | github.com/open-webui/open-webui/discussions/3443 | 2026-06-10 | web | Pin-in-dropdown feature request; drag-reorder workaround breaks | corroborated by A21, A23 |
| A20 | Open WebUI discussion #5902 | github.com/open-webui/open-webui/discussions/5902 | 2026-06-10 | web | Filtering 70+ models; whitelist vs hide patterns | corroborated by A19 |
| A21 | Open WebUI env config reference | docs.openwebui.com/reference/env-configuration/ | 2026-06-10 | web | DEFAULT_PINNED_MODELS; settings.pinnedModels sorts pinned to top | corroborated by A22, A23 |
| A22 | Open WebUI database schema | docs.openwebui.com/reference/database-schema/ | 2026-06-10 | web | Pins live in user.settings JSON, keyed by bare model ID | corroborated by A21 |
| A23 | Open WebUI discussion #23656 | github.com/open-webui/open-webui/discussions/23656 | 2026-06-10 | web | Stale-pin cleanup permanently deletes pins during backend downtime | corroborated by A21, A53 |
| A24 | Open WebUI discussion #14854 | github.com/open-webui/open-webui/discussions/14854 | 2026-06-10 | web | Unpin buried in three-dot menu; discoverability failure | corroborated by A21 |
| A25 | Open WebUI issue #19183 | github.com/open-webui/open-webui/issues/19183 | 2026-06-10 | web | Local/External/All tabs + tag chips + Fuse.js search in selector | corroborated by A26 |
| A26 | Open WebUI discussion #21502 | github.com/open-webui/open-webui/discussions/21502 | 2026-06-10 | web | Flat select unusable at OpenRouter scale; optgroup/search proposals | corroborated by A25 |
| A27 | Open WebUI discussion #4495 | github.com/open-webui/open-webui/discussions/4495 | 2026-06-10 | web | Same-named models from two connections are indistinguishable (bare-ID failure) | corroborated by A25, A26 |
| A28 | LibreChat model specs docs | librechat.ai/docs/configuration/librechat_yaml/object_structure/model_specs | 2026-06-10 | web | Admin YAML group field creates named collapsible sections |
corroborated by A29 |
| A29 | LibreChat v0.8.5 changelogs | librechat.ai/changelog/v0.8.5 | 2026-06-10 | web | Pin support for model specs added (PR #11219) | corroborated by A30; persistence detail single-source |
| A30 | LibreChat discussion #11044 | github.com/danny-avila/LibreChat/discussions/11044 | 2026-06-10 | web | Pinning exists; preset-active confusion | corroborated by A29 |
| A31 | DeepWiki: LibreChat DB models | deepwiki.com/danny-avila/LibreChat/7.1-database-models | 2026-06-10 | web | MongoDB/Mongoose; pinned-spec field name unconfirmed | single source (caveated) |
| A32 | Jan v0.6.9 changelog | jan.ai/changelog/2025-08-28-image-support | 2026-06-10 | web | "Favorite models" shipped; no UI detail | single source (caveated) |
| A33 | Jan manage-models docs | jan.ai/docs/desktop/manage-models | 2026-06-10 | web | Organized by source/quantization tier, not provider | corroborated by A32 |
| A34 | Jan data-folder docs | jan.ai/docs/desktop/data-folder | 2026-06-10 | web | Settings in local JSON files | corroborated by A32 |
| A35 | DeepWiki: Cherry Studio models | deepwiki.com/CherryHQ/cherry-studio/5.3-model-configuration-and-capabilities | 2026-06-10 | web | Provider-grouped UI; getModelUniqId composite {id, provider} | corroborated by A36 (see V8 caveat) |
| A36 | Cherry Studio ModelService.ts | github.com/CherryHQ/cherry-studio/.../ModelService.ts | 2026-06-10 | web | Composite-key implementation | corroborated by A35 (see V8 caveat) |
| A37 | Cherry Studio releases | github.com/CherryHQ/cherry-studio/releases | 2026-06-10 | web | No favorites changes v1.9.1–v1.9.11 | single source (caveated) |
| A38 | Chatbox issue #1540 | github.com/chatboxai/chatbox/issues/1540 | 2026-06-10 | web | Favorite-models proposal; not shipped | corroborated by A39 |
| A39 | Chatbox issue #2252 | github.com/chatboxai/chatbox/issues/2252 | 2026-06-10 | web | Two-section dropdown proposal (Preferred on top, star per row) | corroborated by A38 |
| A40 | DeepWiki: Chatbox local models | deepwiki.com/chatboxai/chatbox/4.6-local-model-integration | 2026-06-10 | web | settings.favoritedModels in localStorage | single source (caveated) |
| A41 | SillyTavern PR #5536 | github.com/SillyTavern/SillyTavern/pull/5536 | 2026-06-10 | web | Unified sort/group settings drawer across providers | corroborated by A42 |
| A42 | SillyTavern 1.13.5 notes | github.com/SillyTavern/SillyTavern/discussions/4660 | 2026-06-10 | web | Sort/group shipped in 1.13.5 | corroborated by A41 |
| A43 | SillyTavern connection profiles docs | docs.sillytavern.app/usage/core-concepts/connection-profiles/ | 2026-06-10 | web | Profiles = saved config snapshots, not per-model favorites | corroborated by A44 |
| A44 | SillyTavern issue #4565 | github.com/SillyTavern/SillyTavern/issues/4565 | 2026-06-10 | web | Better model selector request closed not-planned | corroborated by A43 |
| A45 | VS Code language models docs | code.visualstudio.com/docs/agent-customization/language-models | 2026-06-10 | web | Provider groups + hover pin + dedicated Pinned top section, stable order, model stays in group | corroborated by A46 |
| A46 | vscode-copilot-chat PR #1111 | github.com/microsoft/vscode-copilot-chat/pull/1111 | 2026-06-10 | web | BYOK models grouped into a category | corroborated by A45 (see V8 caveat) |
| A47 | Continue.dev model roles docs | docs.continue.dev/customize/model-roles/00-intro | 2026-06-10 | web | Role-based dropdowns; no grouping/favorites | corroborated by A48 |
| A48 | Continue.dev providers overview | docs.continue.dev/customize/model-providers/overview | 2026-06-10 | web | Picker reflects config.yaml order | corroborated by A47 |
| A49 | Open WebUI discussion #15449 | github.com/open-webui/open-webui/discussions/15449 | 2026-06-10 | web | Multi-model combination pinning request | single source (caveated) |
| A50 | BigAGI repo + changelog | github.com/enricoros/big-AGI | 2026-06-10 | web | No grouping/favorites evidence (negative finding) | single source (caveated) |
| A51 | LM Studio v0.4.0 changelog | lmstudio.ai/changelog/lmstudio-v0.4.0 | 2026-06-10 | web | Search/format filters; no favorites | corroborated by A52 |
| A52 | LM Studio v0.4.13 changelog | lmstudio.ai/changelog/lmstudio-v0.4.13 | 2026-06-10 | web | No picker changes | corroborated by A51 |
| A53 | Open WebUI issue #22578 | github.com/open-webui/open-webui/issues/22578 | 2026-06-10 | web | Model enable/disable state goes stale on catalog change | corroborated by A23 |
| A54 | embedding host live inventory | provided: curl http://100.90.172.55:8411/v1/models + /running |
2026-06-10 | provided | 39 models incl. deepseek-r1-qwen3-8b and 5 IDs duplicated on Sam-desktop; /running empty | corroborated by A56 (config matches) |
| A55 | Sam-desktop live inventory | provided: curl http://100.101.41.16:8401/v1/models + /running |
2026-06-10 | provided | 21 models; qwen3.6-35b-a3b-mxfp4 absent; nemotron-omni running via D:\llama-server | corroborated by A57 |
| A56 | embedding host SSH inventory | provided: ssh samkintop@100.90.172.55 (~/llama-swap/config.yaml, ~/llama.cpp, ~/models) |
2026-06-10 | provided | P104-tuned llama-swap config (ttl 1800, per-model llama-server cmds); llama.cpp source build | corroborated by A54 |
| A57 | Sam-desktop SSH inventory | provided: ssh samki@100.101.41.16 (dir D:) |
2026-06-10 | provided | D:\llama-server (b9591 CUDA), D:\llama-swap (v224), D:\models, D:\llama-sidecar | corroborated by A55 |
| A58 | Current env config | .env, apps/coder/.env.host |
n/a | codebase | LLAMA_SWAP_URL=http://100.101.41.16:8401; DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (both apps) | corroborated (read directly) |
| A59 | Models route | apps/server/src/routes/models.ts:14-56 |
n/a | codebase | GET /api/models fetches only LLAMA_SWAP_URL (+DeepSeek); flat untagged list | corroborated (read directly) |
| A60 | Inference provider/routing | apps/server/src/services/inference/provider.ts:1-163 |
n/a | codebase | resolveRoute: deepseek- prefix → cloud; LLAMA_SIDECAR_URL set → sidecar default for everything; else single swap; resolveModelEndpoint hardcodes LLAMA_SWAP_URL | corroborated (read directly) |
| A61 | BooChat model picker | apps/web/src/components/ModelPicker.tsx:14-133 |
n/a | codebase | Flat lazy list, no grouping/search/favorites; PATCHes session.model | corroborated (explorer + validator) |
| A62 | Provider snapshot contracts | packages/contracts/src/provider-snapshot.ts |
n/a | codebase | ProviderModel has no provider field; identity implicit in parent entry name | corroborated |
| A63 | Coder provider snapshot | apps/coder/src/services/provider-snapshot.ts:48-70,256-310 |
n/a | codebase | Prefixes single llama-swap list with llama-swap/; merges into boocode entry |
corroborated |
| A64 | Coder dispatcher prefixing | apps/coder/src/services/dispatcher.ts:1006-1011 |
n/a | codebase | Bare IDs get llama-swap/; slash-containing IDs pass through unchanged |
corroborated (validator-verified) |
| A65 | Model/settings persistence | apps/server/src/schema.sql:20,217-222,249; routes/settings.ts |
n/a | codebase | sessions.model NOT NULL, chats.model nullable, settings KV JSONB seeded with bare default_model | corroborated |
| A66 | Model context service | apps/server/src/services/model-context.ts:9,26-29,40-49,77-100 |
n/a | codebase | No-TTL positive cache keyed by raw model string; deepseek- guard returns static 131k; /upstream URL from single config | corroborated (validator-verified) |
| A67 | Compaction LLM calls | apps/server/src/services/compaction.ts:351-357,531 |
n/a | codebase | Summarization via resolveModelEndpoint → always LLAMA_SWAP_URL | corroborated (validator-verified) |
| A68 | Task model service | apps/server/src/services/task-model.ts:59-68 |
n/a | codebase | FAST_MODEL fallback chain against single endpoint (TASK_MODEL_URL escape hatch) | corroborated |
| A69 | Arena model calls | apps/coder/src/services/arena-model-call.ts:16-28; arena-analyzer.ts:90 |
n/a | codebase | Raw fetch to LLAMA_SWAP_URL, no routing abstraction | corroborated (validator-verified) |
| A70 | Coder composer prefs | apps/web/src/components/AgentComposerBar.tsx:33-52,118-196 |
n/a | codebase | CompactPicker flat lists; prefs in localStorage boocode.coder.agent-prefs |
corroborated |
| A71 | Model display naming | apps/web/src/lib/modelName.ts:6-32; MessageBubble.tsx:140-189 |
n/a | codebase | Display chips already strip llama-swap/-style prefixes |
corroborated |
| A72 | Coder provider config file | data/coder-providers.example.json |
n/a | codebase | Per-provider overrides exist; no baseUrl field — second endpoint unregistrable today | corroborated |
| A73 | Openspec conventions | openspec/README.md |
n/a | codebase | changes//{proposal,tasks,design}.md; lowercase-hyphenated slugs | corroborated (read directly) |
| A74 | Sidecar architecture notes | apps/server/CLAUDE.md (sidecar sections); /opt/forks/llama-sidecar/ |
n/a | codebase | llama-sidecar = Go per-agent llama-server pool on Sam-desktop; X-Agent-Flags header; boot guard ties llama_extra_args to LLAMA_SIDECAR_URL | corroborated by A60 |
A54/A55: Live host inventories — recommendation-bearing
- Link / location: provided: orchestrator-run
curlagainsthttp://100.90.172.55:8411andhttp://100.101.41.16:8401(/v1/models,/running) - Retrieved: 2026-06-10
- Trust class: provided (operator-owned infrastructure, independently re-checkable with the same commands)
- Summary: embedding serves 39 mostly-small models; Sam-desktop serves 21 mostly-large models. Five IDs (
granite-4.1-8b,negentropy-4.7-9b,qwen3.5-9b,qwen3.5-9b-deepseek-v4,qwopus3.5-9b-coder) appear on both — making composite keying mandatory, not stylistic. The configuredDEFAULT_MODELis absent from Sam-desktop's live list, proving ID churn. embedding'sdeepseek-r1-qwen3-8bcollides with thedeepseek-cloud-routing heuristic. Neither host populates llama-swap's optionalname/descriptionfields, so the UI must derive labels from IDs (asformatModelLabelalready does). - Evidence status: corroborated by A56/A57 (SSH-level configs match the served lists).
A60: provider.ts routing — recommendation-bearing
- Link / location:
apps/server/src/services/inference/provider.ts:90-157 - Retrieved: n/a
- Trust class: codebase (current-state anchor)
- Summary: The single point where all three routes (deepseek/sidecar/swap) resolve. Establishes that (a) BooCode already builds per-baseURL AI-SDK providers from a cache map — O1 slots into this with minimal new machinery; (b) the sidecar is the default route for everything when configured, which forces constraint 4; (c)
resolveModelEndpointis a second, parallel resolution path (compaction/task-model) that must change in lockstep. - Evidence status: corroborated (read directly by orchestrator and validator).
A13/A14: AI SDK provider registry pattern — recommendation-bearing
- Link / location: https://ai-sdk.dev/docs/ai-sdk-core/provider-management ; https://ai-sdk.dev/providers/openai-compatible-providers
- Retrieved: 2026-06-10
- Trust class: web
- Summary: The library BooCode already uses prescribes exactly O1's shape: one named
createOpenAICompatibleinstance per provider, registry-levelprovider:modelnamespacing, bare model IDs on the wire. Adopting O1 is convergence with the upstream idiom rather than a custom scheme. - Evidence status: corroborated (two official doc pages, consistent with LiteLLM's independent design A15/A16).
A45: VS Code model picker docs — recommendation-bearing (UX)
- Link / location: https://code.visualstudio.com/docs/agent-customization/language-models
- Retrieved: 2026-06-10
- Trust class: web
- Summary: Documents the shipped pattern this feature's dropdown adapts: provider-grouped list, hover-revealed pin, dedicated Pinned top section in stable insertion order, pinned models remaining in their provider group.
- Evidence status: corroborated by A46; code-level detail treated as color per V8.
A23/A27: Open WebUI pitfalls — recommendation-bearing (counter-evidence)
- Link / location: https://github.com/open-webui/open-webui/discussions/23656 ; https://github.com/open-webui/open-webui/discussions/4495
- Retrieved: 2026-06-10
- Trust class: web
- Summary: The two documented failure modes the design must avoid: bare-model-ID favorites becoming ambiguous across connections, and stale-favorite cleanup permanently destroying user preferences during transient backend downtime.
- Evidence status: corroborated by A21/A22/A53 (the surrounding docs and a second stale-state issue).