Files

indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).

2026-06-14 12:48:47 +00:00

42 KiB

Raw Blame History

Research: Integrating two named llama-swap providers ("Sam-desktop", "embedding") with provider-grouped model dropdowns and per-model favorites in BooChat and BooCoder

Question: BooCode currently talks to exactly one llama-swap endpoint. How should a second named provider ("embedding", 100.90.172.55:8411) be added alongside the renamed existing one ("Sam-desktop", 100.101.41.16:8401), integrated into both BooChat and BooCoder, with the model dropdown grouped per provider and a favorite button per model (Favorites section listed first)?

Evidence mode: strict (default — every recommendation-bearing claim is corroborated or explicitly caveated).

Summary

Both machines can be added to BooCode as named providers, and the right way is to give BooCode a small provider registry (a name and base URL per machine) and to store selected models as a "provider/model" pair instead of a bare name. Bare names cannot work here: five models exist on both machines under identical names today, and the configured default model has already drifted out of the live list once — so favorites and routing keyed by name alone would be ambiguous and fragile. The dropdown should follow the pattern proven in VS Code's model picker: a Favorites section on top, then one section per provider (Sam-desktop first, then embedding), a star on every row, favorited models staying visible in their provider section, and favorites that are hidden — never deleted — when a machine is offline.

The adversarial validation pass confirmed the direction but showed the change is wider than the obvious spots: chat compaction, context-window lookup, arena battles, the coder's opencode dispatch, and the sidecar routing default all silently assume a single endpoint and need the same provider-resolution change. Two extra hazards were found in the live data: a model on the embedding host literally named deepseek-r1-qwen3-8b trips BooCode's "starts with deepseek-" cloud-routing heuristic, and the always-on sidecar default route would swallow embedding-bound requests. The embedding host does not need its own llama-sidecar — but sidecar routing must become a Sam-desktop-only attribute.

Well-corroborated: live data from both hosts, direct code evidence, and multiple independent web sources agree; validation expanded the implementation scope but did not overturn the choice.

Confidence: High

Research Results

What exists today (codebase — current-state anchor)

BooCode's entire inference surface assumes one llama-swap endpoint, configured as LLAMA_SWAP_URL=http://100.101.41.16:8401 with DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (A58). The single-endpoint assumption is hard-coded in at least nine places:

GET /api/models fetches only {LLAMA_SWAP_URL}/v1/models (plus DeepSeek cloud when DEEPSEEK_API_KEY is set) and returns a flat ModelInfo[] with no provider tag (A59).
upstreamModel() routes by string heuristics: model IDs starting deepseek- go to the DeepSeek cloud API; agents with llama_extra_args go to the sidecar; and when LLAMA_SIDECAR_URL is configured at all — which it is in docker-compose — every remaining request routes through the sidecar by default, falling back to llama-swap only when no sidecar is configured (A60). The provider for each base URL is a cached AI-SDK createOpenAICompatible instance.
resolveModelEndpoint() (used by compaction and task-model for non-streaming calls) returns LLAMA_SWAP_URL for every non-DeepSeek model (A60, A67).
model-context.ts fetches {LLAMA_SWAP_URL}/upstream/<model>/props for context windows, with a no-TTL positive cache keyed by the raw model string, and a deepseek- prefix guard that short-circuits to a static 131,072 context without calling any upstream (A66).
task-model.ts (auto-naming, summaries) falls back through FAST_MODEL → chat model → DEFAULT_MODEL against the single URL (A68).
Arena battles call {LLAMA_SWAP_URL}/v1/chat/completions directly with no routing abstraction at all (A69).
The coder's provider snapshot fetches the single llama-swap list and prefixes every ID with llama-swap/ (A63); its dispatcher prefixes any bare (slash-less) model ID with llama-swap/ before opencode dispatch, and passes any ID already containing / through unchanged (A64).
Model IDs persist as bare strings: sessions.model TEXT NOT NULL, chats.model TEXT nullable, validated only as a 1–200-char string (A65).
The BooChat dropdown (ModelPicker.tsx) and the BooCoder picker (CompactPicker inside AgentComposerBar.tsx) are flat lists with no grouping, search, or favorites; the coder picker persists per-provider preferences in browser localStorage, while BooChat model choice is server-persisted on the session row (A61, A70). Display code already strips llama-swap/-style prefixes when rendering model chips (A71). No favorites/pinning mechanism exists anywhere; the settings table is a key-value JSONB store currently holding default_model and theme keys (A65).

The coder's runtime provider config (data/coder-providers.json) has no baseUrl field — there is no way to register a second llama-swap endpoint today (A72).

What the two hosts actually serve (provided material, retrieved live 2026-06-10)

embedding (100.90.172.55:8411, Linux, P104-100 8GB Pascal GPU): 39 models, skewed small — gemma-3-270m through gemma-4-12b, the LFM2.5 family, granite-4.1-3b/8b, qwen3.5-0.8b/4b/9b, qwopus3.5 family, deepseek-r1-qwen3-8b, a reranker, extraction models (A54). Its llama-swap config is hand-tuned per model (flash-attn/KV-quant choices for Pascal, ttl 1800), with llama.cpp built from source on the box (A56).
Sam-desktop (100.101.41.16:8401, Windows): 21 models, skewed large — qwen3.6-35b-a3b/27b, qwopus3.6 family, granite-4.1-30b, mellum2-12b, nemotron-cascade-2-30b-a3b, north-mini-code, etc. Served by D:\llama-server (llama.cpp CUDA build b9591) behind D:\llama-swap (llama-swap v224), models in D:\models; a D:\llama-sidecar directory backs the existing sidecar at :8402 (A55, A57).

Three load-bearing facts fall out of the live inventories:

Five model IDs exist on both hosts: granite-4.1-8b, negentropy-4.7-9b, qwen3.5-9b, qwen3.5-9b-deepseek-v4, qwopus3.5-9b-coder (A54, A55). Bare-ID favorites or routing are therefore ambiguous from day one.
The configured DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 is not in Sam-desktop's current model list (closest: qwen3.6-35b-a3b) — model IDs already churn in practice, so favorites must tolerate stale references (A55, A58).
deepseek-r1-qwen3-8b on the embedding host collides with BooCode's deepseek- heuristics: with DEEPSEEK_API_KEY set it would be routed to the DeepSeek cloud API, and the context-window guard returns a fake 131k context on the name prefix alone regardless (A54, A60, A66).

How llama-swap identifies models (web, corroborated)

llama-swap model IDs are exactly the YAML keys in its config.yaml; /v1/models can additionally carry optional per-model name, description, and arbitrary metadata from config — fields neither of Sam's hosts currently populates (A1–A4, A54, A55). llama-swap has no instance-identity field: two instances are distinguishable only by host:port (A3). /running reports load state per model (A1, A12). Peer federation exists (one llama-swap aggregating another), but peer-served models surface as "peer-name: model-name" IDs [single-source: A6] and same-ID collisions resolve silently to the lexicographically-first peer (A5) — and, decisive without any web source, BooCode would still see one flat list with no native grouping while the two hosts' uptime becomes coupled. Standalone llama.cpp llama-server defaults its /v1/models ID to the model file path unless --alias is set (A8, A9) — relevant only if a host ever bypasses llama-swap.

How mature clients solve exactly this (web, corroborated)

Every major OpenAI-compatible client library handles multiple same-protocol providers with separate named provider instances, each with its own baseURL, namespaced in the client's registry as provider:model / provider/model — the model ID actually sent on the wire to each backend stays the bare upstream ID (Vercel AI SDK provider registry: A13, A14; LiteLLM model_list: A15, A16). BooCode already uses the AI SDK's createOpenAICompatible (A60) and the coder already namespaces with a llama-swap/ prefix (A63, A64), so this pattern is an extension of existing conventions, not a new idiom.

The closest shipped implementation of the requested UX is VS Code's model picker: models grouped by provider, a pin icon revealed on hover, pinned models lifted into a dedicated top section in stable insertion order, while remaining visible in their provider group (display copy, not move) (A45, A46). Cherry Studio independently demonstrates the key-collision lesson: its model identity is the composite {id, provider} precisely so two providers serving the same model name don't collide (A35, A36) [third-party code reference; unverifiable from here — supporting color only, see V8]. Open WebUI documents the two pitfalls to avoid: favorites keyed by bare model ID become ambiguous the moment two connections serve the same name (A27), and its stale-pin cleanup permanently deletes pins when a backend is temporarily down (A23) — the correct behavior is to hide unavailable favorites and restore them when the host returns. LibreChat groups via admin-configured YAML and added pinning in v0.8.5 (A28, A29). Jan, Chatbox, SillyTavern, Continue.dev, BigAGI, and LM Studio offer weaker or no equivalents (A32–A34, A38–A44, A47–A52) — none contradicts the VS Code pattern.

Does embedding need a llama-sidecar? No.

The llama-sidecar is a Go daemon on Sam-desktop providing a per-agent llama-server process pool so agents can carry llama_extra_args (cache quant, spec decoding, slot save) injected via an X-Agent-Flags header (A60, A74). The embedding host needs none of that: its per-model tuning is baked directly into its llama-swap config.yaml (A56), and no per-agent flag injection applies to it. However, resolveRoute currently makes the sidecar the default route for all non-DeepSeek inference whenever LLAMA_SIDECAR_URL is set (A60) — so under the multi-provider design, sidecar routing must become an attribute of the Sam-desktop provider entry (e.g. optional sidecarUrl per provider), not a global default; otherwise requests for embedding-hosted models would be sent to a sidecar that only manages Sam-desktop processes.

Openspec conventions for the follow-up plan (codebase)

Per-batch docs land in openspec/changes/<slug>/ with proposal.md (why + scope), tasks.md (numbered/checkbox action list), and optional design.md (architecture/data-model decisions); slugs are lowercase-hyphenated from the batch title (A73). This feature is a natural three-file batch — the provider registry + routing is design-heavy, so design.md is warranted.

Options to Consider

O1: Named provider registry with composite model IDs (`<provider>/<model>`)

What it is: BooCode config gains a provider list ({ name, baseUrl, sidecarUrl? } per entry — "sam-desktop" and "embedding"). Models are stored and selected as sam-desktop/qwen3.6-35b-a3b, embedding/gemma-4-12b. /api/models returns provider-tagged groups; one routing resolver (provider prefix → baseURL, bare wire ID) replaces every LLAMA_SWAP_URL hardcode; bare legacy IDs fall back to the default provider (sam-desktop). Favorites, caches, and attribution all key on the composite ID.
Trade-offs: Touches every call site that assumes one endpoint (the nine sites above — see Validation for the full list); needs a deliberate legacy-bare-ID fallback for existing session/chat rows and the seeded default_model; the coder's opencode namespace (llama-swap/) needs an explicit translation rule. In exchange: no DB schema change for model columns, no llama-swap config changes on either host, matches the AI-SDK idiom BooCode already uses and the coder's existing prefix convention, and makes the deepseek- heuristic unnecessary for prefixed IDs.
Rests on: (A13, A14, A15, A16) for the pattern; (A54, A55) for the collision necessity; (A60, A63, A64) for fit with existing code.
Evidence status: corroborated.

O2: Bare model IDs plus a separate `provider` field everywhere

What it is: Keep model strings as-is and add a provider column/field through sessions, chats, WS frames, ModelInfo, ProviderModel, and every read path.
Trade-offs: Avoids string munging and display-time prefix stripping, but is strictly more invasive: two schema migrations, a WsFrameSchema change rebuilt through @boocode/contracts, and every consumer updated in lockstep — while favorites still need a composite key anyway. Higher blast radius for the same outcome.
Rests on: (A65, A62) for the touched surfaces.
Evidence status: corroborated (codebase-derived).

O3: llama-swap peer federation (Sam-desktop aggregates embedding as a peer)

What it is: Configure embedding as a peers: entry in Sam-desktop's llama-swap; BooCode keeps a single endpoint.
Trade-offs: Rejected on codebase-observable grounds: BooCode would still see one flat list (no native named grouping — the feature's whole point), the two hosts' availability becomes coupled, and it requires operational changes on a host outside this repo. Additionally, peer-served model IDs surface as "peer-name: model-name" [single-source: A6] with silent first-lexicographic collision resolution (A5).
Rests on: (A5, A6) plus codebase observation (A59, A61).
Evidence status: rejection corroborated by codebase facts; the peer ID-format detail is single-source (caveated) and not load-bearing.

O4: External aggregator proxy (LiteLLM) in front of both hosts

What it is: A LiteLLM proxy with a model_list mapping unique aliases to each host; BooCode keeps one endpoint.
Trade-offs: Proven pattern (A15, A16) but adds a third always-on service with a manually-maintained catalog (no auto-discovery from /v1/models), an extra network hop, and still no provider grouping signal unless encoded in alias naming conventions. Overweight for a single-user self-hosted system.
Rests on: (A15, A16).
Evidence status: corroborated.

Sub-decision — favorites persistence

O5a: Server-side, in the settings table (e.g. favorite_models: string[] of composite IDs). Survives browsers/devices — and multi-device use is real here (the repo's own docs describe side-by-side iPhone debugging), matching how BooChat model choice is already server-persisted on the session row. Costs a PATCH per star toggle and needs a "hide stale, never delete" rule (A23) plus acceptance that stale composite keys linger until manually unfavorited.
O5b: Browser localStorage, extending the coder's existing boocode.coder.agent-prefs pattern (A70). Zero API surface, but per-device, per-browser, and split across the two UIs.
Evidence status: both corroborated; the cross-device argument for O5a is codebase-derived inference from documented usage, not a measured requirement.

Recommendation

Recommendation: O1 — named provider registry with <provider>/<model> composite IDs — combined with the VS Code-pattern dropdown (Favorites on top in stable insertion order, then Sam-desktop's models, then embedding's; star toggle per row; favorited models remain listed in their provider group) and O5a server-side favorites keyed by composite ID. Non-negotiable design constraints carried in from validation:
1. Prefix-strip only at wire-URL construction; caches (notably model-context.ts's no-TTL positive cache) key on the full composite ID, or the five name-collided models cross-pollute context windows between hosts (V7).
2. The coder dispatcher must translate composite prefixes for opencode (map the default provider to the existing llama-swap/ namespace, or register new opencode providers) — the current pass-through of any slash-containing ID would hand opencode an unknown provider key (V1).
3. Every single-endpoint call site is in scope: provider.ts (upstreamModel + resolveModelEndpoint), models.ts, model-context.ts (including its deepseek- static-context guard), compaction.ts, task-model.ts, arena-model-call.ts (+ arena callers, coder-side config), coder provider-snapshot.ts, coder dispatcher.ts (V2–V4, V9).
4. Sidecar routing becomes a Sam-desktop provider attribute, not the global default route — embedding needs no sidecar (A60, A74; post-validation verification).
5. Bare legacy IDs (existing rows, seeded default_model) resolve to the default provider indefinitely — new sessions inherit a bare seeded default until settings are migrated, so this is a permanent fallback, not a one-time migration (V2).
6. Favorites that reference unavailable models are hidden, never auto-deleted (A23).
Evidence basis: The option choice rests on corroborated evidence throughout: the multi-provider client pattern (A13–A16), the live collision and churn data from both hosts (A54, A55, A58 — provided material, independently re-checkable), and codebase fit (A60, A63, A64). The UX pattern rests on corroborated documentation (A45, A46) with the Open WebUI pitfalls as corroborated counter-evidence (A23, A27); the Cherry Studio and VS Code code-level references are unverifiable third-party color (V8) and nothing rests on them alone. The single-source peer-ID format (A6) supports only the rejection of O3, which stands independently on codebase facts. The cross-device justification for O5a is codebase-derived inference (documented multi-device usage), explicitly not measured evidence.

Validation

Adversarial validation attacked the evidence, framing, recommendation, and gathering integrity. Findings (condensed; all code-verified by the validator in this repo):

V1: "O1 extends the coder's prefix convention" was overstated

Strategy: Challenge the Recommendation
Investigation: dispatcher.ts:1006-1011, coder CLAUDE.md, provider-snapshot.ts:66-72.
Result: Refuted as originally framed — a stored sam-desktop/<model> passes the dispatcher's slash-check unchanged and reaches opencode as an unknown provider key; llama-swap/ is hardcoded in ≥4 coder locations.
Impact: Recommendation now mandates an explicit opencode namespace-translation rule (constraint 2).

V2: The bare-ID legacy fallback was asserted, not designed

Strategy: Challenge the Recommendation
Investigation: provider.ts:115-135, stream-phase.ts:110, sessions.ts:113-117, schema.sql:222, model-context.ts:77.
Result: Partially refuted — architecturally plausible but unimplemented; prefixed IDs would 404 the /upstream/<model>/props fetch and break context/compaction display; the seeded bare default_model makes the fallback permanent, not migratory.
Impact: Constraints 1, 3, 5 added.

V3: The `deepseek-` hazard is wider than routing

Strategy: Challenge the Evidence
Investigation: model-context.ts:40-49, provider.ts:98, compaction.ts:531.
Result: Confirmed with added scope — the context guard fires on the name prefix alone, returning a fake 131k context for embedding's deepseek-r1-qwen3-8b even after routing is fixed.
Impact: model-context.ts guard added to the touch-list (constraint 3).

V4: `compaction.ts` is a missed hardcode site

Strategy: Challenge the Evidence
Investigation: compaction.ts:351-357 → resolveModelEndpoint (provider.ts:139-157).
Result: Refuted the original C9 list as incomplete — compaction summarization calls would go to the wrong host for embedding models.
Impact: Added to the touch-list (A67, constraint 3).

V5: Server-side favorites needed justification against the coder's localStorage pattern

Strategy: Challenge the Assumptions
Investigation: AgentComposerBar.tsx:33-52, routes/settings.ts, root CLAUDE.md auth model.
Result: Partially refuted — the Open WebUI bug distinguishes auto-delete vs hide, not server vs client storage; the original justification conflated the two.
Impact: O5a/O5b reframed as an explicit sub-decision; O5a retained on the cross-device argument, labeled as inference.

V6: O3's rejection over-relied on a single-source claim

Strategy: Challenge the Evidence-Gathering Integrity
Result: Confirmed with a provenance note — O3 is independently rejectable from codebase facts; the stale GitHub issue is demoted to supporting color.
Impact: O3 rejection rewritten to lead with codebase-observable reasons.

V7: Composite IDs + naive prefix-stripping would poison the no-TTL context cache

Strategy: Challenge the Recommendation
Investigation: model-context.ts:9, 26-29, 77-100; the five cross-host duplicate IDs.
Result: Refuted the unstated design — stripping before the cache key shares entries across providers with different real context windows, permanently until restart.
Impact: Constraint 1 (composite cache key, strip only at URL construction) — the most subtle required design rule.

V8: Third-party code references (Cherry Studio, VS Code PR) are unverifiable

Strategy: Challenge the Evidence-Gathering Integrity
Result: Partially refuted their evidentiary weight — retained as color; the composite-key argument stands on BooCode's own conventions and the live collision data.
Impact: Evidence basis re-worded; nothing rests on those references alone.

V9: Arena is the most exposed hardcode

Strategy: Challenge the Evidence
Investigation: arena-model-call.ts:16-28, arena-analyzer.ts:90.
Result: Confirmed with elevated severity — raw fetch, no abstraction, lives in apps/coder with its own config type (cannot reuse the server's resolver as-is).
Impact: Listed as separate coder-side scope (constraint 3).

Adjustments Made

The recommendation survived but was rewritten: the implementation constraints (composite cache keys, opencode namespace translation, the full nine-site touch-list, permanent bare-ID fallback, hidden-not-deleted favorites) were folded into the Recommendation itself; O3's rejection was re-grounded in codebase facts; the favorites-persistence choice was reframed as an explicit sub-decision; unverifiable third-party code references were demoted to supporting color. Post-validation, the orchestrator additionally verified in provider.ts that the sidecar is the default route whenever LLAMA_SIDECAR_URL is set — adding constraint 4 (sidecar becomes a per-provider attribute; embedding needs none).

Confidence Assessment

Confidence: High — for the option choice. The validator rated the pre-adjustment synthesis Medium because the implementation scope was understated; that scope is now enumerated above, and no finding challenged the direction (its own words: "architecturally sound given the existing llama-swap/ convention").
Remaining Risks: (1) The opencode-side translation (V1) may also require host-side ~/.config/opencode/opencode.json changes — outside this repo. (2) Stale favorite keys accumulate in settings with no cleanup mechanism by design (hide-don't-delete); acceptable for single-user but unbounded. (3) The exact /running JSON envelope and llama-swap peer aggregation details remain single-source — neither is load-bearing. (4) The five duplicate-ID models make any partial rollout (one call site migrated, another not) actively dangerous; the routing resolver should land as one batch.

Sources

ID	Source	Link / location	Retrieved	Trust class	Summary (one line)	Evidence status
A1	llama-swap README	github.com/mostlygeek/llama-swap	2026-06-10	web	Proxy hot-swapping local inference servers; documents /v1/models, /running, /upstream, /health; v224 current	corroborated by A2, A3, A12
A2	llama-swap configuration.md	github.com/mostlygeek/llama-swap/blob/main/docs/configuration.md	2026-06-10	web	Model IDs are YAML keys; per-model name/description/aliases/metadata/ttl/useModelName; includeAliasesInList	corroborated by A3, A4
A3	llama-swap config-schema.json	github.com/mostlygeek/llama-swap/blob/main/config-schema.json	2026-06-10	web	Authoritative config schema; peers section; no instance-identity field at any level	corroborated by A2, A4
A4	llama-swap config.example.yaml	github.com/mostlygeek/llama-swap/blob/main/config.example.yaml	2026-06-10	web	Annotated example: aliases, useModelName, metadata, groups, peers	corroborated by A2, A3
A5	DeepWiki: llama-swap peers	deepwiki.com/mostlygeek/llama-swap/3.7-peer-configuration	2026-06-10	web	Duplicate peer model IDs route to first-lexicographic peer with only a warning	corroborated by A6 (collision); single source on aggregation detail
A6	llama-swap issue #539	github.com/mostlygeek/llama-swap/issues/539	2026-06-10	web	Peer models surface as "peer-name: model-name" IDs; stale, unresolved	single source (caveated)
A7	llama-swap issue #538	github.com/mostlygeek/llama-swap/issues/538	2026-06-10	web	Aliases hidden from /v1/models unless includeAliasesInList	corroborated by A2, A3
A8	llama.cpp server README	github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md	2026-06-10	web	/v1/models id defaults to file path; --alias overrides; meta block fields	corroborated by A9, A10
A9	llama.cpp discussion #8547	github.com/ggml-org/llama.cpp/discussions/8547	2026-06-10	web	Confirms file-path default id; --override-kv doesn't change API id	corroborated by A8
A10	llama.cpp issue #17860	github.com/ggml-org/llama.cpp/issues/17860	2026-06-10	web	Only one --alias per llama-server today	corroborated by A8
A11	LM4eu/llama-swap Go pkg docs	pkg.go.dev/github.com/LM4eu/llama-swap/proxy	2026-06-10	web	Model struct {Id, Name, Description, State, Unlisted}; fork, not upstream	single source (caveated)
A12	glukhov.org llama-swap quickstart	glukhov.org/llm-hosting/llama-swap/	2026-06-10	web	/running state values; alias listing behavior	corroborated by A1, A2
A13	Vercel AI SDK provider management	ai-sdk.dev/docs/ai-sdk-core/provider-management	2026-06-10	web	Registry namespaces models as providerId:modelId; per-provider baseURL	corroborated by A14
A14	Vercel AI SDK OpenAI-compatible providers	ai-sdk.dev/providers/openai-compatible-providers	2026-06-10	web	createOpenAICompatible takes name+baseURL per provider; wire model ID unchanged	corroborated by A13
A15	LiteLLM OpenAI-compatible docs	docs.litellm.ai/docs/providers/openai_compatible	2026-06-10	web	Per-entry api_base; aliasing decouples client name from upstream name	corroborated by A16
A16	McDermott: Centralizing LLMs with LiteLLM	robert-mcdermott.medium.com/...9874563f3062	2026-06-10	web	model_list with unique model_name per upstream resolves collisions	corroborated by A15
A17	DeepWiki: llama-swap groups	deepwiki.com/mostlygeek/llama-swap/3.4-groups-and-swapping-policies	2026-06-10	web	Groups/matrix control concurrency, not model IDs	corroborated by A2–A4
A18	llama-swap releases	github.com/mostlygeek/llama-swap/releases	2026-06-10	web	v219–v224 changed routing/perf, not /v1/models schema	single source (caveated)
A19	Open WebUI discussion #3443	github.com/open-webui/open-webui/discussions/3443	2026-06-10	web	Pin-in-dropdown feature request; drag-reorder workaround breaks	corroborated by A21, A23
A20	Open WebUI discussion #5902	github.com/open-webui/open-webui/discussions/5902	2026-06-10	web	Filtering 70+ models; whitelist vs hide patterns	corroborated by A19
A21	Open WebUI env config reference	docs.openwebui.com/reference/env-configuration/	2026-06-10	web	DEFAULT_PINNED_MODELS; settings.pinnedModels sorts pinned to top	corroborated by A22, A23
A22	Open WebUI database schema	docs.openwebui.com/reference/database-schema/	2026-06-10	web	Pins live in user.settings JSON, keyed by bare model ID	corroborated by A21
A23	Open WebUI discussion #23656	github.com/open-webui/open-webui/discussions/23656	2026-06-10	web	Stale-pin cleanup permanently deletes pins during backend downtime	corroborated by A21, A53
A24	Open WebUI discussion #14854	github.com/open-webui/open-webui/discussions/14854	2026-06-10	web	Unpin buried in three-dot menu; discoverability failure	corroborated by A21
A25	Open WebUI issue #19183	github.com/open-webui/open-webui/issues/19183	2026-06-10	web	Local/External/All tabs + tag chips + Fuse.js search in selector	corroborated by A26
A26	Open WebUI discussion #21502	github.com/open-webui/open-webui/discussions/21502	2026-06-10	web	Flat select unusable at OpenRouter scale; optgroup/search proposals	corroborated by A25
A27	Open WebUI discussion #4495	github.com/open-webui/open-webui/discussions/4495	2026-06-10	web	Same-named models from two connections are indistinguishable (bare-ID failure)	corroborated by A25, A26
A28	LibreChat model specs docs	librechat.ai/docs/configuration/librechat_yaml/object_structure/model_specs	2026-06-10	web	Admin YAML `group` field creates named collapsible sections	corroborated by A29
A29	LibreChat v0.8.5 changelogs	librechat.ai/changelog/v0.8.5	2026-06-10	web	Pin support for model specs added (PR #11219)	corroborated by A30; persistence detail single-source
A30	LibreChat discussion #11044	github.com/danny-avila/LibreChat/discussions/11044	2026-06-10	web	Pinning exists; preset-active confusion	corroborated by A29
A31	DeepWiki: LibreChat DB models	deepwiki.com/danny-avila/LibreChat/7.1-database-models	2026-06-10	web	MongoDB/Mongoose; pinned-spec field name unconfirmed	single source (caveated)
A32	Jan v0.6.9 changelog	jan.ai/changelog/2025-08-28-image-support	2026-06-10	web	"Favorite models" shipped; no UI detail	single source (caveated)
A33	Jan manage-models docs	jan.ai/docs/desktop/manage-models	2026-06-10	web	Organized by source/quantization tier, not provider	corroborated by A32
A34	Jan data-folder docs	jan.ai/docs/desktop/data-folder	2026-06-10	web	Settings in local JSON files	corroborated by A32
A35	DeepWiki: Cherry Studio models	deepwiki.com/CherryHQ/cherry-studio/5.3-model-configuration-and-capabilities	2026-06-10	web	Provider-grouped UI; getModelUniqId composite {id, provider}	corroborated by A36 (see V8 caveat)
A36	Cherry Studio ModelService.ts	github.com/CherryHQ/cherry-studio/.../ModelService.ts	2026-06-10	web	Composite-key implementation	corroborated by A35 (see V8 caveat)
A37	Cherry Studio releases	github.com/CherryHQ/cherry-studio/releases	2026-06-10	web	No favorites changes v1.9.1–v1.9.11	single source (caveated)
A38	Chatbox issue #1540	github.com/chatboxai/chatbox/issues/1540	2026-06-10	web	Favorite-models proposal; not shipped	corroborated by A39
A39	Chatbox issue #2252	github.com/chatboxai/chatbox/issues/2252	2026-06-10	web	Two-section dropdown proposal (Preferred on top, star per row)	corroborated by A38
A40	DeepWiki: Chatbox local models	deepwiki.com/chatboxai/chatbox/4.6-local-model-integration	2026-06-10	web	settings.favoritedModels in localStorage	single source (caveated)
A41	SillyTavern PR #5536	github.com/SillyTavern/SillyTavern/pull/5536	2026-06-10	web	Unified sort/group settings drawer across providers	corroborated by A42
A42	SillyTavern 1.13.5 notes	github.com/SillyTavern/SillyTavern/discussions/4660	2026-06-10	web	Sort/group shipped in 1.13.5	corroborated by A41
A43	SillyTavern connection profiles docs	docs.sillytavern.app/usage/core-concepts/connection-profiles/	2026-06-10	web	Profiles = saved config snapshots, not per-model favorites	corroborated by A44
A44	SillyTavern issue #4565	github.com/SillyTavern/SillyTavern/issues/4565	2026-06-10	web	Better model selector request closed not-planned	corroborated by A43
A45	VS Code language models docs	code.visualstudio.com/docs/agent-customization/language-models	2026-06-10	web	Provider groups + hover pin + dedicated Pinned top section, stable order, model stays in group	corroborated by A46
A46	vscode-copilot-chat PR #1111	github.com/microsoft/vscode-copilot-chat/pull/1111	2026-06-10	web	BYOK models grouped into a category	corroborated by A45 (see V8 caveat)
A47	Continue.dev model roles docs	docs.continue.dev/customize/model-roles/00-intro	2026-06-10	web	Role-based dropdowns; no grouping/favorites	corroborated by A48
A48	Continue.dev providers overview	docs.continue.dev/customize/model-providers/overview	2026-06-10	web	Picker reflects config.yaml order	corroborated by A47
A49	Open WebUI discussion #15449	github.com/open-webui/open-webui/discussions/15449	2026-06-10	web	Multi-model combination pinning request	single source (caveated)
A50	BigAGI repo + changelog	github.com/enricoros/big-AGI	2026-06-10	web	No grouping/favorites evidence (negative finding)	single source (caveated)
A51	LM Studio v0.4.0 changelog	lmstudio.ai/changelog/lmstudio-v0.4.0	2026-06-10	web	Search/format filters; no favorites	corroborated by A52
A52	LM Studio v0.4.13 changelog	lmstudio.ai/changelog/lmstudio-v0.4.13	2026-06-10	web	No picker changes	corroborated by A51
A53	Open WebUI issue #22578	github.com/open-webui/open-webui/issues/22578	2026-06-10	web	Model enable/disable state goes stale on catalog change	corroborated by A23
A54	embedding host live inventory	provided: `curl http://100.90.172.55:8411/v1/models` + `/running`	2026-06-10	provided	39 models incl. deepseek-r1-qwen3-8b and 5 IDs duplicated on Sam-desktop; /running empty	corroborated by A56 (config matches)
A55	Sam-desktop live inventory	provided: `curl http://100.101.41.16:8401/v1/models` + `/running`	2026-06-10	provided	21 models; qwen3.6-35b-a3b-mxfp4 absent; nemotron-omni running via D:\llama-server	corroborated by A57
A56	embedding host SSH inventory	provided: `ssh samkintop@100.90.172.55` (~/llama-swap/config.yaml, ~/llama.cpp, ~/models)	2026-06-10	provided	P104-tuned llama-swap config (ttl 1800, per-model llama-server cmds); llama.cpp source build	corroborated by A54
A57	Sam-desktop SSH inventory	provided: `ssh samki@100.101.41.16` (dir D:)	2026-06-10	provided	D:\llama-server (b9591 CUDA), D:\llama-swap (v224), D:\models, D:\llama-sidecar	corroborated by A55
A58	Current env config	`.env`, `apps/coder/.env.host`	n/a	codebase	LLAMA_SWAP_URL=http://100.101.41.16:8401; DEFAULT_MODEL=qwen3.6-35b-a3b-mxfp4 (both apps)	corroborated (read directly)
A59	Models route	`apps/server/src/routes/models.ts:14-56`	n/a	codebase	GET /api/models fetches only LLAMA_SWAP_URL (+DeepSeek); flat untagged list	corroborated (read directly)
A60	Inference provider/routing	`apps/server/src/services/inference/provider.ts:1-163`	n/a	codebase	resolveRoute: deepseek- prefix → cloud; LLAMA_SIDECAR_URL set → sidecar default for everything; else single swap; resolveModelEndpoint hardcodes LLAMA_SWAP_URL	corroborated (read directly)
A61	BooChat model picker	`apps/web/src/components/ModelPicker.tsx:14-133`	n/a	codebase	Flat lazy list, no grouping/search/favorites; PATCHes session.model	corroborated (explorer + validator)
A62	Provider snapshot contracts	`packages/contracts/src/provider-snapshot.ts`	n/a	codebase	ProviderModel has no provider field; identity implicit in parent entry name	corroborated
A63	Coder provider snapshot	`apps/coder/src/services/provider-snapshot.ts:48-70,256-310`	n/a	codebase	Prefixes single llama-swap list with `llama-swap/`; merges into boocode entry	corroborated
A64	Coder dispatcher prefixing	`apps/coder/src/services/dispatcher.ts:1006-1011`	n/a	codebase	Bare IDs get `llama-swap/`; slash-containing IDs pass through unchanged	corroborated (validator-verified)
A65	Model/settings persistence	`apps/server/src/schema.sql:20,217-222,249`; `routes/settings.ts`	n/a	codebase	sessions.model NOT NULL, chats.model nullable, settings KV JSONB seeded with bare default_model	corroborated
A66	Model context service	`apps/server/src/services/model-context.ts:9,26-29,40-49,77-100`	n/a	codebase	No-TTL positive cache keyed by raw model string; deepseek- guard returns static 131k; /upstream URL from single config	corroborated (validator-verified)
A67	Compaction LLM calls	`apps/server/src/services/compaction.ts:351-357,531`	n/a	codebase	Summarization via resolveModelEndpoint → always LLAMA_SWAP_URL	corroborated (validator-verified)
A68	Task model service	`apps/server/src/services/task-model.ts:59-68`	n/a	codebase	FAST_MODEL fallback chain against single endpoint (TASK_MODEL_URL escape hatch)	corroborated
A69	Arena model calls	`apps/coder/src/services/arena-model-call.ts:16-28`; `arena-analyzer.ts:90`	n/a	codebase	Raw fetch to LLAMA_SWAP_URL, no routing abstraction	corroborated (validator-verified)
A70	Coder composer prefs	`apps/web/src/components/AgentComposerBar.tsx:33-52,118-196`	n/a	codebase	CompactPicker flat lists; prefs in localStorage `boocode.coder.agent-prefs`	corroborated
A71	Model display naming	`apps/web/src/lib/modelName.ts:6-32`; `MessageBubble.tsx:140-189`	n/a	codebase	Display chips already strip `llama-swap/`-style prefixes	corroborated
A72	Coder provider config file	`data/coder-providers.example.json`	n/a	codebase	Per-provider overrides exist; no baseUrl field — second endpoint unregistrable today	corroborated
A73	Openspec conventions	`openspec/README.md`	n/a	codebase	changes//{proposal,tasks,design}.md; lowercase-hyphenated slugs	corroborated (read directly)
A74	Sidecar architecture notes	`apps/server/CLAUDE.md` (sidecar sections); `/opt/forks/llama-sidecar/`	n/a	codebase	llama-sidecar = Go per-agent llama-server pool on Sam-desktop; X-Agent-Flags header; boot guard ties llama_extra_args to LLAMA_SIDECAR_URL	corroborated by A60

A54/A55: Live host inventories — recommendation-bearing

Link / location: provided: orchestrator-run curl against http://100.90.172.55:8411 and http://100.101.41.16:8401 (/v1/models, /running)
Retrieved: 2026-06-10
Trust class: provided (operator-owned infrastructure, independently re-checkable with the same commands)
Summary: embedding serves 39 mostly-small models; Sam-desktop serves 21 mostly-large models. Five IDs (granite-4.1-8b, negentropy-4.7-9b, qwen3.5-9b, qwen3.5-9b-deepseek-v4, qwopus3.5-9b-coder) appear on both — making composite keying mandatory, not stylistic. The configured DEFAULT_MODEL is absent from Sam-desktop's live list, proving ID churn. embedding's deepseek-r1-qwen3-8b collides with the deepseek- cloud-routing heuristic. Neither host populates llama-swap's optional name/description fields, so the UI must derive labels from IDs (as formatModelLabel already does).
Evidence status: corroborated by A56/A57 (SSH-level configs match the served lists).

A60: `provider.ts` routing — recommendation-bearing

Link / location: apps/server/src/services/inference/provider.ts:90-157
Retrieved: n/a
Trust class: codebase (current-state anchor)
Summary: The single point where all three routes (deepseek/sidecar/swap) resolve. Establishes that (a) BooCode already builds per-baseURL AI-SDK providers from a cache map — O1 slots into this with minimal new machinery; (b) the sidecar is the default route for everything when configured, which forces constraint 4; (c) resolveModelEndpoint is a second, parallel resolution path (compaction/task-model) that must change in lockstep.
Evidence status: corroborated (read directly by orchestrator and validator).

A13/A14: AI SDK provider registry pattern — recommendation-bearing

Link / location: https://ai-sdk.dev/docs/ai-sdk-core/provider-management ; https://ai-sdk.dev/providers/openai-compatible-providers
Retrieved: 2026-06-10
Trust class: web
Summary: The library BooCode already uses prescribes exactly O1's shape: one named createOpenAICompatible instance per provider, registry-level provider:model namespacing, bare model IDs on the wire. Adopting O1 is convergence with the upstream idiom rather than a custom scheme.
Evidence status: corroborated (two official doc pages, consistent with LiteLLM's independent design A15/A16).

A45: VS Code model picker docs — recommendation-bearing (UX)

Link / location: https://code.visualstudio.com/docs/agent-customization/language-models
Retrieved: 2026-06-10
Trust class: web
Summary: Documents the shipped pattern this feature's dropdown adapts: provider-grouped list, hover-revealed pin, dedicated Pinned top section in stable insertion order, pinned models remaining in their provider group.
Evidence status: corroborated by A46; code-level detail treated as color per V8.

A23/A27: Open WebUI pitfalls — recommendation-bearing (counter-evidence)

Link / location: https://github.com/open-webui/open-webui/discussions/23656 ; https://github.com/open-webui/open-webui/discussions/4495
Retrieved: 2026-06-10
Trust class: web
Summary: The two documented failure modes the design must avoid: bare-model-ID favorites becoming ambiguous across connections, and stale-favorite cleanup permanently destroying user preferences during transient backend downtime.
Evidence status: corroborated by A21/A22/A53 (the surrounding docs and a second stale-state issue).

42 KiB Raw Blame History Unescape Escape