Files
boocode/openspec/changes/multi-llama-swap-providers-model-favorites/design.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

7.5 KiB

multi-llama-swap-providers-model-favorites — design

Detailed implementation plan for named local model providers, composite model IDs, grouped pickers, and shared favorites across BooChat and BooCoder.

1. Current state

Today the repo splits inference configuration across two incompatible shapes:

  • apps/server reads env vars such as LLAMA_SWAP_URL, LLAMA_SIDECAR_URL, and DEFAULT_MODEL.
  • apps/coder reads the same LLAMA_SWAP_URL for BooCode's own provider, plus data/coder-providers.json for ACP providers.

That leaves several hardcoded single-endpoint assumptions:

  • /api/models fetches one llama-swap plus optional DeepSeek.
  • provider.ts routes by deepseek- name prefix and a global sidecar default.
  • model-context.ts caches by bare model string.
  • compaction.ts, task-model.ts, and coder arena use a single upstream URL.
  • BooCoder prepends llama-swap/ and treats any other slash-containing value as an already-routable provider namespace.

2. Design principles

  1. Provider identity is explicit.
  2. Wire model IDs stay bare; persisted model IDs are composite.
  3. Legacy bare model IDs remain readable indefinitely.
  4. Favorites are shared across BooChat and BooCoder.
  5. Sidecar routing is opt-in per provider, not a global fallback.
  6. Any cache keyed by model identity uses the full composite ID.

Introduce a new shared file for local inference providers:

  • Live path: /data/llama-providers.json
  • Env var for both apps: LLAMA_PROVIDERS_PATH
  • Tracked example: data/llama-providers.example.json

Recommended shape:

{
  "defaultProvider": "sam-desktop",
  "providers": [
    {
      "id": "sam-desktop",
      "label": "Sam-desktop",
      "baseUrl": "http://100.101.41.16:8401",
      "sidecarUrl": "http://100.101.41.16:8402",
      "kind": "llama-swap"
    },
    {
      "id": "embedding",
      "label": "embedding",
      "baseUrl": "http://100.90.172.55:8411",
      "kind": "llama-swap"
    }
  ]
}

Rules:

  • If the file is missing, synthesize a single legacy provider from LLAMA_SWAP_URL and optional LLAMA_SIDECAR_URL.
  • data/coder-providers.json remains the ACP registry and is not extended with llama-swap base URLs.
  • DeepSeek credentials remain env-backed, but the model catalog should expose a synthetic provider group such as deepseek so routing no longer depends on a bare deepseek- prefix.

4. Model identity and parsing

Persist model selections as provider/model.

Examples:

  • sam-desktop/qwen3.6-35b-a3b
  • embedding/gemma-4-12b
  • deepseek/deepseek-v4-pro

Helper behavior:

  • parseModelRef(id) returns { providerId, wireModelId, isLegacyBareId }
  • Bare IDs resolve to { providerId: defaultProvider, wireModelId: id }
  • Only strip the prefix at the final wire-call boundary

This preserves existing TEXT columns while fixing duplicate-name ambiguity.

5. Server changes

5.1 Shared registry + model catalog

Add shared registry utilities in packages/contracts plus server-side loaders used by:

  • apps/server/src/config.ts
  • apps/server/src/routes/models.ts
  • apps/server/src/services/inference/provider.ts
  • apps/server/src/services/model-context.ts
  • apps/server/src/services/task-model.ts
  • apps/server/src/services/compaction.ts

GET /api/models should return a provider-aware payload. Recommended shape:

interface ModelCatalogProvider {
  id: string;
  label: string;
  models: ModelInfo[];
}

interface ModelCatalogResponse {
  providers: ModelCatalogProvider[];
}

Where each ModelInfo.id is already composite.

Favorites should not be embedded in this payload. They are a user-level view derived in the client from favorite_models in /api/settings.

5.2 Routing

Replace string-heuristic routing with provider-aware resolution:

  • sam-desktop/* routes to baseUrl or sidecarUrl depending on agent flags and provider capabilities.
  • embedding/* always routes directly to its llama-swap baseUrl.
  • deepseek/* routes to the DeepSeek SDK provider.

resolveModelEndpoint() and upstreamModel() must both resolve from the same parsed model reference to keep streaming and non-streaming behavior aligned.

5.3 Context lookup and cache keys

model-context.ts must key caches by the full composite ID. The provider prefix is stripped only when building:

<provider.baseUrl>/upstream/<wireModelId>/props

This avoids cross-provider cache poisoning for duplicate names.

6. Persistence and settings

Keep:

  • sessions.model TEXT
  • chats.model TEXT

Add a new settings key:

  • favorite_models: string[]

Rules:

  • Stored favorites are composite IDs only.
  • Missing/offline favorites are hidden from the picker, not deleted.
  • Legacy bare favorites are not supported; on read they may be ignored or normalized only if the default-provider mapping is unambiguous.

7. BooCoder integration

Touch points:

  • apps/coder/src/services/provider-snapshot.ts
  • apps/coder/src/services/dispatcher.ts
  • apps/coder/src/services/arena-model-call.ts
  • apps/coder/src/services/arena-analyzer.ts
  • apps/coder/src/config.ts

7.1 Native boocode provider

The native boocode provider can use the shared local-provider registry and resolver directly. Its model list should expose composite provider/model ids and the UI should group them by local provider.

7.2 External-agent parity is a separate seam

opencode is not safe to migrate by a naive string rewrite. The current bridge assumes one local llama-swap provider and collapses identity back to llama-swap/<model>.

Recommended bridge rule:

  • Composite local model IDs remain provider/model in native BooCode state and UI.
  • Do not translate provider/model back to llama-swap/<wireModelId> for external-agent paths; that loses provider identity for duplicate model names.
  • If full opencode parity is required, prefer a BooCoder-hosted OpenAI-compatible local-model gateway that accepts provider-aware model ids and routes them to the correct local upstream.

If the gateway is not part of the first slice, restrict the initial scope to native boocode parity and keep opencode local-model parity as a follow-up.

8. Picker UX

Both BooChat and BooCoder should converge on the same behavior:

  • Favorites section first
  • Then one section per provider
  • Favorite toggle on every model row
  • A favorited model remains visible in its provider section
  • Provider order defaults to:
    1. sam-desktop
    2. embedding
    3. deepseek when configured

This batch does not require search. Search can be added later if model counts make the grouped list insufficient.

9. Rollout and compatibility

  1. Land registry/parsing utilities first.
  2. Switch server routing and model catalog to composite IDs.
  3. Add favorite persistence and picker grouping.
  4. Update native BooCoder (boocode) model handling and arena.
  5. Decide the opencode parity path: gateway now, or explicit follow-up.
  6. Verify legacy bare IDs across existing chats and sessions before removing any old env-based assumptions.

Compatibility requirements:

  • Missing /data/llama-providers.json cannot break startup.
  • Existing DB rows with bare IDs must remain routable.
  • Existing DEFAULT_MODEL can stay bare during transition, but new writes should become composite.

10. Deferred items

  • Picker search/filtering
  • Manual favorite ordering beyond insertion order
  • Host health badges in the picker
  • Automatic normalization of old session/chat model values
  • Full opencode multi-provider parity if the first slice ships native-only
  • Any boocontrol fleet UI built on top of this registry