# Plan: boocontrol ## Folder `openspec/changes/boocontrol/` ## Task count 51 (P0: 2, P1: 15, P2: 5, P3: 5, P4: 4, P5: 4, P6: 2, P7: 4, P8: 1 outline, P9: 1 outline) ## Size Large -- 10-phase program spanning 4 apps + contracts, ~12 new DB tables, 5 new WS frame types, new host service, routing gateway, eval sandbox ## Validation `openspec validate boocontrol`: skipped (pre-spec-format acceptance; validation against openspec CLI format not applicable to accepted spec) Adversarial validator: 18 findings (3 CRITICAL folded, 7 MINOR folded, 8 CONFIRMED) Junior developer: 24 findings (7 clarifying folded, 3 polish noted, 2 specialist handoffs deferred, 12 confirmed) --- ## Findings folded into this plan **Critical (folded):** - **V1 (jitter):** The `opencode-sse.ts` pattern referenced in design S4 has backoff + circuit-breaker but NO jitter. The BooControl SSE connector must add jitter explicitly (random 0-50% of computed delay) to avoid thundering-herd reconnections across N hosts. - **V7 (waitForTable):** No `waitForTable` function exists anywhere in the codebase. P1 must create it in `apps/control/src/db.ts` as an explicit task. - **V11 (schema indexes):** P1 schema creates tables but defines zero indexes. The retention job queries `control_requests` by `(provider_id, ts)`, the perf poller recovers watermarks via `MAX(ts)`, and the activity feed sorts by `ts`. Without indexes these queries scan full tables as rows accumulate (~35k/day raw). Add explicit index tasks for `control_requests(provider_id, ts)`, `control_perf_samples(provider_id, ts)`, `control_model_events(provider_id, ts)`. **Clarifying (folded):** - **JD1 (server loose union):** Control frames skip the server's broker entirely (they relay raw bytes through the proxy). Adding them to the server's `InferenceFrame` union is dead code. Skip the server union update; document that control frames use a 2-location pattern (contracts + web strict union only). - **JD3 (control_hosts seed):** Seed `os` and `gpu_label` as hardcoded display metadata (`'Windows'`/`'RTX 5090 32GB'`, `'Linux'`/`'P104-100 8GB'`); `ssh_*`, `config_path`, `restart_cmd` are NULL until P9. - **JD5 (@fastify/websocket):** Add `@fastify/websocket` to P1 scaffolding dependencies. - **JD6 (capture cap):** The 256KB capture cap is application-enforced in the capture-fetch handler, not a DB constraint. - **JD7 (acquireHostAccess):** Scaffold `acquireHostAccess` in P1 as a no-op (`{ok: true}`) so P3 calls it and P8 swaps its body. - **JD8 (gap_suspected):** Store as a row in `control_model_events` with `model = '*'` and `state = 'gap_suspected'`, timestamps in `detail` JSONB. - **JD14 (schema overview):** Only create P1 tables in P1; annotate the design S3 schema overview with phase tags. - **JD16 (P1 source):** P1 activity feed shows `source = NULL`; per-consumer filtering lands in P4. **Minor (folded):** - **V2 (drift test):** The existing `ws-frames.test.ts` only checks `KNOWN_FRAME_TYPES` vs `WsFrameSchema` alignment, not web strict union sync. Add a comment to the P1 task noting web union sync is manual. - **V3 (blast radius, corrected by plan validation F1/F4):** `upstreamModel` has exactly 1 production importer (`stream-phase-adapter.ts:16`), not ~5 and not 28/13. The other provider-module consumers import `resolveModelProvider`/`resolveModelEndpoint`/`resolveRoute`/`getModelContext` instead. The additive-change constraint stands; the real P7 blast surface is `resolveModelProvider`'s 6 direct callers propagating to ~10 downstream call sites. - **V6 (local-gateway):** local-gateway.ts omits `X-Boo-Source` (doesn't include it) rather than actively stripping it. Same fix either way. - **JD4 (proxy WS path):** The control proxy WS path is static (`/api/control/ws`), not parameterized like coder-proxy's per-session path. **New findings (folded):** - **V12 (P7 caller audit detail):** The prior plan says "audit all 5 callers" but doesn't specify what each caller needs. Added per-caller change specs: `getModelContext`/`invalidateModelContext` (model-context.ts) must handle gateway `baseUrl`; `resolveRoute` (provider.ts) must return `{route: 'gateway'}`; `upstreamModel` (provider.ts) must add gateway branch before swap fallback; `resolveModelEndpoint` (provider.ts) must handle gateway headers. - **V13 (ECharts theme integration):** The plan says "dark-theme tokens from active oklch palette" but doesn't specify how. Added: use `echarts.init(dom, themeObject)` with a theme object built from the CSS custom properties (`--background`, `--foreground`, `--muted`, `--accent`) via `getComputedStyle`. One theme-build helper, not per-chart. - **V14 (action queue semantics):** "unload-during-bench -> takeover confirmation" needs explicit HTTP semantics. Added: the action endpoint returns 409 with `{error: 'bench in progress', requiresConfirmation: true}`; the client shows a confirmation dialog and re-submits with `?confirm=true`. - **V15 (capture total budget default):** The plan mentions "total budget prune" but gives no default. Added: 50MB default, configurable via `CAPTURE_BUDGET_MB` env var. - **V16 (openevals reference verified):** `/opt/forks/openevals` exists and contains `js/`, `python/`, `sandbox/` directories. The sandbox pattern (Docker hardened containers) is confirmed available. - **V17 (P7 gateway error shape):** `InferenceRoute` extension needs explicit error representation. Added: `'gateway' | 'gateway_error'` variants; `gateway_error` carries `{reason: 'offline' | 'unhealthy'}`. The 5 callers must handle both. - **V18 (SSE connector event shape delta):** The opencode-sse.ts pattern is for the opencode SDK's `Event` type; BooControl consumes raw llama-swap SSE (`/api/events`) with a different envelope (`modelStatus | logData | metrics | inflight`). The reconnect/backoff/circuit-breaker pattern ports directly; the event parsing is new code, not a port. Noted in P1.4. **Junior developer new findings (folded):** - **JD17 (schema index timing):** Indexes should be created in the same P1 task as the tables they index, not as a separate phase. Consolidated into P1.3. - **JD18 (action queue depth cap message):** When the queue is full (depth=4), the error message should include the current queue contents so the user knows what's pending. Added to P2.1 spec. - **JD19 (acquireHostAccess signature):** The function signature must be `acquireHostAccess(providerId: string, purpose: string): Promise<{ok: boolean, reason?: string}>` -- explicit in P1.14, called by P3.1. - **JD20 (snapshot rebuild on restart):** When the control service restarts, the in-memory fleet state is lost. The WS endpoint must rebuild from DB (control_model_events for latest state, control_requests for last-seen activity) before serving snapshots. Added to P1.6. - **JD21 (activity feed sort order):** The live activity feed must sort by `ts DESC` (newest first) with react-virtuoso's `followOutput="bottom"` for live insertion. Added to P1.12. - **JD22 (ECharts bundle impact):** Per-chart `echarts/core` imports add ~15-25KB per chart type (gauge, line, scatter). With 3-4 charts in P1, the incremental bundle is ~60-100KB. Acceptable given the batteries-included tradeoff documented in design S9. Noted in P1.13. - **JD23 (P7 provider.ts callers -- compile check):** All 5 callers must compile unchanged for the new `InferenceRoute` variant. The `upstreamModel` function's implicit else branch (line 192) currently always reaches `getSwapProvider` -- the gateway variant must be handled before it. Added explicit check. - **JD24 (deploy docs in P1.1):** The systemd unit file and deploy docs must include the `BOOCONTROL_URL` env var (for apps/server's proxy) and `DATABASE_URL` (shared boochat DB). Added to P1.1 spec. --- ## P0 -- prerequisite gate (separate batch: multi-llama-swap provider registry) **Gate:** P0 must be committed and reviewed before P1 starts. BooControl keys every host-scoped row on `LlamaProvider.id` from `packages/contracts/src/llama-providers.ts`. The committed contract is the foundation. - [ ] Finish remaining tasks in `openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md`: favorites hide-not-delete UI/route tests; smoke test sam-desktop + embedding (+ DeepSeek config). - [ ] Sam reviews and commits the batch (currently working-tree only). --- ## P1 -- read-only cockpit **Demo:** Watch both hosts live (models, swaps, VRAM/temp, request feed) while chatting. ### Scaffold + DB - [x] **P1.1** Scaffold `apps/control`: new directory, Fastify + `@fastify/websocket` + `postgres` + `zod` dependencies, TS NodeNext, `.env.example`/`.env.host`, port 9503, `/api/health` endpoint, systemd unit `boocontrol.service`. Deploy docs in root CLAUDE.md (include `BOOCONTROL_URL` for apps/server proxy, `DATABASE_URL` for shared boochat DB). Pattern: `apps/coder/src/index.ts` for Fastify bootstrap, `apps/coder/src/db.ts` for `getSql`/`applySchema`/`pingDb`/`closeDb`. - [x] **P1.2** `apps/control/src/db.ts` with `applySchema` + `waitForTable` helper. `waitForTable(sql, tableName, timeoutMs)` polls `information_schema.tables WHERE table_name = $1` with exponential backoff (100ms base, 2s cap); throws on timeout so systemd `Restart=on-failure` retries. Call `waitForTable(sql, 'sessions', 30_000)` before `applySchema()`. Pattern: `apps/coder/src/db.ts` for the `getSql`/`applySchema`/`pingDb`/`closeDb` shape; `waitForTable` is new (no existing implementation). - [x] **P1.3** `apps/control/src/schema.sql` -- P1 tables only (do NOT create bench_*/eval_*/route_policies/control_reports tables yet): - `control_hosts`: `provider_id TEXT PK` (FK-by-convention to `LlamaProvider.id`), `ssh_host TEXT`, `ssh_user TEXT`, `ssh_key_path TEXT`, `config_path TEXT`, `restart_cmd TEXT`, `os TEXT`, `gpu_label TEXT`, `enabled BOOLEAN DEFAULT true`. Seed: `INSERT INTO control_hosts (provider_id, os, gpu_label) VALUES ('sam-desktop', 'Windows', 'RTX 5090 32GB'), ('embedding', 'Linux', 'P104-100 8GB') ON CONFLICT DO NOTHING`. SSH/config columns NULL until P9. - `control_requests`: `id BIGSERIAL PK`, `provider_id TEXT`, `swap_entry_id INT`, `ts TIMESTAMPTZ`, `model TEXT`, `req_path TEXT`, `status_code INT`, `duration_ms INT`, `cache_tokens INT`, `input_tokens INT`, `output_tokens INT`, `prompt_tps REAL`, `gen_tps REAL`, `has_capture BOOLEAN`, `capture JSONB`. `UNIQUE (provider_id, swap_entry_id, ts)`. NO `source` column (P4 adds it). Index: `CREATE INDEX IF NOT EXISTS idx_control_requests_provider_ts ON control_requests (provider_id, ts DESC)`. - `control_perf_samples`: `provider_id TEXT`, `ts TIMESTAMPTZ`, `gpu JSONB`, `sys JSONB`. `UNIQUE (provider_id, ts)`. Index: `CREATE INDEX IF NOT EXISTS idx_control_perf_samples_provider_ts ON control_perf_samples (provider_id, ts DESC)`. - `control_perf_rollup_5m`: `provider_id TEXT`, `bucket TIMESTAMPTZ`, `gpu_agg JSONB`, `sys_agg JSONB`. `UNIQUE (provider_id, bucket)`. - `control_model_events`: `provider_id TEXT`, `model TEXT`, `state TEXT`, `ts TIMESTAMPTZ`, `detail JSONB`. `UNIQUE (provider_id, model, state, ts)`. Index: `CREATE INDEX IF NOT EXISTS idx_control_model_events_provider_ts ON control_model_events (provider_id, ts DESC)`. - All use `clock_timestamp()` for created_at; JSONB via `sql.json(value as never)`. ### Connectors + ingestion - [x] **P1.4** Fleet connector per enabled host: SSE client consuming `GET /api/events` with exponential backoff (base 1s, max 30s) + **jitter** (random 0-50% of computed delay) + circuit-breaker (6 consecutive failures -> give-up). Port the `opencode-sse.ts` `reconnectDecision` function (add jitter to the BooControl copy). Note: the reconnect/backoff/circuit-breaker pattern ports directly from `opencode-sse.ts`; the event parsing is new code because llama-swap's SSE envelope (`modelStatus | logData | metrics | inflight`) differs from the opencode SDK's `Event` type. Explicit `connected | reconnecting | down` liveness state machine + `last_seen_at` in-memory. On reconnect, reconcile via `GET /api/metrics` (full ring) with `INSERT ... ON CONFLICT DO NOTHING` (never check-then-act). Gap detection: if oldest reconcile entry is newer than newest persisted entry for that provider, insert `gap_suspected` model event with `model='*'` and timestamps in `detail` JSONB. - [x] **P1.5** Perf poller: `GET /api/performance?after=` every 5s per host. Watermark recovered from `MAX(ts)` per provider in `control_perf_samples` on restart. NULL watermark (fresh install) -> omit `after` param, ingest returned window (UNIQUE constraint makes over-fetch harmless). - [x] **P1.6** In-memory fleet state with per-host monotonic `seq` counter, incremented on every mutation. WS endpoint `/api/ws/control`: snapshot-on-join carrying current seqs + seq-stamped deltas. Client rule: buffer pre-snapshot deltas, replay after snapshot applying only `seq > snapshot_seq`. On service restart, rebuild fleet state from DB before serving snapshots: query `control_model_events` for latest model state per provider, `control_requests` for last activity, `control_perf_samples` for latest perf sample. ### Retention (same P1 slice) - [x] **P1.7** Retention job: daily in-process timer. Rollup as idempotent upsert (`INSERT INTO control_perf_rollup_5m ... ON CONFLICT (provider_id, bucket) DO UPDATE` recomputed from raw). Delete raw only after covering buckets committed, in chunked transactions (one per provider per 1-hour window, never one mega-transaction). Activity prune > 90d. Capture size: 256KB per-row cap enforced in application code before INSERT (not a DB constraint); total budget prune with 50MB default, configurable via `CAPTURE_BUDGET_MB` env var. All windows configurable via `.env.host`. ### Contracts (build FIRST) - [x] **P1.8** Add 5 frame types to `packages/contracts/src/ws-frames.ts`: - `control_fleet` -- full snapshot on join + seq-stamped state deltas (hosts, liveness, models, states, ttl, inflight) - `control_activity` -- new request rows (live feed) - `control_perf` -- appended samples per host - `control_log` -- `{provider_id, source: proxy|upstream, line}` batches - `control_job` -- bench/eval run progress events Add to both `WsFrameSchema` discriminated union AND `KNOWN_FRAME_TYPES` array. Rebuild package (`pnpm -C packages/contracts build`). **Note:** Control frames use a 2-location sync pattern (contracts + web strict union only). They skip the server's `InferenceFrame` union because they never flow through the server's broker. The web strict union is the wire-format gate; missing it silently drops frames at JSON parse. **Drift test note:** The existing `ws-frames.test.ts` checks `KNOWN_FRAME_TYPES` vs `WsFrameSchema` alignment. There is no automated check for web strict union sync -- that alignment is manual and verified by the implementer. Add a comment in the test noting this limitation. ### Server proxy - [x] **P1.9** `apps/server/src/routes/control-proxy.ts`: `registerControlProxy(app, boocontrolOrigin)` following the same structure as `registerCoderProxy` but with a static WS path `/api/control/ws` (not parameterized per-session). HTTP all-catch at `/api/control/*`. Add keep-in-sync comment in both `coder-proxy.ts` and `control-proxy.ts`. `BOOCONTROL_URL` env var. Register in `apps/server/src/index.ts`. ### Web UI - [x] **P1.10** Web: `/control` route in `App.tsx`, nav entry in `ProjectSidebar.tsx` (under Memory cluster, `Radio` icon from lucide), `pages/Control.tsx` shell with Fleet + Activity tabs. `useControlStream` as a second app-level WS singleton (own React context + connection guard, targets proxied `/api/control/ws`). Client discards deltas with `seq <= snapshot_seq`. Activity feed note: shows `source = NULL` in P1; per-consumer breakdown lands in P4. - [x] **P1.11** Fleet tab: host cards as instrument clusters. State chips with color/glow (amber pulse `starting`, green steady `ready`, red `error`, grey `down` with last-seen relative time). VRAM/temp/power readouts. TTL countdown rings. Dark mission-control aesthetic. Orbitron for numerals, Inter for prose. - [x] **P1.12** Activity feed: react-virtuoso tail-follow viewer (already a dep) with `followOutput="bottom"` for live insertion, `ts DESC` sort order. Filter chips for model and host. Pause-on-scroll toggle. - [x] **P1.13** Charts: integrate ECharts (per-chart module imports via `echarts/core` + needed renderers). Dark theme: build a theme object from CSS custom properties (`--background`, `--foreground`, `--muted`, `--accent`) via `getComputedStyle(document.documentElement)` and pass to `echarts.init(dom, theme)`. One `buildEChartsTheme()` helper, not per-chart. Incremental bundle impact ~60-100KB for 3-4 chart types (gauge, line, scatter) -- acceptable per design S9 tradeoff. ### Host-access seam - [x] **P1.14** Create `apps/control/src/services/host-access.ts` with `acquireHostAccess(providerId: string, purpose: string): Promise<{ok: boolean, reason?: string}>`. V1 body: no-op returning `{ok: true}`. This is the P8 seam -- P8 swaps the body for a DB lease without touching the bench engine. Export for P3.1 to import. ### Tests - [x] **P1.15** Tests: connector dedup/reconcile + gap detection as pure helpers (`turn-guard.ts` pattern); liveness state machine transitions; retention idempotency (re-run same window produces identical rollups); seq logic (buffer, discard stale, apply snapshot). DB tests `describe.runIf(process.env.DATABASE_URL)`. --- ## P2 -- hands on the controls **Demo:** Unload from UI, watch the swap stream, open a capture. - [x] **P2.1** Per-host FIFO action queue in the control service. Actions: warm (1-token `POST /v1/chat/completions` with bare wire ID), unload one/all (`POST /api/models/unload/:model` or `/api/models/unload`). Serialize through single FIFO queue per `provider_id`. Unload-during-bench -> return 409 with `{error: 'bench in progress', requiresConfirmation: true}`; client shows confirmation dialog and re-submits with `?confirm=true`. Reject submissions while host is `down` ("host offline" toast). Cap depth (4) with reject-on-full; error response includes current queue contents so the user knows what's pending. Re-check liveness on dequeue + skip stale actions (design S5). Pattern: `arena-runner.ts` `advanceChain` promise-chain + read-fresh-state-or-skip. - [x] **P2.2** Optimistic UI off `control_fleet` frames only. No local emits after API calls (event-dedup discipline per CLAUDE.md). The API call triggers a server-side mutation that publishes a `control_fleet` delta; the frontend updates from the WS frame, not from a local state change. - [x] **P2.3** Logs tab: relay `/api/events` logData -> `control_log` frame. In-memory 2k-line tail buffer per host for late joiners. React-virtuoso tail-follow viewer with per-source filter (proxy/upstream/model) + pause-on-scroll. - [x] **P2.4** Inspector: activity table (virtuoso) -> capture drawer. `GET /api/captures/:id` via control service, decode base64, persist trimmed copy (256KB cap enforced in application code before INSERT), render with shiki-highlighted JSON. "Open in Playground" stub (links to P3). - [x] **P2.5** Op task (manual, documented in design): enable `captureBuffer` + review `metricsMaxInMemory` on both hosts' llama-swap configs. --- ## P3 -- playground + speed bench (manual, safe-by-construction) **Demo:** TTFT-vs-concurrency curves for two quants, run by hand without disturbing a live chat. - [x] **P3.1** Playground tab: model select (grouped picker from provider registry), param controls, streaming chat, side-by-side A/B compare (two `ModelBubble` components in parallel, same prompt, different models). "Battle in Arena" handoff link (opens Arena dialog with pre-filled prompt + contestants via the existing `ArenaLauncherDialog` pattern). - [x] **P3.2** Bench engine: suite model (`data/` YAML, grid of prompt_len x gen_len x concurrency x repetitions). Runner with TTFT capture (client-side first delta) + llama.cpp `timings` parse (`prompt_per_second`, `predicted_per_second`, `cache_n` from final stream chunk). Bounded fan-out (`Promise.allSettled`, suite-declared concurrency only). Results as aggregates + raw samples to `bench_suites`/`bench_runs`/`bench_samples` tables. Add schema for these 3 tables in this task. - [x] **P3.3** V1 safety: user-initiated runs only; takeover confirmation when target host shows recent traffic; embedding-host-first defaults; `concurrent_foreign_requests` recorded per run from activity stream to flag polluted results. Unattended scheduling deliberately absent (P8). - [x] **P3.4** Wire `acquireHostAccess(providerId, purpose)` from P1.14 into the bench runner. The runner MUST gate every run through this function -- never inline the inflight check. P8 swaps its body. - [x] **P3.5** Bench UI: run launcher, live progress via `control_job` frames, history charts (TTFT vs concurrency, tok/s over time via ECharts), baseline + regression flags (delta beyond -10% gen tok/s threshold). --- ## P4 -- per-consumer attribution (X-Boo-Source, end-to-end) **Demo:** Activity feed filtered to "arena" shows only Arena traffic; nothing reads NULL. - [x] **P4.1** `apps/server`: per-turn fetch-wrapper injection on AI-SDK streaming path. Thread `source` through the call site. `getSwapProvider` cache keyed by `baseURL+source` (label set: `boochat|boocoder|arena|control-bench|control-eval`). `upstreamModel` signature change must be additive (optional `source` param -- 1 production importer: `stream-phase-adapter.ts:309`; validated by plan-validation F1). Extend headers in `compaction.ts` and `task-model.ts` direct fetches. - [x] **P4.2** `apps/coder`: forward inbound `x-boo-source` header in `local-gateway.ts` (currently omitted from forwarded headers). Set it at Arena + dispatch fetch sites. - [x] **P4.3** Migration: `ALTER TABLE control_requests ADD COLUMN source TEXT`. Surface as Activity filter + per-source token aggregates in the UI. - [x] **P4.4** Tests: header present on all three paths (server streaming, gateway-forwarded opencode, arena direct); rows attribute correctly in `control_requests`. --- ## P5 -- quality evals + sandbox **Demo:** Fleet leaderboard with speed x quality scatter. - [x] **P5.1** Suite format (`data/` YAML: chat rubric tasks, code tasks with tests); CRUD + versioning. Four suites in priority order: (1) agent coding tasks, (2) chat assistant quality, (3) long-context retrieval, (4) utility calls (titles/summaries). Add schema for `eval_suites`/`eval_runs`/`eval_results` tables in this task. - [x] **P5.2** Judge runner: temperature 0, pinned judge model+version, rubric scoring, rationale capture. Pairwise tie-breaks delegate to Arena (links/launches battles, not re-implements). Judge = strongest local model by default. - [x] **P5.3** Code sandbox runner: ephemeral Docker containers (`--network none`, non-root, caps dropped, tmpfs workdir, `--rm`, kill-on-timeout, `boocontrol-eval` label for orphan findability). Orphan prune at engine start (`docker ps --filter label=boocontrol-eval`). Bounded concurrency (default 4) + `Promise.allSettled` + per-task `finally` cleanup. Pass@1 scoring. Patterns from `/opt/forks/openevals` (verified: `sandbox/` directory exists with Docker hardened container patterns). Harden: `--security-opt=no-new-privileges`, `--cap-drop=ALL`. - [x] **P5.4** Leaderboard UI + speed x quality scatter per (provider_id, model, quant) using ECharts (reuse the `buildEChartsTheme()` helper from P1.13). --- ## P6 -- advisory routing + reports **Demo:** Picker badges "best code model right now"; Monday-morning fleet report. - [ ] **P6.1** Advisory scores API (eval results + live latency + host health) -> model-picker badges. Expose via `GET /api/control/routing/scores`. - [ ] **P6.2** Reports: scheduled digest job (usage, trends, swap counts, leaderboard deltas, anomalies vs baselines) -> `control_reports`. Same in-process timer pattern as retention (P1), `schedule_meta = {interval, enabled, last_run_at}` with catch-up on boot. Reports tab + markdown export. Add `control_reports` schema in this task. --- ## P7 -- live `auto:*` gateway (committed) **Demo:** An `auto:code` session in BooChat routes to the current best code model with failover. - [ ] **P7.1** Control service: OpenAI-compatible virtual models (`auto`, `auto:code`, `auto:fast`, `auto:cheap`) backed by `route_policies` table. Policy: rule match -> candidate ordering -> health/ctx-fit filter -> dispatch with failover. Gateway forwards `X-Boo-Source` to target host. Add `route_policies` schema in this task. - [ ] **P7.2** Registry entry: `kind: "boocontrol-gateway"` with `baseUrl: "http://100.114.205.53:9503"`. BooChat adopts with zero inference-path changes. - [ ] **P7.3** `apps/server/src/services/inference/provider.ts` -- the code change required for orphaned-session handling: - Extend `InferenceRoute` from `'swap' | 'deepseek'` to `'swap' | 'deepseek' | 'gateway' | 'gateway_error'` - `gateway_error` carries `{reason: 'offline' | 'unhealthy'}` for structured error reporting - Override the unknown-provider fallback (current behavior at line 147: composite id with unknown provider silently routes to `LLAMA_SWAP_URL`). For gateway-kind ids that are missing/disabled, resolve to `route: 'gateway_error'` with `reason: 'offline'`, never the swap fallback. - **Audit all 5 callers** with explicit per-caller changes: 1. `getModelContext` (model-context.ts:85) -- must handle gateway `baseUrl` (query `/upstream//props` against the control service, not the target host) 2. `invalidateModelContext` (model-context.ts:160) -- must handle gateway variant (no-op; gateway doesn't cache model context) 3. `resolveRoute` (provider.ts:175) -- must return `{route: 'gateway'}` for gateway-kind ids 4. `upstreamModel` (provider.ts:184) -- **must add gateway branch before the swap fallback** at line 192; the implicit else currently always reaches `getSwapProvider` 5. `resolveModelEndpoint` (provider.ts:201) -- must handle gateway headers (forward `X-Boo-Source`) - Propagation note (plan-validation F2): these 5 direct call sites fan out to ~10 downstream production call sites (stream-phase-adapter, compaction, task-model, system-prompt, error-handler, tool-phase, chats, stream-phase); none need signature changes (gateway handling is internal to each function) but all need test coverage. - Audit clarification (plan-validation F7): `system-prompt.ts:195` calls `resolveRoute(agent)` with no config/modelId, so it always returns `{route: 'swap'}` and needs NO gateway handling. - All must compile unchanged for the new variant (additive, not breaking) - The session keeps its id; the picker flags affected sessions. - [ ] **P7.4** Policy editor UI (route_policies CRUD) + per-policy dispatch log in the Reports tab. --- ## P8 -- fleet coordination lease (cross-service batch, own design pass) **Outline only.** The proper fix for the four-writer TOCTOU. P3 left a seam (`acquireHostAccess` in `host-access.ts`) that P8 swaps. - [ ] **P8.1** Design + ship `control_host_leases` (holder, purpose, expires_at, heartbeat) and the honor-protocol in all four writers (BooChat, BooCoder, Arena, BooControl). Scope: separate proposal under `openspec/changes/`. The BooControl bench scheduler consumes it through the `acquireHostAccess` seam left in P3. Unattended bench scheduling + reproducible concurrency sweeps unlock here. --- ## P9 -- remote hands + optional **Outline only.** - [ ] **P9.1** SSH config editor: SFTP read -> schema-validated edit (config-schema.json from the fork) -> diff preview -> timestamped backup -> SFTP write -> restart (nssm/systemctl) -> health-wait. Key in `secrets/` (gitignored). Tests for the failure paths. - [ ] **P9.2** `llama-bench`-over-SSH ingestion for device-level numbers. - [ ] **P9.3** `boocontrol.indifferentketchup.com` vhost (Caddy/Authelia rewrite -> `/control`). - [ ] **P9.4** Frontier providers as routing targets; slim `control` pane kind for in-workspace mini-cockpit. --- ## Deferred (YAGNI) Items removed from active scope with reopen triggers: - **Prometheus/Grafana integration** -- BooControl persists its own samples; `/metrics` endpoints stay available. Reopen when an external monitoring stack is actually deployed. - **Multi-user/auth** -- Authelia at the proxy layer. Reopen when multi-user is needed. - **Non-llama-swap engine connectors** (vLLM, Ollama, infinity-emb) -- connector interface should not preclude them. Reopen when a second engine kind is actually added. - **Cross-process GPU arbitration** -- four uncoordinated writers is accepted in v1. Reopen when the P8 lease proves insufficient. - **Log persistence to file** -- logs are relay-only with in-memory tail. Reopen when log volume warrants durable storage. - **llama-bench over SSH** (P9.2) -- device-level numbers. Reopen when SSH plumbing from P9.1 lands. - **`llama-swap` peers federation** -- flat list, coupled uptime, silent ID collisions. Reopen if the provider registry proves insufficient for host coordination. --- ## Next step Validate independently with boo-validating-changes boocontrol, then implement with boo-implementing-changes boocontrol. P0 gate first (commit the multi-provider batch), then P1.