# BooControl — design **Status:** ACCEPTED — decisions resolved 2026-06-11; architecture-analysis findings folded in; verification-pass fixes applied 2026-06-12 (chart lib decided: ECharts, §9). No open design items. ## 1. Topology ``` ┌─ Tailscale mesh ──────────────────────────────────────────────────────────┐ │ │ │ sam-desktop 100.101.41.16 (Windows, RTX 5090 32GB) │ │ llama-swap v224 :8401 ─ /api/events SSE, /api/performance(GPU), │ │ D:\llama-server (CUDA) /api/metrics, /api/captures, /running, │ │ /logs/stream, POST /api/models/unload │ │ │ │ embedding 100.90.172.55 (Linux, P104-100 8GB) │ │ llama-swap :8411 ─ same API surface; 39 small models, ttl 1800 │ │ │ │ ubuntu-homelab 100.114.205.53 (no GPU) │ │ boocode container :9500 (apps/server + apps/web) │ │ booterm container :9501 │ │ boocoder host svc :9502 (apps/coder) │ │ boocontrol host svc :9503 (apps/control) ◄── NEW │ │ postgres :5500 (boochat DB) │ └───────────────────────────────────────────────────────────────────────────┘ Browser ──WS/HTTP──► apps/server (/api/control/* proxy, WS relay) └────────► apps/control :9503 ├─ SSE client per provider (events) ├─ pollers (/api/performance?after=, /running) ├─ per-host action queue (warm/unload serialization) ├─ bench + eval engines (manual v1) ├─ ssh2 (P9 only: config edit + restart) └─ Postgres (third schema owner, ordered startup) ``` Key fact that shapes everything: **the llama-swap fork exposes GPU/system telemetry, token metrics, request captures, and log streams over HTTP per instance** (`internal/perf/types.go` GpuStat/SysStat; `internal/server/apigroup.go`). The control service needs no agent on the GPU hosts. SSH is required only for config editing + service restart (P9). Why a host service and not a container: SSH key handling (P9), spawning sandbox containers for code evals (talking to dockerd from inside a container is a privilege escalation we don't need), and parity with the boocoder operational pattern (systemd, `.env.host`, deploy via `pnpm -C packages/contracts build && pnpm -C apps/control build && sudo systemctl restart boocontrol`). **There is no sidecar.** The llama-sidecar (:8402, per-agent flags) has been removed from the system entirely. No control-plane table, connector, or registry field references it. ## 2. Fleet identity = the provider registry (`LlamaProvider.id`) The multi-provider batch introduces the shipped contract (`packages/contracts/src/llama-providers.ts`): ```ts LlamaProviderSchema = { id, label, baseUrl, kind } // ids: "sam-desktop", "embedding" ``` BooControl keys every host-scoped row on **`provider_id` = `LlamaProvider.id`** — the field that actually exists and that `resolveModelProvider` already resolves by. (Earlier drafts said `provider_name` against a `{name, sidecarUrl?}` shape; that shape was never shipped.) Control-plane attributes extend the registry entry rather than inventing a parallel hosts table: ``` control_hosts provider_id TEXT PK -- FK-by-convention to LlamaProvider.id ("sam-desktop", "embedding") ssh_host TEXT, ssh_user TEXT, ssh_key_path TEXT -- nullable: no SSH = no config editing (P9) config_path TEXT -- D:\llama-swap\config.yaml | ~/llama-swap/config.yaml (P9) restart_cmd TEXT -- nssm/systemctl invocation (P9) os TEXT, gpu_label TEXT -- display metadata enabled BOOLEAN DEFAULT true ``` Lesson imported from stackctl's worst bug: its machines table was dropped + re-seeded on every container rebuild, losing user-added hosts. `control_hosts` rows are durable; seeding is `INSERT ... ON CONFLICT DO NOTHING`. ## 3. Schema ownership + startup ordering (third schema owner) `apps/control/src/schema.sql`, applied by `apps/control/src/db.ts:applySchema()` on boot — the coder precedent. Two hardening rules the coder precedent lacks: 1. **Startup ordering guard.** The coder schema holds real FKs into server-owned tables (`REFERENCES sessions(id)`, `chats(id)`); today the server-before-coder ordering is an accident of Docker-vs-host start timing. A third concurrent `applySchema` caller widens that race, so `apps/control` makes the ordering explicit: ```ts // apps/control/src/index.ts — before applySchema() await waitForTable(sql, 'sessions', 30_000); // poll information_schema; THROWS on timeout await applySchema(sql); ``` "Fail loud" means **throw → process exits nonzero → systemd (`Restart=on-failure`) retries**. The guard is enforcing, not advisory: `applySchema` is never reached if the server schema is absent, so a partial-DDL state cannot occur. (Control tables themselves currently take no FKs into server tables, but the guard costs one query and removes the timing dependency for any future FK.) 2. **Dedup is enforced by the database, not application checks.** Every ingest table whose dedup matters carries a UNIQUE constraint and is written with `INSERT ... ON CONFLICT DO NOTHING` — check-then-act application dedup is racy under concurrent SSE + reconcile writers (analysis C2/C7). ``` control_requests -- persisted ActivityLogEntry stream (the thing llama-swap forgets on restart) id BIGSERIAL PK, provider_id TEXT, swap_entry_id INT, -- llama-swap's ring id ts TIMESTAMPTZ, model TEXT, req_path TEXT, status_code INT, duration_ms INT, cache_tokens INT, input_tokens INT, output_tokens INT, prompt_tps REAL, gen_tps REAL, has_capture BOOLEAN, capture JSONB, -- nullable; fetched-on-demand copy (req/resp, capped) UNIQUE (provider_id, swap_entry_id, ts) -- survives ring-id reset; INSERT ... ON CONFLICT DO NOTHING -- NOTE: no `source` column in P1. The X-Boo-Source attribution column is added by the -- P4 migration, when injection actually works end-to-end (see §7). No NULL-forever rows. control_perf_samples -- raw SysStat+GpuStat, short retention (48h default) provider_id TEXT, ts TIMESTAMPTZ, gpu JSONB, sys JSONB, UNIQUE (provider_id, ts) -- restart-safe: re-polled samples no-op control_perf_rollup_5m -- avg/max per 5min bucket, long retention (90d) provider_id TEXT, bucket TIMESTAMPTZ, gpu_agg JSONB, sys_agg JSONB, UNIQUE (provider_id, bucket) -- rollup is an idempotent upsert (§6) control_model_events -- state transitions (stopped→starting→ready→stopping), swap durations provider_id, model, state, ts, detail JSONB, UNIQUE (provider_id, model, state, ts) -- reconcile can re-deliver model status; same ON CONFLICT DO NOTHING discipline bench_suites / bench_runs / bench_samples -- suite: {prompt_tokens[], gen_tokens[], concurrency[], repetitions} -- sample: per-request timings (ttft_ms, prompt_tps, gen_tps, total_ms) + run aggregates eval_suites / eval_runs / eval_results -- suite: kind chat|code, tasks JSONB (prompt, reference, checker), judge_model -- result: per-task score, judge rationale / execution log, sandbox exit info route_policies -- P7: name, match rules JSONB, target ordering, fallback control_reports -- generated digests (markdown + JSONB stats) + schedule meta: {interval: 'daily'|'weekly', enabled, last_run_at TIMESTAMPTZ} -- driven by the SAME in-process timer pattern as the retention job (P6): hourly tick -- checks last_run_at vs interval, runs if due (catch-up on boot included). No cron dep, -- no new scheduler abstraction (S7 stays YAGNI-deferred; reopen trigger unchanged). ``` `clock_timestamp()` inside transactions per repo convention; JSONB via `sql.json(...)`. ## 4. Ingestion semantics - **SSE consumer** per enabled host: `GET /api/events` → envelopes `modelStatus | logData | metrics | inflight`. Reconnect with backoff + jitter (reconnect/circuit-breaker pattern: `apps/coder/src/services/backends/opencode-sse.ts` — NOTE the source has exponential backoff + circuit breaker but NO jitter; add jitter explicitly here, random 0-50% of the computed delay, per plan finding V1/F3). On reconnect, reconcile via `GET /api/metrics` (full ring). Reconcile and live SSE may both insert the same entry concurrently — that is fine **because dedup is the DB UNIQUE constraint** (`ON CONFLICT DO NOTHING`), not a check-then-act. The dedup key `(provider_id, swap_entry_id, ts)` includes the timestamp because llama-swap's ring ids restart from 0 on its restart. - **Known bound, accepted:** the ring holds 1000 entries. An outage longer than 1000 requests loses the overwritten tail permanently — log a `gap_suspected` model event so the loss is visible rather than silent. **Detection rule (no-overlap heuristic):** if the *oldest* entry in the reconcile fetch is newer than the newest already-persisted entry for that provider, the ring wrapped past our tail; emit `gap_suspected` with both timestamps in `detail`. Overlap present = no gap, no event. - **Second accepted residual:** a genuinely-new post-restart entry whose `(swap_entry_id, ts)` exactly collides with a pre-restart row (same ring slot, same timestamp to llama-swap's `ts` precision) is silently dropped by the UNIQUE constraint. Window = one entry per restart at sub-precision coincidence; accepted, not solvable client-side without a content hash in the key. - **Perf poller**: `GET /api/performance?after=` every 5s (llama-swap's own minimum collection interval). The watermark is recovered on restart from `MAX(ts)` per provider in `control_perf_samples` (not in-memory only); duplicate polls no-op on the UNIQUE constraint. **Cold start (`MAX(ts)` = NULL, fresh install):** omit `after` entirely and ingest whatever window the host returns — the UNIQUE constraint makes over-fetch harmless, and the next poll has a watermark. - **Host liveness is explicit state, not absence of data.** Each connector runs a small state machine `connected | reconnecting | down` (down after N failed reconnects); transitions publish a `control_fleet` delta and stamp `control_hosts`-adjacent in-memory state with `last_seen_at`. A late-joining browser therefore sees `down + last_seen_at`, never a stale "ready" snapshot (analysis B3). - **Snapshot/delta consistency.** The fleet state keeps a per-host monotonic `seq`, incremented on every mutation. The join snapshot carries the current `seq`s; every delta carries its `seq`. Client rule: **buffer (do not apply, do not discard) any delta that arrives before the snapshot**; after applying the snapshot, replay the buffer dropping deltas with `seq <=` the snapshot's per-host seq, and apply the filter to all subsequent deltas. On a single FIFO WS pre-snapshot deltas should not occur, but buffering makes the rule transport-independent. This closes the join race where a delta arrives during snapshot serialization (analysis B4). - **Logs are not persisted** by default (volume + low value at rest); they relay live SSE → WS with an in-memory tail buffer (last ~2k lines per host) for late joiners. Optional "record to file" toggle later. - **Fan-out to browser**: the control service publishes over its own WS (`/api/ws/control`), relayed by apps/server's proxy as `/api/control/ws`. This is a **second app-level WS connection** in the browser — `useControlStream` gets its own singleton guard + context; it does NOT share `useUserEvents`' `/api/ws/user` channel. Frames (added to `packages/contracts/src/ws-frames.ts` **first**, then the server loose union, then the web strict union — and the contracts drift test extended to cover them, so a partial edit fails the suite): - `control_fleet` — full snapshot on join + seq-stamped state deltas (hosts, liveness, models, states, ttl deadlines, inflight) - `control_activity` — new request rows (the live feed) - `control_perf` — appended samples per host - `control_log` — `{provider_id, source: proxy|upstream, line}` batches - `control_job` — bench/eval run progress events ## 5. Actions | Action | Mechanism | |---|---| | Warm/load model | 1-token `POST /v1/chat/completions` with the bare wire ID (stackctl-proven; llama-swap loads on demand — there is no load endpoint) | | Unload one/all | `POST /api/models/unload/:model` / `/api/models/unload` | | Inspect request | `GET /api/captures/:id` on the host, decode base64, persist trimmed copy, render | | Bench/eval runs | engines below (manual v1) | | Edit config / restart llama-swap | P9 (SFTP + schema validation + diff + timestamped backup + restart + health-wait) | **Per-host action queue.** All host-mutating actions (warm, unload, bench warm-up) from BooControl serialize through a single FIFO queue per `provider_id` inside the control service — double-clicks, warm-during-warm, and unload-during-bench from *this* service cannot interleave (analysis C3). An unload request while a bench run holds the host is rejected with a "bench in progress — takeover?" confirmation. Queue discipline (verification C-N1): **submissions are rejected immediately while the host's liveness state is `down`** ("host offline" toast); queue depth is capped (4) with reject-on-full; each action **re-checks liveness on dequeue and skips itself if stale** — a recovered host never replays a backlog of stale warms. (Pattern precedent: `arena-runner.ts` `advanceChain` promise-chain, plus its read-fresh-state-or-skip discipline.) This serializes BooControl's own hands only; BooChat/BooCoder/Arena traffic is uncoordinated until P8. All mutating actions publish `control_job`/`control_fleet` frames; UI handlers stay idempotent (event-dedup discipline per CLAUDE.md — no local emit after API call). **Manual op checklist (P2.5):** Before the capture inspector works end-to-end, enable `captureBuffer` and review `metricsMaxInMemory` on both hosts' llama-swap configs. These are per-host settings in `config.yaml` and must be set before captures will be available: - [ ] sam-desktop: set `captureBuffer: true` and verify `metricsMaxInMemory` (default 1000, sufficient for most workloads) - [ ] embedding: set `captureBuffer: true` and verify `metricsMaxInMemory` - [ ] Restart llama-swap on both hosts after config changes ## 6. Retention (ships in the same P1 slice as ingestion) Daily job, crash-safe by construction: 1. **Rollup is an idempotent upsert**: `INSERT INTO control_perf_rollup_5m ... ON CONFLICT (provider_id, bucket) DO UPDATE` recomputed from raw — a re-run after a crash recomputes the same buckets, never double-counts. 2. **Delete raw only after the covering buckets are committed**, in **chunked transactions: one transaction per provider per 1-hour window** (≤720 rows each), never one 48h mega-transaction — bounds lock hold time so the live 5s poller's inserts into the same table never queue behind a multi-second aggregate+delete (verification C-N2). A crash between chunks leaves whole-hour windows either fully migrated or fully raw; the next run recomputes idempotently. 3. Activity > 90d pruned; captures capped per-row (256KB) and pruned by total budget. All windows configurable via `.env.host`. Retention is a **P1 task in the same slice as ingestion**, not a fast-follow — the bloat window between "ingestion starts" and "retention exists" degrades the shared DB that serves all of BooChat (analysis R3). ## 7. Attribution (X-Boo-Source) — own phase (P4), two blockers solved together The naive plan ("inject a header, small touch") is blocked on both inference paths: - **apps/server (BooChat streaming)**: `getSwapProvider()` caches `createOpenAICompatible` instances by `baseURL` in `swapCache`; headers are provider-level, baked at construction. Fix: a per-turn **fetch wrapper** — thread the source label through the call site and pass a wrapping `fetch` that injects `X-Boo-Source` (cache keyed by `baseURL+source` since the label set is tiny: `boochat|boocoder|arena|control-bench|control-eval`). **Interface constraint (verification S-N2):** `getSwapProvider` is private (fan-in 1), but the label must travel through the exported `upstreamModel`, whose file has a 28-file/13-route blast radius — the change MUST be additive (`upstreamModel(config, modelId, agent?, source?)` or an options object with optional `source`), never a breaking signature change; all existing call sites compile unchanged. The direct-fetch paths (`compaction.ts`, `task-model.ts`) just extend their existing headers object. - **apps/coder (opencode local gateway)**: `local-gateway.ts` builds a fresh headers object and silently strips inbound `X-Boo-Source`. Fix: forward it explicitly when present. Arena/dispatch direct paths set it at their own fetch sites. P4 lands: both fixes + the `control_requests.source` column migration + the `source` filter in the Activity UI. llama-swap's header capture (`captureBuffer`) must be enabled on the hosts first (P2 op task). Acceptance: a BooChat turn, a BooCoder dispatch, and an Arena battle each show their own label in the Activity feed; nothing shows NULL except genuinely external traffic. #### Implementation notes **P6.2 schedule meta lives in its own table, not on `control_reports`.** §3 sketched `control_reports + schedule meta: {interval, enabled, last_run_at}`. In implementation the scheduler state was split into a dedicated single-row `control_schedule_meta` table (keyed by schedule `name`, seeded `report-digest`) so generated `control_reports` rows stay immutable snapshots and the boot catch-up reads/writes one well-known row instead of scanning report history for the latest `last_run_at`. The retention-style hourly tick (`runReportSchedulerTick`) and the `{interval, enabled, last_run_at}` contract are unchanged. **P7 gateway identity.** The gateway registers as provider id `auto` (kind `boocontrol-gateway`); its virtual models are `auto`, `auto:code`, `auto:fast`, `auto:cheap`, so BooChat composite ids are `auto/auto:code` etc. and the wire model sent to the gateway is the bare virtual token. `getModelContext` reads `n_ctx` from the gateway's own `/upstream//props`, which proxies the first healthy candidate's props. The gateway is reached server-to-server via the registry baseUrl (not the `/api/control` proxy, which buffers responses and would break streaming). **P7 orphan detection.** An orphaned auto:* session is detected two ways: by registry `kind === 'boocontrol-gateway'` when the gateway is present (→ `gateway`), and by the virtual-model token shape (`auto` / `auto:*`) when the provider is absent (→ `gateway_error`, reason `offline`). The unknown-composite-provider swap fallback is overridden only for that token shape; all other unknown composites keep their existing best-effort swap behavior. **P9.1 uses shelled `ssh`, not an ssh2/SFTP library.** §5 and the P9 task say "SFTP read ... SFTP write". Implementation shells out to the system `ssh` (`cat ` to read, `cp` for the timestamped backup, `cat > ` over stdin to write, the configured `restart_cmd` to restart) with an explicit `-i -o IdentitiesOnly=yes -o BatchMode=yes`. This matches the established booterm SSH-via-shell precedent and the Gitea deploy-key lesson (never offer the agent's default key), and avoids adding an `ssh2` native dependency. The exec is injected (`SshExec`) so every failure path (unreadable host, backup fail, write fail, restart fail, health never recovers) is unit-tested without a live host. The fork `config-schema.json` is bundled at `apps/control/data/config-schema.json` and validated with ajv (added as a control dependency). Backup always precedes write, so a failed write leaves the timestamped backup intact. Not live-smoked: there is no reachable Windows SSH target in the implementation session (the documented "Windows SSH fiddliness" risk); the failure-path suite is the standing verification. **ActivityLogEntry does not carry request headers.** The llama-swap fork's `ActivityLogEntry` struct (`internal/server/metrics.go`) contains `ID`, `Timestamp`, `Model`, `ReqPath`, `RespContentType`, `RespStatusCode`, `Tokens`, `DurationMs`, `HasCapture` -- no `source` field and no request headers. The `X-Boo-Source` header IS captured in `ReqRespCapture.ReqHeaders` (`internal/server/captures.go`), but captures are stored separately in a zstd-compressed cache and fetched on-demand via `GET /api/captures/:id`, not in the metrics ring. Therefore the `control_requests.source` column is NULL for ring-ingested data. The column exists for: (1) future llama-swap versions that may add source to ActivityLogEntry, (2) manual backfill from captures, (3) non-ring sources (bench/eval direct calls that set source explicitly). The metrics ingest mapper writes NULL for source, matching what the ring provides. ## 8. Benchmark, eval, routing ### Speed bench (P3 — manual, safe-by-construction) - HTTP-level, through llama-swap (measures what BooChat actually experiences) with llama.cpp `timings` (`prompt_per_second`, `predicted_per_second`, `cache_n`) parsed from the final stream chunk; TTFT measured client-side at first delta. - Suite = grid of (prompt_len × gen_len × concurrency) × N repetitions; warmup excluded; results as aggregates + raw samples. Runner fan-out is **bounded** (suite-declared concurrency only, `Promise.allSettled`, never unbounded `Promise.all`). - **v1 safety model**: every run is user-initiated with an explicit takeover confirmation when the target host shows recent traffic; embedding-host-first defaults. The `inflight==0` check is a *courtesy gate*, not a guarantee — BooChat/BooCoder/Arena can race it (TOCTOU, four uncoordinated writers). v1 accepts this because a human clicked "run"; **unattended scheduling is explicitly deferred to P8** (fleet lease). Bench results note `concurrent_foreign_requests` observed during the run (from the activity stream) so polluted runs are flagged, not silently trusted. - Baselines + regression: each (provider_id, model) keeps a baseline aggregate; new runs flag deltas beyond threshold (e.g. gen tok/s −10%) → surfaces in Reports and as a fleet-card badge. - Later: `llama-bench` over SSH for device-level (no-server) numbers, JSON output ingested alongside (P9, with the SSH plumbing). ### Quality evals (P5) - **Suite program** (decided 2026-06-12): four suites measuring Sam's real workloads, in priority order — (1) **agent coding tasks** (TS/code-edit tasks like BooCoder dispatches, sandboxed pass@1), (2) **chat assistant quality** (judge rubrics), (3) **long-context retrieval** (needle/doc-QA for file-heavy sessions), (4) **utility calls** (titles/summaries/compaction — directly tunes the `FAST_MODEL` choice). - **Chat**: suite of curated prompts (data/ YAML, editable) scored by LLM-as-judge (rubric single-answer grading, MT-bench style; temperature 0, judge model + version pinned per run). Judge = strongest local model by default. Pairwise comparisons delegate to **Arena** (exists in apps/coder) — BooControl links/launches battles rather than re-implementing. - **Code**: HumanEval+/MBPP+-style tasks, executed in ephemeral sandbox containers on the homelab: `--network none`, non-root, mem/cpu/time caps, tmpfs workdir, `--rm`, kill-on-timeout, and a `boocontrol-eval` label so orphans are findable (`docker ps --filter label=...`) and pruned at engine start. Runner: **bounded concurrency** (default 4), `Promise.allSettled`, per-task `finally` cleanup — a single task failure never abandons in-flight containers (analysis C5; the CLAUDE.md child-supervisor lesson applies). `/opt/forks/openevals` is the reference implementation to borrow patterns from (TS). - Scorecards: per (provider_id, model, quant) leaderboard with speed × quality scatter — "is the Q4 actually worse for my use?" answered with my own suite, on my own hardware. ### Routing (P6 advisory → P7 live gateway, committed) - **P6 — advisory**: routing scores (eval results + live latency + host health) exposed via API; the model picker badges "best code model right now". - **P7 — gateway**: control service exposes OpenAI-compatible virtual models (`auto`, `auto:code`, `auto:fast`, `auto:cheap`) implementing policy: rule match → candidate ordering → health/ctx-fit filter → dispatch with failover. BooChat adopts by adding a registry entry (`{id: "auto", baseUrl: "http://100.114.205.53:9503", kind: "boocontrol-gateway"}`) — zero inference-path changes elsewhere. Frontier providers slot in as policy targets when added to the registry. - **Orphaned-session handling (explicit — REQUIRES a `provider.ts` code change, verification S-N1/B-N3)**: today `resolveModelProvider` silently falls back to `LLAMA_SWAP_URL` for any composite id with an unknown provider ("best-effort fallback, config incomplete" branch) — exactly the mis-route this section forbids. P7 must (a) extend the `InferenceRoute` union (currently `'swap' | 'deepseek'`) with a `'gateway'` variant (and an unhealthy/error representation), and (b) change the unknown-provider fallback so a known-`kind` gateway id that is missing/disabled resolves to a clean "routing gateway offline" error, never the swap fallback. All **5 callers** of `resolveModelProvider` must be audited for the new variant: `getModelContext`, `invalidateModelContext` (model-context.ts), `resolveRoute`, `upstreamModel`, `resolveModelEndpoint` (provider.ts). The session keeps its id, the picker flags it. Gateway-dispatched requests carry `X-Boo-Source` through to the target host so attribution survives the extra hop. - llama-swap `peers` could federate hosts at the proxy layer instead, but was rejected for the same reasons as the provider-registry research rejected it (flat list, coupled uptime, silent ID collisions). ### Fleet coordination lease (P8 — cross-service) The proper fix for the four-writer TOCTOU: a per-host advisory lease in the shared DB (`control_host_leases`: holder, purpose, expires_at, heartbeat) that BooControl's scheduler *requires* and BooChat/BooCoder/Arena *honor* (check-before-dispatch, or queue behind an exclusive bench lease). This touches all four services and is therefore its own batch with its own design pass. **The P3 seam is a named function, not a convention** (verification C1'): the bench runner gates every run through `acquireHostAccess(providerId, purpose): Promise` — the v1 implementation is the courtesy check (inflight==0 + takeover confirmation); P8 swaps its body for the lease without touching the bench engine. P3 implementers must NOT inline the inflight check in the runner. Unattended/scheduled benches and reproducible concurrency sweeps unlock here. ## 9. UI design direction Route `/control`, nav entry under Memory (ProjectSidebar bottom cluster). Sub-views as tabs within the page: **Fleet · Activity · Logs · Models · Bench · Evals · Reports**. - **Aesthetic**: dark mission-control. Host cards as instrument clusters: VRAM arc gauge, GPU temp/power readouts, model chips with state glow (amber pulse `starting`, green steady `ready`, red `error`, grey `down` with last-seen), TTL countdown rings. Orbitron (already in the font pipeline) for numerals only; Inter for prose; JetBrains Mono for logs/JSON. - **Motion**: framer-motion (already a dep) — spring layout transitions on model chips during swaps, count-up tweens on token totals, animated activity-feed inserts. Respect `prefers-reduced-motion`. - **Charts**: **ECharts** (decided 2026-06-12). Gauges, scatter, heatmaps built in — covers the VRAM arcs, speed×quality scatter, and perf timelines from one lib; dark-theme native; 5s streaming append handled via `appendData`/`setOption`. The <100KB preference is consciously traded for batteries-included breadth; import per-chart modules (`echarts/core` + needed renderers) to keep the bundle sane. - **Logs**: react-virtuoso tail-follow viewer (already a dep), per-source filter (proxy/upstream/model), pause-on-scroll. - **Inspector**: activity table (virtuoso) → capture drawer: headers table + shiki-highlighted JSON bodies + "Open in Playground" replay. - **Playground**: param-tweakable single-model chat + A/B compare; "Battle in Arena" handoff for full cross-examination. - Skills to drive the build pass: `frontend-design` (aesthetic direction), `ui-ux-pro-max` (dashboard/chart patterns), `frontend-ui-engineering` (production quality), existing theme tokens (oklch palettes) so BooControl follows the active theme. ## 10. Risks | Risk | Mitigation | |---|---| | PG bloat from time-series + captures | raw/rollup split; **retention job ships in the same P1 slice as ingestion**; UNIQUE constraints prevent restart-duplication inflation; capture size caps; measured in Reports (P7) | | Bench/eval evicts a model in active use | v1: manual runs + takeover confirmation + embedding-first + per-host action queue. Honest limit: `inflight==0` is a courtesy gate (TOCTOU vs 3 other writers). Real fix is the P8 lease | | llama-swap ring-id reset breaks dedup | DB UNIQUE on (provider_id, swap_entry_id, ts) + ON CONFLICT DO NOTHING — enforced at insert, not check-then-act | | Ring wraps during long outage | accepted bound; `gap_suspected` event logged with reconcile delta so loss is visible | | SSE disconnects / host down | backoff + jitter (opencode-sse pattern); explicit connected/reconnecting/down state machine + last_seen_at in control_fleet; favorites-style "hide, never delete" for offline hosts | | Snapshot/delta join race | per-host monotonic seq; client discards deltas ≤ snapshot seq | | Perf-poller restart duplicates | watermark recovered from MAX(ts) in DB; UNIQUE (provider_id, ts) | | Rollup crash double-count/loss | idempotent upsert + rollup-and-delete in one transaction | | Attribution silently NULL | no source column until P4; P4 solves both path blockers (server fetch wrapper + gateway forward) together with the migration | | Sandbox escape from generated code | no-network, non-root, caps, tmpfs, --rm, labeled for orphan prune; bounded allSettled runner with finally-cleanup; gVisor as upgrade path. Residual risk accepted for single-user | | LLM-judge bias/noise in chat evals | fixed rubrics, temperature 0, judge version pinned per run, pairwise via Arena for tie-breaks | | Windows SSH fiddliness (P9 config edit) | pre-apply JSON-schema validation (config-schema.json lives in the fork), timestamped backups before every write, health-wait after restart; stackctl's flow is the reference but gets tests here | | Orphaned `auto:*` sessions if gateway removed | resolver treats missing gateway provider as unhealthy-not-absent: clean error, no silent mis-route to LLAMA_SWAP_URL | | 5s × 2 hosts perf polling forever | trivial volume (~35k rows/day raw), rolled up + pruned at 48h | | Three applySchema callers race on restart | startup ordering guard: control waits for server-owned `sessions` table before applying schema |