feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
31 KiB
BooControl — design
Status: ACCEPTED — decisions resolved 2026-06-11; architecture-analysis findings folded in; verification-pass fixes applied 2026-06-12 (chart lib decided: ECharts, §9). No open design items.
1. Topology
┌─ Tailscale mesh ──────────────────────────────────────────────────────────┐
│ │
│ sam-desktop 100.101.41.16 (Windows, RTX 5090 32GB) │
│ llama-swap v224 :8401 ─ /api/events SSE, /api/performance(GPU), │
│ D:\llama-server (CUDA) /api/metrics, /api/captures, /running, │
│ /logs/stream, POST /api/models/unload │
│ │
│ embedding 100.90.172.55 (Linux, P104-100 8GB) │
│ llama-swap :8411 ─ same API surface; 39 small models, ttl 1800 │
│ │
│ ubuntu-homelab 100.114.205.53 (no GPU) │
│ boocode container :9500 (apps/server + apps/web) │
│ booterm container :9501 │
│ boocoder host svc :9502 (apps/coder) │
│ boocontrol host svc :9503 (apps/control) ◄── NEW │
│ postgres :5500 (boochat DB) │
└───────────────────────────────────────────────────────────────────────────┘
Browser ──WS/HTTP──► apps/server (/api/control/* proxy, WS relay)
└────────► apps/control :9503
├─ SSE client per provider (events)
├─ pollers (/api/performance?after=, /running)
├─ per-host action queue (warm/unload serialization)
├─ bench + eval engines (manual v1)
├─ ssh2 (P9 only: config edit + restart)
└─ Postgres (third schema owner, ordered startup)
Key fact that shapes everything: the llama-swap fork exposes GPU/system telemetry, token metrics, request captures, and log streams over HTTP per instance (internal/perf/types.go GpuStat/SysStat; internal/server/apigroup.go). The control service needs no agent on the GPU hosts. SSH is required only for config editing + service restart (P9).
Why a host service and not a container: SSH key handling (P9), spawning sandbox containers for code evals (talking to dockerd from inside a container is a privilege escalation we don't need), and parity with the boocoder operational pattern (systemd, .env.host, deploy via pnpm -C packages/contracts build && pnpm -C apps/control build && sudo systemctl restart boocontrol).
There is no sidecar. The llama-sidecar (:8402, per-agent flags) has been removed from the system entirely. No control-plane table, connector, or registry field references it.
2. Fleet identity = the provider registry (LlamaProvider.id)
The multi-provider batch introduces the shipped contract (packages/contracts/src/llama-providers.ts):
LlamaProviderSchema = { id, label, baseUrl, kind } // ids: "sam-desktop", "embedding"
BooControl keys every host-scoped row on provider_id = LlamaProvider.id — the field that actually exists and that resolveModelProvider already resolves by. (Earlier drafts said provider_name against a {name, sidecarUrl?} shape; that shape was never shipped.) Control-plane attributes extend the registry entry rather than inventing a parallel hosts table:
control_hosts
provider_id TEXT PK -- FK-by-convention to LlamaProvider.id ("sam-desktop", "embedding")
ssh_host TEXT, ssh_user TEXT, ssh_key_path TEXT -- nullable: no SSH = no config editing (P9)
config_path TEXT -- D:\llama-swap\config.yaml | ~/llama-swap/config.yaml (P9)
restart_cmd TEXT -- nssm/systemctl invocation (P9)
os TEXT, gpu_label TEXT -- display metadata
enabled BOOLEAN DEFAULT true
Lesson imported from stackctl's worst bug: its machines table was dropped + re-seeded on every container rebuild, losing user-added hosts. control_hosts rows are durable; seeding is INSERT ... ON CONFLICT DO NOTHING.
3. Schema ownership + startup ordering (third schema owner)
apps/control/src/schema.sql, applied by apps/control/src/db.ts:applySchema() on boot — the coder precedent. Two hardening rules the coder precedent lacks:
- Startup ordering guard. The coder schema holds real FKs into server-owned tables (
REFERENCES sessions(id),chats(id)); today the server-before-coder ordering is an accident of Docker-vs-host start timing. A third concurrentapplySchemacaller widens that race, soapps/controlmakes the ordering explicit:
// apps/control/src/index.ts — before applySchema()
await waitForTable(sql, 'sessions', 30_000); // poll information_schema; THROWS on timeout
await applySchema(sql);
"Fail loud" means throw → process exits nonzero → systemd (Restart=on-failure) retries. The guard is enforcing, not advisory: applySchema is never reached if the server schema is absent, so a partial-DDL state cannot occur.
(Control tables themselves currently take no FKs into server tables, but the guard costs one query and removes the timing dependency for any future FK.)
- Dedup is enforced by the database, not application checks. Every ingest table whose dedup matters carries a UNIQUE constraint and is written with
INSERT ... ON CONFLICT DO NOTHING— check-then-act application dedup is racy under concurrent SSE + reconcile writers (analysis C2/C7).
control_requests -- persisted ActivityLogEntry stream (the thing llama-swap forgets on restart)
id BIGSERIAL PK, provider_id TEXT, swap_entry_id INT, -- llama-swap's ring id
ts TIMESTAMPTZ, model TEXT, req_path TEXT, status_code INT,
duration_ms INT, cache_tokens INT, input_tokens INT, output_tokens INT,
prompt_tps REAL, gen_tps REAL, has_capture BOOLEAN,
capture JSONB, -- nullable; fetched-on-demand copy (req/resp, capped)
UNIQUE (provider_id, swap_entry_id, ts) -- survives ring-id reset; INSERT ... ON CONFLICT DO NOTHING
-- NOTE: no `source` column in P1. The X-Boo-Source attribution column is added by the
-- P4 migration, when injection actually works end-to-end (see §7). No NULL-forever rows.
control_perf_samples -- raw SysStat+GpuStat, short retention (48h default)
provider_id TEXT, ts TIMESTAMPTZ, gpu JSONB, sys JSONB,
UNIQUE (provider_id, ts) -- restart-safe: re-polled samples no-op
control_perf_rollup_5m -- avg/max per 5min bucket, long retention (90d)
provider_id TEXT, bucket TIMESTAMPTZ, gpu_agg JSONB, sys_agg JSONB,
UNIQUE (provider_id, bucket) -- rollup is an idempotent upsert (§6)
control_model_events -- state transitions (stopped→starting→ready→stopping), swap durations
provider_id, model, state, ts, detail JSONB,
UNIQUE (provider_id, model, state, ts) -- reconcile can re-deliver model status; same ON CONFLICT DO NOTHING discipline
bench_suites / bench_runs / bench_samples
-- suite: {prompt_tokens[], gen_tokens[], concurrency[], repetitions}
-- sample: per-request timings (ttft_ms, prompt_tps, gen_tps, total_ms) + run aggregates
eval_suites / eval_runs / eval_results
-- suite: kind chat|code, tasks JSONB (prompt, reference, checker), judge_model
-- result: per-task score, judge rationale / execution log, sandbox exit info
route_policies -- P7: name, match rules JSONB, target ordering, fallback
control_reports -- generated digests (markdown + JSONB stats)
+ schedule meta: {interval: 'daily'|'weekly', enabled, last_run_at TIMESTAMPTZ}
-- driven by the SAME in-process timer pattern as the retention job (P6): hourly tick
-- checks last_run_at vs interval, runs if due (catch-up on boot included). No cron dep,
-- no new scheduler abstraction (S7 stays YAGNI-deferred; reopen trigger unchanged).
clock_timestamp() inside transactions per repo convention; JSONB via sql.json(...).
4. Ingestion semantics
- SSE consumer per enabled host:
GET /api/events→ envelopesmodelStatus | logData | metrics | inflight. Reconnect with backoff + jitter (reconnect/circuit-breaker pattern:apps/coder/src/services/backends/opencode-sse.ts— NOTE the source has exponential backoff + circuit breaker but NO jitter; add jitter explicitly here, random 0-50% of the computed delay, per plan finding V1/F3). On reconnect, reconcile viaGET /api/metrics(full ring). Reconcile and live SSE may both insert the same entry concurrently — that is fine because dedup is the DB UNIQUE constraint (ON CONFLICT DO NOTHING), not a check-then-act. The dedup key(provider_id, swap_entry_id, ts)includes the timestamp because llama-swap's ring ids restart from 0 on its restart.- Known bound, accepted: the ring holds 1000 entries. An outage longer than 1000 requests loses the overwritten tail permanently — log a
gap_suspectedmodel event so the loss is visible rather than silent. Detection rule (no-overlap heuristic): if the oldest entry in the reconcile fetch is newer than the newest already-persisted entry for that provider, the ring wrapped past our tail; emitgap_suspectedwith both timestamps indetail. Overlap present = no gap, no event. - Second accepted residual: a genuinely-new post-restart entry whose
(swap_entry_id, ts)exactly collides with a pre-restart row (same ring slot, same timestamp to llama-swap'stsprecision) is silently dropped by the UNIQUE constraint. Window = one entry per restart at sub-precision coincidence; accepted, not solvable client-side without a content hash in the key.
- Known bound, accepted: the ring holds 1000 entries. An outage longer than 1000 requests loses the overwritten tail permanently — log a
- Perf poller:
GET /api/performance?after=<last-ts>every 5s (llama-swap's own minimum collection interval). The watermark is recovered on restart fromMAX(ts)per provider incontrol_perf_samples(not in-memory only); duplicate polls no-op on the UNIQUE constraint. Cold start (MAX(ts)= NULL, fresh install): omitafterentirely and ingest whatever window the host returns — the UNIQUE constraint makes over-fetch harmless, and the next poll has a watermark. - Host liveness is explicit state, not absence of data. Each connector runs a small state machine
connected | reconnecting | down(down after N failed reconnects); transitions publish acontrol_fleetdelta and stampcontrol_hosts-adjacent in-memory state withlast_seen_at. A late-joining browser therefore seesdown + last_seen_at, never a stale "ready" snapshot (analysis B3). - Snapshot/delta consistency. The fleet state keeps a per-host monotonic
seq, incremented on every mutation. The join snapshot carries the currentseqs; every delta carries itsseq. Client rule: buffer (do not apply, do not discard) any delta that arrives before the snapshot; after applying the snapshot, replay the buffer dropping deltas withseq <=the snapshot's per-host seq, and apply the filter to all subsequent deltas. On a single FIFO WS pre-snapshot deltas should not occur, but buffering makes the rule transport-independent. This closes the join race where a delta arrives during snapshot serialization (analysis B4). - Logs are not persisted by default (volume + low value at rest); they relay live SSE → WS with an in-memory tail buffer (last ~2k lines per host) for late joiners. Optional "record to file" toggle later.
- Fan-out to browser: the control service publishes over its own WS (
/api/ws/control), relayed by apps/server's proxy as/api/control/ws. This is a second app-level WS connection in the browser —useControlStreamgets its own singleton guard + context; it does NOT shareuseUserEvents'/api/ws/userchannel. Frames (added topackages/contracts/src/ws-frames.tsfirst, then the server loose union, then the web strict union — and the contracts drift test extended to cover them, so a partial edit fails the suite):control_fleet— full snapshot on join + seq-stamped state deltas (hosts, liveness, models, states, ttl deadlines, inflight)control_activity— new request rows (the live feed)control_perf— appended samples per hostcontrol_log—{provider_id, source: proxy|upstream, line}batchescontrol_job— bench/eval run progress events
5. Actions
| Action | Mechanism |
|---|---|
| Warm/load model | 1-token POST /v1/chat/completions with the bare wire ID (stackctl-proven; llama-swap loads on demand — there is no load endpoint) |
| Unload one/all | POST /api/models/unload/:model / /api/models/unload |
| Inspect request | GET /api/captures/:id on the host, decode base64, persist trimmed copy, render |
| Bench/eval runs | engines below (manual v1) |
| Edit config / restart llama-swap | P9 (SFTP + schema validation + diff + timestamped backup + restart + health-wait) |
Per-host action queue. All host-mutating actions (warm, unload, bench warm-up) from BooControl serialize through a single FIFO queue per provider_id inside the control service — double-clicks, warm-during-warm, and unload-during-bench from this service cannot interleave (analysis C3). An unload request while a bench run holds the host is rejected with a "bench in progress — takeover?" confirmation. Queue discipline (verification C-N1): submissions are rejected immediately while the host's liveness state is down ("host offline" toast); queue depth is capped (4) with reject-on-full; each action re-checks liveness on dequeue and skips itself if stale — a recovered host never replays a backlog of stale warms. (Pattern precedent: arena-runner.ts advanceChain promise-chain, plus its read-fresh-state-or-skip discipline.) This serializes BooControl's own hands only; BooChat/BooCoder/Arena traffic is uncoordinated until P8.
All mutating actions publish control_job/control_fleet frames; UI handlers stay idempotent (event-dedup discipline per CLAUDE.md — no local emit after API call).
Manual op checklist (P2.5): Before the capture inspector works end-to-end, enable captureBuffer and review metricsMaxInMemory on both hosts' llama-swap configs. These are per-host settings in config.yaml and must be set before captures will be available:
- sam-desktop: set
captureBuffer: trueand verifymetricsMaxInMemory(default 1000, sufficient for most workloads) - embedding: set
captureBuffer: trueand verifymetricsMaxInMemory - Restart llama-swap on both hosts after config changes
6. Retention (ships in the same P1 slice as ingestion)
Daily job, crash-safe by construction:
- Rollup is an idempotent upsert:
INSERT INTO control_perf_rollup_5m ... ON CONFLICT (provider_id, bucket) DO UPDATErecomputed from raw — a re-run after a crash recomputes the same buckets, never double-counts. - Delete raw only after the covering buckets are committed, in chunked transactions: one transaction per provider per 1-hour window (≤720 rows each), never one 48h mega-transaction — bounds lock hold time so the live 5s poller's inserts into the same table never queue behind a multi-second aggregate+delete (verification C-N2). A crash between chunks leaves whole-hour windows either fully migrated or fully raw; the next run recomputes idempotently.
- Activity > 90d pruned; captures capped per-row (256KB) and pruned by total budget. All windows configurable via
.env.host.
Retention is a P1 task in the same slice as ingestion, not a fast-follow — the bloat window between "ingestion starts" and "retention exists" degrades the shared DB that serves all of BooChat (analysis R3).
7. Attribution (X-Boo-Source) — own phase (P4), two blockers solved together
The naive plan ("inject a header, small touch") is blocked on both inference paths:
- apps/server (BooChat streaming):
getSwapProvider()cachescreateOpenAICompatibleinstances bybaseURLinswapCache; headers are provider-level, baked at construction. Fix: a per-turn fetch wrapper — thread the source label through the call site and pass a wrappingfetchthat injectsX-Boo-Source(cache keyed bybaseURL+sourcesince the label set is tiny:boochat|boocoder|arena|control-bench|control-eval). Interface constraint (verification S-N2):getSwapProvideris private (fan-in 1), but the label must travel through the exportedupstreamModel, whose file has a 28-file/13-route blast radius — the change MUST be additive (upstreamModel(config, modelId, agent?, source?)or an options object with optionalsource), never a breaking signature change; all existing call sites compile unchanged. The direct-fetch paths (compaction.ts,task-model.ts) just extend their existing headers object. - apps/coder (opencode local gateway):
local-gateway.tsbuilds a fresh headers object and silently strips inboundX-Boo-Source. Fix: forward it explicitly when present. Arena/dispatch direct paths set it at their own fetch sites.
P4 lands: both fixes + the control_requests.source column migration + the source filter in the Activity UI. llama-swap's header capture (captureBuffer) must be enabled on the hosts first (P2 op task). Acceptance: a BooChat turn, a BooCoder dispatch, and an Arena battle each show their own label in the Activity feed; nothing shows NULL except genuinely external traffic.
Implementation notes
P6.2 schedule meta lives in its own table, not on control_reports. §3 sketched control_reports + schedule meta: {interval, enabled, last_run_at}. In implementation the scheduler state was split into a dedicated single-row control_schedule_meta table (keyed by schedule name, seeded report-digest) so generated control_reports rows stay immutable snapshots and the boot catch-up reads/writes one well-known row instead of scanning report history for the latest last_run_at. The retention-style hourly tick (runReportSchedulerTick) and the {interval, enabled, last_run_at} contract are unchanged.
P7 gateway identity. The gateway registers as provider id auto (kind boocontrol-gateway); its virtual models are auto, auto:code, auto:fast, auto:cheap, so BooChat composite ids are auto/auto:code etc. and the wire model sent to the gateway is the bare virtual token. getModelContext reads n_ctx from the gateway's own /upstream/<virtual>/props, which proxies the first healthy candidate's props. The gateway is reached server-to-server via the registry baseUrl (not the /api/control proxy, which buffers responses and would break streaming).
P7 orphan detection. An orphaned auto:* session is detected two ways: by registry kind === 'boocontrol-gateway' when the gateway is present (→ gateway), and by the virtual-model token shape (auto / auto:*) when the provider is absent (→ gateway_error, reason offline). The unknown-composite-provider swap fallback is overridden only for that token shape; all other unknown composites keep their existing best-effort swap behavior.
P9.1 uses shelled ssh, not an ssh2/SFTP library. §5 and the P9 task say "SFTP read ... SFTP write". Implementation shells out to the system ssh (cat <path> to read, cp for the timestamped backup, cat > <path> over stdin to write, the configured restart_cmd to restart) with an explicit -i <key> -o IdentitiesOnly=yes -o BatchMode=yes. This matches the established booterm SSH-via-shell precedent and the Gitea deploy-key lesson (never offer the agent's default key), and avoids adding an ssh2 native dependency. The exec is injected (SshExec) so every failure path (unreadable host, backup fail, write fail, restart fail, health never recovers) is unit-tested without a live host. The fork config-schema.json is bundled at apps/control/data/config-schema.json and validated with ajv (added as a control dependency). Backup always precedes write, so a failed write leaves the timestamped backup intact. Not live-smoked: there is no reachable Windows SSH target in the implementation session (the documented "Windows SSH fiddliness" risk); the failure-path suite is the standing verification.
ActivityLogEntry does not carry request headers. The llama-swap fork's ActivityLogEntry struct (internal/server/metrics.go) contains ID, Timestamp, Model, ReqPath, RespContentType, RespStatusCode, Tokens, DurationMs, HasCapture -- no source field and no request headers. The X-Boo-Source header IS captured in ReqRespCapture.ReqHeaders (internal/server/captures.go), but captures are stored separately in a zstd-compressed cache and fetched on-demand via GET /api/captures/:id, not in the metrics ring.
Therefore the control_requests.source column is NULL for ring-ingested data. The column exists for: (1) future llama-swap versions that may add source to ActivityLogEntry, (2) manual backfill from captures, (3) non-ring sources (bench/eval direct calls that set source explicitly). The metrics ingest mapper writes NULL for source, matching what the ring provides.
8. Benchmark, eval, routing
Speed bench (P3 — manual, safe-by-construction)
- HTTP-level, through llama-swap (measures what BooChat actually experiences) with llama.cpp
timings(prompt_per_second,predicted_per_second,cache_n) parsed from the final stream chunk; TTFT measured client-side at first delta. - Suite = grid of (prompt_len × gen_len × concurrency) × N repetitions; warmup excluded; results as aggregates + raw samples. Runner fan-out is bounded (suite-declared concurrency only,
Promise.allSettled, never unboundedPromise.all). - v1 safety model: every run is user-initiated with an explicit takeover confirmation when the target host shows recent traffic; embedding-host-first defaults. The
inflight==0check is a courtesy gate, not a guarantee — BooChat/BooCoder/Arena can race it (TOCTOU, four uncoordinated writers). v1 accepts this because a human clicked "run"; unattended scheduling is explicitly deferred to P8 (fleet lease). Bench results noteconcurrent_foreign_requestsobserved during the run (from the activity stream) so polluted runs are flagged, not silently trusted. - Baselines + regression: each (provider_id, model) keeps a baseline aggregate; new runs flag deltas beyond threshold (e.g. gen tok/s −10%) → surfaces in Reports and as a fleet-card badge.
- Later:
llama-benchover SSH for device-level (no-server) numbers, JSON output ingested alongside (P9, with the SSH plumbing).
Quality evals (P5)
- Suite program (decided 2026-06-12): four suites measuring Sam's real workloads, in priority order — (1) agent coding tasks (TS/code-edit tasks like BooCoder dispatches, sandboxed pass@1), (2) chat assistant quality (judge rubrics), (3) long-context retrieval (needle/doc-QA for file-heavy sessions), (4) utility calls (titles/summaries/compaction — directly tunes the
FAST_MODELchoice). - Chat: suite of curated prompts (data/ YAML, editable) scored by LLM-as-judge (rubric single-answer grading, MT-bench style; temperature 0, judge model + version pinned per run). Judge = strongest local model by default. Pairwise comparisons delegate to Arena (exists in apps/coder) — BooControl links/launches battles rather than re-implementing.
- Code: HumanEval+/MBPP+-style tasks, executed in ephemeral sandbox containers on the homelab:
--network none, non-root, mem/cpu/time caps, tmpfs workdir,--rm, kill-on-timeout, and aboocontrol-evallabel so orphans are findable (docker ps --filter label=...) and pruned at engine start. Runner: bounded concurrency (default 4),Promise.allSettled, per-taskfinallycleanup — a single task failure never abandons in-flight containers (analysis C5; the CLAUDE.md child-supervisor lesson applies)./opt/forks/openevalsis the reference implementation to borrow patterns from (TS). - Scorecards: per (provider_id, model, quant) leaderboard with speed × quality scatter — "is the Q4 actually worse for my use?" answered with my own suite, on my own hardware.
Routing (P6 advisory → P7 live gateway, committed)
- P6 — advisory: routing scores (eval results + live latency + host health) exposed via API; the model picker badges "best code model right now".
- P7 — gateway: control service exposes OpenAI-compatible virtual models (
auto,auto:code,auto:fast,auto:cheap) implementing policy: rule match → candidate ordering → health/ctx-fit filter → dispatch with failover. BooChat adopts by adding a registry entry ({id: "auto", baseUrl: "http://100.114.205.53:9503", kind: "boocontrol-gateway"}) — zero inference-path changes elsewhere. Frontier providers slot in as policy targets when added to the registry.- Orphaned-session handling (explicit — REQUIRES a
provider.tscode change, verification S-N1/B-N3): todayresolveModelProvidersilently falls back toLLAMA_SWAP_URLfor any composite id with an unknown provider ("best-effort fallback, config incomplete" branch) — exactly the mis-route this section forbids. P7 must (a) extend theInferenceRouteunion (currently'swap' | 'deepseek') with a'gateway'variant (and an unhealthy/error representation), and (b) change the unknown-provider fallback so a known-kindgateway id that is missing/disabled resolves to a clean "routing gateway offline" error, never the swap fallback. All 5 callers ofresolveModelProvidermust be audited for the new variant:getModelContext,invalidateModelContext(model-context.ts),resolveRoute,upstreamModel,resolveModelEndpoint(provider.ts). The session keeps its id, the picker flags it. Gateway-dispatched requests carryX-Boo-Sourcethrough to the target host so attribution survives the extra hop.
- Orphaned-session handling (explicit — REQUIRES a
- llama-swap
peerscould federate hosts at the proxy layer instead, but was rejected for the same reasons as the provider-registry research rejected it (flat list, coupled uptime, silent ID collisions).
Fleet coordination lease (P8 — cross-service)
The proper fix for the four-writer TOCTOU: a per-host advisory lease in the shared DB (control_host_leases: holder, purpose, expires_at, heartbeat) that BooControl's scheduler requires and BooChat/BooCoder/Arena honor (check-before-dispatch, or queue behind an exclusive bench lease). This touches all four services and is therefore its own batch with its own design pass. The P3 seam is a named function, not a convention (verification C1'): the bench runner gates every run through acquireHostAccess(providerId, purpose): Promise<HostGrant> — the v1 implementation is the courtesy check (inflight==0 + takeover confirmation); P8 swaps its body for the lease without touching the bench engine. P3 implementers must NOT inline the inflight check in the runner. Unattended/scheduled benches and reproducible concurrency sweeps unlock here.
9. UI design direction
Route /control, nav entry under Memory (ProjectSidebar bottom cluster). Sub-views as tabs within the page: Fleet · Activity · Logs · Models · Bench · Evals · Reports.
- Aesthetic: dark mission-control. Host cards as instrument clusters: VRAM arc gauge, GPU temp/power readouts, model chips with state glow (amber pulse
starting, green steadyready, rederror, greydownwith last-seen), TTL countdown rings. Orbitron (already in the font pipeline) for numerals only; Inter for prose; JetBrains Mono for logs/JSON. - Motion: framer-motion (already a dep) — spring layout transitions on model chips during swaps, count-up tweens on token totals, animated activity-feed inserts. Respect
prefers-reduced-motion. - Charts: ECharts (decided 2026-06-12). Gauges, scatter, heatmaps built in — covers the VRAM arcs, speed×quality scatter, and perf timelines from one lib; dark-theme native; 5s streaming append handled via
appendData/setOption. The <100KB preference is consciously traded for batteries-included breadth; import per-chart modules (echarts/core+ needed renderers) to keep the bundle sane. - Logs: react-virtuoso tail-follow viewer (already a dep), per-source filter (proxy/upstream/model), pause-on-scroll.
- Inspector: activity table (virtuoso) → capture drawer: headers table + shiki-highlighted JSON bodies + "Open in Playground" replay.
- Playground: param-tweakable single-model chat + A/B compare; "Battle in Arena" handoff for full cross-examination.
- Skills to drive the build pass:
frontend-design(aesthetic direction),ui-ux-pro-max(dashboard/chart patterns),frontend-ui-engineering(production quality), existing theme tokens (oklch palettes) so BooControl follows the active theme.
10. Risks
| Risk | Mitigation |
|---|---|
| PG bloat from time-series + captures | raw/rollup split; retention job ships in the same P1 slice as ingestion; UNIQUE constraints prevent restart-duplication inflation; capture size caps; measured in Reports (P7) |
| Bench/eval evicts a model in active use | v1: manual runs + takeover confirmation + embedding-first + per-host action queue. Honest limit: inflight==0 is a courtesy gate (TOCTOU vs 3 other writers). Real fix is the P8 lease |
| llama-swap ring-id reset breaks dedup | DB UNIQUE on (provider_id, swap_entry_id, ts) + ON CONFLICT DO NOTHING — enforced at insert, not check-then-act |
| Ring wraps during long outage | accepted bound; gap_suspected event logged with reconcile delta so loss is visible |
| SSE disconnects / host down | backoff + jitter (opencode-sse pattern); explicit connected/reconnecting/down state machine + last_seen_at in control_fleet; favorites-style "hide, never delete" for offline hosts |
| Snapshot/delta join race | per-host monotonic seq; client discards deltas ≤ snapshot seq |
| Perf-poller restart duplicates | watermark recovered from MAX(ts) in DB; UNIQUE (provider_id, ts) |
| Rollup crash double-count/loss | idempotent upsert + rollup-and-delete in one transaction |
| Attribution silently NULL | no source column until P4; P4 solves both path blockers (server fetch wrapper + gateway forward) together with the migration |
| Sandbox escape from generated code | no-network, non-root, caps, tmpfs, --rm, labeled for orphan prune; bounded allSettled runner with finally-cleanup; gVisor as upgrade path. Residual risk accepted for single-user |
| LLM-judge bias/noise in chat evals | fixed rubrics, temperature 0, judge version pinned per run, pairwise via Arena for tie-breaks |
| Windows SSH fiddliness (P9 config edit) | pre-apply JSON-schema validation (config-schema.json lives in the fork), timestamped backups before every write, health-wait after restart; stackctl's flow is the reference but gets tests here |
Orphaned auto:* sessions if gateway removed |
resolver treats missing gateway provider as unhealthy-not-absent: clean error, no silent mis-route to LLAMA_SWAP_URL |
| 5s × 2 hosts perf polling forever | trivial volume (~35k rows/day raw), rolled up + pruned at 48h |
| Three applySchema callers race on restart | startup ordering guard: control waits for server-owned sessions table before applying schema |