Files
boocode/openspec/changes/boocontrol/design.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

247 lines
31 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# BooControl — design
**Status:** ACCEPTED — decisions resolved 2026-06-11; architecture-analysis findings folded in; verification-pass fixes applied 2026-06-12 (chart lib decided: ECharts, §9). No open design items.
## 1. Topology
```
┌─ Tailscale mesh ──────────────────────────────────────────────────────────┐
│ │
│ sam-desktop 100.101.41.16 (Windows, RTX 5090 32GB) │
│ llama-swap v224 :8401 ─ /api/events SSE, /api/performance(GPU), │
│ D:\llama-server (CUDA) /api/metrics, /api/captures, /running, │
│ /logs/stream, POST /api/models/unload │
│ │
│ embedding 100.90.172.55 (Linux, P104-100 8GB) │
│ llama-swap :8411 ─ same API surface; 39 small models, ttl 1800 │
│ │
│ ubuntu-homelab 100.114.205.53 (no GPU) │
│ boocode container :9500 (apps/server + apps/web) │
│ booterm container :9501 │
│ boocoder host svc :9502 (apps/coder) │
│ boocontrol host svc :9503 (apps/control) ◄── NEW │
│ postgres :5500 (boochat DB) │
└───────────────────────────────────────────────────────────────────────────┘
Browser ──WS/HTTP──► apps/server (/api/control/* proxy, WS relay)
└────────► apps/control :9503
├─ SSE client per provider (events)
├─ pollers (/api/performance?after=, /running)
├─ per-host action queue (warm/unload serialization)
├─ bench + eval engines (manual v1)
├─ ssh2 (P9 only: config edit + restart)
└─ Postgres (third schema owner, ordered startup)
```
Key fact that shapes everything: **the llama-swap fork exposes GPU/system telemetry, token metrics, request captures, and log streams over HTTP per instance** (`internal/perf/types.go` GpuStat/SysStat; `internal/server/apigroup.go`). The control service needs no agent on the GPU hosts. SSH is required only for config editing + service restart (P9).
Why a host service and not a container: SSH key handling (P9), spawning sandbox containers for code evals (talking to dockerd from inside a container is a privilege escalation we don't need), and parity with the boocoder operational pattern (systemd, `.env.host`, deploy via `pnpm -C packages/contracts build && pnpm -C apps/control build && sudo systemctl restart boocontrol`).
**There is no sidecar.** The llama-sidecar (:8402, per-agent flags) has been removed from the system entirely. No control-plane table, connector, or registry field references it.
## 2. Fleet identity = the provider registry (`LlamaProvider.id`)
The multi-provider batch introduces the shipped contract (`packages/contracts/src/llama-providers.ts`):
```ts
LlamaProviderSchema = { id, label, baseUrl, kind } // ids: "sam-desktop", "embedding"
```
BooControl keys every host-scoped row on **`provider_id` = `LlamaProvider.id`** — the field that actually exists and that `resolveModelProvider` already resolves by. (Earlier drafts said `provider_name` against a `{name, sidecarUrl?}` shape; that shape was never shipped.) Control-plane attributes extend the registry entry rather than inventing a parallel hosts table:
```
control_hosts
provider_id TEXT PK -- FK-by-convention to LlamaProvider.id ("sam-desktop", "embedding")
ssh_host TEXT, ssh_user TEXT, ssh_key_path TEXT -- nullable: no SSH = no config editing (P9)
config_path TEXT -- D:\llama-swap\config.yaml | ~/llama-swap/config.yaml (P9)
restart_cmd TEXT -- nssm/systemctl invocation (P9)
os TEXT, gpu_label TEXT -- display metadata
enabled BOOLEAN DEFAULT true
```
Lesson imported from stackctl's worst bug: its machines table was dropped + re-seeded on every container rebuild, losing user-added hosts. `control_hosts` rows are durable; seeding is `INSERT ... ON CONFLICT DO NOTHING`.
## 3. Schema ownership + startup ordering (third schema owner)
`apps/control/src/schema.sql`, applied by `apps/control/src/db.ts:applySchema()` on boot — the coder precedent. Two hardening rules the coder precedent lacks:
1. **Startup ordering guard.** The coder schema holds real FKs into server-owned tables (`REFERENCES sessions(id)`, `chats(id)`); today the server-before-coder ordering is an accident of Docker-vs-host start timing. A third concurrent `applySchema` caller widens that race, so `apps/control` makes the ordering explicit:
```ts
// apps/control/src/index.ts — before applySchema()
await waitForTable(sql, 'sessions', 30_000); // poll information_schema; THROWS on timeout
await applySchema(sql);
```
"Fail loud" means **throw → process exits nonzero → systemd (`Restart=on-failure`) retries**. The guard is enforcing, not advisory: `applySchema` is never reached if the server schema is absent, so a partial-DDL state cannot occur.
(Control tables themselves currently take no FKs into server tables, but the guard costs one query and removes the timing dependency for any future FK.)
2. **Dedup is enforced by the database, not application checks.** Every ingest table whose dedup matters carries a UNIQUE constraint and is written with `INSERT ... ON CONFLICT DO NOTHING` — check-then-act application dedup is racy under concurrent SSE + reconcile writers (analysis C2/C7).
```
control_requests -- persisted ActivityLogEntry stream (the thing llama-swap forgets on restart)
id BIGSERIAL PK, provider_id TEXT, swap_entry_id INT, -- llama-swap's ring id
ts TIMESTAMPTZ, model TEXT, req_path TEXT, status_code INT,
duration_ms INT, cache_tokens INT, input_tokens INT, output_tokens INT,
prompt_tps REAL, gen_tps REAL, has_capture BOOLEAN,
capture JSONB, -- nullable; fetched-on-demand copy (req/resp, capped)
UNIQUE (provider_id, swap_entry_id, ts) -- survives ring-id reset; INSERT ... ON CONFLICT DO NOTHING
-- NOTE: no `source` column in P1. The X-Boo-Source attribution column is added by the
-- P4 migration, when injection actually works end-to-end (see §7). No NULL-forever rows.
control_perf_samples -- raw SysStat+GpuStat, short retention (48h default)
provider_id TEXT, ts TIMESTAMPTZ, gpu JSONB, sys JSONB,
UNIQUE (provider_id, ts) -- restart-safe: re-polled samples no-op
control_perf_rollup_5m -- avg/max per 5min bucket, long retention (90d)
provider_id TEXT, bucket TIMESTAMPTZ, gpu_agg JSONB, sys_agg JSONB,
UNIQUE (provider_id, bucket) -- rollup is an idempotent upsert (§6)
control_model_events -- state transitions (stopped→starting→ready→stopping), swap durations
provider_id, model, state, ts, detail JSONB,
UNIQUE (provider_id, model, state, ts) -- reconcile can re-deliver model status; same ON CONFLICT DO NOTHING discipline
bench_suites / bench_runs / bench_samples
-- suite: {prompt_tokens[], gen_tokens[], concurrency[], repetitions}
-- sample: per-request timings (ttft_ms, prompt_tps, gen_tps, total_ms) + run aggregates
eval_suites / eval_runs / eval_results
-- suite: kind chat|code, tasks JSONB (prompt, reference, checker), judge_model
-- result: per-task score, judge rationale / execution log, sandbox exit info
route_policies -- P7: name, match rules JSONB, target ordering, fallback
control_reports -- generated digests (markdown + JSONB stats)
+ schedule meta: {interval: 'daily'|'weekly', enabled, last_run_at TIMESTAMPTZ}
-- driven by the SAME in-process timer pattern as the retention job (P6): hourly tick
-- checks last_run_at vs interval, runs if due (catch-up on boot included). No cron dep,
-- no new scheduler abstraction (S7 stays YAGNI-deferred; reopen trigger unchanged).
```
`clock_timestamp()` inside transactions per repo convention; JSONB via `sql.json(...)`.
## 4. Ingestion semantics
- **SSE consumer** per enabled host: `GET /api/events` → envelopes `modelStatus | logData | metrics | inflight`. Reconnect with backoff + jitter (reconnect/circuit-breaker pattern: `apps/coder/src/services/backends/opencode-sse.ts` — NOTE the source has exponential backoff + circuit breaker but NO jitter; add jitter explicitly here, random 0-50% of the computed delay, per plan finding V1/F3). On reconnect, reconcile via `GET /api/metrics` (full ring). Reconcile and live SSE may both insert the same entry concurrently — that is fine **because dedup is the DB UNIQUE constraint** (`ON CONFLICT DO NOTHING`), not a check-then-act. The dedup key `(provider_id, swap_entry_id, ts)` includes the timestamp because llama-swap's ring ids restart from 0 on its restart.
- **Known bound, accepted:** the ring holds 1000 entries. An outage longer than 1000 requests loses the overwritten tail permanently — log a `gap_suspected` model event so the loss is visible rather than silent. **Detection rule (no-overlap heuristic):** if the *oldest* entry in the reconcile fetch is newer than the newest already-persisted entry for that provider, the ring wrapped past our tail; emit `gap_suspected` with both timestamps in `detail`. Overlap present = no gap, no event.
- **Second accepted residual:** a genuinely-new post-restart entry whose `(swap_entry_id, ts)` exactly collides with a pre-restart row (same ring slot, same timestamp to llama-swap's `ts` precision) is silently dropped by the UNIQUE constraint. Window = one entry per restart at sub-precision coincidence; accepted, not solvable client-side without a content hash in the key.
- **Perf poller**: `GET /api/performance?after=<last-ts>` every 5s (llama-swap's own minimum collection interval). The watermark is recovered on restart from `MAX(ts)` per provider in `control_perf_samples` (not in-memory only); duplicate polls no-op on the UNIQUE constraint. **Cold start (`MAX(ts)` = NULL, fresh install):** omit `after` entirely and ingest whatever window the host returns — the UNIQUE constraint makes over-fetch harmless, and the next poll has a watermark.
- **Host liveness is explicit state, not absence of data.** Each connector runs a small state machine `connected | reconnecting | down` (down after N failed reconnects); transitions publish a `control_fleet` delta and stamp `control_hosts`-adjacent in-memory state with `last_seen_at`. A late-joining browser therefore sees `down + last_seen_at`, never a stale "ready" snapshot (analysis B3).
- **Snapshot/delta consistency.** The fleet state keeps a per-host monotonic `seq`, incremented on every mutation. The join snapshot carries the current `seq`s; every delta carries its `seq`. Client rule: **buffer (do not apply, do not discard) any delta that arrives before the snapshot**; after applying the snapshot, replay the buffer dropping deltas with `seq <=` the snapshot's per-host seq, and apply the filter to all subsequent deltas. On a single FIFO WS pre-snapshot deltas should not occur, but buffering makes the rule transport-independent. This closes the join race where a delta arrives during snapshot serialization (analysis B4).
- **Logs are not persisted** by default (volume + low value at rest); they relay live SSE → WS with an in-memory tail buffer (last ~2k lines per host) for late joiners. Optional "record to file" toggle later.
- **Fan-out to browser**: the control service publishes over its own WS (`/api/ws/control`), relayed by apps/server's proxy as `/api/control/ws`. This is a **second app-level WS connection** in the browser — `useControlStream` gets its own singleton guard + context; it does NOT share `useUserEvents`' `/api/ws/user` channel. Frames (added to `packages/contracts/src/ws-frames.ts` **first**, then the server loose union, then the web strict union — and the contracts drift test extended to cover them, so a partial edit fails the suite):
- `control_fleet` — full snapshot on join + seq-stamped state deltas (hosts, liveness, models, states, ttl deadlines, inflight)
- `control_activity` — new request rows (the live feed)
- `control_perf` — appended samples per host
- `control_log``{provider_id, source: proxy|upstream, line}` batches
- `control_job` — bench/eval run progress events
## 5. Actions
| Action | Mechanism |
|---|---|
| Warm/load model | 1-token `POST /v1/chat/completions` with the bare wire ID (stackctl-proven; llama-swap loads on demand — there is no load endpoint) |
| Unload one/all | `POST /api/models/unload/:model` / `/api/models/unload` |
| Inspect request | `GET /api/captures/:id` on the host, decode base64, persist trimmed copy, render |
| Bench/eval runs | engines below (manual v1) |
| Edit config / restart llama-swap | P9 (SFTP + schema validation + diff + timestamped backup + restart + health-wait) |
**Per-host action queue.** All host-mutating actions (warm, unload, bench warm-up) from BooControl serialize through a single FIFO queue per `provider_id` inside the control service — double-clicks, warm-during-warm, and unload-during-bench from *this* service cannot interleave (analysis C3). An unload request while a bench run holds the host is rejected with a "bench in progress — takeover?" confirmation. Queue discipline (verification C-N1): **submissions are rejected immediately while the host's liveness state is `down`** ("host offline" toast); queue depth is capped (4) with reject-on-full; each action **re-checks liveness on dequeue and skips itself if stale** — a recovered host never replays a backlog of stale warms. (Pattern precedent: `arena-runner.ts` `advanceChain` promise-chain, plus its read-fresh-state-or-skip discipline.) This serializes BooControl's own hands only; BooChat/BooCoder/Arena traffic is uncoordinated until P8.
All mutating actions publish `control_job`/`control_fleet` frames; UI handlers stay idempotent (event-dedup discipline per CLAUDE.md — no local emit after API call).
**Manual op checklist (P2.5):** Before the capture inspector works end-to-end, enable `captureBuffer` and review `metricsMaxInMemory` on both hosts' llama-swap configs. These are per-host settings in `config.yaml` and must be set before captures will be available:
- [ ] sam-desktop: set `captureBuffer: true` and verify `metricsMaxInMemory` (default 1000, sufficient for most workloads)
- [ ] embedding: set `captureBuffer: true` and verify `metricsMaxInMemory`
- [ ] Restart llama-swap on both hosts after config changes
## 6. Retention (ships in the same P1 slice as ingestion)
Daily job, crash-safe by construction:
1. **Rollup is an idempotent upsert**: `INSERT INTO control_perf_rollup_5m ... ON CONFLICT (provider_id, bucket) DO UPDATE` recomputed from raw — a re-run after a crash recomputes the same buckets, never double-counts.
2. **Delete raw only after the covering buckets are committed**, in **chunked transactions: one transaction per provider per 1-hour window** (≤720 rows each), never one 48h mega-transaction — bounds lock hold time so the live 5s poller's inserts into the same table never queue behind a multi-second aggregate+delete (verification C-N2). A crash between chunks leaves whole-hour windows either fully migrated or fully raw; the next run recomputes idempotently.
3. Activity > 90d pruned; captures capped per-row (256KB) and pruned by total budget. All windows configurable via `.env.host`.
Retention is a **P1 task in the same slice as ingestion**, not a fast-follow — the bloat window between "ingestion starts" and "retention exists" degrades the shared DB that serves all of BooChat (analysis R3).
## 7. Attribution (X-Boo-Source) — own phase (P4), two blockers solved together
The naive plan ("inject a header, small touch") is blocked on both inference paths:
- **apps/server (BooChat streaming)**: `getSwapProvider()` caches `createOpenAICompatible` instances by `baseURL` in `swapCache`; headers are provider-level, baked at construction. Fix: a per-turn **fetch wrapper** — thread the source label through the call site and pass a wrapping `fetch` that injects `X-Boo-Source` (cache keyed by `baseURL+source` since the label set is tiny: `boochat|boocoder|arena|control-bench|control-eval`). **Interface constraint (verification S-N2):** `getSwapProvider` is private (fan-in 1), but the label must travel through the exported `upstreamModel`, whose file has a 28-file/13-route blast radius — the change MUST be additive (`upstreamModel(config, modelId, agent?, source?)` or an options object with optional `source`), never a breaking signature change; all existing call sites compile unchanged. The direct-fetch paths (`compaction.ts`, `task-model.ts`) just extend their existing headers object.
- **apps/coder (opencode local gateway)**: `local-gateway.ts` builds a fresh headers object and silently strips inbound `X-Boo-Source`. Fix: forward it explicitly when present. Arena/dispatch direct paths set it at their own fetch sites.
P4 lands: both fixes + the `control_requests.source` column migration + the `source` filter in the Activity UI. llama-swap's header capture (`captureBuffer`) must be enabled on the hosts first (P2 op task). Acceptance: a BooChat turn, a BooCoder dispatch, and an Arena battle each show their own label in the Activity feed; nothing shows NULL except genuinely external traffic.
#### Implementation notes
**P6.2 schedule meta lives in its own table, not on `control_reports`.** §3 sketched `control_reports + schedule meta: {interval, enabled, last_run_at}`. In implementation the scheduler state was split into a dedicated single-row `control_schedule_meta` table (keyed by schedule `name`, seeded `report-digest`) so generated `control_reports` rows stay immutable snapshots and the boot catch-up reads/writes one well-known row instead of scanning report history for the latest `last_run_at`. The retention-style hourly tick (`runReportSchedulerTick`) and the `{interval, enabled, last_run_at}` contract are unchanged.
**P7 gateway identity.** The gateway registers as provider id `auto` (kind `boocontrol-gateway`); its virtual models are `auto`, `auto:code`, `auto:fast`, `auto:cheap`, so BooChat composite ids are `auto/auto:code` etc. and the wire model sent to the gateway is the bare virtual token. `getModelContext` reads `n_ctx` from the gateway's own `/upstream/<virtual>/props`, which proxies the first healthy candidate's props. The gateway is reached server-to-server via the registry baseUrl (not the `/api/control` proxy, which buffers responses and would break streaming).
**P7 orphan detection.** An orphaned auto:* session is detected two ways: by registry `kind === 'boocontrol-gateway'` when the gateway is present (→ `gateway`), and by the virtual-model token shape (`auto` / `auto:*`) when the provider is absent (→ `gateway_error`, reason `offline`). The unknown-composite-provider swap fallback is overridden only for that token shape; all other unknown composites keep their existing best-effort swap behavior.
**P9.1 uses shelled `ssh`, not an ssh2/SFTP library.** §5 and the P9 task say "SFTP read ... SFTP write". Implementation shells out to the system `ssh` (`cat <path>` to read, `cp` for the timestamped backup, `cat > <path>` over stdin to write, the configured `restart_cmd` to restart) with an explicit `-i <key> -o IdentitiesOnly=yes -o BatchMode=yes`. This matches the established booterm SSH-via-shell precedent and the Gitea deploy-key lesson (never offer the agent's default key), and avoids adding an `ssh2` native dependency. The exec is injected (`SshExec`) so every failure path (unreadable host, backup fail, write fail, restart fail, health never recovers) is unit-tested without a live host. The fork `config-schema.json` is bundled at `apps/control/data/config-schema.json` and validated with ajv (added as a control dependency). Backup always precedes write, so a failed write leaves the timestamped backup intact. Not live-smoked: there is no reachable Windows SSH target in the implementation session (the documented "Windows SSH fiddliness" risk); the failure-path suite is the standing verification.
**ActivityLogEntry does not carry request headers.** The llama-swap fork's `ActivityLogEntry` struct (`internal/server/metrics.go`) contains `ID`, `Timestamp`, `Model`, `ReqPath`, `RespContentType`, `RespStatusCode`, `Tokens`, `DurationMs`, `HasCapture` -- no `source` field and no request headers. The `X-Boo-Source` header IS captured in `ReqRespCapture.ReqHeaders` (`internal/server/captures.go`), but captures are stored separately in a zstd-compressed cache and fetched on-demand via `GET /api/captures/:id`, not in the metrics ring.
Therefore the `control_requests.source` column is NULL for ring-ingested data. The column exists for: (1) future llama-swap versions that may add source to ActivityLogEntry, (2) manual backfill from captures, (3) non-ring sources (bench/eval direct calls that set source explicitly). The metrics ingest mapper writes NULL for source, matching what the ring provides.
## 8. Benchmark, eval, routing
### Speed bench (P3 — manual, safe-by-construction)
- HTTP-level, through llama-swap (measures what BooChat actually experiences) with llama.cpp `timings` (`prompt_per_second`, `predicted_per_second`, `cache_n`) parsed from the final stream chunk; TTFT measured client-side at first delta.
- Suite = grid of (prompt_len × gen_len × concurrency) × N repetitions; warmup excluded; results as aggregates + raw samples. Runner fan-out is **bounded** (suite-declared concurrency only, `Promise.allSettled`, never unbounded `Promise.all`).
- **v1 safety model**: every run is user-initiated with an explicit takeover confirmation when the target host shows recent traffic; embedding-host-first defaults. The `inflight==0` check is a *courtesy gate*, not a guarantee — BooChat/BooCoder/Arena can race it (TOCTOU, four uncoordinated writers). v1 accepts this because a human clicked "run"; **unattended scheduling is explicitly deferred to P8** (fleet lease). Bench results note `concurrent_foreign_requests` observed during the run (from the activity stream) so polluted runs are flagged, not silently trusted.
- Baselines + regression: each (provider_id, model) keeps a baseline aggregate; new runs flag deltas beyond threshold (e.g. gen tok/s 10%) → surfaces in Reports and as a fleet-card badge.
- Later: `llama-bench` over SSH for device-level (no-server) numbers, JSON output ingested alongside (P9, with the SSH plumbing).
### Quality evals (P5)
- **Suite program** (decided 2026-06-12): four suites measuring Sam's real workloads, in priority order — (1) **agent coding tasks** (TS/code-edit tasks like BooCoder dispatches, sandboxed pass@1), (2) **chat assistant quality** (judge rubrics), (3) **long-context retrieval** (needle/doc-QA for file-heavy sessions), (4) **utility calls** (titles/summaries/compaction — directly tunes the `FAST_MODEL` choice).
- **Chat**: suite of curated prompts (data/ YAML, editable) scored by LLM-as-judge (rubric single-answer grading, MT-bench style; temperature 0, judge model + version pinned per run). Judge = strongest local model by default. Pairwise comparisons delegate to **Arena** (exists in apps/coder) — BooControl links/launches battles rather than re-implementing.
- **Code**: HumanEval+/MBPP+-style tasks, executed in ephemeral sandbox containers on the homelab: `--network none`, non-root, mem/cpu/time caps, tmpfs workdir, `--rm`, kill-on-timeout, and a `boocontrol-eval` label so orphans are findable (`docker ps --filter label=...`) and pruned at engine start. Runner: **bounded concurrency** (default 4), `Promise.allSettled`, per-task `finally` cleanup — a single task failure never abandons in-flight containers (analysis C5; the CLAUDE.md child-supervisor lesson applies). `/opt/forks/openevals` is the reference implementation to borrow patterns from (TS).
- Scorecards: per (provider_id, model, quant) leaderboard with speed × quality scatter — "is the Q4 actually worse for my use?" answered with my own suite, on my own hardware.
### Routing (P6 advisory → P7 live gateway, committed)
- **P6 — advisory**: routing scores (eval results + live latency + host health) exposed via API; the model picker badges "best code model right now".
- **P7 — gateway**: control service exposes OpenAI-compatible virtual models (`auto`, `auto:code`, `auto:fast`, `auto:cheap`) implementing policy: rule match → candidate ordering → health/ctx-fit filter → dispatch with failover. BooChat adopts by adding a registry entry (`{id: "auto", baseUrl: "http://100.114.205.53:9503", kind: "boocontrol-gateway"}`) — zero inference-path changes elsewhere. Frontier providers slot in as policy targets when added to the registry.
- **Orphaned-session handling (explicit — REQUIRES a `provider.ts` code change, verification S-N1/B-N3)**: today `resolveModelProvider` silently falls back to `LLAMA_SWAP_URL` for any composite id with an unknown provider ("best-effort fallback, config incomplete" branch) — exactly the mis-route this section forbids. P7 must (a) extend the `InferenceRoute` union (currently `'swap' | 'deepseek'`) with a `'gateway'` variant (and an unhealthy/error representation), and (b) change the unknown-provider fallback so a known-`kind` gateway id that is missing/disabled resolves to a clean "routing gateway offline" error, never the swap fallback. All **5 callers** of `resolveModelProvider` must be audited for the new variant: `getModelContext`, `invalidateModelContext` (model-context.ts), `resolveRoute`, `upstreamModel`, `resolveModelEndpoint` (provider.ts). The session keeps its id, the picker flags it. Gateway-dispatched requests carry `X-Boo-Source` through to the target host so attribution survives the extra hop.
- llama-swap `peers` could federate hosts at the proxy layer instead, but was rejected for the same reasons as the provider-registry research rejected it (flat list, coupled uptime, silent ID collisions).
### Fleet coordination lease (P8 — cross-service)
The proper fix for the four-writer TOCTOU: a per-host advisory lease in the shared DB (`control_host_leases`: holder, purpose, expires_at, heartbeat) that BooControl's scheduler *requires* and BooChat/BooCoder/Arena *honor* (check-before-dispatch, or queue behind an exclusive bench lease). This touches all four services and is therefore its own batch with its own design pass. **The P3 seam is a named function, not a convention** (verification C1'): the bench runner gates every run through `acquireHostAccess(providerId, purpose): Promise<HostGrant>` — the v1 implementation is the courtesy check (inflight==0 + takeover confirmation); P8 swaps its body for the lease without touching the bench engine. P3 implementers must NOT inline the inflight check in the runner. Unattended/scheduled benches and reproducible concurrency sweeps unlock here.
## 9. UI design direction
Route `/control`, nav entry under Memory (ProjectSidebar bottom cluster). Sub-views as tabs within the page: **Fleet · Activity · Logs · Models · Bench · Evals · Reports**.
- **Aesthetic**: dark mission-control. Host cards as instrument clusters: VRAM arc gauge, GPU temp/power readouts, model chips with state glow (amber pulse `starting`, green steady `ready`, red `error`, grey `down` with last-seen), TTL countdown rings. Orbitron (already in the font pipeline) for numerals only; Inter for prose; JetBrains Mono for logs/JSON.
- **Motion**: framer-motion (already a dep) — spring layout transitions on model chips during swaps, count-up tweens on token totals, animated activity-feed inserts. Respect `prefers-reduced-motion`.
- **Charts**: **ECharts** (decided 2026-06-12). Gauges, scatter, heatmaps built in — covers the VRAM arcs, speed×quality scatter, and perf timelines from one lib; dark-theme native; 5s streaming append handled via `appendData`/`setOption`. The <100KB preference is consciously traded for batteries-included breadth; import per-chart modules (`echarts/core` + needed renderers) to keep the bundle sane.
- **Logs**: react-virtuoso tail-follow viewer (already a dep), per-source filter (proxy/upstream/model), pause-on-scroll.
- **Inspector**: activity table (virtuoso) → capture drawer: headers table + shiki-highlighted JSON bodies + "Open in Playground" replay.
- **Playground**: param-tweakable single-model chat + A/B compare; "Battle in Arena" handoff for full cross-examination.
- Skills to drive the build pass: `frontend-design` (aesthetic direction), `ui-ux-pro-max` (dashboard/chart patterns), `frontend-ui-engineering` (production quality), existing theme tokens (oklch palettes) so BooControl follows the active theme.
## 10. Risks
| Risk | Mitigation |
|---|---|
| PG bloat from time-series + captures | raw/rollup split; **retention job ships in the same P1 slice as ingestion**; UNIQUE constraints prevent restart-duplication inflation; capture size caps; measured in Reports (P7) |
| Bench/eval evicts a model in active use | v1: manual runs + takeover confirmation + embedding-first + per-host action queue. Honest limit: `inflight==0` is a courtesy gate (TOCTOU vs 3 other writers). Real fix is the P8 lease |
| llama-swap ring-id reset breaks dedup | DB UNIQUE on (provider_id, swap_entry_id, ts) + ON CONFLICT DO NOTHING — enforced at insert, not check-then-act |
| Ring wraps during long outage | accepted bound; `gap_suspected` event logged with reconcile delta so loss is visible |
| SSE disconnects / host down | backoff + jitter (opencode-sse pattern); explicit connected/reconnecting/down state machine + last_seen_at in control_fleet; favorites-style "hide, never delete" for offline hosts |
| Snapshot/delta join race | per-host monotonic seq; client discards deltas ≤ snapshot seq |
| Perf-poller restart duplicates | watermark recovered from MAX(ts) in DB; UNIQUE (provider_id, ts) |
| Rollup crash double-count/loss | idempotent upsert + rollup-and-delete in one transaction |
| Attribution silently NULL | no source column until P4; P4 solves both path blockers (server fetch wrapper + gateway forward) together with the migration |
| Sandbox escape from generated code | no-network, non-root, caps, tmpfs, --rm, labeled for orphan prune; bounded allSettled runner with finally-cleanup; gVisor as upgrade path. Residual risk accepted for single-user |
| LLM-judge bias/noise in chat evals | fixed rubrics, temperature 0, judge version pinned per run, pairwise via Arena for tie-breaks |
| Windows SSH fiddliness (P9 config edit) | pre-apply JSON-schema validation (config-schema.json lives in the fork), timestamped backups before every write, health-wait after restart; stackctl's flow is the reference but gets tests here |
| Orphaned `auto:*` sessions if gateway removed | resolver treats missing gateway provider as unhealthy-not-absent: clean error, no silent mis-route to LLAMA_SWAP_URL |
| 5s × 2 hosts perf polling forever | trivial volume (~35k rows/day raw), rolled up + pruned at 48h |
| Three applySchema callers race on restart | startup ordering guard: control waits for server-owned `sessions` table before applying schema |