boocode/openspec/changes/boocontrol/tasks.md

# BooControl — tasks

**Status:** READY (decisions resolved 2026-06-11). Gate: P0 must be **committed and reviewed** before P1 starts. Each phase is a vertical slice with a demo; the whole idea ships eventually — P1→P3 are the cockpit, P4→P7 are intelligence, P8→P9 are coordination + remote hands.

## P0 — prerequisite gate (separate batch: multi-llama-swap provider registry)
- [ ] Finish remaining tasks in `openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md`: favorites hide-not-delete UI/route tests; smoke test sam-desktop + embedding (+ DeepSeek config); opencode duplicate-name routing smoke if in scope.
- [ ] Sam reviews and **commits** the batch (currently working-tree only). BooControl keys on `LlamaProvider.id` — the committed contract is the foundation.

## P1 — read-only cockpit
**Demo: watch both hosts live (models, swaps, VRAM/temp, request feed) while chatting.**
- [ ] Scaffold `apps/control`: Fastify, TS NodeNext, `.env.example`/`.env.host`, port 9503, `/api/health`, systemd unit `boocontrol.service`, deploy docs in root CLAUDE.md.
- [ ] `db.ts` with `applySchema` + **startup ordering guard** (`waitForTable(sql, 'sessions')` before DDL — design §3).
- [ ] `schema.sql`: `control_hosts` seed (sam-desktop, embedding) `ON CONFLICT DO NOTHING`; `control_requests` (NO source column — that's P4) with `UNIQUE (provider_id, swap_entry_id, ts)`; `control_perf_samples` with `UNIQUE (provider_id, ts)`; `control_perf_rollup_5m` with `UNIQUE (provider_id, bucket)`; `control_model_events` with `UNIQUE (provider_id, model, state, ts)`.
- [ ] Fleet connector per enabled host: SSE client w/ backoff+jitter+circuit-breaker (port the `opencode-sse.ts` pattern); explicit `connected|reconnecting|down` liveness state machine + `last_seen_at`; reconcile via `/api/metrics` on reconnect with `INSERT ... ON CONFLICT DO NOTHING` (never check-then-act); `gap_suspected` via the no-overlap heuristic (design §4).
- [ ] Perf poller (5s, `/api/performance?after=`); watermark recovered from `MAX(ts)` on restart; NULL watermark (fresh install) → omit `after`, ingest returned window (design §4).
- [ ] In-memory fleet state with per-host monotonic `seq`; WS endpoint `/api/ws/control`: snapshot-on-join carrying seqs + seq-stamped deltas.
- [ ] **Retention job in this slice** (not a fast-follow): rollup as idempotent upsert + raw delete in chunked per-provider-per-hour transactions (design §6); activity prune; configurable windows.
- [ ] Contracts: add `control_fleet`, `control_activity`, `control_perf`, `control_log`, `control_job` to `WsFrameSchema` + `KNOWN_FRAME_TYPES`; rebuild package; mirror in the web strict union; extend the contracts drift test to cover the five new frames. (Server loose union NOT needed — control frames bypass the broker via the raw proxy relay, so this is a 2-location sync; plan finding JD1.)
- [ ] `apps/server`: `registerControlProxy` (`/api/control/*` HTTP + `/api/control/ws` WS relay; clone of `routes/coder-proxy.ts` with keep-in-sync comments in both files); `BOOCONTROL_URL` env.
- [ ] Web: `/control` route (`App.tsx`), nav entry (`ProjectSidebar.tsx`), `pages/Control.tsx` shell with Fleet + Activity tabs; `useControlStream` as a **second app-level WS singleton** (own context + connection guard; client discards deltas ≤ snapshot seq); host cards (state chips incl. grey `down`+last-seen, VRAM/temp/power readouts, TTL countdowns); live activity feed (virtuoso).
- [ ] Charts: integrate ECharts (per-chart module imports via `echarts/core`) for perf timelines; dark-theme tokens from active palette.
- [ ] Tests: connector dedup/reconcile + seq logic as pure helpers (`turn-guard.ts` pattern); liveness state machine; retention idempotency (re-run same window → identical rollups); DB tests `describe.runIf(DATABASE_URL)`.

## P2 — hands on the controls
**Demo: unload from UI, watch the swap stream, open a capture.**
- [x] Per-host FIFO action queue in the control service; warm (1-token completion w/ bare wire ID) + unload one/all routed through it; unload-during-bench → takeover confirmation; reject submissions while host is `down`, cap depth (4), re-check liveness on dequeue + skip stale actions (design §5).
- [x] Optimistic UI off `control_fleet` frames only (no local emits, per event-dedup discipline).
- [x] Logs tab: relay `/api/events` logData → `control_log`; in-memory 2k-line tail for late joiners; virtuoso tail-follow viewer w/ source filters + pause-on-scroll.
- [x] Inspector: activity table → capture drawer (`GET /api/captures/:id` via control svc, trimmed persist, shiki JSON, headers); "Open in Playground" stub.
- [x] Op task (manual, documented in design): enable `captureBuffer` + review `metricsMaxInMemory` on both hosts' llama-swap configs.

## P3 — playground + speed bench (manual, safe-by-construction)
**Demo: TTFT-vs-concurrency curves for two quants, run by hand without disturbing a live chat.**
- [x] Playground tab: model select (grouped picker from P0), param controls, streaming chat, side-by-side A/B; "Battle in Arena" handoff link.
- [x] Bench engine: suite model (grid + repetitions), runner w/ TTFT capture + `timings` parse; bounded fan-out (`Promise.allSettled`, suite-declared concurrency only); aggregates + raw samples to `bench_*` tables.
- [x] v1 safety: user-initiated runs only; takeover confirmation when target host shows recent traffic; embedding-host-first defaults; `concurrent_foreign_requests` recorded per run to flag polluted results. (Unattended scheduling deliberately absent — P8.)
- [x] The P8 seam: every run gates through `acquireHostAccess(providerId, purpose)` (v1 body = courtesy check + confirmation); never inline the inflight check in the runner (design §8).
- [x] Bench UI: run launcher, live progress via `control_job`, history charts (TTFT vs concurrency, tok/s over time), baseline + regression flags.

## P4 — per-consumer attribution (X-Boo-Source, end-to-end)
**Demo: Activity feed filtered to "arena" shows only Arena traffic; nothing reads NULL.**
- [x] `apps/server`: per-turn fetch-wrapper injection on the AI-SDK streaming path (thread source through the call site; wrapper-aware `getSwapProvider`, cache keyed by baseURL+source). **`upstreamModel` change must be additive** (optional `source` param/options — its file has 28-file/13-route blast radius, design §7); extend headers in `compaction.ts` + `task-model.ts` direct fetches.
- [x] `apps/coder`: forward inbound `x-boo-source` in `local-gateway.ts`; set it at arena + dispatch fetch sites.
- [x] Migration: add `source TEXT` to `control_requests`; surface as Activity filter + per-source token aggregates.
- [x] Tests: header present on all three paths (server streaming, gateway-forwarded opencode, arena direct); rows attribute correctly.

## P5 — quality evals + sandbox
**Demo: fleet leaderboard with speed×quality scatter.**
- [x] Suite format (data/ YAML: chat rubric tasks; code tasks with tests); CRUD + versioning.
- [x] Judge runner (temperature 0, pinned judge model+version, rubric scoring, rationale capture); pairwise tie-breaks delegate to Arena.
- [x] Code sandbox runner: ephemeral containers (`--network none`, non-root, mem/cpu/time caps, tmpfs, `--rm`, `boocontrol-eval` label); orphan prune at engine start; bounded concurrency (default 4) + `Promise.allSettled` + per-task `finally` cleanup; pass@1 scoring; borrow patterns from `/opt/forks/openevals`.
- [x] Leaderboard UI + speed×quality scatter per (provider_id, model, quant).

## P6 — advisory routing + reports
**Demo: picker badges "best code model right now"; Monday-morning fleet report.**
- [x] Advisory scores API (evals + live latency + host health) → model-picker badges. `services/routing-scores.ts` (`assignBadges` pure helper, unit-tested), `GET /api/control/routing/scores`; `ModelPicker.tsx` fetches badges (non-fatal) and renders best-code/best-chat/best-fast chips. Verify: `pnpm -C apps/control test` (routing-scores 4), `npx tsc -p apps/web/tsconfig.app.json --noEmit`.
- [x] Reports: scheduled digest job (usage, trends, swap counts, leaderboard deltas, anomalies vs baselines) → `control_reports`; same in-process timer pattern as retention, schedule meta in `control_schedule_meta` table (`{interval, enabled, last_run_at}`) w/ catch-up on boot; Reports tab + markdown export (`renderReportMarkdown`/`isReportDue` pure, unit-tested). See design `## Implementation notes` for the schedule-meta-table deviation. Verify: `pnpm -C apps/control test` (reports 7).

## P7 — live `auto:*` gateway (committed)
**Demo: an `auto:code` session in BooChat routes to the current best code model with failover.**
- [x] OpenAI-compatible virtual models (`auto`, `auto:code`, `auto:fast`, `auto:cheap`) backed by `route_policies`: rule match → candidate ordering → health/ctx-fit filter → dispatch w/ failover; gateway forwards `X-Boo-Source` to the target host. `routes/gateway.ts` (`/v1/models`, `/v1/chat/completions`, `/upstream/:model/props`) + `services/gateway.ts` (`orderCandidates` pure, unit-tested). Reached server-to-server (registry baseUrl), not via the buffering /api/control proxy, so streaming survives. Verify: `pnpm -C apps/control test` (gateway 11) + live smoke.
- [x] Registry entry (`kind: "boocontrol-gateway"`) so BooChat adopts with zero inference-path changes. Added to `data/llama-providers.example.json`; control service filters gateway-kind providers out of fleet connectors/pollers/retention (`fleetProviders` in `index.ts`) so it never SSE-connects to itself.
- [x] **Orphaned-session handling — `provider.ts` code change** (design §8): `InferenceRoute` extended to `'swap' | 'deepseek' | 'gateway' | 'gateway_error'` (gateway_error carries `gatewayReason`); known gateway-kind id → `'gateway'`; orphaned auto:* id (provider missing) → `'gateway_error'` reason `offline`, NEVER the swap fallback. All callers audited: `upstreamModel`/`resolveModelEndpoint` add gateway branch + throw on gateway_error; `getModelContext` proxies gateway props / null on gateway_error; `resolveRoute` returns the new variant (system-prompt.ts `ObservedInputs.route` widened to `InferenceRoute`); `invalidateModelContext` unchanged (composite-key path covers it). Picker flags orphaned sessions (`isOrphanedGatewayValue` banner in `ModelPicker.tsx`). Verify: `pnpm -C apps/server test` (provider gateway tests), `pnpm -C apps/server build`.
- [x] Policy editor UI (route_policies CRUD) + per-policy dispatch log. `routes/policies.ts` (CRUD + `/dispatch-log`); `ReportsTab.tsx` Policies + Dispatch Log sub-views. Verify: `npx tsc -p apps/web/tsconfig.app.json --noEmit`.

## P8 — fleet coordination lease (cross-service batch, own design pass)
**Demo: a scheduled overnight bench runs unattended without ever evicting a live model.**
- [x] Outlined, see `openspec/changes/fleet-coordination-lease/` (proposal + tasks, OUTLINE status). Design + ship `control_host_leases` (holder, purpose, expires_at, heartbeat) and the honor-protocol in all four writers (BooChat, BooCoder, Arena, BooControl); BooControl consumes it through the `acquireHostAccess` seam left in P3. NOT implemented here — outline only per the program decision.
- [x] Outlined, see `openspec/changes/fleet-coordination-lease/` (tasks L4). Unattended bench scheduling + reproducible concurrency sweeps unlock behind the lease.

## P9 — remote hands + optional
- [x] SSH config editor: SSH read → schema-validated edit (config-schema.json from the fork, bundled at `apps/control/data/config-schema.json`, ajv-validated) → diff preview → timestamped backup → write → restart → health-wait. `services/ssh-config.ts` (pure `validateLlamaConfig`/`computeDiff`/`backupFilename` + injectable-exec `applyRemoteConfig` pipeline) + `routes/ssh-config.ts` (`GET/PATCH /api/hosts`, `/config`, `/config/validate`, `/config/diff`, `/config/apply`) + `HostConfigEditor.tsx` (gear button on each Fleet card). SSH via shelled `ssh` (booterm precedent, key from `control_hosts.ssh_key_path` → `secrets/`, gitignored) instead of an ssh2 dependency. Failure-path tests for every pipeline step (`ssh-config.test.ts`, 15 tests). NOTE deviation: SFTP replaced by `ssh cat`/`cat >` (no ssh2 dep); recorded in design `## Implementation notes`. Verify: `pnpm -C apps/control test` (ssh-config 15). Not live-smoked — no reachable Windows SSH target in this session (the "Windows SSH fiddliness" risk); the failure-path test suite stands in.
- [ ] DEFERRED — `llama-bench`-over-SSH ingestion for device-level numbers. Reason: depends on the SSH plumbing from P9.1 *landing + a live host to run `llama-bench` on*; it is also explicitly YAGNI-deferred in the implementation-plan ("Reopen when SSH plumbing from P9.1 lands"). The P9.1 exec seam (`SshExec`) is the hook a follow-up reuses.
- [ ] DEFERRED — boocontrol.indifferentketchup.com vhost (Caddy/Authelia rewrite → `/control`). Reason: pure reverse-proxy/ops config (Caddyfile + Authelia rules) on the homelab host, no repo code; `/control` already works behind the existing boocode origin via the `registerControlProxy` relay. Out of scope for a code batch.
- [ ] DEFERRED — Frontier providers as routing targets; slim `control` pane kind for in-workspace mini-cockpit. Reason: two sizeable independent features (frontier-provider routing belongs with the registry/provider work; a new workspace pane kind is its own UI batch). Marked optional in the implementation-plan Deferred section; out of reach for an additive P6–P9 pass without dedicated design.