feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
13 KiB
BooControl — tasks
Status: READY (decisions resolved 2026-06-11). Gate: P0 must be committed and reviewed before P1 starts. Each phase is a vertical slice with a demo; the whole idea ships eventually — P1→P3 are the cockpit, P4→P7 are intelligence, P8→P9 are coordination + remote hands.
P0 — prerequisite gate (separate batch: multi-llama-swap provider registry)
- Finish remaining tasks in
openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md: favorites hide-not-delete UI/route tests; smoke test sam-desktop + embedding (+ DeepSeek config); opencode duplicate-name routing smoke if in scope. - Sam reviews and commits the batch (currently working-tree only). BooControl keys on
LlamaProvider.id— the committed contract is the foundation.
P1 — read-only cockpit
Demo: watch both hosts live (models, swaps, VRAM/temp, request feed) while chatting.
- Scaffold
apps/control: Fastify, TS NodeNext,.env.example/.env.host, port 9503,/api/health, systemd unitboocontrol.service, deploy docs in root CLAUDE.md. db.tswithapplySchema+ startup ordering guard (waitForTable(sql, 'sessions')before DDL — design §3).schema.sql:control_hostsseed (sam-desktop, embedding)ON CONFLICT DO NOTHING;control_requests(NO source column — that's P4) withUNIQUE (provider_id, swap_entry_id, ts);control_perf_sampleswithUNIQUE (provider_id, ts);control_perf_rollup_5mwithUNIQUE (provider_id, bucket);control_model_eventswithUNIQUE (provider_id, model, state, ts).- Fleet connector per enabled host: SSE client w/ backoff+jitter+circuit-breaker (port the
opencode-sse.tspattern); explicitconnected|reconnecting|downliveness state machine +last_seen_at; reconcile via/api/metricson reconnect withINSERT ... ON CONFLICT DO NOTHING(never check-then-act);gap_suspectedvia the no-overlap heuristic (design §4). - Perf poller (5s,
/api/performance?after=); watermark recovered fromMAX(ts)on restart; NULL watermark (fresh install) → omitafter, ingest returned window (design §4). - In-memory fleet state with per-host monotonic
seq; WS endpoint/api/ws/control: snapshot-on-join carrying seqs + seq-stamped deltas. - Retention job in this slice (not a fast-follow): rollup as idempotent upsert + raw delete in chunked per-provider-per-hour transactions (design §6); activity prune; configurable windows.
- Contracts: add
control_fleet,control_activity,control_perf,control_log,control_jobtoWsFrameSchema+KNOWN_FRAME_TYPES; rebuild package; mirror in the web strict union; extend the contracts drift test to cover the five new frames. (Server loose union NOT needed — control frames bypass the broker via the raw proxy relay, so this is a 2-location sync; plan finding JD1.) apps/server:registerControlProxy(/api/control/*HTTP +/api/control/wsWS relay; clone ofroutes/coder-proxy.tswith keep-in-sync comments in both files);BOOCONTROL_URLenv.- Web:
/controlroute (App.tsx), nav entry (ProjectSidebar.tsx),pages/Control.tsxshell with Fleet + Activity tabs;useControlStreamas a second app-level WS singleton (own context + connection guard; client discards deltas ≤ snapshot seq); host cards (state chips incl. greydown+last-seen, VRAM/temp/power readouts, TTL countdowns); live activity feed (virtuoso). - Charts: integrate ECharts (per-chart module imports via
echarts/core) for perf timelines; dark-theme tokens from active palette. - Tests: connector dedup/reconcile + seq logic as pure helpers (
turn-guard.tspattern); liveness state machine; retention idempotency (re-run same window → identical rollups); DB testsdescribe.runIf(DATABASE_URL).
P2 — hands on the controls
Demo: unload from UI, watch the swap stream, open a capture.
- Per-host FIFO action queue in the control service; warm (1-token completion w/ bare wire ID) + unload one/all routed through it; unload-during-bench → takeover confirmation; reject submissions while host is
down, cap depth (4), re-check liveness on dequeue + skip stale actions (design §5). - Optimistic UI off
control_fleetframes only (no local emits, per event-dedup discipline). - Logs tab: relay
/api/eventslogData →control_log; in-memory 2k-line tail for late joiners; virtuoso tail-follow viewer w/ source filters + pause-on-scroll. - Inspector: activity table → capture drawer (
GET /api/captures/:idvia control svc, trimmed persist, shiki JSON, headers); "Open in Playground" stub. - Op task (manual, documented in design): enable
captureBuffer+ reviewmetricsMaxInMemoryon both hosts' llama-swap configs.
P3 — playground + speed bench (manual, safe-by-construction)
Demo: TTFT-vs-concurrency curves for two quants, run by hand without disturbing a live chat.
- Playground tab: model select (grouped picker from P0), param controls, streaming chat, side-by-side A/B; "Battle in Arena" handoff link.
- Bench engine: suite model (grid + repetitions), runner w/ TTFT capture +
timingsparse; bounded fan-out (Promise.allSettled, suite-declared concurrency only); aggregates + raw samples tobench_*tables. - v1 safety: user-initiated runs only; takeover confirmation when target host shows recent traffic; embedding-host-first defaults;
concurrent_foreign_requestsrecorded per run to flag polluted results. (Unattended scheduling deliberately absent — P8.) - The P8 seam: every run gates through
acquireHostAccess(providerId, purpose)(v1 body = courtesy check + confirmation); never inline the inflight check in the runner (design §8). - Bench UI: run launcher, live progress via
control_job, history charts (TTFT vs concurrency, tok/s over time), baseline + regression flags.
P4 — per-consumer attribution (X-Boo-Source, end-to-end)
Demo: Activity feed filtered to "arena" shows only Arena traffic; nothing reads NULL.
apps/server: per-turn fetch-wrapper injection on the AI-SDK streaming path (thread source through the call site; wrapper-awaregetSwapProvider, cache keyed by baseURL+source).upstreamModelchange must be additive (optionalsourceparam/options — its file has 28-file/13-route blast radius, design §7); extend headers incompaction.ts+task-model.tsdirect fetches.apps/coder: forward inboundx-boo-sourceinlocal-gateway.ts; set it at arena + dispatch fetch sites.- Migration: add
source TEXTtocontrol_requests; surface as Activity filter + per-source token aggregates. - Tests: header present on all three paths (server streaming, gateway-forwarded opencode, arena direct); rows attribute correctly.
P5 — quality evals + sandbox
Demo: fleet leaderboard with speed×quality scatter.
- Suite format (data/ YAML: chat rubric tasks; code tasks with tests); CRUD + versioning.
- Judge runner (temperature 0, pinned judge model+version, rubric scoring, rationale capture); pairwise tie-breaks delegate to Arena.
- Code sandbox runner: ephemeral containers (
--network none, non-root, mem/cpu/time caps, tmpfs,--rm,boocontrol-evallabel); orphan prune at engine start; bounded concurrency (default 4) +Promise.allSettled+ per-taskfinallycleanup; pass@1 scoring; borrow patterns from/opt/forks/openevals. - Leaderboard UI + speed×quality scatter per (provider_id, model, quant).
P6 — advisory routing + reports
Demo: picker badges "best code model right now"; Monday-morning fleet report.
- Advisory scores API (evals + live latency + host health) → model-picker badges.
services/routing-scores.ts(assignBadgespure helper, unit-tested),GET /api/control/routing/scores;ModelPicker.tsxfetches badges (non-fatal) and renders best-code/best-chat/best-fast chips. Verify:pnpm -C apps/control test(routing-scores 4),npx tsc -p apps/web/tsconfig.app.json --noEmit. - Reports: scheduled digest job (usage, trends, swap counts, leaderboard deltas, anomalies vs baselines) →
control_reports; same in-process timer pattern as retention, schedule meta incontrol_schedule_metatable ({interval, enabled, last_run_at}) w/ catch-up on boot; Reports tab + markdown export (renderReportMarkdown/isReportDuepure, unit-tested). See design## Implementation notesfor the schedule-meta-table deviation. Verify:pnpm -C apps/control test(reports 7).
P7 — live auto:* gateway (committed)
Demo: an auto:code session in BooChat routes to the current best code model with failover.
- OpenAI-compatible virtual models (
auto,auto:code,auto:fast,auto:cheap) backed byroute_policies: rule match → candidate ordering → health/ctx-fit filter → dispatch w/ failover; gateway forwardsX-Boo-Sourceto the target host.routes/gateway.ts(/v1/models,/v1/chat/completions,/upstream/:model/props) +services/gateway.ts(orderCandidatespure, unit-tested). Reached server-to-server (registry baseUrl), not via the buffering /api/control proxy, so streaming survives. Verify:pnpm -C apps/control test(gateway 11) + live smoke. - Registry entry (
kind: "boocontrol-gateway") so BooChat adopts with zero inference-path changes. Added todata/llama-providers.example.json; control service filters gateway-kind providers out of fleet connectors/pollers/retention (fleetProvidersinindex.ts) so it never SSE-connects to itself. - Orphaned-session handling —
provider.tscode change (design §8):InferenceRouteextended to'swap' | 'deepseek' | 'gateway' | 'gateway_error'(gateway_error carriesgatewayReason); known gateway-kind id →'gateway'; orphaned auto:* id (provider missing) →'gateway_error'reasonoffline, NEVER the swap fallback. All callers audited:upstreamModel/resolveModelEndpointadd gateway branch + throw on gateway_error;getModelContextproxies gateway props / null on gateway_error;resolveRoutereturns the new variant (system-prompt.tsObservedInputs.routewidened toInferenceRoute);invalidateModelContextunchanged (composite-key path covers it). Picker flags orphaned sessions (isOrphanedGatewayValuebanner inModelPicker.tsx). Verify:pnpm -C apps/server test(provider gateway tests),pnpm -C apps/server build. - Policy editor UI (route_policies CRUD) + per-policy dispatch log.
routes/policies.ts(CRUD +/dispatch-log);ReportsTab.tsxPolicies + Dispatch Log sub-views. Verify:npx tsc -p apps/web/tsconfig.app.json --noEmit.
P8 — fleet coordination lease (cross-service batch, own design pass)
Demo: a scheduled overnight bench runs unattended without ever evicting a live model.
- Outlined, see
openspec/changes/fleet-coordination-lease/(proposal + tasks, OUTLINE status). Design + shipcontrol_host_leases(holder, purpose, expires_at, heartbeat) and the honor-protocol in all four writers (BooChat, BooCoder, Arena, BooControl); BooControl consumes it through theacquireHostAccessseam left in P3. NOT implemented here — outline only per the program decision. - Outlined, see
openspec/changes/fleet-coordination-lease/(tasks L4). Unattended bench scheduling + reproducible concurrency sweeps unlock behind the lease.
P9 — remote hands + optional
- SSH config editor: SSH read → schema-validated edit (config-schema.json from the fork, bundled at
apps/control/data/config-schema.json, ajv-validated) → diff preview → timestamped backup → write → restart → health-wait.services/ssh-config.ts(purevalidateLlamaConfig/computeDiff/backupFilename+ injectable-execapplyRemoteConfigpipeline) +routes/ssh-config.ts(GET/PATCH /api/hosts,/config,/config/validate,/config/diff,/config/apply) +HostConfigEditor.tsx(gear button on each Fleet card). SSH via shelledssh(booterm precedent, key fromcontrol_hosts.ssh_key_path→secrets/, gitignored) instead of an ssh2 dependency. Failure-path tests for every pipeline step (ssh-config.test.ts, 15 tests). NOTE deviation: SFTP replaced byssh cat/cat >(no ssh2 dep); recorded in design## Implementation notes. Verify:pnpm -C apps/control test(ssh-config 15). Not live-smoked — no reachable Windows SSH target in this session (the "Windows SSH fiddliness" risk); the failure-path test suite stands in. - DEFERRED —
llama-bench-over-SSH ingestion for device-level numbers. Reason: depends on the SSH plumbing from P9.1 landing + a live host to runllama-benchon; it is also explicitly YAGNI-deferred in the implementation-plan ("Reopen when SSH plumbing from P9.1 lands"). The P9.1 exec seam (SshExec) is the hook a follow-up reuses. - DEFERRED — boocontrol.indifferentketchup.com vhost (Caddy/Authelia rewrite →
/control). Reason: pure reverse-proxy/ops config (Caddyfile + Authelia rules) on the homelab host, no repo code;/controlalready works behind the existing boocode origin via theregisterControlProxyrelay. Out of scope for a code batch. - DEFERRED — Frontier providers as routing targets; slim
controlpane kind for in-workspace mini-cockpit. Reason: two sizeable independent features (frontier-provider routing belongs with the registry/provider work; a new workspace pane kind is its own UI batch). Marked optional in the implementation-plan Deferred section; out of reach for an additive P6–P9 pass without dedicated design.