Files

indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).

2026-06-14 12:48:47 +00:00

29 KiB

Raw Blame History

Plan: boocontrol

Folder

openspec/changes/boocontrol/

Task count

51 (P0: 2, P1: 15, P2: 5, P3: 5, P4: 4, P5: 4, P6: 2, P7: 4, P8: 1 outline, P9: 1 outline)

Size

Large -- 10-phase program spanning 4 apps + contracts, ~12 new DB tables, 5 new WS frame types, new host service, routing gateway, eval sandbox

Validation

openspec validate boocontrol: skipped (pre-spec-format acceptance; validation against openspec CLI format not applicable to accepted spec) Adversarial validator: 18 findings (3 CRITICAL folded, 7 MINOR folded, 8 CONFIRMED) Junior developer: 24 findings (7 clarifying folded, 3 polish noted, 2 specialist handoffs deferred, 12 confirmed)

Findings folded into this plan

Critical (folded):

V1 (jitter): The opencode-sse.ts pattern referenced in design S4 has backoff + circuit-breaker but NO jitter. The BooControl SSE connector must add jitter explicitly (random 0-50% of computed delay) to avoid thundering-herd reconnections across N hosts.
V7 (waitForTable): No waitForTable function exists anywhere in the codebase. P1 must create it in apps/control/src/db.ts as an explicit task.
V11 (schema indexes): P1 schema creates tables but defines zero indexes. The retention job queries control_requests by (provider_id, ts), the perf poller recovers watermarks via MAX(ts), and the activity feed sorts by ts. Without indexes these queries scan full tables as rows accumulate (~35k/day raw). Add explicit index tasks for control_requests(provider_id, ts), control_perf_samples(provider_id, ts), control_model_events(provider_id, ts).

Clarifying (folded):

JD1 (server loose union): Control frames skip the server's broker entirely (they relay raw bytes through the proxy). Adding them to the server's InferenceFrame union is dead code. Skip the server union update; document that control frames use a 2-location pattern (contracts + web strict union only).
JD3 (control_hosts seed): Seed os and gpu_label as hardcoded display metadata ('Windows'/'RTX 5090 32GB', 'Linux'/'P104-100 8GB'); ssh_*, config_path, restart_cmd are NULL until P9.
JD5 (@fastify/websocket): Add @fastify/websocket to P1 scaffolding dependencies.
JD6 (capture cap): The 256KB capture cap is application-enforced in the capture-fetch handler, not a DB constraint.
JD7 (acquireHostAccess): Scaffold acquireHostAccess in P1 as a no-op ({ok: true}) so P3 calls it and P8 swaps its body.
JD8 (gap_suspected): Store as a row in control_model_events with model = '*' and state = 'gap_suspected', timestamps in detail JSONB.
JD14 (schema overview): Only create P1 tables in P1; annotate the design S3 schema overview with phase tags.
JD16 (P1 source): P1 activity feed shows source = NULL; per-consumer filtering lands in P4.

Minor (folded):

V2 (drift test): The existing ws-frames.test.ts only checks KNOWN_FRAME_TYPES vs WsFrameSchema alignment, not web strict union sync. Add a comment to the P1 task noting web union sync is manual.
V3 (blast radius, corrected by plan validation F1/F4): upstreamModel has exactly 1 production importer (stream-phase-adapter.ts:16), not ~5 and not 28/13. The other provider-module consumers import resolveModelProvider/resolveModelEndpoint/resolveRoute/getModelContext instead. The additive-change constraint stands; the real P7 blast surface is resolveModelProvider's 6 direct callers propagating to ~10 downstream call sites.
V6 (local-gateway): local-gateway.ts omits X-Boo-Source (doesn't include it) rather than actively stripping it. Same fix either way.
JD4 (proxy WS path): The control proxy WS path is static (/api/control/ws), not parameterized like coder-proxy's per-session path.

New findings (folded):

V12 (P7 caller audit detail): The prior plan says "audit all 5 callers" but doesn't specify what each caller needs. Added per-caller change specs: getModelContext/invalidateModelContext (model-context.ts) must handle gateway baseUrl; resolveRoute (provider.ts) must return {route: 'gateway'}; upstreamModel (provider.ts) must add gateway branch before swap fallback; resolveModelEndpoint (provider.ts) must handle gateway headers.
V13 (ECharts theme integration): The plan says "dark-theme tokens from active oklch palette" but doesn't specify how. Added: use echarts.init(dom, themeObject) with a theme object built from the CSS custom properties (--background, --foreground, --muted, --accent) via getComputedStyle. One theme-build helper, not per-chart.
V14 (action queue semantics): "unload-during-bench -> takeover confirmation" needs explicit HTTP semantics. Added: the action endpoint returns 409 with {error: 'bench in progress', requiresConfirmation: true}; the client shows a confirmation dialog and re-submits with ?confirm=true.
V15 (capture total budget default): The plan mentions "total budget prune" but gives no default. Added: 50MB default, configurable via CAPTURE_BUDGET_MB env var.
V16 (openevals reference verified): /opt/forks/openevals exists and contains js/, python/, sandbox/ directories. The sandbox pattern (Docker hardened containers) is confirmed available.
V17 (P7 gateway error shape): InferenceRoute extension needs explicit error representation. Added: 'gateway' | 'gateway_error' variants; gateway_error carries {reason: 'offline' | 'unhealthy'}. The 5 callers must handle both.
V18 (SSE connector event shape delta): The opencode-sse.ts pattern is for the opencode SDK's Event type; BooControl consumes raw llama-swap SSE (/api/events) with a different envelope (modelStatus | logData | metrics | inflight). The reconnect/backoff/circuit-breaker pattern ports directly; the event parsing is new code, not a port. Noted in P1.4.

Junior developer new findings (folded):

JD17 (schema index timing): Indexes should be created in the same P1 task as the tables they index, not as a separate phase. Consolidated into P1.3.
JD18 (action queue depth cap message): When the queue is full (depth=4), the error message should include the current queue contents so the user knows what's pending. Added to P2.1 spec.
JD19 (acquireHostAccess signature): The function signature must be acquireHostAccess(providerId: string, purpose: string): Promise<{ok: boolean, reason?: string}> -- explicit in P1.14, called by P3.1.
JD20 (snapshot rebuild on restart): When the control service restarts, the in-memory fleet state is lost. The WS endpoint must rebuild from DB (control_model_events for latest state, control_requests for last-seen activity) before serving snapshots. Added to P1.6.
JD21 (activity feed sort order): The live activity feed must sort by ts DESC (newest first) with react-virtuoso's followOutput="bottom" for live insertion. Added to P1.12.
JD22 (ECharts bundle impact): Per-chart echarts/core imports add ~15-25KB per chart type (gauge, line, scatter). With 3-4 charts in P1, the incremental bundle is ~60-100KB. Acceptable given the batteries-included tradeoff documented in design S9. Noted in P1.13.
JD23 (P7 provider.ts callers -- compile check): All 5 callers must compile unchanged for the new InferenceRoute variant. The upstreamModel function's implicit else branch (line 192) currently always reaches getSwapProvider -- the gateway variant must be handled before it. Added explicit check.
JD24 (deploy docs in P1.1): The systemd unit file and deploy docs must include the BOOCONTROL_URL env var (for apps/server's proxy) and DATABASE_URL (shared boochat DB). Added to P1.1 spec.

P0 -- prerequisite gate (separate batch: multi-llama-swap provider registry)

Gate: P0 must be committed and reviewed before P1 starts. BooControl keys every host-scoped row on LlamaProvider.id from packages/contracts/src/llama-providers.ts. The committed contract is the foundation.

Finish remaining tasks in openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md: favorites hide-not-delete UI/route tests; smoke test sam-desktop + embedding (+ DeepSeek config).
Sam reviews and commits the batch (currently working-tree only).

P1 -- read-only cockpit

Demo: Watch both hosts live (models, swaps, VRAM/temp, request feed) while chatting.

Scaffold + DB

P1.1 Scaffold apps/control: new directory, Fastify + @fastify/websocket + postgres + zod dependencies, TS NodeNext, .env.example/.env.host, port 9503, /api/health endpoint, systemd unit boocontrol.service. Deploy docs in root CLAUDE.md (include BOOCONTROL_URL for apps/server proxy, DATABASE_URL for shared boochat DB). Pattern: apps/coder/src/index.ts for Fastify bootstrap, apps/coder/src/db.ts for getSql/applySchema/pingDb/closeDb.
P1.2 apps/control/src/db.ts with applySchema + waitForTable helper. waitForTable(sql, tableName, timeoutMs) polls information_schema.tables WHERE table_name = $1 with exponential backoff (100ms base, 2s cap); throws on timeout so systemd Restart=on-failure retries. Call waitForTable(sql, 'sessions', 30_000) before applySchema(). Pattern: apps/coder/src/db.ts for the getSql/applySchema/pingDb/closeDb shape; waitForTable is new (no existing implementation).
P1.3 apps/control/src/schema.sql -- P1 tables only (do NOT create bench_/eval_/route_policies/control_reports tables yet):
- control_hosts: provider_id TEXT PK (FK-by-convention to LlamaProvider.id), ssh_host TEXT, ssh_user TEXT, ssh_key_path TEXT, config_path TEXT, restart_cmd TEXT, os TEXT, gpu_label TEXT, enabled BOOLEAN DEFAULT true. Seed: INSERT INTO control_hosts (provider_id, os, gpu_label) VALUES ('sam-desktop', 'Windows', 'RTX 5090 32GB'), ('embedding', 'Linux', 'P104-100 8GB') ON CONFLICT DO NOTHING. SSH/config columns NULL until P9.
- control_requests: id BIGSERIAL PK, provider_id TEXT, swap_entry_id INT, ts TIMESTAMPTZ, model TEXT, req_path TEXT, status_code INT, duration_ms INT, cache_tokens INT, input_tokens INT, output_tokens INT, prompt_tps REAL, gen_tps REAL, has_capture BOOLEAN, capture JSONB. UNIQUE (provider_id, swap_entry_id, ts). NO source column (P4 adds it). Index: CREATE INDEX IF NOT EXISTS idx_control_requests_provider_ts ON control_requests (provider_id, ts DESC).
- control_perf_samples: provider_id TEXT, ts TIMESTAMPTZ, gpu JSONB, sys JSONB. UNIQUE (provider_id, ts). Index: CREATE INDEX IF NOT EXISTS idx_control_perf_samples_provider_ts ON control_perf_samples (provider_id, ts DESC).
- control_perf_rollup_5m: provider_id TEXT, bucket TIMESTAMPTZ, gpu_agg JSONB, sys_agg JSONB. UNIQUE (provider_id, bucket).
- control_model_events: provider_id TEXT, model TEXT, state TEXT, ts TIMESTAMPTZ, detail JSONB. UNIQUE (provider_id, model, state, ts). Index: CREATE INDEX IF NOT EXISTS idx_control_model_events_provider_ts ON control_model_events (provider_id, ts DESC).
- All use clock_timestamp() for created_at; JSONB via sql.json(value as never).

Connectors + ingestion

P1.4 Fleet connector per enabled host: SSE client consuming GET /api/events with exponential backoff (base 1s, max 30s) + jitter (random 0-50% of computed delay) + circuit-breaker (6 consecutive failures -> give-up). Port the opencode-sse.ts reconnectDecision function (add jitter to the BooControl copy). Note: the reconnect/backoff/circuit-breaker pattern ports directly from opencode-sse.ts; the event parsing is new code because llama-swap's SSE envelope (modelStatus | logData | metrics | inflight) differs from the opencode SDK's Event type. Explicit connected | reconnecting | down liveness state machine + last_seen_at in-memory. On reconnect, reconcile via GET /api/metrics (full ring) with INSERT ... ON CONFLICT DO NOTHING (never check-then-act). Gap detection: if oldest reconcile entry is newer than newest persisted entry for that provider, insert gap_suspected model event with model='*' and timestamps in detail JSONB.
P1.5 Perf poller: GET /api/performance?after=<watermark> every 5s per host. Watermark recovered from MAX(ts) per provider in control_perf_samples on restart. NULL watermark (fresh install) -> omit after param, ingest returned window (UNIQUE constraint makes over-fetch harmless).
P1.6 In-memory fleet state with per-host monotonic seq counter, incremented on every mutation. WS endpoint /api/ws/control: snapshot-on-join carrying current seqs + seq-stamped deltas. Client rule: buffer pre-snapshot deltas, replay after snapshot applying only seq > snapshot_seq. On service restart, rebuild fleet state from DB before serving snapshots: query control_model_events for latest model state per provider, control_requests for last activity, control_perf_samples for latest perf sample.

Retention (same P1 slice)

P1.7 Retention job: daily in-process timer. Rollup as idempotent upsert (INSERT INTO control_perf_rollup_5m ... ON CONFLICT (provider_id, bucket) DO UPDATE recomputed from raw). Delete raw only after covering buckets committed, in chunked transactions (one per provider per 1-hour window, never one mega-transaction). Activity prune > 90d. Capture size: 256KB per-row cap enforced in application code before INSERT (not a DB constraint); total budget prune with 50MB default, configurable via CAPTURE_BUDGET_MB env var. All windows configurable via .env.host.

Contracts (build FIRST)

P1.8 Add 5 frame types to packages/contracts/src/ws-frames.ts:
- control_fleet -- full snapshot on join + seq-stamped state deltas (hosts, liveness, models, states, ttl, inflight)
- control_activity -- new request rows (live feed)
- control_perf -- appended samples per host
- control_log -- {provider_id, source: proxy|upstream, line} batches
- control_job -- bench/eval run progress events
Add to both WsFrameSchema discriminated union AND KNOWN_FRAME_TYPES array. Rebuild package (pnpm -C packages/contracts build).

Note: Control frames use a 2-location sync pattern (contracts + web strict union only). They skip the server's InferenceFrame union because they never flow through the server's broker. The web strict union is the wire-format gate; missing it silently drops frames at JSON parse.

Drift test note: The existing ws-frames.test.ts checks KNOWN_FRAME_TYPES vs WsFrameSchema alignment. There is no automated check for web strict union sync -- that alignment is manual and verified by the implementer. Add a comment in the test noting this limitation.

Server proxy

P1.9 apps/server/src/routes/control-proxy.ts: registerControlProxy(app, boocontrolOrigin) following the same structure as registerCoderProxy but with a static WS path /api/control/ws (not parameterized per-session). HTTP all-catch at /api/control/*. Add keep-in-sync comment in both coder-proxy.ts and control-proxy.ts. BOOCONTROL_URL env var. Register in apps/server/src/index.ts.

Web UI

P1.10 Web: /control route in App.tsx, nav entry in ProjectSidebar.tsx (under Memory cluster, Radio icon from lucide), pages/Control.tsx shell with Fleet + Activity tabs. useControlStream as a second app-level WS singleton (own React context + connection guard, targets proxied /api/control/ws). Client discards deltas with seq <= snapshot_seq. Activity feed note: shows source = NULL in P1; per-consumer breakdown lands in P4.
P1.11 Fleet tab: host cards as instrument clusters. State chips with color/glow (amber pulse starting, green steady ready, red error, grey down with last-seen relative time). VRAM/temp/power readouts. TTL countdown rings. Dark mission-control aesthetic. Orbitron for numerals, Inter for prose.
P1.12 Activity feed: react-virtuoso tail-follow viewer (already a dep) with followOutput="bottom" for live insertion, ts DESC sort order. Filter chips for model and host. Pause-on-scroll toggle.
P1.13 Charts: integrate ECharts (per-chart module imports via echarts/core + needed renderers). Dark theme: build a theme object from CSS custom properties (--background, --foreground, --muted, --accent) via getComputedStyle(document.documentElement) and pass to echarts.init(dom, theme). One buildEChartsTheme() helper, not per-chart. Incremental bundle impact ~60-100KB for 3-4 chart types (gauge, line, scatter) -- acceptable per design S9 tradeoff.

Host-access seam

P1.14 Create apps/control/src/services/host-access.ts with acquireHostAccess(providerId: string, purpose: string): Promise<{ok: boolean, reason?: string}>. V1 body: no-op returning {ok: true}. This is the P8 seam -- P8 swaps the body for a DB lease without touching the bench engine. Export for P3.1 to import.

Tests

P1.15 Tests: connector dedup/reconcile + gap detection as pure helpers (turn-guard.ts pattern); liveness state machine transitions; retention idempotency (re-run same window produces identical rollups); seq logic (buffer, discard stale, apply snapshot). DB tests describe.runIf(process.env.DATABASE_URL).

P2 -- hands on the controls

Demo: Unload from UI, watch the swap stream, open a capture.

P2.1 Per-host FIFO action queue in the control service. Actions: warm (1-token POST /v1/chat/completions with bare wire ID), unload one/all (POST /api/models/unload/:model or /api/models/unload). Serialize through single FIFO queue per provider_id. Unload-during-bench -> return 409 with {error: 'bench in progress', requiresConfirmation: true}; client shows confirmation dialog and re-submits with ?confirm=true. Reject submissions while host is down ("host offline" toast). Cap depth (4) with reject-on-full; error response includes current queue contents so the user knows what's pending. Re-check liveness on dequeue + skip stale actions (design S5). Pattern: arena-runner.ts advanceChain promise-chain + read-fresh-state-or-skip.
P2.2 Optimistic UI off control_fleet frames only. No local emits after API calls (event-dedup discipline per CLAUDE.md). The API call triggers a server-side mutation that publishes a control_fleet delta; the frontend updates from the WS frame, not from a local state change.
P2.3 Logs tab: relay /api/events logData -> control_log frame. In-memory 2k-line tail buffer per host for late joiners. React-virtuoso tail-follow viewer with per-source filter (proxy/upstream/model) + pause-on-scroll.
P2.4 Inspector: activity table (virtuoso) -> capture drawer. GET /api/captures/:id via control service, decode base64, persist trimmed copy (256KB cap enforced in application code before INSERT), render with shiki-highlighted JSON. "Open in Playground" stub (links to P3).
P2.5 Op task (manual, documented in design): enable captureBuffer + review metricsMaxInMemory on both hosts' llama-swap configs.

P3 -- playground + speed bench (manual, safe-by-construction)

Demo: TTFT-vs-concurrency curves for two quants, run by hand without disturbing a live chat.

P3.1 Playground tab: model select (grouped picker from provider registry), param controls, streaming chat, side-by-side A/B compare (two ModelBubble components in parallel, same prompt, different models). "Battle in Arena" handoff link (opens Arena dialog with pre-filled prompt + contestants via the existing ArenaLauncherDialog pattern).
P3.2 Bench engine: suite model (data/ YAML, grid of prompt_len x gen_len x concurrency x repetitions). Runner with TTFT capture (client-side first delta) + llama.cpp timings parse (prompt_per_second, predicted_per_second, cache_n from final stream chunk). Bounded fan-out (Promise.allSettled, suite-declared concurrency only). Results as aggregates + raw samples to bench_suites/bench_runs/bench_samples tables. Add schema for these 3 tables in this task.
P3.3 V1 safety: user-initiated runs only; takeover confirmation when target host shows recent traffic; embedding-host-first defaults; concurrent_foreign_requests recorded per run from activity stream to flag polluted results. Unattended scheduling deliberately absent (P8).
P3.4 Wire acquireHostAccess(providerId, purpose) from P1.14 into the bench runner. The runner MUST gate every run through this function -- never inline the inflight check. P8 swaps its body.
P3.5 Bench UI: run launcher, live progress via control_job frames, history charts (TTFT vs concurrency, tok/s over time via ECharts), baseline + regression flags (delta beyond -10% gen tok/s threshold).

P4 -- per-consumer attribution (X-Boo-Source, end-to-end)

Demo: Activity feed filtered to "arena" shows only Arena traffic; nothing reads NULL.

P4.1 apps/server: per-turn fetch-wrapper injection on AI-SDK streaming path. Thread source through the call site. getSwapProvider cache keyed by baseURL+source (label set: boochat|boocoder|arena|control-bench|control-eval). upstreamModel signature change must be additive (optional source param -- 1 production importer: stream-phase-adapter.ts:309; validated by plan-validation F1). Extend headers in compaction.ts and task-model.ts direct fetches.
P4.2 apps/coder: forward inbound x-boo-source header in local-gateway.ts (currently omitted from forwarded headers). Set it at Arena + dispatch fetch sites.
P4.3 Migration: ALTER TABLE control_requests ADD COLUMN source TEXT. Surface as Activity filter + per-source token aggregates in the UI.
P4.4 Tests: header present on all three paths (server streaming, gateway-forwarded opencode, arena direct); rows attribute correctly in control_requests.

P5 -- quality evals + sandbox

Demo: Fleet leaderboard with speed x quality scatter.

P5.1 Suite format (data/ YAML: chat rubric tasks, code tasks with tests); CRUD + versioning. Four suites in priority order: (1) agent coding tasks, (2) chat assistant quality, (3) long-context retrieval, (4) utility calls (titles/summaries). Add schema for eval_suites/eval_runs/eval_results tables in this task.
P5.2 Judge runner: temperature 0, pinned judge model+version, rubric scoring, rationale capture. Pairwise tie-breaks delegate to Arena (links/launches battles, not re-implements). Judge = strongest local model by default.
P5.3 Code sandbox runner: ephemeral Docker containers (--network none, non-root, caps dropped, tmpfs workdir, --rm, kill-on-timeout, boocontrol-eval label for orphan findability). Orphan prune at engine start (docker ps --filter label=boocontrol-eval). Bounded concurrency (default 4) + Promise.allSettled + per-task finally cleanup. Pass@1 scoring. Patterns from /opt/forks/openevals (verified: sandbox/ directory exists with Docker hardened container patterns). Harden: --security-opt=no-new-privileges, --cap-drop=ALL.
P5.4 Leaderboard UI + speed x quality scatter per (provider_id, model, quant) using ECharts (reuse the buildEChartsTheme() helper from P1.13).

P6 -- advisory routing + reports

Demo: Picker badges "best code model right now"; Monday-morning fleet report.

P6.1 Advisory scores API (eval results + live latency + host health) -> model-picker badges. Expose via GET /api/control/routing/scores.
P6.2 Reports: scheduled digest job (usage, trends, swap counts, leaderboard deltas, anomalies vs baselines) -> control_reports. Same in-process timer pattern as retention (P1), schedule_meta = {interval, enabled, last_run_at} with catch-up on boot. Reports tab + markdown export. Add control_reports schema in this task.

P7 -- live `auto:*` gateway (committed)

Demo: An auto:code session in BooChat routes to the current best code model with failover.

P7.1 Control service: OpenAI-compatible virtual models (auto, auto:code, auto:fast, auto:cheap) backed by route_policies table. Policy: rule match -> candidate ordering -> health/ctx-fit filter -> dispatch with failover. Gateway forwards X-Boo-Source to target host. Add route_policies schema in this task.
P7.2 Registry entry: kind: "boocontrol-gateway" with baseUrl: "http://100.114.205.53:9503". BooChat adopts with zero inference-path changes.
P7.3 apps/server/src/services/inference/provider.ts -- the code change required for orphaned-session handling:
- Extend InferenceRoute from 'swap' | 'deepseek' to 'swap' | 'deepseek' | 'gateway' | 'gateway_error'
- gateway_error carries {reason: 'offline' | 'unhealthy'} for structured error reporting
- Override the unknown-provider fallback (current behavior at line 147: composite id with unknown provider silently routes to LLAMA_SWAP_URL). For gateway-kind ids that are missing/disabled, resolve to route: 'gateway_error' with reason: 'offline', never the swap fallback.
- Audit all 5 callers with explicit per-caller changes:
  1. getModelContext (model-context.ts:85) -- must handle gateway baseUrl (query /upstream/<model>/props against the control service, not the target host)
  2. invalidateModelContext (model-context.ts:160) -- must handle gateway variant (no-op; gateway doesn't cache model context)
  3. resolveRoute (provider.ts:175) -- must return {route: 'gateway'} for gateway-kind ids
  4. upstreamModel (provider.ts:184) -- must add gateway branch before the swap fallback at line 192; the implicit else currently always reaches getSwapProvider
  5. resolveModelEndpoint (provider.ts:201) -- must handle gateway headers (forward X-Boo-Source)
- Propagation note (plan-validation F2): these 5 direct call sites fan out to ~10 downstream production call sites (stream-phase-adapter, compaction, task-model, system-prompt, error-handler, tool-phase, chats, stream-phase); none need signature changes (gateway handling is internal to each function) but all need test coverage.
- Audit clarification (plan-validation F7): system-prompt.ts:195 calls resolveRoute(agent) with no config/modelId, so it always returns {route: 'swap'} and needs NO gateway handling.
- All must compile unchanged for the new variant (additive, not breaking)
- The session keeps its id; the picker flags affected sessions.
P7.4 Policy editor UI (route_policies CRUD) + per-policy dispatch log in the Reports tab.

P8 -- fleet coordination lease (cross-service batch, own design pass)

Outline only. The proper fix for the four-writer TOCTOU. P3 left a seam (acquireHostAccess in host-access.ts) that P8 swaps.

P8.1 Design + ship control_host_leases (holder, purpose, expires_at, heartbeat) and the honor-protocol in all four writers (BooChat, BooCoder, Arena, BooControl). Scope: separate proposal under openspec/changes/. The BooControl bench scheduler consumes it through the acquireHostAccess seam left in P3. Unattended bench scheduling + reproducible concurrency sweeps unlock here.

P9 -- remote hands + optional

Outline only.

P9.1 SSH config editor: SFTP read -> schema-validated edit (config-schema.json from the fork) -> diff preview -> timestamped backup -> SFTP write -> restart (nssm/systemctl) -> health-wait. Key in secrets/ (gitignored). Tests for the failure paths.
P9.2 llama-bench-over-SSH ingestion for device-level numbers.
P9.3 boocontrol.indifferentketchup.com vhost (Caddy/Authelia rewrite -> /control).
P9.4 Frontier providers as routing targets; slim control pane kind for in-workspace mini-cockpit.

Deferred (YAGNI)

Items removed from active scope with reopen triggers:

Prometheus/Grafana integration -- BooControl persists its own samples; /metrics endpoints stay available. Reopen when an external monitoring stack is actually deployed.
Multi-user/auth -- Authelia at the proxy layer. Reopen when multi-user is needed.
Non-llama-swap engine connectors (vLLM, Ollama, infinity-emb) -- connector interface should not preclude them. Reopen when a second engine kind is actually added.
Cross-process GPU arbitration -- four uncoordinated writers is accepted in v1. Reopen when the P8 lease proves insufficient.
Log persistence to file -- logs are relay-only with in-memory tail. Reopen when log volume warrants durable storage.
llama-bench over SSH (P9.2) -- device-level numbers. Reopen when SSH plumbing from P9.1 lands.
llama-swap peers federation -- flat list, coupled uptime, silent ID collisions. Reopen if the provider registry proves insufficient for host coordination.

Next step

Validate independently with boo-validating-changes boocontrol, then implement with boo-implementing-changes boocontrol. P0 gate first (commit the multi-provider batch), then P1.

29 KiB Raw Blame History