Files
boocode/openspec/changes/boocontrol/artifacts/implementation-plan.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

29 KiB

Plan: boocontrol

Folder

openspec/changes/boocontrol/

Task count

51 (P0: 2, P1: 15, P2: 5, P3: 5, P4: 4, P5: 4, P6: 2, P7: 4, P8: 1 outline, P9: 1 outline)

Size

Large -- 10-phase program spanning 4 apps + contracts, ~12 new DB tables, 5 new WS frame types, new host service, routing gateway, eval sandbox

Validation

openspec validate boocontrol: skipped (pre-spec-format acceptance; validation against openspec CLI format not applicable to accepted spec) Adversarial validator: 18 findings (3 CRITICAL folded, 7 MINOR folded, 8 CONFIRMED) Junior developer: 24 findings (7 clarifying folded, 3 polish noted, 2 specialist handoffs deferred, 12 confirmed)


Findings folded into this plan

Critical (folded):

  • V1 (jitter): The opencode-sse.ts pattern referenced in design S4 has backoff + circuit-breaker but NO jitter. The BooControl SSE connector must add jitter explicitly (random 0-50% of computed delay) to avoid thundering-herd reconnections across N hosts.
  • V7 (waitForTable): No waitForTable function exists anywhere in the codebase. P1 must create it in apps/control/src/db.ts as an explicit task.
  • V11 (schema indexes): P1 schema creates tables but defines zero indexes. The retention job queries control_requests by (provider_id, ts), the perf poller recovers watermarks via MAX(ts), and the activity feed sorts by ts. Without indexes these queries scan full tables as rows accumulate (~35k/day raw). Add explicit index tasks for control_requests(provider_id, ts), control_perf_samples(provider_id, ts), control_model_events(provider_id, ts).

Clarifying (folded):

  • JD1 (server loose union): Control frames skip the server's broker entirely (they relay raw bytes through the proxy). Adding them to the server's InferenceFrame union is dead code. Skip the server union update; document that control frames use a 2-location pattern (contracts + web strict union only).
  • JD3 (control_hosts seed): Seed os and gpu_label as hardcoded display metadata ('Windows'/'RTX 5090 32GB', 'Linux'/'P104-100 8GB'); ssh_*, config_path, restart_cmd are NULL until P9.
  • JD5 (@fastify/websocket): Add @fastify/websocket to P1 scaffolding dependencies.
  • JD6 (capture cap): The 256KB capture cap is application-enforced in the capture-fetch handler, not a DB constraint.
  • JD7 (acquireHostAccess): Scaffold acquireHostAccess in P1 as a no-op ({ok: true}) so P3 calls it and P8 swaps its body.
  • JD8 (gap_suspected): Store as a row in control_model_events with model = '*' and state = 'gap_suspected', timestamps in detail JSONB.
  • JD14 (schema overview): Only create P1 tables in P1; annotate the design S3 schema overview with phase tags.
  • JD16 (P1 source): P1 activity feed shows source = NULL; per-consumer filtering lands in P4.

Minor (folded):

  • V2 (drift test): The existing ws-frames.test.ts only checks KNOWN_FRAME_TYPES vs WsFrameSchema alignment, not web strict union sync. Add a comment to the P1 task noting web union sync is manual.
  • V3 (blast radius, corrected by plan validation F1/F4): upstreamModel has exactly 1 production importer (stream-phase-adapter.ts:16), not ~5 and not 28/13. The other provider-module consumers import resolveModelProvider/resolveModelEndpoint/resolveRoute/getModelContext instead. The additive-change constraint stands; the real P7 blast surface is resolveModelProvider's 6 direct callers propagating to ~10 downstream call sites.
  • V6 (local-gateway): local-gateway.ts omits X-Boo-Source (doesn't include it) rather than actively stripping it. Same fix either way.
  • JD4 (proxy WS path): The control proxy WS path is static (/api/control/ws), not parameterized like coder-proxy's per-session path.

New findings (folded):

  • V12 (P7 caller audit detail): The prior plan says "audit all 5 callers" but doesn't specify what each caller needs. Added per-caller change specs: getModelContext/invalidateModelContext (model-context.ts) must handle gateway baseUrl; resolveRoute (provider.ts) must return {route: 'gateway'}; upstreamModel (provider.ts) must add gateway branch before swap fallback; resolveModelEndpoint (provider.ts) must handle gateway headers.
  • V13 (ECharts theme integration): The plan says "dark-theme tokens from active oklch palette" but doesn't specify how. Added: use echarts.init(dom, themeObject) with a theme object built from the CSS custom properties (--background, --foreground, --muted, --accent) via getComputedStyle. One theme-build helper, not per-chart.
  • V14 (action queue semantics): "unload-during-bench -> takeover confirmation" needs explicit HTTP semantics. Added: the action endpoint returns 409 with {error: 'bench in progress', requiresConfirmation: true}; the client shows a confirmation dialog and re-submits with ?confirm=true.
  • V15 (capture total budget default): The plan mentions "total budget prune" but gives no default. Added: 50MB default, configurable via CAPTURE_BUDGET_MB env var.
  • V16 (openevals reference verified): /opt/forks/openevals exists and contains js/, python/, sandbox/ directories. The sandbox pattern (Docker hardened containers) is confirmed available.
  • V17 (P7 gateway error shape): InferenceRoute extension needs explicit error representation. Added: 'gateway' | 'gateway_error' variants; gateway_error carries {reason: 'offline' | 'unhealthy'}. The 5 callers must handle both.
  • V18 (SSE connector event shape delta): The opencode-sse.ts pattern is for the opencode SDK's Event type; BooControl consumes raw llama-swap SSE (/api/events) with a different envelope (modelStatus | logData | metrics | inflight). The reconnect/backoff/circuit-breaker pattern ports directly; the event parsing is new code, not a port. Noted in P1.4.

Junior developer new findings (folded):

  • JD17 (schema index timing): Indexes should be created in the same P1 task as the tables they index, not as a separate phase. Consolidated into P1.3.
  • JD18 (action queue depth cap message): When the queue is full (depth=4), the error message should include the current queue contents so the user knows what's pending. Added to P2.1 spec.
  • JD19 (acquireHostAccess signature): The function signature must be acquireHostAccess(providerId: string, purpose: string): Promise<{ok: boolean, reason?: string}> -- explicit in P1.14, called by P3.1.
  • JD20 (snapshot rebuild on restart): When the control service restarts, the in-memory fleet state is lost. The WS endpoint must rebuild from DB (control_model_events for latest state, control_requests for last-seen activity) before serving snapshots. Added to P1.6.
  • JD21 (activity feed sort order): The live activity feed must sort by ts DESC (newest first) with react-virtuoso's followOutput="bottom" for live insertion. Added to P1.12.
  • JD22 (ECharts bundle impact): Per-chart echarts/core imports add ~15-25KB per chart type (gauge, line, scatter). With 3-4 charts in P1, the incremental bundle is ~60-100KB. Acceptable given the batteries-included tradeoff documented in design S9. Noted in P1.13.
  • JD23 (P7 provider.ts callers -- compile check): All 5 callers must compile unchanged for the new InferenceRoute variant. The upstreamModel function's implicit else branch (line 192) currently always reaches getSwapProvider -- the gateway variant must be handled before it. Added explicit check.
  • JD24 (deploy docs in P1.1): The systemd unit file and deploy docs must include the BOOCONTROL_URL env var (for apps/server's proxy) and DATABASE_URL (shared boochat DB). Added to P1.1 spec.

P0 -- prerequisite gate (separate batch: multi-llama-swap provider registry)

Gate: P0 must be committed and reviewed before P1 starts. BooControl keys every host-scoped row on LlamaProvider.id from packages/contracts/src/llama-providers.ts. The committed contract is the foundation.

  • Finish remaining tasks in openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md: favorites hide-not-delete UI/route tests; smoke test sam-desktop + embedding (+ DeepSeek config).
  • Sam reviews and commits the batch (currently working-tree only).

P1 -- read-only cockpit

Demo: Watch both hosts live (models, swaps, VRAM/temp, request feed) while chatting.

Scaffold + DB

  • P1.1 Scaffold apps/control: new directory, Fastify + @fastify/websocket + postgres + zod dependencies, TS NodeNext, .env.example/.env.host, port 9503, /api/health endpoint, systemd unit boocontrol.service. Deploy docs in root CLAUDE.md (include BOOCONTROL_URL for apps/server proxy, DATABASE_URL for shared boochat DB). Pattern: apps/coder/src/index.ts for Fastify bootstrap, apps/coder/src/db.ts for getSql/applySchema/pingDb/closeDb.

  • P1.2 apps/control/src/db.ts with applySchema + waitForTable helper. waitForTable(sql, tableName, timeoutMs) polls information_schema.tables WHERE table_name = $1 with exponential backoff (100ms base, 2s cap); throws on timeout so systemd Restart=on-failure retries. Call waitForTable(sql, 'sessions', 30_000) before applySchema(). Pattern: apps/coder/src/db.ts for the getSql/applySchema/pingDb/closeDb shape; waitForTable is new (no existing implementation).

  • P1.3 apps/control/src/schema.sql -- P1 tables only (do NOT create bench_/eval_/route_policies/control_reports tables yet):

    • control_hosts: provider_id TEXT PK (FK-by-convention to LlamaProvider.id), ssh_host TEXT, ssh_user TEXT, ssh_key_path TEXT, config_path TEXT, restart_cmd TEXT, os TEXT, gpu_label TEXT, enabled BOOLEAN DEFAULT true. Seed: INSERT INTO control_hosts (provider_id, os, gpu_label) VALUES ('sam-desktop', 'Windows', 'RTX 5090 32GB'), ('embedding', 'Linux', 'P104-100 8GB') ON CONFLICT DO NOTHING. SSH/config columns NULL until P9.
    • control_requests: id BIGSERIAL PK, provider_id TEXT, swap_entry_id INT, ts TIMESTAMPTZ, model TEXT, req_path TEXT, status_code INT, duration_ms INT, cache_tokens INT, input_tokens INT, output_tokens INT, prompt_tps REAL, gen_tps REAL, has_capture BOOLEAN, capture JSONB. UNIQUE (provider_id, swap_entry_id, ts). NO source column (P4 adds it). Index: CREATE INDEX IF NOT EXISTS idx_control_requests_provider_ts ON control_requests (provider_id, ts DESC).
    • control_perf_samples: provider_id TEXT, ts TIMESTAMPTZ, gpu JSONB, sys JSONB. UNIQUE (provider_id, ts). Index: CREATE INDEX IF NOT EXISTS idx_control_perf_samples_provider_ts ON control_perf_samples (provider_id, ts DESC).
    • control_perf_rollup_5m: provider_id TEXT, bucket TIMESTAMPTZ, gpu_agg JSONB, sys_agg JSONB. UNIQUE (provider_id, bucket).
    • control_model_events: provider_id TEXT, model TEXT, state TEXT, ts TIMESTAMPTZ, detail JSONB. UNIQUE (provider_id, model, state, ts). Index: CREATE INDEX IF NOT EXISTS idx_control_model_events_provider_ts ON control_model_events (provider_id, ts DESC).
    • All use clock_timestamp() for created_at; JSONB via sql.json(value as never).

Connectors + ingestion

  • P1.4 Fleet connector per enabled host: SSE client consuming GET /api/events with exponential backoff (base 1s, max 30s) + jitter (random 0-50% of computed delay) + circuit-breaker (6 consecutive failures -> give-up). Port the opencode-sse.ts reconnectDecision function (add jitter to the BooControl copy). Note: the reconnect/backoff/circuit-breaker pattern ports directly from opencode-sse.ts; the event parsing is new code because llama-swap's SSE envelope (modelStatus | logData | metrics | inflight) differs from the opencode SDK's Event type. Explicit connected | reconnecting | down liveness state machine + last_seen_at in-memory. On reconnect, reconcile via GET /api/metrics (full ring) with INSERT ... ON CONFLICT DO NOTHING (never check-then-act). Gap detection: if oldest reconcile entry is newer than newest persisted entry for that provider, insert gap_suspected model event with model='*' and timestamps in detail JSONB.

  • P1.5 Perf poller: GET /api/performance?after=<watermark> every 5s per host. Watermark recovered from MAX(ts) per provider in control_perf_samples on restart. NULL watermark (fresh install) -> omit after param, ingest returned window (UNIQUE constraint makes over-fetch harmless).

  • P1.6 In-memory fleet state with per-host monotonic seq counter, incremented on every mutation. WS endpoint /api/ws/control: snapshot-on-join carrying current seqs + seq-stamped deltas. Client rule: buffer pre-snapshot deltas, replay after snapshot applying only seq > snapshot_seq. On service restart, rebuild fleet state from DB before serving snapshots: query control_model_events for latest model state per provider, control_requests for last activity, control_perf_samples for latest perf sample.

Retention (same P1 slice)

  • P1.7 Retention job: daily in-process timer. Rollup as idempotent upsert (INSERT INTO control_perf_rollup_5m ... ON CONFLICT (provider_id, bucket) DO UPDATE recomputed from raw). Delete raw only after covering buckets committed, in chunked transactions (one per provider per 1-hour window, never one mega-transaction). Activity prune > 90d. Capture size: 256KB per-row cap enforced in application code before INSERT (not a DB constraint); total budget prune with 50MB default, configurable via CAPTURE_BUDGET_MB env var. All windows configurable via .env.host.

Contracts (build FIRST)

  • P1.8 Add 5 frame types to packages/contracts/src/ws-frames.ts:

    • control_fleet -- full snapshot on join + seq-stamped state deltas (hosts, liveness, models, states, ttl, inflight)
    • control_activity -- new request rows (live feed)
    • control_perf -- appended samples per host
    • control_log -- {provider_id, source: proxy|upstream, line} batches
    • control_job -- bench/eval run progress events

    Add to both WsFrameSchema discriminated union AND KNOWN_FRAME_TYPES array. Rebuild package (pnpm -C packages/contracts build).

    Note: Control frames use a 2-location sync pattern (contracts + web strict union only). They skip the server's InferenceFrame union because they never flow through the server's broker. The web strict union is the wire-format gate; missing it silently drops frames at JSON parse.

    Drift test note: The existing ws-frames.test.ts checks KNOWN_FRAME_TYPES vs WsFrameSchema alignment. There is no automated check for web strict union sync -- that alignment is manual and verified by the implementer. Add a comment in the test noting this limitation.

Server proxy

  • P1.9 apps/server/src/routes/control-proxy.ts: registerControlProxy(app, boocontrolOrigin) following the same structure as registerCoderProxy but with a static WS path /api/control/ws (not parameterized per-session). HTTP all-catch at /api/control/*. Add keep-in-sync comment in both coder-proxy.ts and control-proxy.ts. BOOCONTROL_URL env var. Register in apps/server/src/index.ts.

Web UI

  • P1.10 Web: /control route in App.tsx, nav entry in ProjectSidebar.tsx (under Memory cluster, Radio icon from lucide), pages/Control.tsx shell with Fleet + Activity tabs. useControlStream as a second app-level WS singleton (own React context + connection guard, targets proxied /api/control/ws). Client discards deltas with seq <= snapshot_seq. Activity feed note: shows source = NULL in P1; per-consumer breakdown lands in P4.

  • P1.11 Fleet tab: host cards as instrument clusters. State chips with color/glow (amber pulse starting, green steady ready, red error, grey down with last-seen relative time). VRAM/temp/power readouts. TTL countdown rings. Dark mission-control aesthetic. Orbitron for numerals, Inter for prose.

  • P1.12 Activity feed: react-virtuoso tail-follow viewer (already a dep) with followOutput="bottom" for live insertion, ts DESC sort order. Filter chips for model and host. Pause-on-scroll toggle.

  • P1.13 Charts: integrate ECharts (per-chart module imports via echarts/core + needed renderers). Dark theme: build a theme object from CSS custom properties (--background, --foreground, --muted, --accent) via getComputedStyle(document.documentElement) and pass to echarts.init(dom, theme). One buildEChartsTheme() helper, not per-chart. Incremental bundle impact ~60-100KB for 3-4 chart types (gauge, line, scatter) -- acceptable per design S9 tradeoff.

Host-access seam

  • P1.14 Create apps/control/src/services/host-access.ts with acquireHostAccess(providerId: string, purpose: string): Promise<{ok: boolean, reason?: string}>. V1 body: no-op returning {ok: true}. This is the P8 seam -- P8 swaps the body for a DB lease without touching the bench engine. Export for P3.1 to import.

Tests

  • P1.15 Tests: connector dedup/reconcile + gap detection as pure helpers (turn-guard.ts pattern); liveness state machine transitions; retention idempotency (re-run same window produces identical rollups); seq logic (buffer, discard stale, apply snapshot). DB tests describe.runIf(process.env.DATABASE_URL).

P2 -- hands on the controls

Demo: Unload from UI, watch the swap stream, open a capture.

  • P2.1 Per-host FIFO action queue in the control service. Actions: warm (1-token POST /v1/chat/completions with bare wire ID), unload one/all (POST /api/models/unload/:model or /api/models/unload). Serialize through single FIFO queue per provider_id. Unload-during-bench -> return 409 with {error: 'bench in progress', requiresConfirmation: true}; client shows confirmation dialog and re-submits with ?confirm=true. Reject submissions while host is down ("host offline" toast). Cap depth (4) with reject-on-full; error response includes current queue contents so the user knows what's pending. Re-check liveness on dequeue + skip stale actions (design S5). Pattern: arena-runner.ts advanceChain promise-chain + read-fresh-state-or-skip.

  • P2.2 Optimistic UI off control_fleet frames only. No local emits after API calls (event-dedup discipline per CLAUDE.md). The API call triggers a server-side mutation that publishes a control_fleet delta; the frontend updates from the WS frame, not from a local state change.

  • P2.3 Logs tab: relay /api/events logData -> control_log frame. In-memory 2k-line tail buffer per host for late joiners. React-virtuoso tail-follow viewer with per-source filter (proxy/upstream/model) + pause-on-scroll.

  • P2.4 Inspector: activity table (virtuoso) -> capture drawer. GET /api/captures/:id via control service, decode base64, persist trimmed copy (256KB cap enforced in application code before INSERT), render with shiki-highlighted JSON. "Open in Playground" stub (links to P3).

  • P2.5 Op task (manual, documented in design): enable captureBuffer + review metricsMaxInMemory on both hosts' llama-swap configs.


P3 -- playground + speed bench (manual, safe-by-construction)

Demo: TTFT-vs-concurrency curves for two quants, run by hand without disturbing a live chat.

  • P3.1 Playground tab: model select (grouped picker from provider registry), param controls, streaming chat, side-by-side A/B compare (two ModelBubble components in parallel, same prompt, different models). "Battle in Arena" handoff link (opens Arena dialog with pre-filled prompt + contestants via the existing ArenaLauncherDialog pattern).

  • P3.2 Bench engine: suite model (data/ YAML, grid of prompt_len x gen_len x concurrency x repetitions). Runner with TTFT capture (client-side first delta) + llama.cpp timings parse (prompt_per_second, predicted_per_second, cache_n from final stream chunk). Bounded fan-out (Promise.allSettled, suite-declared concurrency only). Results as aggregates + raw samples to bench_suites/bench_runs/bench_samples tables. Add schema for these 3 tables in this task.

  • P3.3 V1 safety: user-initiated runs only; takeover confirmation when target host shows recent traffic; embedding-host-first defaults; concurrent_foreign_requests recorded per run from activity stream to flag polluted results. Unattended scheduling deliberately absent (P8).

  • P3.4 Wire acquireHostAccess(providerId, purpose) from P1.14 into the bench runner. The runner MUST gate every run through this function -- never inline the inflight check. P8 swaps its body.

  • P3.5 Bench UI: run launcher, live progress via control_job frames, history charts (TTFT vs concurrency, tok/s over time via ECharts), baseline + regression flags (delta beyond -10% gen tok/s threshold).


P4 -- per-consumer attribution (X-Boo-Source, end-to-end)

Demo: Activity feed filtered to "arena" shows only Arena traffic; nothing reads NULL.

  • P4.1 apps/server: per-turn fetch-wrapper injection on AI-SDK streaming path. Thread source through the call site. getSwapProvider cache keyed by baseURL+source (label set: boochat|boocoder|arena|control-bench|control-eval). upstreamModel signature change must be additive (optional source param -- 1 production importer: stream-phase-adapter.ts:309; validated by plan-validation F1). Extend headers in compaction.ts and task-model.ts direct fetches.

  • P4.2 apps/coder: forward inbound x-boo-source header in local-gateway.ts (currently omitted from forwarded headers). Set it at Arena + dispatch fetch sites.

  • P4.3 Migration: ALTER TABLE control_requests ADD COLUMN source TEXT. Surface as Activity filter + per-source token aggregates in the UI.

  • P4.4 Tests: header present on all three paths (server streaming, gateway-forwarded opencode, arena direct); rows attribute correctly in control_requests.


P5 -- quality evals + sandbox

Demo: Fleet leaderboard with speed x quality scatter.

  • P5.1 Suite format (data/ YAML: chat rubric tasks, code tasks with tests); CRUD + versioning. Four suites in priority order: (1) agent coding tasks, (2) chat assistant quality, (3) long-context retrieval, (4) utility calls (titles/summaries). Add schema for eval_suites/eval_runs/eval_results tables in this task.

  • P5.2 Judge runner: temperature 0, pinned judge model+version, rubric scoring, rationale capture. Pairwise tie-breaks delegate to Arena (links/launches battles, not re-implements). Judge = strongest local model by default.

  • P5.3 Code sandbox runner: ephemeral Docker containers (--network none, non-root, caps dropped, tmpfs workdir, --rm, kill-on-timeout, boocontrol-eval label for orphan findability). Orphan prune at engine start (docker ps --filter label=boocontrol-eval). Bounded concurrency (default 4) + Promise.allSettled + per-task finally cleanup. Pass@1 scoring. Patterns from /opt/forks/openevals (verified: sandbox/ directory exists with Docker hardened container patterns). Harden: --security-opt=no-new-privileges, --cap-drop=ALL.

  • P5.4 Leaderboard UI + speed x quality scatter per (provider_id, model, quant) using ECharts (reuse the buildEChartsTheme() helper from P1.13).


P6 -- advisory routing + reports

Demo: Picker badges "best code model right now"; Monday-morning fleet report.

  • P6.1 Advisory scores API (eval results + live latency + host health) -> model-picker badges. Expose via GET /api/control/routing/scores.

  • P6.2 Reports: scheduled digest job (usage, trends, swap counts, leaderboard deltas, anomalies vs baselines) -> control_reports. Same in-process timer pattern as retention (P1), schedule_meta = {interval, enabled, last_run_at} with catch-up on boot. Reports tab + markdown export. Add control_reports schema in this task.


P7 -- live auto:* gateway (committed)

Demo: An auto:code session in BooChat routes to the current best code model with failover.

  • P7.1 Control service: OpenAI-compatible virtual models (auto, auto:code, auto:fast, auto:cheap) backed by route_policies table. Policy: rule match -> candidate ordering -> health/ctx-fit filter -> dispatch with failover. Gateway forwards X-Boo-Source to target host. Add route_policies schema in this task.

  • P7.2 Registry entry: kind: "boocontrol-gateway" with baseUrl: "http://100.114.205.53:9503". BooChat adopts with zero inference-path changes.

  • P7.3 apps/server/src/services/inference/provider.ts -- the code change required for orphaned-session handling:

    • Extend InferenceRoute from 'swap' | 'deepseek' to 'swap' | 'deepseek' | 'gateway' | 'gateway_error'
    • gateway_error carries {reason: 'offline' | 'unhealthy'} for structured error reporting
    • Override the unknown-provider fallback (current behavior at line 147: composite id with unknown provider silently routes to LLAMA_SWAP_URL). For gateway-kind ids that are missing/disabled, resolve to route: 'gateway_error' with reason: 'offline', never the swap fallback.
    • Audit all 5 callers with explicit per-caller changes:
      1. getModelContext (model-context.ts:85) -- must handle gateway baseUrl (query /upstream/<model>/props against the control service, not the target host)
      2. invalidateModelContext (model-context.ts:160) -- must handle gateway variant (no-op; gateway doesn't cache model context)
      3. resolveRoute (provider.ts:175) -- must return {route: 'gateway'} for gateway-kind ids
      4. upstreamModel (provider.ts:184) -- must add gateway branch before the swap fallback at line 192; the implicit else currently always reaches getSwapProvider
      5. resolveModelEndpoint (provider.ts:201) -- must handle gateway headers (forward X-Boo-Source)
    • Propagation note (plan-validation F2): these 5 direct call sites fan out to ~10 downstream production call sites (stream-phase-adapter, compaction, task-model, system-prompt, error-handler, tool-phase, chats, stream-phase); none need signature changes (gateway handling is internal to each function) but all need test coverage.
    • Audit clarification (plan-validation F7): system-prompt.ts:195 calls resolveRoute(agent) with no config/modelId, so it always returns {route: 'swap'} and needs NO gateway handling.
    • All must compile unchanged for the new variant (additive, not breaking)
    • The session keeps its id; the picker flags affected sessions.
  • P7.4 Policy editor UI (route_policies CRUD) + per-policy dispatch log in the Reports tab.


P8 -- fleet coordination lease (cross-service batch, own design pass)

Outline only. The proper fix for the four-writer TOCTOU. P3 left a seam (acquireHostAccess in host-access.ts) that P8 swaps.

  • P8.1 Design + ship control_host_leases (holder, purpose, expires_at, heartbeat) and the honor-protocol in all four writers (BooChat, BooCoder, Arena, BooControl). Scope: separate proposal under openspec/changes/. The BooControl bench scheduler consumes it through the acquireHostAccess seam left in P3. Unattended bench scheduling + reproducible concurrency sweeps unlock here.

P9 -- remote hands + optional

Outline only.

  • P9.1 SSH config editor: SFTP read -> schema-validated edit (config-schema.json from the fork) -> diff preview -> timestamped backup -> SFTP write -> restart (nssm/systemctl) -> health-wait. Key in secrets/ (gitignored). Tests for the failure paths.

  • P9.2 llama-bench-over-SSH ingestion for device-level numbers.

  • P9.3 boocontrol.indifferentketchup.com vhost (Caddy/Authelia rewrite -> /control).

  • P9.4 Frontier providers as routing targets; slim control pane kind for in-workspace mini-cockpit.


Deferred (YAGNI)

Items removed from active scope with reopen triggers:

  • Prometheus/Grafana integration -- BooControl persists its own samples; /metrics endpoints stay available. Reopen when an external monitoring stack is actually deployed.
  • Multi-user/auth -- Authelia at the proxy layer. Reopen when multi-user is needed.
  • Non-llama-swap engine connectors (vLLM, Ollama, infinity-emb) -- connector interface should not preclude them. Reopen when a second engine kind is actually added.
  • Cross-process GPU arbitration -- four uncoordinated writers is accepted in v1. Reopen when the P8 lease proves insufficient.
  • Log persistence to file -- logs are relay-only with in-memory tail. Reopen when log volume warrants durable storage.
  • llama-bench over SSH (P9.2) -- device-level numbers. Reopen when SSH plumbing from P9.1 lands.
  • llama-swap peers federation -- flat list, coupled uptime, silent ID collisions. Reopen if the provider registry proves insufficient for host coordination.

Next step

Validate independently with boo-validating-changes boocontrol, then implement with boo-implementing-changes boocontrol. P0 gate first (commit the multi-provider batch), then P1.