feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
29 KiB
Plan: boocontrol
Folder
openspec/changes/boocontrol/
Task count
51 (P0: 2, P1: 15, P2: 5, P3: 5, P4: 4, P5: 4, P6: 2, P7: 4, P8: 1 outline, P9: 1 outline)
Size
Large -- 10-phase program spanning 4 apps + contracts, ~12 new DB tables, 5 new WS frame types, new host service, routing gateway, eval sandbox
Validation
openspec validate boocontrol: skipped (pre-spec-format acceptance; validation against openspec CLI format not applicable to accepted spec)
Adversarial validator: 18 findings (3 CRITICAL folded, 7 MINOR folded, 8 CONFIRMED)
Junior developer: 24 findings (7 clarifying folded, 3 polish noted, 2 specialist handoffs deferred, 12 confirmed)
Findings folded into this plan
Critical (folded):
- V1 (jitter): The
opencode-sse.tspattern referenced in design S4 has backoff + circuit-breaker but NO jitter. The BooControl SSE connector must add jitter explicitly (random 0-50% of computed delay) to avoid thundering-herd reconnections across N hosts. - V7 (waitForTable): No
waitForTablefunction exists anywhere in the codebase. P1 must create it inapps/control/src/db.tsas an explicit task. - V11 (schema indexes): P1 schema creates tables but defines zero indexes. The retention job queries
control_requestsby(provider_id, ts), the perf poller recovers watermarks viaMAX(ts), and the activity feed sorts byts. Without indexes these queries scan full tables as rows accumulate (~35k/day raw). Add explicit index tasks forcontrol_requests(provider_id, ts),control_perf_samples(provider_id, ts),control_model_events(provider_id, ts).
Clarifying (folded):
- JD1 (server loose union): Control frames skip the server's broker entirely (they relay raw bytes through the proxy). Adding them to the server's
InferenceFrameunion is dead code. Skip the server union update; document that control frames use a 2-location pattern (contracts + web strict union only). - JD3 (control_hosts seed): Seed
osandgpu_labelas hardcoded display metadata ('Windows'/'RTX 5090 32GB','Linux'/'P104-100 8GB');ssh_*,config_path,restart_cmdare NULL until P9. - JD5 (@fastify/websocket): Add
@fastify/websocketto P1 scaffolding dependencies. - JD6 (capture cap): The 256KB capture cap is application-enforced in the capture-fetch handler, not a DB constraint.
- JD7 (acquireHostAccess): Scaffold
acquireHostAccessin P1 as a no-op ({ok: true}) so P3 calls it and P8 swaps its body. - JD8 (gap_suspected): Store as a row in
control_model_eventswithmodel = '*'andstate = 'gap_suspected', timestamps indetailJSONB. - JD14 (schema overview): Only create P1 tables in P1; annotate the design S3 schema overview with phase tags.
- JD16 (P1 source): P1 activity feed shows
source = NULL; per-consumer filtering lands in P4.
Minor (folded):
- V2 (drift test): The existing
ws-frames.test.tsonly checksKNOWN_FRAME_TYPESvsWsFrameSchemaalignment, not web strict union sync. Add a comment to the P1 task noting web union sync is manual. - V3 (blast radius, corrected by plan validation F1/F4):
upstreamModelhas exactly 1 production importer (stream-phase-adapter.ts:16), not ~5 and not 28/13. The other provider-module consumers importresolveModelProvider/resolveModelEndpoint/resolveRoute/getModelContextinstead. The additive-change constraint stands; the real P7 blast surface isresolveModelProvider's 6 direct callers propagating to ~10 downstream call sites. - V6 (local-gateway): local-gateway.ts omits
X-Boo-Source(doesn't include it) rather than actively stripping it. Same fix either way. - JD4 (proxy WS path): The control proxy WS path is static (
/api/control/ws), not parameterized like coder-proxy's per-session path.
New findings (folded):
- V12 (P7 caller audit detail): The prior plan says "audit all 5 callers" but doesn't specify what each caller needs. Added per-caller change specs:
getModelContext/invalidateModelContext(model-context.ts) must handle gatewaybaseUrl;resolveRoute(provider.ts) must return{route: 'gateway'};upstreamModel(provider.ts) must add gateway branch before swap fallback;resolveModelEndpoint(provider.ts) must handle gateway headers. - V13 (ECharts theme integration): The plan says "dark-theme tokens from active oklch palette" but doesn't specify how. Added: use
echarts.init(dom, themeObject)with a theme object built from the CSS custom properties (--background,--foreground,--muted,--accent) viagetComputedStyle. One theme-build helper, not per-chart. - V14 (action queue semantics): "unload-during-bench -> takeover confirmation" needs explicit HTTP semantics. Added: the action endpoint returns 409 with
{error: 'bench in progress', requiresConfirmation: true}; the client shows a confirmation dialog and re-submits with?confirm=true. - V15 (capture total budget default): The plan mentions "total budget prune" but gives no default. Added: 50MB default, configurable via
CAPTURE_BUDGET_MBenv var. - V16 (openevals reference verified):
/opt/forks/openevalsexists and containsjs/,python/,sandbox/directories. The sandbox pattern (Docker hardened containers) is confirmed available. - V17 (P7 gateway error shape):
InferenceRouteextension needs explicit error representation. Added:'gateway' | 'gateway_error'variants;gateway_errorcarries{reason: 'offline' | 'unhealthy'}. The 5 callers must handle both. - V18 (SSE connector event shape delta): The opencode-sse.ts pattern is for the opencode SDK's
Eventtype; BooControl consumes raw llama-swap SSE (/api/events) with a different envelope (modelStatus | logData | metrics | inflight). The reconnect/backoff/circuit-breaker pattern ports directly; the event parsing is new code, not a port. Noted in P1.4.
Junior developer new findings (folded):
- JD17 (schema index timing): Indexes should be created in the same P1 task as the tables they index, not as a separate phase. Consolidated into P1.3.
- JD18 (action queue depth cap message): When the queue is full (depth=4), the error message should include the current queue contents so the user knows what's pending. Added to P2.1 spec.
- JD19 (acquireHostAccess signature): The function signature must be
acquireHostAccess(providerId: string, purpose: string): Promise<{ok: boolean, reason?: string}>-- explicit in P1.14, called by P3.1. - JD20 (snapshot rebuild on restart): When the control service restarts, the in-memory fleet state is lost. The WS endpoint must rebuild from DB (control_model_events for latest state, control_requests for last-seen activity) before serving snapshots. Added to P1.6.
- JD21 (activity feed sort order): The live activity feed must sort by
ts DESC(newest first) with react-virtuoso'sfollowOutput="bottom"for live insertion. Added to P1.12. - JD22 (ECharts bundle impact): Per-chart
echarts/coreimports add ~15-25KB per chart type (gauge, line, scatter). With 3-4 charts in P1, the incremental bundle is ~60-100KB. Acceptable given the batteries-included tradeoff documented in design S9. Noted in P1.13. - JD23 (P7 provider.ts callers -- compile check): All 5 callers must compile unchanged for the new
InferenceRoutevariant. TheupstreamModelfunction's implicit else branch (line 192) currently always reachesgetSwapProvider-- the gateway variant must be handled before it. Added explicit check. - JD24 (deploy docs in P1.1): The systemd unit file and deploy docs must include the
BOOCONTROL_URLenv var (for apps/server's proxy) andDATABASE_URL(shared boochat DB). Added to P1.1 spec.
P0 -- prerequisite gate (separate batch: multi-llama-swap provider registry)
Gate: P0 must be committed and reviewed before P1 starts. BooControl keys every host-scoped row on LlamaProvider.id from packages/contracts/src/llama-providers.ts. The committed contract is the foundation.
- Finish remaining tasks in
openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md: favorites hide-not-delete UI/route tests; smoke test sam-desktop + embedding (+ DeepSeek config). - Sam reviews and commits the batch (currently working-tree only).
P1 -- read-only cockpit
Demo: Watch both hosts live (models, swaps, VRAM/temp, request feed) while chatting.
Scaffold + DB
-
P1.1 Scaffold
apps/control: new directory, Fastify +@fastify/websocket+postgres+zoddependencies, TS NodeNext,.env.example/.env.host, port 9503,/api/healthendpoint, systemd unitboocontrol.service. Deploy docs in root CLAUDE.md (includeBOOCONTROL_URLfor apps/server proxy,DATABASE_URLfor shared boochat DB). Pattern:apps/coder/src/index.tsfor Fastify bootstrap,apps/coder/src/db.tsforgetSql/applySchema/pingDb/closeDb. -
P1.2
apps/control/src/db.tswithapplySchema+waitForTablehelper.waitForTable(sql, tableName, timeoutMs)pollsinformation_schema.tables WHERE table_name = $1with exponential backoff (100ms base, 2s cap); throws on timeout so systemdRestart=on-failureretries. CallwaitForTable(sql, 'sessions', 30_000)beforeapplySchema(). Pattern:apps/coder/src/db.tsfor thegetSql/applySchema/pingDb/closeDbshape;waitForTableis new (no existing implementation). -
P1.3
apps/control/src/schema.sql-- P1 tables only (do NOT create bench_/eval_/route_policies/control_reports tables yet):control_hosts:provider_id TEXT PK(FK-by-convention toLlamaProvider.id),ssh_host TEXT,ssh_user TEXT,ssh_key_path TEXT,config_path TEXT,restart_cmd TEXT,os TEXT,gpu_label TEXT,enabled BOOLEAN DEFAULT true. Seed:INSERT INTO control_hosts (provider_id, os, gpu_label) VALUES ('sam-desktop', 'Windows', 'RTX 5090 32GB'), ('embedding', 'Linux', 'P104-100 8GB') ON CONFLICT DO NOTHING. SSH/config columns NULL until P9.control_requests:id BIGSERIAL PK,provider_id TEXT,swap_entry_id INT,ts TIMESTAMPTZ,model TEXT,req_path TEXT,status_code INT,duration_ms INT,cache_tokens INT,input_tokens INT,output_tokens INT,prompt_tps REAL,gen_tps REAL,has_capture BOOLEAN,capture JSONB.UNIQUE (provider_id, swap_entry_id, ts). NOsourcecolumn (P4 adds it). Index:CREATE INDEX IF NOT EXISTS idx_control_requests_provider_ts ON control_requests (provider_id, ts DESC).control_perf_samples:provider_id TEXT,ts TIMESTAMPTZ,gpu JSONB,sys JSONB.UNIQUE (provider_id, ts). Index:CREATE INDEX IF NOT EXISTS idx_control_perf_samples_provider_ts ON control_perf_samples (provider_id, ts DESC).control_perf_rollup_5m:provider_id TEXT,bucket TIMESTAMPTZ,gpu_agg JSONB,sys_agg JSONB.UNIQUE (provider_id, bucket).control_model_events:provider_id TEXT,model TEXT,state TEXT,ts TIMESTAMPTZ,detail JSONB.UNIQUE (provider_id, model, state, ts). Index:CREATE INDEX IF NOT EXISTS idx_control_model_events_provider_ts ON control_model_events (provider_id, ts DESC).- All use
clock_timestamp()for created_at; JSONB viasql.json(value as never).
Connectors + ingestion
-
P1.4 Fleet connector per enabled host: SSE client consuming
GET /api/eventswith exponential backoff (base 1s, max 30s) + jitter (random 0-50% of computed delay) + circuit-breaker (6 consecutive failures -> give-up). Port theopencode-sse.tsreconnectDecisionfunction (add jitter to the BooControl copy). Note: the reconnect/backoff/circuit-breaker pattern ports directly fromopencode-sse.ts; the event parsing is new code because llama-swap's SSE envelope (modelStatus | logData | metrics | inflight) differs from the opencode SDK'sEventtype. Explicitconnected | reconnecting | downliveness state machine +last_seen_atin-memory. On reconnect, reconcile viaGET /api/metrics(full ring) withINSERT ... ON CONFLICT DO NOTHING(never check-then-act). Gap detection: if oldest reconcile entry is newer than newest persisted entry for that provider, insertgap_suspectedmodel event withmodel='*'and timestamps indetailJSONB. -
P1.5 Perf poller:
GET /api/performance?after=<watermark>every 5s per host. Watermark recovered fromMAX(ts)per provider incontrol_perf_sampleson restart. NULL watermark (fresh install) -> omitafterparam, ingest returned window (UNIQUE constraint makes over-fetch harmless). -
P1.6 In-memory fleet state with per-host monotonic
seqcounter, incremented on every mutation. WS endpoint/api/ws/control: snapshot-on-join carrying current seqs + seq-stamped deltas. Client rule: buffer pre-snapshot deltas, replay after snapshot applying onlyseq > snapshot_seq. On service restart, rebuild fleet state from DB before serving snapshots: querycontrol_model_eventsfor latest model state per provider,control_requestsfor last activity,control_perf_samplesfor latest perf sample.
Retention (same P1 slice)
- P1.7 Retention job: daily in-process timer. Rollup as idempotent upsert (
INSERT INTO control_perf_rollup_5m ... ON CONFLICT (provider_id, bucket) DO UPDATErecomputed from raw). Delete raw only after covering buckets committed, in chunked transactions (one per provider per 1-hour window, never one mega-transaction). Activity prune > 90d. Capture size: 256KB per-row cap enforced in application code before INSERT (not a DB constraint); total budget prune with 50MB default, configurable viaCAPTURE_BUDGET_MBenv var. All windows configurable via.env.host.
Contracts (build FIRST)
-
P1.8 Add 5 frame types to
packages/contracts/src/ws-frames.ts:control_fleet-- full snapshot on join + seq-stamped state deltas (hosts, liveness, models, states, ttl, inflight)control_activity-- new request rows (live feed)control_perf-- appended samples per hostcontrol_log--{provider_id, source: proxy|upstream, line}batchescontrol_job-- bench/eval run progress events
Add to both
WsFrameSchemadiscriminated union ANDKNOWN_FRAME_TYPESarray. Rebuild package (pnpm -C packages/contracts build).Note: Control frames use a 2-location sync pattern (contracts + web strict union only). They skip the server's
InferenceFrameunion because they never flow through the server's broker. The web strict union is the wire-format gate; missing it silently drops frames at JSON parse.Drift test note: The existing
ws-frames.test.tschecksKNOWN_FRAME_TYPESvsWsFrameSchemaalignment. There is no automated check for web strict union sync -- that alignment is manual and verified by the implementer. Add a comment in the test noting this limitation.
Server proxy
- P1.9
apps/server/src/routes/control-proxy.ts:registerControlProxy(app, boocontrolOrigin)following the same structure asregisterCoderProxybut with a static WS path/api/control/ws(not parameterized per-session). HTTP all-catch at/api/control/*. Add keep-in-sync comment in bothcoder-proxy.tsandcontrol-proxy.ts.BOOCONTROL_URLenv var. Register inapps/server/src/index.ts.
Web UI
-
P1.10 Web:
/controlroute inApp.tsx, nav entry inProjectSidebar.tsx(under Memory cluster,Radioicon from lucide),pages/Control.tsxshell with Fleet + Activity tabs.useControlStreamas a second app-level WS singleton (own React context + connection guard, targets proxied/api/control/ws). Client discards deltas withseq <= snapshot_seq. Activity feed note: showssource = NULLin P1; per-consumer breakdown lands in P4. -
P1.11 Fleet tab: host cards as instrument clusters. State chips with color/glow (amber pulse
starting, green steadyready, rederror, greydownwith last-seen relative time). VRAM/temp/power readouts. TTL countdown rings. Dark mission-control aesthetic. Orbitron for numerals, Inter for prose. -
P1.12 Activity feed: react-virtuoso tail-follow viewer (already a dep) with
followOutput="bottom"for live insertion,ts DESCsort order. Filter chips for model and host. Pause-on-scroll toggle. -
P1.13 Charts: integrate ECharts (per-chart module imports via
echarts/core+ needed renderers). Dark theme: build a theme object from CSS custom properties (--background,--foreground,--muted,--accent) viagetComputedStyle(document.documentElement)and pass toecharts.init(dom, theme). OnebuildEChartsTheme()helper, not per-chart. Incremental bundle impact ~60-100KB for 3-4 chart types (gauge, line, scatter) -- acceptable per design S9 tradeoff.
Host-access seam
- P1.14 Create
apps/control/src/services/host-access.tswithacquireHostAccess(providerId: string, purpose: string): Promise<{ok: boolean, reason?: string}>. V1 body: no-op returning{ok: true}. This is the P8 seam -- P8 swaps the body for a DB lease without touching the bench engine. Export for P3.1 to import.
Tests
- P1.15 Tests: connector dedup/reconcile + gap detection as pure helpers (
turn-guard.tspattern); liveness state machine transitions; retention idempotency (re-run same window produces identical rollups); seq logic (buffer, discard stale, apply snapshot). DB testsdescribe.runIf(process.env.DATABASE_URL).
P2 -- hands on the controls
Demo: Unload from UI, watch the swap stream, open a capture.
-
P2.1 Per-host FIFO action queue in the control service. Actions: warm (1-token
POST /v1/chat/completionswith bare wire ID), unload one/all (POST /api/models/unload/:modelor/api/models/unload). Serialize through single FIFO queue perprovider_id. Unload-during-bench -> return 409 with{error: 'bench in progress', requiresConfirmation: true}; client shows confirmation dialog and re-submits with?confirm=true. Reject submissions while host isdown("host offline" toast). Cap depth (4) with reject-on-full; error response includes current queue contents so the user knows what's pending. Re-check liveness on dequeue + skip stale actions (design S5). Pattern:arena-runner.tsadvanceChainpromise-chain + read-fresh-state-or-skip. -
P2.2 Optimistic UI off
control_fleetframes only. No local emits after API calls (event-dedup discipline per CLAUDE.md). The API call triggers a server-side mutation that publishes acontrol_fleetdelta; the frontend updates from the WS frame, not from a local state change. -
P2.3 Logs tab: relay
/api/eventslogData ->control_logframe. In-memory 2k-line tail buffer per host for late joiners. React-virtuoso tail-follow viewer with per-source filter (proxy/upstream/model) + pause-on-scroll. -
P2.4 Inspector: activity table (virtuoso) -> capture drawer.
GET /api/captures/:idvia control service, decode base64, persist trimmed copy (256KB cap enforced in application code before INSERT), render with shiki-highlighted JSON. "Open in Playground" stub (links to P3). -
P2.5 Op task (manual, documented in design): enable
captureBuffer+ reviewmetricsMaxInMemoryon both hosts' llama-swap configs.
P3 -- playground + speed bench (manual, safe-by-construction)
Demo: TTFT-vs-concurrency curves for two quants, run by hand without disturbing a live chat.
-
P3.1 Playground tab: model select (grouped picker from provider registry), param controls, streaming chat, side-by-side A/B compare (two
ModelBubblecomponents in parallel, same prompt, different models). "Battle in Arena" handoff link (opens Arena dialog with pre-filled prompt + contestants via the existingArenaLauncherDialogpattern). -
P3.2 Bench engine: suite model (
data/YAML, grid of prompt_len x gen_len x concurrency x repetitions). Runner with TTFT capture (client-side first delta) + llama.cpptimingsparse (prompt_per_second,predicted_per_second,cache_nfrom final stream chunk). Bounded fan-out (Promise.allSettled, suite-declared concurrency only). Results as aggregates + raw samples tobench_suites/bench_runs/bench_samplestables. Add schema for these 3 tables in this task. -
P3.3 V1 safety: user-initiated runs only; takeover confirmation when target host shows recent traffic; embedding-host-first defaults;
concurrent_foreign_requestsrecorded per run from activity stream to flag polluted results. Unattended scheduling deliberately absent (P8). -
P3.4 Wire
acquireHostAccess(providerId, purpose)from P1.14 into the bench runner. The runner MUST gate every run through this function -- never inline the inflight check. P8 swaps its body. -
P3.5 Bench UI: run launcher, live progress via
control_jobframes, history charts (TTFT vs concurrency, tok/s over time via ECharts), baseline + regression flags (delta beyond -10% gen tok/s threshold).
P4 -- per-consumer attribution (X-Boo-Source, end-to-end)
Demo: Activity feed filtered to "arena" shows only Arena traffic; nothing reads NULL.
-
P4.1
apps/server: per-turn fetch-wrapper injection on AI-SDK streaming path. Threadsourcethrough the call site.getSwapProvidercache keyed bybaseURL+source(label set:boochat|boocoder|arena|control-bench|control-eval).upstreamModelsignature change must be additive (optionalsourceparam -- 1 production importer:stream-phase-adapter.ts:309; validated by plan-validation F1). Extend headers incompaction.tsandtask-model.tsdirect fetches. -
P4.2
apps/coder: forward inboundx-boo-sourceheader inlocal-gateway.ts(currently omitted from forwarded headers). Set it at Arena + dispatch fetch sites. -
P4.3 Migration:
ALTER TABLE control_requests ADD COLUMN source TEXT. Surface as Activity filter + per-source token aggregates in the UI. -
P4.4 Tests: header present on all three paths (server streaming, gateway-forwarded opencode, arena direct); rows attribute correctly in
control_requests.
P5 -- quality evals + sandbox
Demo: Fleet leaderboard with speed x quality scatter.
-
P5.1 Suite format (
data/YAML: chat rubric tasks, code tasks with tests); CRUD + versioning. Four suites in priority order: (1) agent coding tasks, (2) chat assistant quality, (3) long-context retrieval, (4) utility calls (titles/summaries). Add schema foreval_suites/eval_runs/eval_resultstables in this task. -
P5.2 Judge runner: temperature 0, pinned judge model+version, rubric scoring, rationale capture. Pairwise tie-breaks delegate to Arena (links/launches battles, not re-implements). Judge = strongest local model by default.
-
P5.3 Code sandbox runner: ephemeral Docker containers (
--network none, non-root, caps dropped, tmpfs workdir,--rm, kill-on-timeout,boocontrol-evallabel for orphan findability). Orphan prune at engine start (docker ps --filter label=boocontrol-eval). Bounded concurrency (default 4) +Promise.allSettled+ per-taskfinallycleanup. Pass@1 scoring. Patterns from/opt/forks/openevals(verified:sandbox/directory exists with Docker hardened container patterns). Harden:--security-opt=no-new-privileges,--cap-drop=ALL. -
P5.4 Leaderboard UI + speed x quality scatter per (provider_id, model, quant) using ECharts (reuse the
buildEChartsTheme()helper from P1.13).
P6 -- advisory routing + reports
Demo: Picker badges "best code model right now"; Monday-morning fleet report.
-
P6.1 Advisory scores API (eval results + live latency + host health) -> model-picker badges. Expose via
GET /api/control/routing/scores. -
P6.2 Reports: scheduled digest job (usage, trends, swap counts, leaderboard deltas, anomalies vs baselines) ->
control_reports. Same in-process timer pattern as retention (P1),schedule_meta = {interval, enabled, last_run_at}with catch-up on boot. Reports tab + markdown export. Addcontrol_reportsschema in this task.
P7 -- live auto:* gateway (committed)
Demo: An auto:code session in BooChat routes to the current best code model with failover.
-
P7.1 Control service: OpenAI-compatible virtual models (
auto,auto:code,auto:fast,auto:cheap) backed byroute_policiestable. Policy: rule match -> candidate ordering -> health/ctx-fit filter -> dispatch with failover. Gateway forwardsX-Boo-Sourceto target host. Addroute_policiesschema in this task. -
P7.2 Registry entry:
kind: "boocontrol-gateway"withbaseUrl: "http://100.114.205.53:9503". BooChat adopts with zero inference-path changes. -
P7.3
apps/server/src/services/inference/provider.ts-- the code change required for orphaned-session handling:- Extend
InferenceRoutefrom'swap' | 'deepseek'to'swap' | 'deepseek' | 'gateway' | 'gateway_error' gateway_errorcarries{reason: 'offline' | 'unhealthy'}for structured error reporting- Override the unknown-provider fallback (current behavior at line 147: composite id with unknown provider silently routes to
LLAMA_SWAP_URL). For gateway-kind ids that are missing/disabled, resolve toroute: 'gateway_error'withreason: 'offline', never the swap fallback. - Audit all 5 callers with explicit per-caller changes:
getModelContext(model-context.ts:85) -- must handle gatewaybaseUrl(query/upstream/<model>/propsagainst the control service, not the target host)invalidateModelContext(model-context.ts:160) -- must handle gateway variant (no-op; gateway doesn't cache model context)resolveRoute(provider.ts:175) -- must return{route: 'gateway'}for gateway-kind idsupstreamModel(provider.ts:184) -- must add gateway branch before the swap fallback at line 192; the implicit else currently always reachesgetSwapProviderresolveModelEndpoint(provider.ts:201) -- must handle gateway headers (forwardX-Boo-Source)
- Propagation note (plan-validation F2): these 5 direct call sites fan out to ~10 downstream production call sites (stream-phase-adapter, compaction, task-model, system-prompt, error-handler, tool-phase, chats, stream-phase); none need signature changes (gateway handling is internal to each function) but all need test coverage.
- Audit clarification (plan-validation F7):
system-prompt.ts:195callsresolveRoute(agent)with no config/modelId, so it always returns{route: 'swap'}and needs NO gateway handling. - All must compile unchanged for the new variant (additive, not breaking)
- The session keeps its id; the picker flags affected sessions.
- Extend
-
P7.4 Policy editor UI (route_policies CRUD) + per-policy dispatch log in the Reports tab.
P8 -- fleet coordination lease (cross-service batch, own design pass)
Outline only. The proper fix for the four-writer TOCTOU. P3 left a seam (acquireHostAccess in host-access.ts) that P8 swaps.
- P8.1 Design + ship
control_host_leases(holder, purpose, expires_at, heartbeat) and the honor-protocol in all four writers (BooChat, BooCoder, Arena, BooControl). Scope: separate proposal underopenspec/changes/. The BooControl bench scheduler consumes it through theacquireHostAccessseam left in P3. Unattended bench scheduling + reproducible concurrency sweeps unlock here.
P9 -- remote hands + optional
Outline only.
-
P9.1 SSH config editor: SFTP read -> schema-validated edit (config-schema.json from the fork) -> diff preview -> timestamped backup -> SFTP write -> restart (nssm/systemctl) -> health-wait. Key in
secrets/(gitignored). Tests for the failure paths. -
P9.2
llama-bench-over-SSH ingestion for device-level numbers. -
P9.3
boocontrol.indifferentketchup.comvhost (Caddy/Authelia rewrite ->/control). -
P9.4 Frontier providers as routing targets; slim
controlpane kind for in-workspace mini-cockpit.
Deferred (YAGNI)
Items removed from active scope with reopen triggers:
- Prometheus/Grafana integration -- BooControl persists its own samples;
/metricsendpoints stay available. Reopen when an external monitoring stack is actually deployed. - Multi-user/auth -- Authelia at the proxy layer. Reopen when multi-user is needed.
- Non-llama-swap engine connectors (vLLM, Ollama, infinity-emb) -- connector interface should not preclude them. Reopen when a second engine kind is actually added.
- Cross-process GPU arbitration -- four uncoordinated writers is accepted in v1. Reopen when the P8 lease proves insufficient.
- Log persistence to file -- logs are relay-only with in-memory tail. Reopen when log volume warrants durable storage.
- llama-bench over SSH (P9.2) -- device-level numbers. Reopen when SSH plumbing from P9.1 lands.
llama-swappeers federation -- flat list, coupled uptime, silent ID collisions. Reopen if the provider registry proves insufficient for host coordination.
Next step
Validate independently with boo-validating-changes boocontrol, then implement with boo-implementing-changes boocontrol. P0 gate first (commit the multi-provider batch), then P1.