Files

indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).

2026-06-14 12:48:47 +00:00

13 KiB

Raw Blame History

BooControl — tasks

Status: READY (decisions resolved 2026-06-11). Gate: P0 must be committed and reviewed before P1 starts. Each phase is a vertical slice with a demo; the whole idea ships eventually — P1→P3 are the cockpit, P4→P7 are intelligence, P8→P9 are coordination + remote hands.

P0 — prerequisite gate (separate batch: multi-llama-swap provider registry)

Finish remaining tasks in openspec/changes/multi-llama-swap-providers-model-favorites/tasks.md: favorites hide-not-delete UI/route tests; smoke test sam-desktop + embedding (+ DeepSeek config); opencode duplicate-name routing smoke if in scope.
Sam reviews and commits the batch (currently working-tree only). BooControl keys on LlamaProvider.id — the committed contract is the foundation.

P1 — read-only cockpit

Demo: watch both hosts live (models, swaps, VRAM/temp, request feed) while chatting.

Scaffold apps/control: Fastify, TS NodeNext, .env.example/.env.host, port 9503, /api/health, systemd unit boocontrol.service, deploy docs in root CLAUDE.md.
db.ts with applySchema + startup ordering guard (waitForTable(sql, 'sessions') before DDL — design §3).
schema.sql: control_hosts seed (sam-desktop, embedding) ON CONFLICT DO NOTHING; control_requests (NO source column — that's P4) with UNIQUE (provider_id, swap_entry_id, ts); control_perf_samples with UNIQUE (provider_id, ts); control_perf_rollup_5m with UNIQUE (provider_id, bucket); control_model_events with UNIQUE (provider_id, model, state, ts).
Fleet connector per enabled host: SSE client w/ backoff+jitter+circuit-breaker (port the opencode-sse.ts pattern); explicit connected|reconnecting|down liveness state machine + last_seen_at; reconcile via /api/metrics on reconnect with INSERT ... ON CONFLICT DO NOTHING (never check-then-act); gap_suspected via the no-overlap heuristic (design §4).
Perf poller (5s, /api/performance?after=); watermark recovered from MAX(ts) on restart; NULL watermark (fresh install) → omit after, ingest returned window (design §4).
In-memory fleet state with per-host monotonic seq; WS endpoint /api/ws/control: snapshot-on-join carrying seqs + seq-stamped deltas.
Retention job in this slice (not a fast-follow): rollup as idempotent upsert + raw delete in chunked per-provider-per-hour transactions (design §6); activity prune; configurable windows.
Contracts: add control_fleet, control_activity, control_perf, control_log, control_job to WsFrameSchema + KNOWN_FRAME_TYPES; rebuild package; mirror in the web strict union; extend the contracts drift test to cover the five new frames. (Server loose union NOT needed — control frames bypass the broker via the raw proxy relay, so this is a 2-location sync; plan finding JD1.)
apps/server: registerControlProxy (/api/control/* HTTP + /api/control/ws WS relay; clone of routes/coder-proxy.ts with keep-in-sync comments in both files); BOOCONTROL_URL env.
Web: /control route (App.tsx), nav entry (ProjectSidebar.tsx), pages/Control.tsx shell with Fleet + Activity tabs; useControlStream as a second app-level WS singleton (own context + connection guard; client discards deltas ≤ snapshot seq); host cards (state chips incl. grey down+last-seen, VRAM/temp/power readouts, TTL countdowns); live activity feed (virtuoso).
Charts: integrate ECharts (per-chart module imports via echarts/core) for perf timelines; dark-theme tokens from active palette.
Tests: connector dedup/reconcile + seq logic as pure helpers (turn-guard.ts pattern); liveness state machine; retention idempotency (re-run same window → identical rollups); DB tests describe.runIf(DATABASE_URL).

P2 — hands on the controls

Demo: unload from UI, watch the swap stream, open a capture.

Per-host FIFO action queue in the control service; warm (1-token completion w/ bare wire ID) + unload one/all routed through it; unload-during-bench → takeover confirmation; reject submissions while host is down, cap depth (4), re-check liveness on dequeue + skip stale actions (design §5).
Optimistic UI off control_fleet frames only (no local emits, per event-dedup discipline).
Logs tab: relay /api/events logData → control_log; in-memory 2k-line tail for late joiners; virtuoso tail-follow viewer w/ source filters + pause-on-scroll.
Inspector: activity table → capture drawer (GET /api/captures/:id via control svc, trimmed persist, shiki JSON, headers); "Open in Playground" stub.
Op task (manual, documented in design): enable captureBuffer + review metricsMaxInMemory on both hosts' llama-swap configs.

P3 — playground + speed bench (manual, safe-by-construction)

Demo: TTFT-vs-concurrency curves for two quants, run by hand without disturbing a live chat.

Playground tab: model select (grouped picker from P0), param controls, streaming chat, side-by-side A/B; "Battle in Arena" handoff link.
Bench engine: suite model (grid + repetitions), runner w/ TTFT capture + timings parse; bounded fan-out (Promise.allSettled, suite-declared concurrency only); aggregates + raw samples to bench_* tables.
v1 safety: user-initiated runs only; takeover confirmation when target host shows recent traffic; embedding-host-first defaults; concurrent_foreign_requests recorded per run to flag polluted results. (Unattended scheduling deliberately absent — P8.)
The P8 seam: every run gates through acquireHostAccess(providerId, purpose) (v1 body = courtesy check + confirmation); never inline the inflight check in the runner (design §8).
Bench UI: run launcher, live progress via control_job, history charts (TTFT vs concurrency, tok/s over time), baseline + regression flags.

P4 — per-consumer attribution (X-Boo-Source, end-to-end)

Demo: Activity feed filtered to "arena" shows only Arena traffic; nothing reads NULL.

apps/server: per-turn fetch-wrapper injection on the AI-SDK streaming path (thread source through the call site; wrapper-aware getSwapProvider, cache keyed by baseURL+source). upstreamModel change must be additive (optional source param/options — its file has 28-file/13-route blast radius, design §7); extend headers in compaction.ts + task-model.ts direct fetches.
apps/coder: forward inbound x-boo-source in local-gateway.ts; set it at arena + dispatch fetch sites.
Migration: add source TEXT to control_requests; surface as Activity filter + per-source token aggregates.
Tests: header present on all three paths (server streaming, gateway-forwarded opencode, arena direct); rows attribute correctly.

P5 — quality evals + sandbox

Demo: fleet leaderboard with speed×quality scatter.

Suite format (data/ YAML: chat rubric tasks; code tasks with tests); CRUD + versioning.
Judge runner (temperature 0, pinned judge model+version, rubric scoring, rationale capture); pairwise tie-breaks delegate to Arena.
Code sandbox runner: ephemeral containers (--network none, non-root, mem/cpu/time caps, tmpfs, --rm, boocontrol-eval label); orphan prune at engine start; bounded concurrency (default 4) + Promise.allSettled + per-task finally cleanup; pass@1 scoring; borrow patterns from /opt/forks/openevals.
Leaderboard UI + speed×quality scatter per (provider_id, model, quant).

P6 — advisory routing + reports

Demo: picker badges "best code model right now"; Monday-morning fleet report.

Advisory scores API (evals + live latency + host health) → model-picker badges. services/routing-scores.ts (assignBadges pure helper, unit-tested), GET /api/control/routing/scores; ModelPicker.tsx fetches badges (non-fatal) and renders best-code/best-chat/best-fast chips. Verify: pnpm -C apps/control test (routing-scores 4), npx tsc -p apps/web/tsconfig.app.json --noEmit.
Reports: scheduled digest job (usage, trends, swap counts, leaderboard deltas, anomalies vs baselines) → control_reports; same in-process timer pattern as retention, schedule meta in control_schedule_meta table ({interval, enabled, last_run_at}) w/ catch-up on boot; Reports tab + markdown export (renderReportMarkdown/isReportDue pure, unit-tested). See design ## Implementation notes for the schedule-meta-table deviation. Verify: pnpm -C apps/control test (reports 7).

P7 — live `auto:*` gateway (committed)

Demo: an auto:code session in BooChat routes to the current best code model with failover.

OpenAI-compatible virtual models (auto, auto:code, auto:fast, auto:cheap) backed by route_policies: rule match → candidate ordering → health/ctx-fit filter → dispatch w/ failover; gateway forwards X-Boo-Source to the target host. routes/gateway.ts (/v1/models, /v1/chat/completions, /upstream/:model/props) + services/gateway.ts (orderCandidates pure, unit-tested). Reached server-to-server (registry baseUrl), not via the buffering /api/control proxy, so streaming survives. Verify: pnpm -C apps/control test (gateway 11) + live smoke.
Registry entry (kind: "boocontrol-gateway") so BooChat adopts with zero inference-path changes. Added to data/llama-providers.example.json; control service filters gateway-kind providers out of fleet connectors/pollers/retention (fleetProviders in index.ts) so it never SSE-connects to itself.
Orphaned-session handling — provider.ts code change (design §8): InferenceRoute extended to 'swap' | 'deepseek' | 'gateway' | 'gateway_error' (gateway_error carries gatewayReason); known gateway-kind id → 'gateway'; orphaned auto:* id (provider missing) → 'gateway_error' reason offline, NEVER the swap fallback. All callers audited: upstreamModel/resolveModelEndpoint add gateway branch + throw on gateway_error; getModelContext proxies gateway props / null on gateway_error; resolveRoute returns the new variant (system-prompt.ts ObservedInputs.route widened to InferenceRoute); invalidateModelContext unchanged (composite-key path covers it). Picker flags orphaned sessions (isOrphanedGatewayValue banner in ModelPicker.tsx). Verify: pnpm -C apps/server test (provider gateway tests), pnpm -C apps/server build.
Policy editor UI (route_policies CRUD) + per-policy dispatch log. routes/policies.ts (CRUD + /dispatch-log); ReportsTab.tsx Policies + Dispatch Log sub-views. Verify: npx tsc -p apps/web/tsconfig.app.json --noEmit.

P8 — fleet coordination lease (cross-service batch, own design pass)

Demo: a scheduled overnight bench runs unattended without ever evicting a live model.

Outlined, see openspec/changes/fleet-coordination-lease/ (proposal + tasks, OUTLINE status). Design + ship control_host_leases (holder, purpose, expires_at, heartbeat) and the honor-protocol in all four writers (BooChat, BooCoder, Arena, BooControl); BooControl consumes it through the acquireHostAccess seam left in P3. NOT implemented here — outline only per the program decision.
Outlined, see openspec/changes/fleet-coordination-lease/ (tasks L4). Unattended bench scheduling + reproducible concurrency sweeps unlock behind the lease.

P9 — remote hands + optional

SSH config editor: SSH read → schema-validated edit (config-schema.json from the fork, bundled at apps/control/data/config-schema.json, ajv-validated) → diff preview → timestamped backup → write → restart → health-wait. services/ssh-config.ts (pure validateLlamaConfig/computeDiff/backupFilename + injectable-exec applyRemoteConfig pipeline) + routes/ssh-config.ts (GET/PATCH /api/hosts, /config, /config/validate, /config/diff, /config/apply) + HostConfigEditor.tsx (gear button on each Fleet card). SSH via shelled ssh (booterm precedent, key from control_hosts.ssh_key_path → secrets/, gitignored) instead of an ssh2 dependency. Failure-path tests for every pipeline step (ssh-config.test.ts, 15 tests). NOTE deviation: SFTP replaced by ssh cat/cat > (no ssh2 dep); recorded in design ## Implementation notes. Verify: pnpm -C apps/control test (ssh-config 15). Not live-smoked — no reachable Windows SSH target in this session (the "Windows SSH fiddliness" risk); the failure-path test suite stands in.
DEFERRED — llama-bench-over-SSH ingestion for device-level numbers. Reason: depends on the SSH plumbing from P9.1 landing + a live host to run llama-bench on; it is also explicitly YAGNI-deferred in the implementation-plan ("Reopen when SSH plumbing from P9.1 lands"). The P9.1 exec seam (SshExec) is the hook a follow-up reuses.
DEFERRED — boocontrol.indifferentketchup.com vhost (Caddy/Authelia rewrite → /control). Reason: pure reverse-proxy/ops config (Caddyfile + Authelia rules) on the homelab host, no repo code; /control already works behind the existing boocode origin via the registerControlProxy relay. Out of scope for a code batch.
DEFERRED — Frontier providers as routing targets; slim control pane kind for in-workspace mini-cockpit. Reason: two sizeable independent features (frontier-provider routing belongs with the registry/provider work; a new workspace pane kind is its own UI batch). Marked optional in the implementation-plan Deferred section; out of reach for an additive P6–P9 pass without dedicated design.

13 KiB Raw Blame History Unescape Escape