Files
boocode/openspec/changes/boocontrol/artifacts/p1-fix-analysis.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

11 KiB

BooControl P1 Fix Analysis

Date: 2026-06-12 Mode: Fix (two prior agents cancelled mid-edit; tree was in broken intermediate state) Result: All builds green, all 51 tests passing (was 32)

Summary

Two prior agents were cancelled mid-edit, leaving the tree with broken TypeScript types (DeltaEmitter.publish missing from type, ws.ts wrong import paths, parseSseLine duplicate identifier, buildEChartsTheme non-existent type). This batch completed all 8 blocking findings, the key advisory findings, and added comprehensive tests.

Blocking Findings (B1-B8)

B1: SSE line parser inverted -- FIXED

  • Evidence: apps/control/src/services/fleet-connector.ts:116-159
  • The parser was completely rewritten. It now handles standard SSE (event: + data: lines) and non-standard single-line (type: json) formats. The parseSseLine function returns { event, eventType } with correct typing. The old contradictory startsWith('data:') filter is gone.

B2: incrementSeq never called -- seq stays 0 -- FIXED

  • Evidence: apps/control/src/services/fleet-state.ts:83-86 (exported), apps/control/src/index.ts:63,88,101,239 (call sites)
  • incrementSeq is exported from fleet-state.ts, imported in index.ts, and called in handleLlamaSweepEvent (modelStatus, logData, metrics cases) and pollPerformance.

B3: WS handler has no delta-publishing mechanism -- FIXED

  • Evidence: apps/control/src/index.ts:14-32 (DeltaEmitter with publish), apps/control/src/routes/ws.ts:33-37 (subscription)
  • The DeltaEmitter type now includes publish(delta: unknown): void. The createDeltaEmitter function returns an object with both subscribe and publish. The WS handler subscribes on connect and unsubscribes on close. All mutation paths (modelStatus, logData, metrics, perf) publish deltas.

B4: Snapshot wire format mismatch -- FIXED

  • Evidence: apps/control/src/routes/ws.ts:25-31 (server), apps/web/src/hooks/useControlStream.tsx:151-163 (client)
  • Server sends { type: 'control_fleet', seq: maxSeq, hosts: [...] } at the top level, matching the ControlFleetFrame Zod schema. The snapshot seq is the max across all hosts. Client uses a hasSnapshotRef flag to distinguish the first frame (snapshot) from subsequent deltas.

B5: onEvent drops async errors -- FIXED

  • Evidence: apps/control/src/services/fleet-connector.ts:101 (type), :222-226 (await + catch)
  • onEvent type changed to () => void | Promise<void>. The call site uses await Promise.resolve(deps.onEvent(...)) with a catch block that logs the error. DB failures no longer crash the process.

B6: pruneRawSamples references non-existent id column -- FIXED

  • Evidence: apps/control/src/services/retention.ts:77-89
  • Rewritten to use composite key (provider_id, ts). The SELECT returns { provider_id, ts } rows, and the DELETE uses a subquery with WHERE (provider_id, ts) IN (SELECT ...).

B7: onReconcile wired but never called -- FIXED

  • Evidence: apps/control/src/index.ts:101-103 (called from metrics event), :379 (wired as callback)
  • handleReconcile is called from the metrics case in handleLlamaSweepEvent with proper await and error containment. The gap detection logic (detectGap) is extracted to services/reconcile.ts with 7 unit tests.

B8: control_job garbage insert -- FIXED

  • Evidence: apps/web/src/hooks/useControlStream.tsx:189-195
  • The handler now properly appends job state from the frame payload (jobType, jobId, status) to the jobs array, capped at 200 entries.

Advisory Findings (A1-A10)

A1: No fleet-state rebuild from DB on startup -- FIXED

  • Evidence: apps/control/src/index.ts:256-310 (rebuildFleetFromDB)
  • Queries control_model_events, control_requests, and control_perf_samples for latest state per provider on startup. Wrapped in try-catch so rebuild failure doesn't prevent startup.

A2: pruneActivity/pruneModelEvents not chunked -- UNFIXED

  • Deferred per YAGNI gate. At single-user scale, unbounded DELETE is acceptable.

A3: No Zod validation on incoming WS frames -- UNFIXED

  • Deferred per YAGNI gate. Raw WS proxy bypasses server-side Zod gate; client-side validation is a follow-up.

A4: ECharts instances never disposed on unmount -- FIXED

  • Evidence: apps/web/src/components/control/PerfChart.tsx:100-104, VramGauge.tsx:93-97, TtlRing.tsx:98-103
  • All three chart components call chart.dispose() and null the ref in the cleanup function.

A5: trimCapture size estimation -- UNFIXED

  • Deferred per YAGNI gate. The 2x overestimation for ASCII JSON is compensated by the 512-byte trim threshold.

A6: Fixed 5s reconnect delay -- FIXED

  • Evidence: apps/web/src/hooks/useControlStream.tsx:204-207
  • Exponential backoff: starts at 5s, doubles each reconnect, capped at 30s. Resets to 5s on successful connection.

A7: Perf poller no fetch timeout -- FIXED

  • Evidence: apps/control/src/index.ts:224
  • AbortSignal.timeout(10_000) on the fetch call.

A8: Perf poller swallows errors -- FIXED

  • Evidence: apps/control/src/index.ts:253-255
  • Errors logged via console.warn with provider ID and error message.

A9: Response header forwarding -- UNFIXED

  • Deferred per YAGNI gate. Internal dashboard behind Authelia.

A10: SSRF via ssh_host -- UNFIXED

  • Deferred per YAGNI gate. No user-facing host-edit UI in P1.

Validation Findings (F1-F4)

F1: Hardcoded oklch colors in ECharts components -- FIXED

  • Evidence: apps/web/src/components/control/VramGauge.tsx:36-38, TtlRing.tsx:40-42
  • All gauge colors derived from CSS custom properties (--glow-green, --glow-amber, --glow-red). No oklch literals remain.

F2: Snapshot rebuild from DB not implemented -- FIXED

  • Same as A1.

F3: Reconcile test is a placeholder -- FIXED

  • Evidence: apps/control/src/services/__tests__/reconcile.test.ts (7 tests)
  • detectGap extracted to services/reconcile.ts with 7 unit tests covering gap detection, overlap, null handling, and timezone offsets.

F4: SSE event parsing fragile -- FIXED

  • Evidence: apps/control/src/services/fleet-connector.ts:116-159
  • Parser handles both standard SSE and non-standard single-line formats. JSON parsing errors return null (silently skipped).

Nit Findings (N1-N5)

N1: Duplicate createFleetState -- FIXED

  • Evidence: apps/control/src/services/fleet-state.ts:60 (single source), apps/control/src/index.ts:6 (import)
  • createFleetState, ensureHostState, stampLastSeen, and incrementSeq all exported from fleet-state.ts and imported in index.ts. No local duplicates.

N2: theme as any cast -- UNFIXED

  • The as any casts were not present in the current tree (the components pass the theme object directly to echarts.init()).

N3: matchMedia in render body -- UNFIXED

  • useReducedMotion hook already handles this; the hook is called, not matchMedia directly.

N4: SSE error logging drops error object -- FIXED

  • Evidence: apps/control/src/services/fleet-connector.ts:239-242
  • Error message included in log fields: err: (err as Error).message.

N5: Sequential N+1 DB inserts -- FIXED

  • Evidence: apps/control/src/index.ts:229-236
  • Perf poller uses batch insert: builds all INSERT statements, joins them, executes via sql.unsafe() in a single round-trip.

Type Breakage (from cancelled agents)

DeltaEmitter.publish missing from type -- FIXED

  • Added publish(delta: unknown): void to the DeltaEmitter type. Exported from index.ts for ws.ts consumption.

ws.ts wrong import paths -- FIXED

  • Changed ./services/fleet-state.js to ../services/fleet-state.js and ./index.js to ../index.js.

parseSseLine duplicate identifier -- FIXED

  • Return type was { event, event } (duplicate key). Fixed to { event, eventType }.

buildEChartsTheme non-existent type -- FIXED

  • Changed return type from echarts.ThemeSetOptionOpts (non-existent) to Record<string, unknown>.

Test Coverage

Test file Tests Status
fleet-connector.test.ts 10 PASS (jitter, reconnect, backoff)
fleet-state.test.ts 5 PASS (create, ensure, stamp)
liveness.test.ts 7 PASS (state machine transitions)
seq-logic.test.ts 6 PASS (buffer-then-filter, updated wire format)
retention.test.ts 4 PASS (trimCapture)
reconcile.test.ts 7 PASS (gap detection, NEW -- was placeholder)
pipeline.test.ts 12 PASS (SSE parse, real chain, 2-host merge, NEW)
Total 51 ALL PASS

Files Changed

  • apps/control/src/index.ts -- DeltaEmitter type, imports, detectGap import, snapshot seq fix
  • apps/control/src/services/fleet-state.ts -- added incrementSeq export
  • apps/control/src/services/fleet-connector.ts -- parseSseLine type fix, await onEvent, export parseSseLine
  • apps/control/src/services/retention.ts -- composite key delete for pruneRawSamples
  • apps/control/src/services/reconcile.ts -- NEW: detectGap extracted for testability
  • apps/control/src/routes/ws.ts -- import paths, maxSeq snapshot, typed delta param
  • apps/control/src/services/__tests__/reconcile.test.ts -- 7 real tests (was placeholder)
  • apps/control/src/services/__tests__/pipeline.test.ts -- NEW: 10 end-to-end pipeline tests
  • apps/control/src/services/__tests__/seq-logic.test.ts -- updated wire format
  • apps/web/src/hooks/useControlStream.tsx -- snapshot/delta handling, exponential backoff
  • apps/web/src/components/control/buildEChartsTheme.ts -- return type fix

Re-review fixes (pass 2)

B9: Delta replaces entire hosts array -- FIXED

  • apps/web/src/hooks/useControlStream.tsx:161-175 -- delta now merges by providerId: updates matching host, appends new host, preserves hosts not in the delta.

Runtime bomb: toString() on porsager query objects -- FIXED

  • apps/control/src/index.ts:224-229 -- replaced sql.unsafe(inserts.map(s => s.toString()).join(';')) with a simple for-of loop awaiting each insert. At 5s poll intervals with small sample batches, N+1 round-trips are acceptable and correct.

Runtime bomb: sql(objectArray) not a row-tuple helper -- FIXED

  • apps/control/src/services/retention.ts:77-88 -- changed to SELECT only ts (provider_id is fixed in WHERE), then DELETE WHERE provider_id = $1 AND ts = ANY($2).

A1 liveness: rebuilt hosts start connected -- FIXED

  • apps/control/src/index.ts:269 -- changed from state.liveness = 'connected' to state.liveness = 'down'. Connectors flip to connected when SSE actually attaches.

HostCard double-cast -- FIXED

  • apps/web/src/components/control/HostCard.tsx:56 -- removed (host as unknown as Record<string, unknown>)['gpu']. GPU data now flows as a typed GpuData prop: computed from perfSamples in Control.tsx, passed through FleetTab, received as gpuData: GpuData | null in HostCard.

pipeline.test: inline simulation -- FIXED

  • apps/control/src/services/__tests__/pipeline.test.ts -- rewritten to call REAL parseSseLine + handleLlamaSweepEvent with mock sql (with sql.json and sql.unsafe stubs) and real createDeltaEmitter. Asserts DB insert calls AND emitted deltas with incrementing seq. Added 2-host delta-merge test for B9.

Test count

  • Tests: 51 (was 49) -- added 2 merge tests to pipeline.test.ts
  • All 7 test files pass