feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
11 KiB
11 KiB
BooControl P1 Fix Analysis
Date: 2026-06-12 Mode: Fix (two prior agents cancelled mid-edit; tree was in broken intermediate state) Result: All builds green, all 51 tests passing (was 32)
Summary
Two prior agents were cancelled mid-edit, leaving the tree with broken TypeScript types (DeltaEmitter.publish missing from type, ws.ts wrong import paths, parseSseLine duplicate identifier, buildEChartsTheme non-existent type). This batch completed all 8 blocking findings, the key advisory findings, and added comprehensive tests.
Blocking Findings (B1-B8)
B1: SSE line parser inverted -- FIXED
- Evidence:
apps/control/src/services/fleet-connector.ts:116-159 - The parser was completely rewritten. It now handles standard SSE (
event:+data:lines) and non-standard single-line (type: json) formats. TheparseSseLinefunction returns{ event, eventType }with correct typing. The old contradictorystartsWith('data:')filter is gone.
B2: incrementSeq never called -- seq stays 0 -- FIXED
- Evidence:
apps/control/src/services/fleet-state.ts:83-86(exported),apps/control/src/index.ts:63,88,101,239(call sites) incrementSeqis exported fromfleet-state.ts, imported inindex.ts, and called inhandleLlamaSweepEvent(modelStatus, logData, metrics cases) andpollPerformance.
B3: WS handler has no delta-publishing mechanism -- FIXED
- Evidence:
apps/control/src/index.ts:14-32(DeltaEmitter with publish),apps/control/src/routes/ws.ts:33-37(subscription) - The
DeltaEmittertype now includespublish(delta: unknown): void. ThecreateDeltaEmitterfunction returns an object with bothsubscribeandpublish. The WS handler subscribes on connect and unsubscribes on close. All mutation paths (modelStatus, logData, metrics, perf) publish deltas.
B4: Snapshot wire format mismatch -- FIXED
- Evidence:
apps/control/src/routes/ws.ts:25-31(server),apps/web/src/hooks/useControlStream.tsx:151-163(client) - Server sends
{ type: 'control_fleet', seq: maxSeq, hosts: [...] }at the top level, matching theControlFleetFrameZod schema. The snapshot seq is the max across all hosts. Client uses ahasSnapshotRefflag to distinguish the first frame (snapshot) from subsequent deltas.
B5: onEvent drops async errors -- FIXED
- Evidence:
apps/control/src/services/fleet-connector.ts:101(type),:222-226(await + catch) onEventtype changed to() => void | Promise<void>. The call site usesawait Promise.resolve(deps.onEvent(...))with a catch block that logs the error. DB failures no longer crash the process.
B6: pruneRawSamples references non-existent id column -- FIXED
- Evidence:
apps/control/src/services/retention.ts:77-89 - Rewritten to use composite key
(provider_id, ts). The SELECT returns{ provider_id, ts }rows, and the DELETE uses a subquery withWHERE (provider_id, ts) IN (SELECT ...).
B7: onReconcile wired but never called -- FIXED
- Evidence:
apps/control/src/index.ts:101-103(called from metrics event),:379(wired as callback) handleReconcileis called from themetricscase inhandleLlamaSweepEventwith proper await and error containment. The gap detection logic (detectGap) is extracted toservices/reconcile.tswith 7 unit tests.
B8: control_job garbage insert -- FIXED
- Evidence:
apps/web/src/hooks/useControlStream.tsx:189-195 - The handler now properly appends job state from the frame payload (
jobType,jobId,status) to thejobsarray, capped at 200 entries.
Advisory Findings (A1-A10)
A1: No fleet-state rebuild from DB on startup -- FIXED
- Evidence:
apps/control/src/index.ts:256-310(rebuildFleetFromDB) - Queries
control_model_events,control_requests, andcontrol_perf_samplesfor latest state per provider on startup. Wrapped in try-catch so rebuild failure doesn't prevent startup.
A2: pruneActivity/pruneModelEvents not chunked -- UNFIXED
- Deferred per YAGNI gate. At single-user scale, unbounded DELETE is acceptable.
A3: No Zod validation on incoming WS frames -- UNFIXED
- Deferred per YAGNI gate. Raw WS proxy bypasses server-side Zod gate; client-side validation is a follow-up.
A4: ECharts instances never disposed on unmount -- FIXED
- Evidence:
apps/web/src/components/control/PerfChart.tsx:100-104,VramGauge.tsx:93-97,TtlRing.tsx:98-103 - All three chart components call
chart.dispose()and null the ref in the cleanup function.
A5: trimCapture size estimation -- UNFIXED
- Deferred per YAGNI gate. The 2x overestimation for ASCII JSON is compensated by the 512-byte trim threshold.
A6: Fixed 5s reconnect delay -- FIXED
- Evidence:
apps/web/src/hooks/useControlStream.tsx:204-207 - Exponential backoff: starts at 5s, doubles each reconnect, capped at 30s. Resets to 5s on successful connection.
A7: Perf poller no fetch timeout -- FIXED
- Evidence:
apps/control/src/index.ts:224 AbortSignal.timeout(10_000)on the fetch call.
A8: Perf poller swallows errors -- FIXED
- Evidence:
apps/control/src/index.ts:253-255 - Errors logged via
console.warnwith provider ID and error message.
A9: Response header forwarding -- UNFIXED
- Deferred per YAGNI gate. Internal dashboard behind Authelia.
A10: SSRF via ssh_host -- UNFIXED
- Deferred per YAGNI gate. No user-facing host-edit UI in P1.
Validation Findings (F1-F4)
F1: Hardcoded oklch colors in ECharts components -- FIXED
- Evidence:
apps/web/src/components/control/VramGauge.tsx:36-38,TtlRing.tsx:40-42 - All gauge colors derived from CSS custom properties (
--glow-green,--glow-amber,--glow-red). No oklch literals remain.
F2: Snapshot rebuild from DB not implemented -- FIXED
- Same as A1.
F3: Reconcile test is a placeholder -- FIXED
- Evidence:
apps/control/src/services/__tests__/reconcile.test.ts(7 tests) detectGapextracted toservices/reconcile.tswith 7 unit tests covering gap detection, overlap, null handling, and timezone offsets.
F4: SSE event parsing fragile -- FIXED
- Evidence:
apps/control/src/services/fleet-connector.ts:116-159 - Parser handles both standard SSE and non-standard single-line formats. JSON parsing errors return null (silently skipped).
Nit Findings (N1-N5)
N1: Duplicate createFleetState -- FIXED
- Evidence:
apps/control/src/services/fleet-state.ts:60(single source),apps/control/src/index.ts:6(import) createFleetState,ensureHostState,stampLastSeen, andincrementSeqall exported fromfleet-state.tsand imported inindex.ts. No local duplicates.
N2: theme as any cast -- UNFIXED
- The
as anycasts were not present in the current tree (the components pass the theme object directly toecharts.init()).
N3: matchMedia in render body -- UNFIXED
useReducedMotionhook already handles this; the hook is called, notmatchMediadirectly.
N4: SSE error logging drops error object -- FIXED
- Evidence:
apps/control/src/services/fleet-connector.ts:239-242 - Error message included in log fields:
err: (err as Error).message.
N5: Sequential N+1 DB inserts -- FIXED
- Evidence:
apps/control/src/index.ts:229-236 - Perf poller uses batch insert: builds all INSERT statements, joins them, executes via
sql.unsafe()in a single round-trip.
Type Breakage (from cancelled agents)
DeltaEmitter.publish missing from type -- FIXED
- Added
publish(delta: unknown): voidto theDeltaEmittertype. Exported fromindex.tsfor ws.ts consumption.
ws.ts wrong import paths -- FIXED
- Changed
./services/fleet-state.jsto../services/fleet-state.jsand./index.jsto../index.js.
parseSseLine duplicate identifier -- FIXED
- Return type was
{ event, event }(duplicate key). Fixed to{ event, eventType }.
buildEChartsTheme non-existent type -- FIXED
- Changed return type from
echarts.ThemeSetOptionOpts(non-existent) toRecord<string, unknown>.
Test Coverage
| Test file | Tests | Status |
|---|---|---|
| fleet-connector.test.ts | 10 | PASS (jitter, reconnect, backoff) |
| fleet-state.test.ts | 5 | PASS (create, ensure, stamp) |
| liveness.test.ts | 7 | PASS (state machine transitions) |
| seq-logic.test.ts | 6 | PASS (buffer-then-filter, updated wire format) |
| retention.test.ts | 4 | PASS (trimCapture) |
| reconcile.test.ts | 7 | PASS (gap detection, NEW -- was placeholder) |
| pipeline.test.ts | 12 | PASS (SSE parse, real chain, 2-host merge, NEW) |
| Total | 51 | ALL PASS |
Files Changed
apps/control/src/index.ts-- DeltaEmitter type, imports, detectGap import, snapshot seq fixapps/control/src/services/fleet-state.ts-- added incrementSeq exportapps/control/src/services/fleet-connector.ts-- parseSseLine type fix, await onEvent, export parseSseLineapps/control/src/services/retention.ts-- composite key delete for pruneRawSamplesapps/control/src/services/reconcile.ts-- NEW: detectGap extracted for testabilityapps/control/src/routes/ws.ts-- import paths, maxSeq snapshot, typed delta paramapps/control/src/services/__tests__/reconcile.test.ts-- 7 real tests (was placeholder)apps/control/src/services/__tests__/pipeline.test.ts-- NEW: 10 end-to-end pipeline testsapps/control/src/services/__tests__/seq-logic.test.ts-- updated wire formatapps/web/src/hooks/useControlStream.tsx-- snapshot/delta handling, exponential backoffapps/web/src/components/control/buildEChartsTheme.ts-- return type fix
Re-review fixes (pass 2)
B9: Delta replaces entire hosts array -- FIXED
apps/web/src/hooks/useControlStream.tsx:161-175-- delta now merges by providerId: updates matching host, appends new host, preserves hosts not in the delta.
Runtime bomb: toString() on porsager query objects -- FIXED
apps/control/src/index.ts:224-229-- replacedsql.unsafe(inserts.map(s => s.toString()).join(';'))with a simple for-of loop awaiting each insert. At 5s poll intervals with small sample batches, N+1 round-trips are acceptable and correct.
Runtime bomb: sql(objectArray) not a row-tuple helper -- FIXED
apps/control/src/services/retention.ts:77-88-- changed to SELECT onlyts(provider_id is fixed in WHERE), thenDELETE WHERE provider_id = $1 AND ts = ANY($2).
A1 liveness: rebuilt hosts start connected -- FIXED
apps/control/src/index.ts:269-- changed fromstate.liveness = 'connected'tostate.liveness = 'down'. Connectors flip to connected when SSE actually attaches.
HostCard double-cast -- FIXED
apps/web/src/components/control/HostCard.tsx:56-- removed(host as unknown as Record<string, unknown>)['gpu']. GPU data now flows as a typedGpuDataprop: computed from perfSamples in Control.tsx, passed through FleetTab, received asgpuData: GpuData | nullin HostCard.
pipeline.test: inline simulation -- FIXED
apps/control/src/services/__tests__/pipeline.test.ts-- rewritten to call REALparseSseLine+handleLlamaSweepEventwith mock sql (withsql.jsonandsql.unsafestubs) and realcreateDeltaEmitter. Asserts DB insert calls AND emitted deltas with incrementing seq. Added 2-host delta-merge test for B9.
Test count
- Tests: 51 (was 49) -- added 2 merge tests to pipeline.test.ts
- All 7 test files pass