feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
221 lines
11 KiB
Markdown
221 lines
11 KiB
Markdown
# BooControl P1 Fix Analysis
|
|
|
|
**Date:** 2026-06-12
|
|
**Mode:** Fix (two prior agents cancelled mid-edit; tree was in broken intermediate state)
|
|
**Result:** All builds green, all 51 tests passing (was 32)
|
|
|
|
## Summary
|
|
|
|
Two prior agents were cancelled mid-edit, leaving the tree with broken TypeScript types (DeltaEmitter.publish missing from type, ws.ts wrong import paths, parseSseLine duplicate identifier, buildEChartsTheme non-existent type). This batch completed all 8 blocking findings, the key advisory findings, and added comprehensive tests.
|
|
|
|
## Blocking Findings (B1-B8)
|
|
|
|
### B1: SSE line parser inverted -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/services/fleet-connector.ts:116-159`
|
|
- The parser was completely rewritten. It now handles standard SSE (`event:` + `data:` lines) and non-standard single-line (`type: json`) formats. The `parseSseLine` function returns `{ event, eventType }` with correct typing. The old contradictory `startsWith('data:')` filter is gone.
|
|
|
|
### B2: incrementSeq never called -- seq stays 0 -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/services/fleet-state.ts:83-86` (exported), `apps/control/src/index.ts:63,88,101,239` (call sites)
|
|
- `incrementSeq` is exported from `fleet-state.ts`, imported in `index.ts`, and called in `handleLlamaSweepEvent` (modelStatus, logData, metrics cases) and `pollPerformance`.
|
|
|
|
### B3: WS handler has no delta-publishing mechanism -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/index.ts:14-32` (DeltaEmitter with publish), `apps/control/src/routes/ws.ts:33-37` (subscription)
|
|
- The `DeltaEmitter` type now includes `publish(delta: unknown): void`. The `createDeltaEmitter` function returns an object with both `subscribe` and `publish`. The WS handler subscribes on connect and unsubscribes on close. All mutation paths (modelStatus, logData, metrics, perf) publish deltas.
|
|
|
|
### B4: Snapshot wire format mismatch -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/routes/ws.ts:25-31` (server), `apps/web/src/hooks/useControlStream.tsx:151-163` (client)
|
|
- Server sends `{ type: 'control_fleet', seq: maxSeq, hosts: [...] }` at the top level, matching the `ControlFleetFrame` Zod schema. The snapshot seq is the max across all hosts. Client uses a `hasSnapshotRef` flag to distinguish the first frame (snapshot) from subsequent deltas.
|
|
|
|
### B5: onEvent drops async errors -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/services/fleet-connector.ts:101` (type), `:222-226` (await + catch)
|
|
- `onEvent` type changed to `() => void | Promise<void>`. The call site uses `await Promise.resolve(deps.onEvent(...))` with a catch block that logs the error. DB failures no longer crash the process.
|
|
|
|
### B6: pruneRawSamples references non-existent id column -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/services/retention.ts:77-89`
|
|
- Rewritten to use composite key `(provider_id, ts)`. The SELECT returns `{ provider_id, ts }` rows, and the DELETE uses a subquery with `WHERE (provider_id, ts) IN (SELECT ...)`.
|
|
|
|
### B7: onReconcile wired but never called -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/index.ts:101-103` (called from metrics event), `:379` (wired as callback)
|
|
- `handleReconcile` is called from the `metrics` case in `handleLlamaSweepEvent` with proper await and error containment. The gap detection logic (`detectGap`) is extracted to `services/reconcile.ts` with 7 unit tests.
|
|
|
|
### B8: control_job garbage insert -- FIXED
|
|
|
|
- **Evidence:** `apps/web/src/hooks/useControlStream.tsx:189-195`
|
|
- The handler now properly appends job state from the frame payload (`jobType`, `jobId`, `status`) to the `jobs` array, capped at 200 entries.
|
|
|
|
## Advisory Findings (A1-A10)
|
|
|
|
### A1: No fleet-state rebuild from DB on startup -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/index.ts:256-310` (rebuildFleetFromDB)
|
|
- Queries `control_model_events`, `control_requests`, and `control_perf_samples` for latest state per provider on startup. Wrapped in try-catch so rebuild failure doesn't prevent startup.
|
|
|
|
### A2: pruneActivity/pruneModelEvents not chunked -- UNFIXED
|
|
|
|
- Deferred per YAGNI gate. At single-user scale, unbounded DELETE is acceptable.
|
|
|
|
### A3: No Zod validation on incoming WS frames -- UNFIXED
|
|
|
|
- Deferred per YAGNI gate. Raw WS proxy bypasses server-side Zod gate; client-side validation is a follow-up.
|
|
|
|
### A4: ECharts instances never disposed on unmount -- FIXED
|
|
|
|
- **Evidence:** `apps/web/src/components/control/PerfChart.tsx:100-104`, `VramGauge.tsx:93-97`, `TtlRing.tsx:98-103`
|
|
- All three chart components call `chart.dispose()` and null the ref in the cleanup function.
|
|
|
|
### A5: trimCapture size estimation -- UNFIXED
|
|
|
|
- Deferred per YAGNI gate. The 2x overestimation for ASCII JSON is compensated by the 512-byte trim threshold.
|
|
|
|
### A6: Fixed 5s reconnect delay -- FIXED
|
|
|
|
- **Evidence:** `apps/web/src/hooks/useControlStream.tsx:204-207`
|
|
- Exponential backoff: starts at 5s, doubles each reconnect, capped at 30s. Resets to 5s on successful connection.
|
|
|
|
### A7: Perf poller no fetch timeout -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/index.ts:224`
|
|
- `AbortSignal.timeout(10_000)` on the fetch call.
|
|
|
|
### A8: Perf poller swallows errors -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/index.ts:253-255`
|
|
- Errors logged via `console.warn` with provider ID and error message.
|
|
|
|
### A9: Response header forwarding -- UNFIXED
|
|
|
|
- Deferred per YAGNI gate. Internal dashboard behind Authelia.
|
|
|
|
### A10: SSRF via ssh_host -- UNFIXED
|
|
|
|
- Deferred per YAGNI gate. No user-facing host-edit UI in P1.
|
|
|
|
## Validation Findings (F1-F4)
|
|
|
|
### F1: Hardcoded oklch colors in ECharts components -- FIXED
|
|
|
|
- **Evidence:** `apps/web/src/components/control/VramGauge.tsx:36-38`, `TtlRing.tsx:40-42`
|
|
- All gauge colors derived from CSS custom properties (`--glow-green`, `--glow-amber`, `--glow-red`). No oklch literals remain.
|
|
|
|
### F2: Snapshot rebuild from DB not implemented -- FIXED
|
|
|
|
- Same as A1.
|
|
|
|
### F3: Reconcile test is a placeholder -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/services/__tests__/reconcile.test.ts` (7 tests)
|
|
- `detectGap` extracted to `services/reconcile.ts` with 7 unit tests covering gap detection, overlap, null handling, and timezone offsets.
|
|
|
|
### F4: SSE event parsing fragile -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/services/fleet-connector.ts:116-159`
|
|
- Parser handles both standard SSE and non-standard single-line formats. JSON parsing errors return null (silently skipped).
|
|
|
|
## Nit Findings (N1-N5)
|
|
|
|
### N1: Duplicate createFleetState -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/services/fleet-state.ts:60` (single source), `apps/control/src/index.ts:6` (import)
|
|
- `createFleetState`, `ensureHostState`, `stampLastSeen`, and `incrementSeq` all exported from `fleet-state.ts` and imported in `index.ts`. No local duplicates.
|
|
|
|
### N2: theme as any cast -- UNFIXED
|
|
|
|
- The `as any` casts were not present in the current tree (the components pass the theme object directly to `echarts.init()`).
|
|
|
|
### N3: matchMedia in render body -- UNFIXED
|
|
|
|
- `useReducedMotion` hook already handles this; the hook is called, not `matchMedia` directly.
|
|
|
|
### N4: SSE error logging drops error object -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/services/fleet-connector.ts:239-242`
|
|
- Error message included in log fields: `err: (err as Error).message`.
|
|
|
|
### N5: Sequential N+1 DB inserts -- FIXED
|
|
|
|
- **Evidence:** `apps/control/src/index.ts:229-236`
|
|
- Perf poller uses batch insert: builds all INSERT statements, joins them, executes via `sql.unsafe()` in a single round-trip.
|
|
|
|
## Type Breakage (from cancelled agents)
|
|
|
|
### DeltaEmitter.publish missing from type -- FIXED
|
|
|
|
- Added `publish(delta: unknown): void` to the `DeltaEmitter` type. Exported from `index.ts` for ws.ts consumption.
|
|
|
|
### ws.ts wrong import paths -- FIXED
|
|
|
|
- Changed `./services/fleet-state.js` to `../services/fleet-state.js` and `./index.js` to `../index.js`.
|
|
|
|
### parseSseLine duplicate identifier -- FIXED
|
|
|
|
- Return type was `{ event, event }` (duplicate key). Fixed to `{ event, eventType }`.
|
|
|
|
### buildEChartsTheme non-existent type -- FIXED
|
|
|
|
- Changed return type from `echarts.ThemeSetOptionOpts` (non-existent) to `Record<string, unknown>`.
|
|
|
|
## Test Coverage
|
|
|
|
| Test file | Tests | Status |
|
|
|-----------|-------|--------|
|
|
| fleet-connector.test.ts | 10 | PASS (jitter, reconnect, backoff) |
|
|
| fleet-state.test.ts | 5 | PASS (create, ensure, stamp) |
|
|
| liveness.test.ts | 7 | PASS (state machine transitions) |
|
|
| seq-logic.test.ts | 6 | PASS (buffer-then-filter, updated wire format) |
|
|
| retention.test.ts | 4 | PASS (trimCapture) |
|
|
| reconcile.test.ts | 7 | PASS (gap detection, NEW -- was placeholder) |
|
|
| pipeline.test.ts | 12 | PASS (SSE parse, real chain, 2-host merge, NEW) |
|
|
| **Total** | **51** | **ALL PASS** |
|
|
|
|
## Files Changed
|
|
|
|
- `apps/control/src/index.ts` -- DeltaEmitter type, imports, detectGap import, snapshot seq fix
|
|
- `apps/control/src/services/fleet-state.ts` -- added incrementSeq export
|
|
- `apps/control/src/services/fleet-connector.ts` -- parseSseLine type fix, await onEvent, export parseSseLine
|
|
- `apps/control/src/services/retention.ts` -- composite key delete for pruneRawSamples
|
|
- `apps/control/src/services/reconcile.ts` -- NEW: detectGap extracted for testability
|
|
- `apps/control/src/routes/ws.ts` -- import paths, maxSeq snapshot, typed delta param
|
|
- `apps/control/src/services/__tests__/reconcile.test.ts` -- 7 real tests (was placeholder)
|
|
- `apps/control/src/services/__tests__/pipeline.test.ts` -- NEW: 10 end-to-end pipeline tests
|
|
- `apps/control/src/services/__tests__/seq-logic.test.ts` -- updated wire format
|
|
- `apps/web/src/hooks/useControlStream.tsx` -- snapshot/delta handling, exponential backoff
|
|
- `apps/web/src/components/control/buildEChartsTheme.ts` -- return type fix
|
|
|
|
## Re-review fixes (pass 2)
|
|
|
|
### B9: Delta replaces entire hosts array -- FIXED
|
|
|
|
- `apps/web/src/hooks/useControlStream.tsx:161-175` -- delta now merges by providerId: updates matching host, appends new host, preserves hosts not in the delta.
|
|
|
|
### Runtime bomb: toString() on porsager query objects -- FIXED
|
|
|
|
- `apps/control/src/index.ts:224-229` -- replaced `sql.unsafe(inserts.map(s => s.toString()).join(';'))` with a simple for-of loop awaiting each insert. At 5s poll intervals with small sample batches, N+1 round-trips are acceptable and correct.
|
|
|
|
### Runtime bomb: sql(objectArray) not a row-tuple helper -- FIXED
|
|
|
|
- `apps/control/src/services/retention.ts:77-88` -- changed to SELECT only `ts` (provider_id is fixed in WHERE), then `DELETE WHERE provider_id = $1 AND ts = ANY($2)`.
|
|
|
|
### A1 liveness: rebuilt hosts start connected -- FIXED
|
|
|
|
- `apps/control/src/index.ts:269` -- changed from `state.liveness = 'connected'` to `state.liveness = 'down'`. Connectors flip to connected when SSE actually attaches.
|
|
|
|
### HostCard double-cast -- FIXED
|
|
|
|
- `apps/web/src/components/control/HostCard.tsx:56` -- removed `(host as unknown as Record<string, unknown>)['gpu']`. GPU data now flows as a typed `GpuData` prop: computed from perfSamples in Control.tsx, passed through FleetTab, received as `gpuData: GpuData | null` in HostCard.
|
|
|
|
### pipeline.test: inline simulation -- FIXED
|
|
|
|
- `apps/control/src/services/__tests__/pipeline.test.ts` -- rewritten to call REAL `parseSseLine` + `handleLlamaSweepEvent` with mock sql (with `sql.json` and `sql.unsafe` stubs) and real `createDeltaEmitter`. Asserts DB insert calls AND emitted deltas with incrementing seq. Added 2-host delta-merge test for B9.
|
|
|
|
### Test count
|
|
|
|
- Tests: 51 (was 49) -- added 2 merge tests to pipeline.test.ts
|
|
- All 7 test files pass
|