# Security Analysis: BooControl P1 Implementation

## Scope

**Files analyzed:**
- `apps/control/src/index.ts` — Fastify host service, SSE event handlers, perf poller, retention job
- `apps/control/src/db.ts` — Database connection, schema application
- `apps/control/src/config.ts` — Zod-validated env config
- `apps/control/src/schema.sql` — DDL for fleet cockpit tables
- `apps/control/src/routes/ws.ts` — WebSocket endpoint serving fleet state
- `apps/control/src/services/fleet-connector.ts` — SSE client with reconnect logic
- `apps/control/src/services/fleet-state.ts` — In-memory fleet state types
- `apps/control/src/services/retention.ts` — Rollup and prune jobs
- `apps/control/src/services/host-access.ts` — No-op host access seam (P8 placeholder)
- `apps/server/src/routes/control-proxy.ts` — HTTP/WS reverse proxy in BooChat
- `apps/server/src/routes/coder-proxy.ts` — Reference proxy (keep-in-sync comparison)
- `apps/server/src/index.ts` — Server bootstrap (proxy registration)
- `apps/web/src/hooks/useControlStream.tsx` — Client WS hook + React context
- `apps/web/src/components/control/` — All UI components (FleetTab, ActivityTab, HostCard, PerfChart, TtlRing, VramGauge, buildEChartsTheme)

**Dependency manifests reviewed:**
- `apps/control/package.json`, `apps/server/package.json`, root `package.json`, `docker-compose.yml`

**No branch specified; analysis performed on working tree.**

## Summary

The BooControl P1 implementation has a critical functional bug in the SSE fleet connector that renders the entire fleet monitoring system non-functional — no events are ever parsed or persisted. Beyond that, the SQL injection concern raised about `::jsonb` patterns is a false positive (the `postgres` tagged-template library parameterizes these correctly), but real vulnerabilities exist in SSRF via unvalidated `ssh_host`, missing WebSocket origin validation on the proxy, unbounded in-memory state growth, and response header forwarding without filtering.

| Severity | Count |
|----------|-------|
| Critical | 1     |
| High     | 2     |
| Medium   | 3     |

Full analysis written to: /home/samkintop/opt/boocode/security-analysis.md

## Findings

### OWASP A01 - Broken Access Control

**SEC-001: WebSocket Proxy Has No Origin Validation**
- **OWASP:** A01 - Broken Access Control
- **Location:** `apps/server/src/routes/control-proxy.ts:20`
- **Evidence:**
  ```typescript
  app.get('/api/control/ws', { websocket: true }, (clientSocket, _req) => {
    const target = boocontrolWsUrl(boocontrolOrigin, '/api/ws/control');
    const upstream = new WebSocket(target);
  ```
- **EXPLOIT:** A malicious page on any domain can open a WebSocket to `/api/control/ws` on the BooChat origin. WebSocket connections are not subject to same-origin policy. If the user has an active Authelia session, the browser will include session cookies automatically. The malicious page receives fleet state snapshots (host IPs, model names, liveness status, GPU metrics) and can relay messages between client and upstream. The `@fastify/websocket` plugin supports an `origin` option that is not used. The sibling `coder-proxy.ts` has the same pattern.
- **Severity:** High

**Fix sketch:**
```typescript
// In control-proxy.ts, add origin validation:
app.get('/api/control/ws', { websocket: true }, (clientSocket, req) => {
  const origin = req.headers.origin;
  const allowed = process.env.ALLOWED_ORIGIN ?? 'https://code.indifferentketchup.com';
  if (origin && origin !== allowed) {
    clientSocket.close(1008, 'origin not allowed');
    return;
  }
  // ... rest of proxy
});
```

---

**SEC-002: Control Service Has No Application-Layer Authentication**
- **OWASP:** A01 - Broken Access Control
- **Location:** `apps/control/src/index.ts:197-306` (entire server); `apps/control/src/config.ts:6` (`HOST` defaults to `100.114.205.53`)
- **Evidence:**
  ```typescript
  // No auth middleware registered. No CORS. No origin checks.
  app.get('/api/health', async (_req: unknown, reply) => { ... });
  registerControlWebSocket(app, () => fleet);
  await app.listen({ port: config.PORT, host: config.HOST });
  ```
- **EXPLOIT:** The control service binds to the Tailscale IP with zero authentication. Any device on the Tailscale network can connect to port 9503 and read fleet state, GPU metrics, and model activity. If the Tailscale network is shared or a device is compromised, the attacker gains full read access to the inference fleet dashboard. The service has no CORS headers, no auth middleware, and no request validation. This is architecturally acceptable *only* if the Tailscale network is single-user and the service is never exposed beyond it — a fragile assumption.
- **Severity:** Medium

**Fix sketch:** Add a shared-secret bearer token or mTLS between BooChat proxy and BooControl, so the control service rejects requests not originating from the authenticated proxy.

---

### OWASP A02 - Cryptographic Failures

> **A02 - Cryptographic Failures:** No proven vulnerability found. Checked: No secrets, API keys, or credentials hardcoded in the control service code. `DATABASE_URL` comes from env vars validated by Zod. `config.ts:7` reads `DATABASE_URL` from `process.env`. No sensitive data appears in log output (the `onnotice: () => {}` in `db.ts:20` suppresses PostgreSQL notices).

---

### OWASP A03 - Injection

**SEC-003: JSONB Interpolation Uses String Interpolation Instead of sql.json()**
- **OWASP:** A03 - Injection
- **Location:** `apps/control/src/index.ts:68`, `apps/control/src/index.ts:83`, `apps/control/src/index.ts:137`, `apps/control/src/index.ts:184`
- **Evidence:**
  ```typescript
  // Line 68: JSON.stringify → tagged template parameter (SAFE but fragile)
  ${JSON.stringify({ ttl })}::jsonb

  // Line 83: captureTrimmed quoted inside tagged template (SAFE but looks dangerous)
  ${captureTrimmed ? sql`'${captureTrimmed}'::jsonb` : sql`NULL::jsonb`}

  // Line 137: JSON.stringify → tagged template parameter
  ${JSON.stringify({ oldestReconcile: oldestReconcileTs, newestPersisted: newestRow.ts })}::jsonb

  // Line 184: JSON.stringify → tagged template parameter
  ${JSON.stringify(sample.gpu)}::jsonb, ${JSON.stringify(sample.sys)}::jsonb
  ```
- **EXPLOIT (false positive confirmed):** After tracing the `postgres` library's tagged-template behavior, all four patterns are safe. The `postgres` library parameterizes each `${}` interpolation — `JSON.stringify(...)` produces a JSON string that becomes a bound parameter, and `::jsonb` is raw SQL text appended after the parameter placeholder. Line 83's nested `` sql`'${captureTrimmed}'::jsonb` `` is also safe: the inner tagged template parameterizes `captureTrimmed` and the outer template includes the resulting SQL fragment. **However**, the line 83 pattern is a maintenance hazard — it *looks like* SQL injection and would trip every static analysis tool. The project's own `CLAUDE.md` documents this exact anti-pattern: "use `sql.json(value as never)` — NOT `${JSON.stringify(value)}::jsonb` which double-serializes."
- **Severity:** Medium (maintenance/correctness, not exploitable injection)

**Fix sketch (all four locations):**
```typescript
// Before (fragile, trips linters):
${JSON.stringify({ ttl })}::jsonb
${captureTrimmed ? sql`'${captureTrimmed}'::jsonb` : sql`NULL::jsonb`}

// After (idiomatic, defense-in-depth):
${sql.json({ ttl })}
${captureTrimmed ? sql.json(JSON.parse(captureTrimmed)) : sql`NULL::jsonb`}
// Or for raw JSON strings:
${sql.json(JSON.parse(captureTrimmed))}
```

---

### OWASP A04 - Insecure Design

**SEC-004: Unbounded In-Memory Fleet State — Maps Never Pruned**
- **OWASP:** A04 - Insecure Design
- **Location:** `apps/control/src/index.ts:14-31` (`createFleetState`, `ensureHostState`); `apps/control/src/services/fleet-state.ts:7-17`
- **Evidence:**
  ```typescript
  function createFleetState(): FleetState {
    return { hosts: new Map() };
  }

  function ensureHostState(fleet: FleetState, providerId: string): HostState {
    let state = fleet.hosts.get(providerId);
    if (!state) {
      state = { providerId, liveness: 'down', lastSeenAt: null, seq: 0, models: new Map() };
      fleet.hosts.set(providerId, state);
    }
    return state;
  }
  ```
  The `fleet.hosts` Map and each host's `models` Map grow without bound. `handleLlamaSweepEvent` (line 58) sets models: `state.models.set(model, ...)`. There is no eviction, TTL, or size cap. The database retention job (`retention.ts`) prunes old *database* rows but never touches in-memory state. Over time, if models are renamed or rotated, the Maps accumulate stale entries.
- **EXPLOIT:** An attacker who can send crafted SSE events (or a misconfigured llama-swap instance) with rapidly changing model names can grow the `models` Map without bound, eventually exhausting the control service's heap. With the default 5-second poll interval and no cap, this is a slow but real memory leak under normal operation.
- **Severity:** Medium

**Fix sketch:**
```typescript
// Cap the models map per host (e.g., 100 entries, LRU eviction):
const MAX_MODELS_PER_HOST = 100;
if (state.models.size >= MAX_MODELS_PER_HOST && !state.models.has(model)) {
  const oldest = state.models.keys().next().value;
  if (oldest) state.models.delete(oldest);
}
state.models.set(model, { ... });
```

---

### OWASP A05 - Security Misconfiguration

**SEC-005: HTTP Proxy Forwards All Upstream Response Headers Without Filtering**
- **OWASP:** A05 - Security Misconfiguration
- **Location:** `apps/server/src/routes/control-proxy.ts:78-81`
- **Evidence:**
  ```typescript
  reply.code(res.status);
  for (const [key, value] of res.headers) {
    if (key === 'transfer-encoding') continue;
    reply.header(key, value);
  }
  ```
  Every response header from the BooControl upstream (except `transfer-encoding`) is forwarded verbatim to the client. If the upstream sets `Set-Cookie`, `X-Frame-Options`, `Content-Security-Policy`, or `Content-Type: text/html` with attacker-influenced body, those reach the browser unfiltered. The same pattern exists in `coder-proxy.ts:83-86` (keep-in-sync clone).
- **EXPLOIT:** Currently low-impact because BooControl's only HTTP endpoint (`/api/health`) returns `application/json`. But the pattern is dangerous by default — if any future BooControl endpoint returns HTML or sets cookies, the proxy will forward them. A compromised BooControl instance could inject arbitrary response headers into the BooChat origin.
- **Severity:** Medium

**Fix sketch:**
```typescript
const HOP_BY_HOP = new Set(['transfer-encoding', 'connection', 'keep-alive', 'upgrade']);
const BLOCKED_HEADERS = new Set(['set-cookie', 'content-security-policy', 'x-frame-options']);
for (const [key, value] of res.headers) {
  if (HOP_BY_HOP.has(key) || BLOCKED_HEADERS.has(key)) continue;
  reply.header(key, value);
}
```

---

### OWASP A06 - Vulnerable and Outdated Components

> **A06 - Vulnerable and Outdated Components:** No proven vulnerability found. Checked: `apps/control/package.json` dependencies — fastify ^4.28.1, @fastify/websocket ^10.0.1, postgres ^3.4.4, ws ^8.18.0, zod ^3.23.8. All are current major versions with no known CVEs at these version ranges. The `postgres` library (porsager/postgres) at ^3.4.4 is actively maintained.

---

### OWASP A07 - Identification and Authentication Failures

> **A07 - Identification and Authentication Failures:** No proven vulnerability beyond what's covered in SEC-001 (WS origin) and SEC-002 (no auth middleware). The architecture delegates authentication to Authelia at the reverse proxy, which is a valid pattern for single-user deployments. No hardcoded credentials, no bypass mechanisms, no session fixation.

---

### OWASP A08 - Software and Data Integrity Failures

**SEC-006: Critical SSE Parsing Bug — Fleet Connector Never Processes Events**
- **OWASP:** A08 - Software and Data Integrity Failures (data flow integrity)
- **Location:** `apps/control/src/services/fleet-connector.ts:158`
- **Evidence:**
  ```typescript
  const trimmed = line.trim();
  if (!trimmed || trimmed.startsWith('data:')) continue;  // ← BUG: skips ALL data lines

  const dataMatch = trimmed.match(/^data:\s*(.+)$/);       // ← unreachable for data lines
  if (!dataMatch) continue;
  ```
  Line 158 skips every line starting with `data:` — which is *every SSE data line*. The regex on line 160 can therefore never match. The `handleLlamaSweepEvent` callback is never invoked. No model events, metrics, or log data are ever parsed or persisted.
- **EXPLOIT:** This is not an attacker-exploitable vulnerability, but it renders the entire fleet monitoring system non-functional. The control dashboard shows no hosts, no activity, no perf data. The retention job runs but operates on an empty table. The perf poller (line 164-192) does work independently (it uses HTTP, not SSE), so perf samples are ingested — but the primary event stream is dead.
- **Severity:** Critical (functional — the core feature is broken)

**Fix sketch:**
```typescript
// Line 158: remove the incorrect `startsWith('data:')` guard
// The intent was probably to skip lines that are JUST "data:" (SSE end-of-event marker)
// but the current code skips all data lines including "data: {...}"
- if (!trimmed || trimmed.startsWith('data:')) continue;
+ if (!trimmed) continue;
+ if (trimmed === 'data:') continue; // SSE end-of-event marker (empty data)

// Also fix line 167: event type extraction is wrong for SSE format.
// In SSE, event type comes from a preceding "event:" line, not from the data line.
// The current code extracts "data" as the event type for every line.
// This needs a proper SSE parser that tracks the current event type across lines.
```

---

### OWASP A09 - Security Logging and Monitoring Failures

> **A09 - Security Logging and Monitoring Failures:** No proven vulnerability found. Checked: The control service logs reconnect attempts, SSE parse errors, and upstream proxy errors at appropriate levels. No sensitive data (passwords, tokens, PII) appears in log output. The `onnotice: () => {}` in `db.ts:20` suppresses PostgreSQL notices (which can contain query text). However, note that the fleet connector's critical parsing bug (SEC-006) means *no* fleet events are logged, which is a monitoring blind spot.

---

### OWASP A10 - Server-Side Request Forgery (SSRF)

**SEC-007: SSRF via Unvalidated ssh_host in URL Construction**
- **OWASP:** A10 - Server-Side Request Forgery
- **Location:** `apps/control/src/index.ts:247-248` and `apps/control/src/index.ts:269-270`
- **Evidence:**
  ```typescript
  // Line 247-248 (fleet connector startup):
  const sshHost = host.ssh_host;
  if (!sshHost) continue;
  const baseUrl = `http://${sshHost}:8401`;

  // Line 269-270 (perf poller):
  const sshHost = host.ssh_host;
  if (!sshHost) continue;
  const baseUrl = `http://${sshHost}:8401`;
  ```
  The `ssh_host` value from the `control_hosts` database table is interpolated directly into a URL with no validation. This URL is passed to `fetch()` in `fleet-connector.ts:136` and `index.ts:176`. If an attacker can write to the `control_hosts` table (via SQL injection elsewhere, compromised DB credentials, or a future admin endpoint), they can set `ssh_host` to:
  - `169.254.169.254` → AWS/GCP metadata service (credential theft)
  - `localhost:5432` → PostgreSQL (connection probing)
  - `internal-service.corp` → internal network scanning
  - `evil.com` → data exfiltration via DNS
- **EXPLOIT:** Requires write access to `control_hosts`. Currently no exposed write path exists in the control service (no POST/PUT/PATCH endpoints). The seed data in `schema.sql:19-23` sets known hosts. But the table schema (`ssh_host TEXT`) accepts any string, and the code trusts it completely. This is a latent SSRF — safe today, dangerous the moment an admin API or migration writes to this table.
- **Severity:** High (latent — no current write path, but no validation either)

**Fix sketch:**
```typescript
// Validate ssh_host is a bare hostname or IP before constructing URLs:
function validateSshHost(host: string): string {
  // Allow hostnames and IPv4/IPv6, reject URLs and special addresses
  if (/^https?:\/\//.test(host)) throw new Error(`ssh_host must not contain protocol: ${host}`);
  if (/[@#?]/.test(host)) throw new Error(`ssh_host contains invalid characters: ${host}`);
  // Block link-local and metadata addresses
  if (/^(169\.254\.|0\.|127\.|::1|fe80:)/.test(host)) throw new Error(`ssh_host is a blocked address: ${host}`);
  return host;
}

const sshHost = validateSshHost(host.ssh_host);
const baseUrl = `http://${sshHost}:8401`;
```

---

### Attack-Angle Protocol Results

#### Protocol 1: Input-to-Sink Tracing

| Input Source | Sink | Path | Verdict |
|---|---|---|---|
| SSE event data (`fleet-connector.ts:166`) | SQL INSERT (`index.ts:66-70`) | `JSON.parse(dataStr)` → `onEvent` → `handleLlamaSweepEvent` → `sql\`INSERT...\`` | **Dead path** — line 158 skips all data lines (SEC-006) |
| Perf poller HTTP response (`index.ts:178`) | SQL INSERT (`index.ts:182-186`) | `res.json()` → `sample.gpu`/`sample.sys` → `JSON.stringify()` → `sql\`...\`` | **Safe** — JSON.stringify produces valid JSON, parameterized via tagged template |
| `ssh_host` from DB (`index.ts:247`) | `fetch()` (`fleet-connector.ts:136`) | SQL SELECT → string interpolation → URL construction → `fetch(url)` | **Vulnerable** — no validation (SEC-007) |
| Client WS message (`control-proxy.ts:46`) | Upstream WS (`control-proxy.ts:48`) | `clientSocket.on('message')` → `upstream.send(data)` | **Relay only** — no processing, no injection vector |
| HTTP request path (`control-proxy.ts:65`) | `fetch()` (`control-proxy.ts:72`) | `req.url.replace(...)` → string concat → `fetch(targetUrl)` | **Safe** — Fastify normalizes paths; `req.url` is the matched route |
| WS frame data from upstream (`ws.ts:31`) | Client socket (`ws.ts:32`) | `JSON.stringify(delta)` → `socket.send()` | **Safe** — server-generated JSON |

#### Protocol 2: Auth/Authz Decision Audit

| Decision Point | Location | Bypass? |
|---|---|---|
| BooChat → BooControl proxy | `control-proxy.ts:19-88` | No auth forwarded (no headers to forward). Authelia is the sole gate. |
| BooControl WS endpoint | `routes/ws.ts:19` | No auth check. Any TCP connection to port 9503 gets fleet state. |
| BooControl health endpoint | `index.ts:227` | No auth. Intentional (health checks). |
| BooChat WS proxy origin | `control-proxy.ts:20` | No origin validation. SEC-001. |
| HTTP proxy catch-all | `control-proxy.ts:64` | Forwards `authorization` header if present (line 69). No additional validation. |

#### Protocol 3: Secret and PII Pattern Search

| Pattern | Files Searched | Findings |
|---|---|---|
| `password`, `secret`, `api_key`, `token`, `credential` | All control source files | None found in code. `DATABASE_URL` in config includes password but is env-var sourced. |
| `ssn`, `credit_card`, `private_key` | All files | None found. |
| `BEGIN RSA`, `Bearer `, `Authorization:` | All files | None found in control service. `control-proxy.ts:69` forwards `authorization` header (expected proxy behavior). |
| PII in fleet state | `fleet-state.ts`, `ws.ts` | `providerId`, model names, GPU metrics — operational data, not PII. |

#### Protocol 4: Dependency Vulnerability Check

| Dependency | Version | Known CVEs |
|---|---|---|
| `fastify` | ^4.28.1 | None at this version |
| `@fastify/websocket` | ^10.0.1 | None at this version |
| `postgres` (porsager) | ^3.4.4 | None at this version |
| `ws` | ^8.18.0 | None at this version |
| `zod` | ^3.23.8 | None at this version |

No known-vulnerable dependency versions detected.

---

## Security Improvement Summary

### What Was Found

The BooControl P1 implementation contains one critical functional defect (SEC-006: the SSE parser skips all data lines, rendering fleet monitoring completely non-functional), two high-severity architectural issues (SEC-001: no WebSocket origin validation on the proxy; SEC-007: SSRF via unvalidated `ssh_host` database values flowing into `fetch()`), and three medium-severity findings (SEC-002: no application-layer auth; SEC-003: JSONB interpolation uses fragile `::jsonb` string patterns instead of `sql.json()`; SEC-004: unbounded in-memory Maps; SEC-005: unfiltered response header forwarding). The SQL injection concern about `::jsonb` patterns was investigated thoroughly and confirmed as a false positive — the `postgres` tagged-template library correctly parameterizes all four interpolation sites. However, the patterns are a maintenance hazard and violate the project's own documented convention.

### How to Improve

1. **Fix the SSE parser** (SEC-006): Remove the `trimmed.startsWith('data:')` guard on line 158 of `fleet-connector.ts`. Add a proper SSE parser that tracks `event:` type lines across the stream and pairs them with their `data:` payloads. The current code always extracts `"data"` as the event type.

2. **Add WebSocket origin validation** (SEC-001): Use `@fastify/websocket`'s `origin` option or check `req.headers.origin` in the WS handler. Apply to both `control-proxy.ts` and `coder-proxy.ts` (keep-in-sync).

3. **Validate `ssh_host` before URL construction** (SEC-007): Add a `validateSshHost()` function that rejects URLs, protocol prefixes, special characters, and link-local/metadata IP addresses. Call it before constructing `baseUrl` in both the fleet connector startup and perf poller.

4. **Replace `::jsonb` string patterns with `sql.json()`** (SEC-003): All four interpolation sites in `index.ts` should use the `sql.json()` helper documented in the project's own `CLAUDE.md`. This eliminates linter false positives and follows the established codebase convention.

5. **Cap in-memory fleet state** (SEC-004): Add LRU eviction to the `models` Map per host (e.g., 100 entries max). Consider periodic pruning of hosts that have been `down` for more than the retention window.

6. **Filter forwarded response headers** (SEC-005): In `control-proxy.ts`, skip hop-by-hop headers AND security-sensitive headers (`set-cookie`, `content-security-policy`) when forwarding upstream responses. Apply the same filter to `coder-proxy.ts`.

### How to Prevent This Going Forward

1. **Tagged-template lint rule**: Add an ESLint custom rule or pre-commit check that flags `${JSON.stringify(...)}::jsonb` patterns and suggests `sql.json()`. The project's `CLAUDE.md` already documents this convention — enforcement should be automated.

2. **SSE parser library**: Replace the hand-rolled SSE parser with a tested library (e.g., `eventsource-parser` or `@microsoft/fetch-event-source`). Hand-rolled SSE parsing has a long history of off-by-one and state-tracking bugs — the line 158 bug is a textbook example.

3. **Proxy header allowlist**: Establish a pattern where proxy routes maintain an explicit allowlist of response headers to forward, rather than forwarding everything except a blocklist. This is defense-in-depth against compromised upstream services.

4. **Network boundary documentation**: Document the trust boundary between BooChat (port 9500) and BooControl (port 9503). If the control service will ever be exposed beyond Tailscale, add bearer-token auth before that happens. The current "no auth, rely on network isolation" pattern is acceptable for single-user Tailscale but should be explicitly documented as a known limitation.

5. **Integration test for SSE parsing**: Add a test that feeds a realistic SSE stream (with `event:` and `data:` lines) to `runFleetConnector` and asserts that `onEvent` is called with the correct event types. This would have caught SEC-006 immediately.