chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions
--- a/openspec/changes/fleet-coordination-lease/proposal.md
+++ b/openspec/changes/fleet-coordination-lease/proposal.md
@@ -0,0 +1,92 @@
+# Fleet coordination lease — proposal
+
+**Status:** OUTLINE (not yet ready to build). Spun out of BooControl P8 (see
+`openspec/changes/boocontrol/`). This folder is the separate design pass the
+BooControl program deferred; it is an outline, not an implementation plan ready
+for `boo-implementing-changes`. Promote to READY only after the open questions
+below are resolved.
+
+## Why
+
+Four independent processes dispatch inference to the same llama-swap hosts with
+no coordination:
+
+- **BooChat** (`apps/server`) — interactive chat turns.
+- **BooCoder** (`apps/coder`) — agent dispatches (opencode / ACP / PTY / Claude-SDK).
+- **Arena** (`apps/coder`) — head-to-head battles.
+- **BooControl** (`apps/control`) — bench + eval runs.
+
+Each host (`sam-desktop`, `embedding`) runs ONE model at a time on a single GPU;
+llama-swap evicts the loaded model to serve a request for a different one. So an
+unattended BooControl bench can evict a model mid-chat, and a chat can pollute a
+bench mid-run. BooControl P3 made this safe-by-construction for *manual* runs
+(human clicks "run", takeover confirmation, `concurrent_foreign_requests`
+recorded), but the underlying `inflight == 0` check is a courtesy gate with a
+TOCTOU race against the other three writers (design §8, risk table). That race
+is the single blocker for **unattended bench scheduling and reproducible
+concurrency sweeps** — the reason this batch exists.
+
+The proper fix is a per-host advisory lease in the shared `boochat` DB that
+BooControl's scheduler *requires* and the other three writers *honor*.
+
+## What ships (scope)
+
+1. **`control_host_leases` table** (owned by the BooControl schema, since it is
+   the only *required* holder; the others are voluntary honorers): holder id,
+   purpose, `expires_at`, heartbeat timestamp, keyed by `provider_id`.
+2. **Lease lifecycle service** in `apps/control`: acquire (atomic, conditional
+   insert/update), heartbeat (extend `expires_at`), release, and expiry sweep
+   (a crashed holder's lease lapses without manual cleanup).
+3. **The honor-protocol in all four writers**: before dispatching to a host,
+   check for an active *exclusive* lease held by someone else; if present, queue
+   behind it or fail fast with a clear "host leased for <purpose>" signal. A
+   shared (non-exclusive) lease for ordinary interactive traffic is the default;
+   bench/eval take an exclusive lease.
+4. **BooControl consumes it through the existing seam.** P3 left
+   `acquireHostAccess(providerId, purpose): Promise<HostGrant>` in
+   `apps/control/src/services/host-access.ts` as a no-op returning `{ok: true}`.
+   This batch swaps its body for a real lease acquire+heartbeat WITHOUT touching
+   the bench engine (which already gates every run through the seam, design §8).
+5. **Unattended bench scheduling + reproducible concurrency sweeps** unlock once
+   the lease exists (the deferred half of BooControl P3).
+
+## Out of scope
+
+- Cross-host scheduling / global GPU arbitration beyond per-host leases
+  (YAGNI: reopen if per-host leases prove insufficient — implementation-plan
+  Deferred section).
+- Frontier-provider coordination (no single-GPU contention there).
+- Replacing llama-swap's own on-demand eviction; the lease coordinates *callers*,
+  not the swap engine.
+
+## Open questions (resolve before READY)
+
+- **Exclusive vs shared semantics for interactive traffic.** Do BooChat/BooCoder
+  take a shared lease per turn (heavyweight) or only *read* the exclusive-lease
+  flag before dispatch (lightweight, racy on the boundary)? Leaning lightweight:
+  interactive writers read-before-dispatch; only bench/eval take exclusive holds.
+- **Honor enforcement granularity.** Per-request check vs per-session hold. A
+  per-request check is cheap but a long chat turn could still straddle a lease
+  acquisition. Acceptable for v1?
+- **Heartbeat interval + lease TTL.** Short TTL = fast crash recovery but more DB
+  chatter; long TTL = a crashed bench blocks the host until expiry. Proposed:
+  TTL 60s, heartbeat 20s.
+- **Failure mode when the DB is unreachable.** Fail-open (dispatch anyway,
+  current behavior) or fail-closed (refuse)? Fail-open preserves chat
+  availability; document the residual race.
+
+## Risks
+
+| Risk | Mitigation |
+|---|---|
+| A crashed exclusive holder blocks a host | TTL + heartbeat; expiry sweep reclaims lapsed leases |
+| Honor-protocol drift across four services | single shared lease-check helper in `@boocode/contracts`-adjacent shared code, consumed by all four; integration test per writer |
+| DB unreachable mid-dispatch | documented fail-open default; lease is advisory, never a hard dependency for interactive chat |
+| Lease check adds latency to every chat turn | lightweight read-before-dispatch (one indexed SELECT by `provider_id`); no per-turn write on the interactive path |
+
+## References
+
+- BooControl design `§8 Fleet coordination lease (P8 — cross-service)` and the
+  P3 seam contract (`acquireHostAccess`).
+- `apps/control/src/services/host-access.ts` — the seam to swap.
+- `apps/control/src/schema.sql` — where `control_host_leases` lands.