Files
boocode/openspec/changes/fleet-coordination-lease/proposal.md
indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
2026-06-14 12:48:47 +00:00

5.0 KiB

Fleet coordination lease — proposal

Status: OUTLINE (not yet ready to build). Spun out of BooControl P8 (see openspec/changes/boocontrol/). This folder is the separate design pass the BooControl program deferred; it is an outline, not an implementation plan ready for boo-implementing-changes. Promote to READY only after the open questions below are resolved.

Why

Four independent processes dispatch inference to the same llama-swap hosts with no coordination:

  • BooChat (apps/server) — interactive chat turns.
  • BooCoder (apps/coder) — agent dispatches (opencode / ACP / PTY / Claude-SDK).
  • Arena (apps/coder) — head-to-head battles.
  • BooControl (apps/control) — bench + eval runs.

Each host (sam-desktop, embedding) runs ONE model at a time on a single GPU; llama-swap evicts the loaded model to serve a request for a different one. So an unattended BooControl bench can evict a model mid-chat, and a chat can pollute a bench mid-run. BooControl P3 made this safe-by-construction for manual runs (human clicks "run", takeover confirmation, concurrent_foreign_requests recorded), but the underlying inflight == 0 check is a courtesy gate with a TOCTOU race against the other three writers (design §8, risk table). That race is the single blocker for unattended bench scheduling and reproducible concurrency sweeps — the reason this batch exists.

The proper fix is a per-host advisory lease in the shared boochat DB that BooControl's scheduler requires and the other three writers honor.

What ships (scope)

  1. control_host_leases table (owned by the BooControl schema, since it is the only required holder; the others are voluntary honorers): holder id, purpose, expires_at, heartbeat timestamp, keyed by provider_id.
  2. Lease lifecycle service in apps/control: acquire (atomic, conditional insert/update), heartbeat (extend expires_at), release, and expiry sweep (a crashed holder's lease lapses without manual cleanup).
  3. The honor-protocol in all four writers: before dispatching to a host, check for an active exclusive lease held by someone else; if present, queue behind it or fail fast with a clear "host leased for " signal. A shared (non-exclusive) lease for ordinary interactive traffic is the default; bench/eval take an exclusive lease.
  4. BooControl consumes it through the existing seam. P3 left acquireHostAccess(providerId, purpose): Promise<HostGrant> in apps/control/src/services/host-access.ts as a no-op returning {ok: true}. This batch swaps its body for a real lease acquire+heartbeat WITHOUT touching the bench engine (which already gates every run through the seam, design §8).
  5. Unattended bench scheduling + reproducible concurrency sweeps unlock once the lease exists (the deferred half of BooControl P3).

Out of scope

  • Cross-host scheduling / global GPU arbitration beyond per-host leases (YAGNI: reopen if per-host leases prove insufficient — implementation-plan Deferred section).
  • Frontier-provider coordination (no single-GPU contention there).
  • Replacing llama-swap's own on-demand eviction; the lease coordinates callers, not the swap engine.

Open questions (resolve before READY)

  • Exclusive vs shared semantics for interactive traffic. Do BooChat/BooCoder take a shared lease per turn (heavyweight) or only read the exclusive-lease flag before dispatch (lightweight, racy on the boundary)? Leaning lightweight: interactive writers read-before-dispatch; only bench/eval take exclusive holds.
  • Honor enforcement granularity. Per-request check vs per-session hold. A per-request check is cheap but a long chat turn could still straddle a lease acquisition. Acceptable for v1?
  • Heartbeat interval + lease TTL. Short TTL = fast crash recovery but more DB chatter; long TTL = a crashed bench blocks the host until expiry. Proposed: TTL 60s, heartbeat 20s.
  • Failure mode when the DB is unreachable. Fail-open (dispatch anyway, current behavior) or fail-closed (refuse)? Fail-open preserves chat availability; document the residual race.

Risks

Risk Mitigation
A crashed exclusive holder blocks a host TTL + heartbeat; expiry sweep reclaims lapsed leases
Honor-protocol drift across four services single shared lease-check helper in @boocode/contracts-adjacent shared code, consumed by all four; integration test per writer
DB unreachable mid-dispatch documented fail-open default; lease is advisory, never a hard dependency for interactive chat
Lease check adds latency to every chat turn lightweight read-before-dispatch (one indexed SELECT by provider_id); no per-turn write on the interactive path

References

  • BooControl design §8 Fleet coordination lease (P8 — cross-service) and the P3 seam contract (acquireHostAccess).
  • apps/control/src/services/host-access.ts — the seam to swap.
  • apps/control/src/schema.sql — where control_host_leases lands.