Fleet coordination lease — proposal

Status: OUTLINE (not yet ready to build). Spun out of BooControl P8 (see openspec/changes/boocontrol/). This folder is the separate design pass the BooControl program deferred; it is an outline, not an implementation plan ready for boo-implementing-changes. Promote to READY only after the open questions below are resolved.

Why

Four independent processes dispatch inference to the same llama-swap hosts with no coordination:

BooChat (apps/server) — interactive chat turns.
BooCoder (apps/coder) — agent dispatches (opencode / ACP / PTY / Claude-SDK).
Arena (apps/coder) — head-to-head battles.
BooControl (apps/control) — bench + eval runs.

Each host (sam-desktop, embedding) runs ONE model at a time on a single GPU; llama-swap evicts the loaded model to serve a request for a different one. So an unattended BooControl bench can evict a model mid-chat, and a chat can pollute a bench mid-run. BooControl P3 made this safe-by-construction for manual runs (human clicks "run", takeover confirmation, concurrent_foreign_requests recorded), but the underlying inflight == 0 check is a courtesy gate with a TOCTOU race against the other three writers (design §8, risk table). That race is the single blocker for unattended bench scheduling and reproducible concurrency sweeps — the reason this batch exists.

The proper fix is a per-host advisory lease in the shared boochat DB that BooControl's scheduler requires and the other three writers honor.

What ships (scope)

control_host_leases table (owned by the BooControl schema, since it is the only required holder; the others are voluntary honorers): holder id, purpose, expires_at, heartbeat timestamp, keyed by provider_id.
Lease lifecycle service in apps/control: acquire (atomic, conditional insert/update), heartbeat (extend expires_at), release, and expiry sweep (a crashed holder's lease lapses without manual cleanup).
The honor-protocol in all four writers: before dispatching to a host, check for an active exclusive lease held by someone else; if present, queue behind it or fail fast with a clear "host leased for " signal. A shared (non-exclusive) lease for ordinary interactive traffic is the default; bench/eval take an exclusive lease.
BooControl consumes it through the existing seam. P3 left acquireHostAccess(providerId, purpose): Promise<HostGrant> in apps/control/src/services/host-access.ts as a no-op returning {ok: true}. This batch swaps its body for a real lease acquire+heartbeat WITHOUT touching the bench engine (which already gates every run through the seam, design §8).
Unattended bench scheduling + reproducible concurrency sweeps unlock once the lease exists (the deferred half of BooControl P3).

Out of scope

Cross-host scheduling / global GPU arbitration beyond per-host leases (YAGNI: reopen if per-host leases prove insufficient — implementation-plan Deferred section).
Frontier-provider coordination (no single-GPU contention there).
Replacing llama-swap's own on-demand eviction; the lease coordinates callers, not the swap engine.

Open questions (resolve before READY)

Exclusive vs shared semantics for interactive traffic. Do BooChat/BooCoder take a shared lease per turn (heavyweight) or only read the exclusive-lease flag before dispatch (lightweight, racy on the boundary)? Leaning lightweight: interactive writers read-before-dispatch; only bench/eval take exclusive holds.
Honor enforcement granularity. Per-request check vs per-session hold. A per-request check is cheap but a long chat turn could still straddle a lease acquisition. Acceptable for v1?
Heartbeat interval + lease TTL. Short TTL = fast crash recovery but more DB chatter; long TTL = a crashed bench blocks the host until expiry. Proposed: TTL 60s, heartbeat 20s.
Failure mode when the DB is unreachable. Fail-open (dispatch anyway, current behavior) or fail-closed (refuse)? Fail-open preserves chat availability; document the residual race.

Risks

Risk	Mitigation
A crashed exclusive holder blocks a host	TTL + heartbeat; expiry sweep reclaims lapsed leases
Honor-protocol drift across four services	single shared lease-check helper in `@boocode/contracts`-adjacent shared code, consumed by all four; integration test per writer
DB unreachable mid-dispatch	documented fail-open default; lease is advisory, never a hard dependency for interactive chat
Lease check adds latency to every chat turn	lightweight read-before-dispatch (one indexed SELECT by `provider_id`); no per-turn write on the interactive path

References

BooControl design §8 Fleet coordination lease (P8 — cross-service) and the P3 seam contract (acquireHostAccess).
apps/control/src/services/host-access.ts — the seam to swap.
apps/control/src/schema.sql — where control_host_leases lands.

Recommended resolutions (draft)

These are draft recommendations for operator ratification before this change is promoted to READY.

Exclusive vs shared semantics for interactive traffic: Use exclusive leases only for bench/eval holders in v1; BooChat, BooCoder, and Arena should read-before-dispatch and avoid writing shared leases. Rationale: this keeps interactive latency and availability close to current behavior while still giving scheduled control work a clear isolation signal.
Honor enforcement granularity: Use a per-request honor check in v1, not a per-session hold. Rationale: it is the smallest cross-service contract and keeps long-lived chats from pinning a host across unrelated turns; document the residual boundary race.
Heartbeat interval and lease TTL: Use a 60s TTL with a 20s heartbeat, with expired rows reclaimed during acquire plus an opportunistic sweep. Rationale: this bounds crash recovery to about one minute while keeping write traffic low.
DB-unreachable failure mode: Fail open for interactive honorers, but fail closed for BooControl work that requires acquiring an exclusive lease. Rationale: chat availability should not depend on the advisory lease table, while unattended bench/eval work should not claim reproducible isolation when the lease cannot be acquired.

6.3 KiB Raw Blame History