boocode/openspec/changes/fleet-coordination-lease/proposal.md

# Fleet coordination lease — proposal

**Status:** OUTLINE (not yet ready to build). Spun out of BooControl P8 (see
`openspec/changes/boocontrol/`). This folder is the separate design pass the
BooControl program deferred; it is an outline, not an implementation plan ready
for `boo-implementing-changes`. Promote to READY only after the open questions
below are resolved.

## Why

Four independent processes dispatch inference to the same llama-swap hosts with
no coordination:

- **BooChat** (`apps/server`) — interactive chat turns.
- **BooCoder** (`apps/coder`) — agent dispatches (opencode / ACP / PTY / Claude-SDK).
- **Arena** (`apps/coder`) — head-to-head battles.
- **BooControl** (`apps/control`) — bench + eval runs.

Each host (`sam-desktop`, `embedding`) runs ONE model at a time on a single GPU;
llama-swap evicts the loaded model to serve a request for a different one. So an
unattended BooControl bench can evict a model mid-chat, and a chat can pollute a
bench mid-run. BooControl P3 made this safe-by-construction for *manual* runs
(human clicks "run", takeover confirmation, `concurrent_foreign_requests`
recorded), but the underlying `inflight == 0` check is a courtesy gate with a
TOCTOU race against the other three writers (design §8, risk table). That race
is the single blocker for **unattended bench scheduling and reproducible
concurrency sweeps** — the reason this batch exists.

The proper fix is a per-host advisory lease in the shared `boochat` DB that
BooControl's scheduler *requires* and the other three writers *honor*.

## What ships (scope)

1. **`control_host_leases` table** (owned by the BooControl schema, since it is
   the only *required* holder; the others are voluntary honorers): holder id,
   purpose, `expires_at`, heartbeat timestamp, keyed by `provider_id`.
2. **Lease lifecycle service** in `apps/control`: acquire (atomic, conditional
   insert/update), heartbeat (extend `expires_at`), release, and expiry sweep
   (a crashed holder's lease lapses without manual cleanup).
3. **The honor-protocol in all four writers**: before dispatching to a host,
   check for an active *exclusive* lease held by someone else; if present, queue
   behind it or fail fast with a clear "host leased for <purpose>" signal. A
   shared (non-exclusive) lease for ordinary interactive traffic is the default;
   bench/eval take an exclusive lease.
4. **BooControl consumes it through the existing seam.** P3 left
   `acquireHostAccess(providerId, purpose): Promise<HostGrant>` in
   `apps/control/src/services/host-access.ts` as a no-op returning `{ok: true}`.
   This batch swaps its body for a real lease acquire+heartbeat WITHOUT touching
   the bench engine (which already gates every run through the seam, design §8).
5. **Unattended bench scheduling + reproducible concurrency sweeps** unlock once
   the lease exists (the deferred half of BooControl P3).

## Out of scope

- Cross-host scheduling / global GPU arbitration beyond per-host leases
  (YAGNI: reopen if per-host leases prove insufficient — implementation-plan
  Deferred section).
- Frontier-provider coordination (no single-GPU contention there).
- Replacing llama-swap's own on-demand eviction; the lease coordinates *callers*,
  not the swap engine.

## Open questions (resolve before READY)

- **Exclusive vs shared semantics for interactive traffic.** Do BooChat/BooCoder
  take a shared lease per turn (heavyweight) or only *read* the exclusive-lease
  flag before dispatch (lightweight, racy on the boundary)? Leaning lightweight:
  interactive writers read-before-dispatch; only bench/eval take exclusive holds.
- **Honor enforcement granularity.** Per-request check vs per-session hold. A
  per-request check is cheap but a long chat turn could still straddle a lease
  acquisition. Acceptable for v1?
- **Heartbeat interval + lease TTL.** Short TTL = fast crash recovery but more DB
  chatter; long TTL = a crashed bench blocks the host until expiry. Proposed:
  TTL 60s, heartbeat 20s.
- **Failure mode when the DB is unreachable.** Fail-open (dispatch anyway,
  current behavior) or fail-closed (refuse)? Fail-open preserves chat
  availability; document the residual race.

## Risks

| Risk | Mitigation |
|---|---|
| A crashed exclusive holder blocks a host | TTL + heartbeat; expiry sweep reclaims lapsed leases |
| Honor-protocol drift across four services | single shared lease-check helper in `@boocode/contracts`-adjacent shared code, consumed by all four; integration test per writer |
| DB unreachable mid-dispatch | documented fail-open default; lease is advisory, never a hard dependency for interactive chat |
| Lease check adds latency to every chat turn | lightweight read-before-dispatch (one indexed SELECT by `provider_id`); no per-turn write on the interactive path |

## References

- BooControl design `§8 Fleet coordination lease (P8 — cross-service)` and the
  P3 seam contract (`acquireHostAccess`).
- `apps/control/src/services/host-access.ts` — the seam to swap.
- `apps/control/src/schema.sql` — where `control_host_leases` lands.


## Recommended resolutions (draft)

These are draft recommendations for operator ratification before this change is
promoted to READY.

- **Exclusive vs shared semantics for interactive traffic:** Use exclusive
  leases only for bench/eval holders in v1; BooChat, BooCoder, and Arena should
  read-before-dispatch and avoid writing shared leases. Rationale: this keeps
  interactive latency and availability close to current behavior while still
  giving scheduled control work a clear isolation signal.
- **Honor enforcement granularity:** Use a per-request honor check in v1, not a
  per-session hold. Rationale: it is the smallest cross-service contract and
  keeps long-lived chats from pinning a host across unrelated turns; document
  the residual boundary race.
- **Heartbeat interval and lease TTL:** Use a 60s TTL with a 20s heartbeat, with
  expired rows reclaimed during acquire plus an opportunistic sweep. Rationale:
  this bounds crash recovery to about one minute while keeping write traffic low.
- **DB-unreachable failure mode:** Fail open for interactive honorers, but fail
  closed for BooControl work that requires acquiring an exclusive lease.
  Rationale: chat availability should not depend on the advisory lease table,
  while unattended bench/eval work should not claim reproducible isolation when
  the lease cannot be acquired.