Files
boocode/openspec/changes/fleet-coordination-lease/proposal.md

117 lines
6.3 KiB
Markdown

# Fleet coordination lease — proposal
**Status:** OUTLINE (not yet ready to build). Spun out of BooControl P8 (see
`openspec/changes/boocontrol/`). This folder is the separate design pass the
BooControl program deferred; it is an outline, not an implementation plan ready
for `boo-implementing-changes`. Promote to READY only after the open questions
below are resolved.
## Why
Four independent processes dispatch inference to the same llama-swap hosts with
no coordination:
- **BooChat** (`apps/server`) — interactive chat turns.
- **BooCoder** (`apps/coder`) — agent dispatches (opencode / ACP / PTY / Claude-SDK).
- **Arena** (`apps/coder`) — head-to-head battles.
- **BooControl** (`apps/control`) — bench + eval runs.
Each host (`sam-desktop`, `embedding`) runs ONE model at a time on a single GPU;
llama-swap evicts the loaded model to serve a request for a different one. So an
unattended BooControl bench can evict a model mid-chat, and a chat can pollute a
bench mid-run. BooControl P3 made this safe-by-construction for *manual* runs
(human clicks "run", takeover confirmation, `concurrent_foreign_requests`
recorded), but the underlying `inflight == 0` check is a courtesy gate with a
TOCTOU race against the other three writers (design §8, risk table). That race
is the single blocker for **unattended bench scheduling and reproducible
concurrency sweeps** — the reason this batch exists.
The proper fix is a per-host advisory lease in the shared `boochat` DB that
BooControl's scheduler *requires* and the other three writers *honor*.
## What ships (scope)
1. **`control_host_leases` table** (owned by the BooControl schema, since it is
the only *required* holder; the others are voluntary honorers): holder id,
purpose, `expires_at`, heartbeat timestamp, keyed by `provider_id`.
2. **Lease lifecycle service** in `apps/control`: acquire (atomic, conditional
insert/update), heartbeat (extend `expires_at`), release, and expiry sweep
(a crashed holder's lease lapses without manual cleanup).
3. **The honor-protocol in all four writers**: before dispatching to a host,
check for an active *exclusive* lease held by someone else; if present, queue
behind it or fail fast with a clear "host leased for <purpose>" signal. A
shared (non-exclusive) lease for ordinary interactive traffic is the default;
bench/eval take an exclusive lease.
4. **BooControl consumes it through the existing seam.** P3 left
`acquireHostAccess(providerId, purpose): Promise<HostGrant>` in
`apps/control/src/services/host-access.ts` as a no-op returning `{ok: true}`.
This batch swaps its body for a real lease acquire+heartbeat WITHOUT touching
the bench engine (which already gates every run through the seam, design §8).
5. **Unattended bench scheduling + reproducible concurrency sweeps** unlock once
the lease exists (the deferred half of BooControl P3).
## Out of scope
- Cross-host scheduling / global GPU arbitration beyond per-host leases
(YAGNI: reopen if per-host leases prove insufficient — implementation-plan
Deferred section).
- Frontier-provider coordination (no single-GPU contention there).
- Replacing llama-swap's own on-demand eviction; the lease coordinates *callers*,
not the swap engine.
## Open questions (resolve before READY)
- **Exclusive vs shared semantics for interactive traffic.** Do BooChat/BooCoder
take a shared lease per turn (heavyweight) or only *read* the exclusive-lease
flag before dispatch (lightweight, racy on the boundary)? Leaning lightweight:
interactive writers read-before-dispatch; only bench/eval take exclusive holds.
- **Honor enforcement granularity.** Per-request check vs per-session hold. A
per-request check is cheap but a long chat turn could still straddle a lease
acquisition. Acceptable for v1?
- **Heartbeat interval + lease TTL.** Short TTL = fast crash recovery but more DB
chatter; long TTL = a crashed bench blocks the host until expiry. Proposed:
TTL 60s, heartbeat 20s.
- **Failure mode when the DB is unreachable.** Fail-open (dispatch anyway,
current behavior) or fail-closed (refuse)? Fail-open preserves chat
availability; document the residual race.
## Risks
| Risk | Mitigation |
|---|---|
| A crashed exclusive holder blocks a host | TTL + heartbeat; expiry sweep reclaims lapsed leases |
| Honor-protocol drift across four services | single shared lease-check helper in `@boocode/contracts`-adjacent shared code, consumed by all four; integration test per writer |
| DB unreachable mid-dispatch | documented fail-open default; lease is advisory, never a hard dependency for interactive chat |
| Lease check adds latency to every chat turn | lightweight read-before-dispatch (one indexed SELECT by `provider_id`); no per-turn write on the interactive path |
## References
- BooControl design `§8 Fleet coordination lease (P8 — cross-service)` and the
P3 seam contract (`acquireHostAccess`).
- `apps/control/src/services/host-access.ts` — the seam to swap.
- `apps/control/src/schema.sql` — where `control_host_leases` lands.
## Recommended resolutions (draft)
These are draft recommendations for operator ratification before this change is
promoted to READY.
- **Exclusive vs shared semantics for interactive traffic:** Use exclusive
leases only for bench/eval holders in v1; BooChat, BooCoder, and Arena should
read-before-dispatch and avoid writing shared leases. Rationale: this keeps
interactive latency and availability close to current behavior while still
giving scheduled control work a clear isolation signal.
- **Honor enforcement granularity:** Use a per-request honor check in v1, not a
per-session hold. Rationale: it is the smallest cross-service contract and
keeps long-lived chats from pinning a host across unrelated turns; document
the residual boundary race.
- **Heartbeat interval and lease TTL:** Use a 60s TTL with a 20s heartbeat, with
expired rows reclaimed during acquire plus an opportunistic sweep. Rationale:
this bounds crash recovery to about one minute while keeping write traffic low.
- **DB-unreachable failure mode:** Fail open for interactive honorers, but fail
closed for BooControl work that requires acquiring an exclusive lease.
Rationale: chat availability should not depend on the advisory lease table,
while unattended bench/eval work should not claim reproducible isolation when
the lease cannot be acquired.