Files

indifferentketchup b18de2a331 chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).

2026-06-14 12:48:47 +00:00

5.0 KiB

Raw Blame History

Fleet coordination lease — proposal

Status: OUTLINE (not yet ready to build). Spun out of BooControl P8 (see openspec/changes/boocontrol/). This folder is the separate design pass the BooControl program deferred; it is an outline, not an implementation plan ready for boo-implementing-changes. Promote to READY only after the open questions below are resolved.

Why

Four independent processes dispatch inference to the same llama-swap hosts with no coordination:

BooChat (apps/server) — interactive chat turns.
BooCoder (apps/coder) — agent dispatches (opencode / ACP / PTY / Claude-SDK).
Arena (apps/coder) — head-to-head battles.
BooControl (apps/control) — bench + eval runs.

Each host (sam-desktop, embedding) runs ONE model at a time on a single GPU; llama-swap evicts the loaded model to serve a request for a different one. So an unattended BooControl bench can evict a model mid-chat, and a chat can pollute a bench mid-run. BooControl P3 made this safe-by-construction for manual runs (human clicks "run", takeover confirmation, concurrent_foreign_requests recorded), but the underlying inflight == 0 check is a courtesy gate with a TOCTOU race against the other three writers (design §8, risk table). That race is the single blocker for unattended bench scheduling and reproducible concurrency sweeps — the reason this batch exists.

The proper fix is a per-host advisory lease in the shared boochat DB that BooControl's scheduler requires and the other three writers honor.

What ships (scope)

control_host_leases table (owned by the BooControl schema, since it is the only required holder; the others are voluntary honorers): holder id, purpose, expires_at, heartbeat timestamp, keyed by provider_id.
Lease lifecycle service in apps/control: acquire (atomic, conditional insert/update), heartbeat (extend expires_at), release, and expiry sweep (a crashed holder's lease lapses without manual cleanup).
The honor-protocol in all four writers: before dispatching to a host, check for an active exclusive lease held by someone else; if present, queue behind it or fail fast with a clear "host leased for " signal. A shared (non-exclusive) lease for ordinary interactive traffic is the default; bench/eval take an exclusive lease.
BooControl consumes it through the existing seam. P3 left acquireHostAccess(providerId, purpose): Promise<HostGrant> in apps/control/src/services/host-access.ts as a no-op returning {ok: true}. This batch swaps its body for a real lease acquire+heartbeat WITHOUT touching the bench engine (which already gates every run through the seam, design §8).
Unattended bench scheduling + reproducible concurrency sweeps unlock once the lease exists (the deferred half of BooControl P3).

Out of scope

Cross-host scheduling / global GPU arbitration beyond per-host leases (YAGNI: reopen if per-host leases prove insufficient — implementation-plan Deferred section).
Frontier-provider coordination (no single-GPU contention there).
Replacing llama-swap's own on-demand eviction; the lease coordinates callers, not the swap engine.

Open questions (resolve before READY)

Exclusive vs shared semantics for interactive traffic. Do BooChat/BooCoder take a shared lease per turn (heavyweight) or only read the exclusive-lease flag before dispatch (lightweight, racy on the boundary)? Leaning lightweight: interactive writers read-before-dispatch; only bench/eval take exclusive holds.
Honor enforcement granularity. Per-request check vs per-session hold. A per-request check is cheap but a long chat turn could still straddle a lease acquisition. Acceptable for v1?
Heartbeat interval + lease TTL. Short TTL = fast crash recovery but more DB chatter; long TTL = a crashed bench blocks the host until expiry. Proposed: TTL 60s, heartbeat 20s.
Failure mode when the DB is unreachable. Fail-open (dispatch anyway, current behavior) or fail-closed (refuse)? Fail-open preserves chat availability; document the residual race.

Risks

Risk	Mitigation
A crashed exclusive holder blocks a host	TTL + heartbeat; expiry sweep reclaims lapsed leases
Honor-protocol drift across four services	single shared lease-check helper in `@boocode/contracts`-adjacent shared code, consumed by all four; integration test per writer
DB unreachable mid-dispatch	documented fail-open default; lease is advisory, never a hard dependency for interactive chat
Lease check adds latency to every chat turn	lightweight read-before-dispatch (one indexed SELECT by `provider_id`); no per-turn write on the interactive path

References

BooControl design §8 Fleet coordination lease (P8 — cross-service) and the P3 seam contract (acquireHostAccess).
apps/control/src/services/host-access.ts — the seam to swap.
apps/control/src/schema.sql — where control_host_leases lands.

5.0 KiB Raw Blame History