chore: snapshot working tree - pty_exited notifications + in-flight inference WIP

feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean).

wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes.

openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
2026-06-14 12:48:47 +00:00
parent 0ed506f1da
commit b18de2a331
204 changed files with 25344 additions and 867 deletions

View File

@@ -0,0 +1,92 @@
# Fleet coordination lease — proposal
**Status:** OUTLINE (not yet ready to build). Spun out of BooControl P8 (see
`openspec/changes/boocontrol/`). This folder is the separate design pass the
BooControl program deferred; it is an outline, not an implementation plan ready
for `boo-implementing-changes`. Promote to READY only after the open questions
below are resolved.
## Why
Four independent processes dispatch inference to the same llama-swap hosts with
no coordination:
- **BooChat** (`apps/server`) — interactive chat turns.
- **BooCoder** (`apps/coder`) — agent dispatches (opencode / ACP / PTY / Claude-SDK).
- **Arena** (`apps/coder`) — head-to-head battles.
- **BooControl** (`apps/control`) — bench + eval runs.
Each host (`sam-desktop`, `embedding`) runs ONE model at a time on a single GPU;
llama-swap evicts the loaded model to serve a request for a different one. So an
unattended BooControl bench can evict a model mid-chat, and a chat can pollute a
bench mid-run. BooControl P3 made this safe-by-construction for *manual* runs
(human clicks "run", takeover confirmation, `concurrent_foreign_requests`
recorded), but the underlying `inflight == 0` check is a courtesy gate with a
TOCTOU race against the other three writers (design §8, risk table). That race
is the single blocker for **unattended bench scheduling and reproducible
concurrency sweeps** — the reason this batch exists.
The proper fix is a per-host advisory lease in the shared `boochat` DB that
BooControl's scheduler *requires* and the other three writers *honor*.
## What ships (scope)
1. **`control_host_leases` table** (owned by the BooControl schema, since it is
the only *required* holder; the others are voluntary honorers): holder id,
purpose, `expires_at`, heartbeat timestamp, keyed by `provider_id`.
2. **Lease lifecycle service** in `apps/control`: acquire (atomic, conditional
insert/update), heartbeat (extend `expires_at`), release, and expiry sweep
(a crashed holder's lease lapses without manual cleanup).
3. **The honor-protocol in all four writers**: before dispatching to a host,
check for an active *exclusive* lease held by someone else; if present, queue
behind it or fail fast with a clear "host leased for <purpose>" signal. A
shared (non-exclusive) lease for ordinary interactive traffic is the default;
bench/eval take an exclusive lease.
4. **BooControl consumes it through the existing seam.** P3 left
`acquireHostAccess(providerId, purpose): Promise<HostGrant>` in
`apps/control/src/services/host-access.ts` as a no-op returning `{ok: true}`.
This batch swaps its body for a real lease acquire+heartbeat WITHOUT touching
the bench engine (which already gates every run through the seam, design §8).
5. **Unattended bench scheduling + reproducible concurrency sweeps** unlock once
the lease exists (the deferred half of BooControl P3).
## Out of scope
- Cross-host scheduling / global GPU arbitration beyond per-host leases
(YAGNI: reopen if per-host leases prove insufficient — implementation-plan
Deferred section).
- Frontier-provider coordination (no single-GPU contention there).
- Replacing llama-swap's own on-demand eviction; the lease coordinates *callers*,
not the swap engine.
## Open questions (resolve before READY)
- **Exclusive vs shared semantics for interactive traffic.** Do BooChat/BooCoder
take a shared lease per turn (heavyweight) or only *read* the exclusive-lease
flag before dispatch (lightweight, racy on the boundary)? Leaning lightweight:
interactive writers read-before-dispatch; only bench/eval take exclusive holds.
- **Honor enforcement granularity.** Per-request check vs per-session hold. A
per-request check is cheap but a long chat turn could still straddle a lease
acquisition. Acceptable for v1?
- **Heartbeat interval + lease TTL.** Short TTL = fast crash recovery but more DB
chatter; long TTL = a crashed bench blocks the host until expiry. Proposed:
TTL 60s, heartbeat 20s.
- **Failure mode when the DB is unreachable.** Fail-open (dispatch anyway,
current behavior) or fail-closed (refuse)? Fail-open preserves chat
availability; document the residual race.
## Risks
| Risk | Mitigation |
|---|---|
| A crashed exclusive holder blocks a host | TTL + heartbeat; expiry sweep reclaims lapsed leases |
| Honor-protocol drift across four services | single shared lease-check helper in `@boocode/contracts`-adjacent shared code, consumed by all four; integration test per writer |
| DB unreachable mid-dispatch | documented fail-open default; lease is advisory, never a hard dependency for interactive chat |
| Lease check adds latency to every chat turn | lightweight read-before-dispatch (one indexed SELECT by `provider_id`); no per-turn write on the interactive path |
## References
- BooControl design `§8 Fleet coordination lease (P8 — cross-service)` and the
P3 seam contract (`acquireHostAccess`).
- `apps/control/src/services/host-access.ts` — the seam to swap.
- `apps/control/src/schema.sql` — where `control_host_leases` lands.