chore: snapshot working tree - pty_exited notifications + in-flight inference WIP
feat(booterm): structured pty_exited WS notifications. Plan-validated, impl-validated, code-reviewed green (contracts build clean, contracts test 29/29, booterm + web typecheck clean). wip: in-progress inference/provider refactor (agents.ts, provider.ts, new llama-providers.ts, removed llama-args-validator), plus arena, dispatcher, compaction, schema changes. openspec: pty-exit-notifications complete; x-agent-flags planned (not yet implemented).
This commit is contained in:
92
openspec/changes/fleet-coordination-lease/proposal.md
Normal file
92
openspec/changes/fleet-coordination-lease/proposal.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# Fleet coordination lease — proposal
|
||||
|
||||
**Status:** OUTLINE (not yet ready to build). Spun out of BooControl P8 (see
|
||||
`openspec/changes/boocontrol/`). This folder is the separate design pass the
|
||||
BooControl program deferred; it is an outline, not an implementation plan ready
|
||||
for `boo-implementing-changes`. Promote to READY only after the open questions
|
||||
below are resolved.
|
||||
|
||||
## Why
|
||||
|
||||
Four independent processes dispatch inference to the same llama-swap hosts with
|
||||
no coordination:
|
||||
|
||||
- **BooChat** (`apps/server`) — interactive chat turns.
|
||||
- **BooCoder** (`apps/coder`) — agent dispatches (opencode / ACP / PTY / Claude-SDK).
|
||||
- **Arena** (`apps/coder`) — head-to-head battles.
|
||||
- **BooControl** (`apps/control`) — bench + eval runs.
|
||||
|
||||
Each host (`sam-desktop`, `embedding`) runs ONE model at a time on a single GPU;
|
||||
llama-swap evicts the loaded model to serve a request for a different one. So an
|
||||
unattended BooControl bench can evict a model mid-chat, and a chat can pollute a
|
||||
bench mid-run. BooControl P3 made this safe-by-construction for *manual* runs
|
||||
(human clicks "run", takeover confirmation, `concurrent_foreign_requests`
|
||||
recorded), but the underlying `inflight == 0` check is a courtesy gate with a
|
||||
TOCTOU race against the other three writers (design §8, risk table). That race
|
||||
is the single blocker for **unattended bench scheduling and reproducible
|
||||
concurrency sweeps** — the reason this batch exists.
|
||||
|
||||
The proper fix is a per-host advisory lease in the shared `boochat` DB that
|
||||
BooControl's scheduler *requires* and the other three writers *honor*.
|
||||
|
||||
## What ships (scope)
|
||||
|
||||
1. **`control_host_leases` table** (owned by the BooControl schema, since it is
|
||||
the only *required* holder; the others are voluntary honorers): holder id,
|
||||
purpose, `expires_at`, heartbeat timestamp, keyed by `provider_id`.
|
||||
2. **Lease lifecycle service** in `apps/control`: acquire (atomic, conditional
|
||||
insert/update), heartbeat (extend `expires_at`), release, and expiry sweep
|
||||
(a crashed holder's lease lapses without manual cleanup).
|
||||
3. **The honor-protocol in all four writers**: before dispatching to a host,
|
||||
check for an active *exclusive* lease held by someone else; if present, queue
|
||||
behind it or fail fast with a clear "host leased for <purpose>" signal. A
|
||||
shared (non-exclusive) lease for ordinary interactive traffic is the default;
|
||||
bench/eval take an exclusive lease.
|
||||
4. **BooControl consumes it through the existing seam.** P3 left
|
||||
`acquireHostAccess(providerId, purpose): Promise<HostGrant>` in
|
||||
`apps/control/src/services/host-access.ts` as a no-op returning `{ok: true}`.
|
||||
This batch swaps its body for a real lease acquire+heartbeat WITHOUT touching
|
||||
the bench engine (which already gates every run through the seam, design §8).
|
||||
5. **Unattended bench scheduling + reproducible concurrency sweeps** unlock once
|
||||
the lease exists (the deferred half of BooControl P3).
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Cross-host scheduling / global GPU arbitration beyond per-host leases
|
||||
(YAGNI: reopen if per-host leases prove insufficient — implementation-plan
|
||||
Deferred section).
|
||||
- Frontier-provider coordination (no single-GPU contention there).
|
||||
- Replacing llama-swap's own on-demand eviction; the lease coordinates *callers*,
|
||||
not the swap engine.
|
||||
|
||||
## Open questions (resolve before READY)
|
||||
|
||||
- **Exclusive vs shared semantics for interactive traffic.** Do BooChat/BooCoder
|
||||
take a shared lease per turn (heavyweight) or only *read* the exclusive-lease
|
||||
flag before dispatch (lightweight, racy on the boundary)? Leaning lightweight:
|
||||
interactive writers read-before-dispatch; only bench/eval take exclusive holds.
|
||||
- **Honor enforcement granularity.** Per-request check vs per-session hold. A
|
||||
per-request check is cheap but a long chat turn could still straddle a lease
|
||||
acquisition. Acceptable for v1?
|
||||
- **Heartbeat interval + lease TTL.** Short TTL = fast crash recovery but more DB
|
||||
chatter; long TTL = a crashed bench blocks the host until expiry. Proposed:
|
||||
TTL 60s, heartbeat 20s.
|
||||
- **Failure mode when the DB is unreachable.** Fail-open (dispatch anyway,
|
||||
current behavior) or fail-closed (refuse)? Fail-open preserves chat
|
||||
availability; document the residual race.
|
||||
|
||||
## Risks
|
||||
|
||||
| Risk | Mitigation |
|
||||
|---|---|
|
||||
| A crashed exclusive holder blocks a host | TTL + heartbeat; expiry sweep reclaims lapsed leases |
|
||||
| Honor-protocol drift across four services | single shared lease-check helper in `@boocode/contracts`-adjacent shared code, consumed by all four; integration test per writer |
|
||||
| DB unreachable mid-dispatch | documented fail-open default; lease is advisory, never a hard dependency for interactive chat |
|
||||
| Lease check adds latency to every chat turn | lightweight read-before-dispatch (one indexed SELECT by `provider_id`); no per-turn write on the interactive path |
|
||||
|
||||
## References
|
||||
|
||||
- BooControl design `§8 Fleet coordination lease (P8 — cross-service)` and the
|
||||
P3 seam contract (`acquireHostAccess`).
|
||||
- `apps/control/src/services/host-access.ts` — the seam to swap.
|
||||
- `apps/control/src/schema.sql` — where `control_host_leases` lands.
|
||||
46
openspec/changes/fleet-coordination-lease/tasks.md
Normal file
46
openspec/changes/fleet-coordination-lease/tasks.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Fleet coordination lease — tasks
|
||||
|
||||
**Status:** OUTLINE. Do not start until the proposal's open questions are
|
||||
resolved and this folder is promoted to READY. Task granularity here is
|
||||
deliberately coarse; a full implementation plan (per `boo-planning-changes`) is
|
||||
the first step once READY.
|
||||
|
||||
## L0 — design pass (gate)
|
||||
- [ ] Resolve the four open questions in `proposal.md` (exclusive vs shared,
|
||||
enforcement granularity, TTL/heartbeat, DB-unreachable failure mode).
|
||||
- [ ] Write `design.md`: lease state machine, the atomic acquire SQL (conditional
|
||||
upsert, no check-then-act), the honor-protocol contract shared by all four
|
||||
writers, and the integration-test matrix.
|
||||
|
||||
## L1 — schema + lease service (apps/control)
|
||||
- [ ] `control_host_leases` in `apps/control/src/schema.sql`: `provider_id`,
|
||||
`holder`, `purpose`, `mode` (shared|exclusive), `expires_at`, `heartbeat_at`,
|
||||
idempotent DDL. Index for the hot read path (active lease by `provider_id`).
|
||||
- [ ] Lease service: `acquire` (atomic conditional upsert), `heartbeat`,
|
||||
`release`, and an expiry sweep timer (reclaim lapsed leases) following the
|
||||
retention-timer pattern.
|
||||
- [ ] Pure helpers unit-tested (lease-conflict decision, expiry check) per the
|
||||
`turn-guard.ts` pattern; DB-gated integration tests `describe.runIf(DATABASE_URL)`.
|
||||
|
||||
## L2 — swap the BooControl seam
|
||||
- [ ] Replace the body of `acquireHostAccess(providerId, purpose)` in
|
||||
`apps/control/src/services/host-access.ts` with a real exclusive-lease
|
||||
acquire + heartbeat for bench/eval purposes. Do NOT touch the bench engine
|
||||
(it already gates through the seam).
|
||||
- [ ] Return a `HostGrant` that carries a release handle/heartbeat lifecycle the
|
||||
bench runner can drive in its `finally`.
|
||||
|
||||
## L3 — honor-protocol in the other three writers
|
||||
- [ ] BooChat (`apps/server`): read-before-dispatch active-exclusive-lease check
|
||||
on the inference path; clear "host leased for <purpose>" surfacing.
|
||||
- [ ] BooCoder (`apps/coder`): same check at the dispatch fetch sites.
|
||||
- [ ] Arena (`apps/coder`): same check at the battle fetch sites.
|
||||
- [ ] A single shared lease-check helper consumed by all four (avoid drift); one
|
||||
integration test per writer proving it honors an exclusive lease.
|
||||
|
||||
## L4 — unlock unattended scheduling
|
||||
- [ ] Unattended bench scheduling (the deferred half of BooControl P3): a
|
||||
scheduler that acquires the exclusive lease, runs, releases.
|
||||
- [ ] Reproducible concurrency sweeps behind the lease (no foreign traffic).
|
||||
- [ ] Smoke: schedule an overnight bench; confirm it never evicts a live model
|
||||
and that `concurrent_foreign_requests` is 0 for leased runs.
|
||||
Reference in New Issue
Block a user