boocode/docs/adr/0001-arena-two-lane-scheduling.md

# Arena schedules contestants in a local lane (serial) and a cloud lane (parallel)

A Battle runs the same prompt against 2–6 Contestants. The local llama-swap
server can only hold one model in memory at a time, so llama-swap-backed
Contestants are placed in a **local lane** and run strictly one at a time, while
cloud-backed Contestants (Claude Code, OpenCode-on-cloud) run all in parallel in
a **cloud lane**; the two lanes run concurrently. We chose this over running
everything serially (too slow for cloud) or everything in parallel (impossible
for local, and it would corrupt the speed Benchmark) because the single-model
constraint is physical and the serial local lane also gives each local model an
uncontended, fair tokens/sec measurement.

## Consequences

- A Battle's wall-clock is roughly `max(slowest cloud contestant, sum of local
  contestants)`. Deep local lanes (especially all-local Q&A battles) are slow by
  design; the launcher warns when the local lane is deep.
- The speed Benchmark (tokens/sec) is only meaningful for local-lane Contestants,
  which is acceptable since external CLI agents don't report token usage anyway.