boocode/openspec/changes/archived/2026-06-07-eval-sandbox-agent-runtime/specs/multi-turn-simulation/spec.md

## ADDED Requirements

### Requirement: Multi-turn conversation simulation

The system SHALL provide `run_multiturn_simulation()` that simulates a multi-turn conversation between an app and a simulated user.

Parameters:
- `app: Callable[[ChatCompletionMessage], ChatCompletionMessage]` — the application under test
- `user: Callable | string[]` — simulated user (dynamic or static responses)
- `max_turns?: number` — maximum conversation turns
- `trajectory_evaluators?: EvalFunction[]` — evaluators that assess the final trajectory
- `stopping_condition?: Callable[[Message[], number], boolean]` — early termination
- `reference_outputs?: unknown` — passed to evaluators

#### Scenario: Static user responses drive conversation

- **WHEN** `user=["Hello", "Tell me more", "Goodbye"]` with `max_turns=3`
- **THEN** the simulation SHALL alternate between user responses and app responses for 3 turns

#### Scenario: Dynamic simulated user adapts to context

- **WHEN** `user` is a `Callable` receiving the current trajectory
- **THEN** the user function SHALL receive the current conversation history and return the next message

#### Scenario: Trajectory evaluators run after simulation

- **WHEN** `trajectory_evaluators` are provided
- **THEN** each evaluator SHALL receive the full conversation trajectory as `outputs`
- **THEN** the simulation result SHALL include `evaluator_results` from each evaluator

#### Scenario: Stopping condition terminates early

- **WHEN** `stopping_condition` returns `true` before `max_turns`
- **THEN** the simulation SHALL terminate immediately

#### Scenario: Async simulation is supported

- **WHEN** `run_multiturn_simulation_async()` is called with async `app` and `user` functions
- **THEN** the simulation SHALL await each turn and return the same result structure