chore(openspec): drop 9 superseded proposals + 11 stub archive files

Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
2026-06-07 22:15:38 +00:00
parent 0d6e9a2413
commit c935687725
119 changed files with 4897 additions and 45 deletions
--- a/openspec/changes/archived/2026-06-07-eval-sandbox-agent-runtime/specs/multi-turn-simulation/spec.md
+++ b/openspec/changes/archived/2026-06-07-eval-sandbox-agent-runtime/specs/multi-turn-simulation/spec.md
@@ -0,0 +1,39 @@
+## ADDED Requirements
+
+### Requirement: Multi-turn conversation simulation
+
+The system SHALL provide `run_multiturn_simulation()` that simulates a multi-turn conversation between an app and a simulated user.
+
+Parameters:
+- `app: Callable[[ChatCompletionMessage], ChatCompletionMessage]` — the application under test
+- `user: Callable | string[]` — simulated user (dynamic or static responses)
+- `max_turns?: number` — maximum conversation turns
+- `trajectory_evaluators?: EvalFunction[]` — evaluators that assess the final trajectory
+- `stopping_condition?: Callable[[Message[], number], boolean]` — early termination
+- `reference_outputs?: unknown` — passed to evaluators
+
+#### Scenario: Static user responses drive conversation
+
+- **WHEN** `user=["Hello", "Tell me more", "Goodbye"]` with `max_turns=3`
+- **THEN** the simulation SHALL alternate between user responses and app responses for 3 turns
+
+#### Scenario: Dynamic simulated user adapts to context
+
+- **WHEN** `user` is a `Callable` receiving the current trajectory
+- **THEN** the user function SHALL receive the current conversation history and return the next message
+
+#### Scenario: Trajectory evaluators run after simulation
+
+- **WHEN** `trajectory_evaluators` are provided
+- **THEN** each evaluator SHALL receive the full conversation trajectory as `outputs`
+- **THEN** the simulation result SHALL include `evaluator_results` from each evaluator
+
+#### Scenario: Stopping condition terminates early
+
+- **WHEN** `stopping_condition` returns `true` before `max_turns`
+- **THEN** the simulation SHALL terminate immediately
+
+#### Scenario: Async simulation is supported
+
+- **WHEN** `run_multiturn_simulation_async()` is called with async `app` and `user` functions
+- **THEN** the simulation SHALL await each turn and return the same result structure