chore(openspec): drop 9 superseded proposals + 11 stub archive files

Drop 9 batch proposals that are superseded by the boocode-lift-analysis
(boocontext-audit, conductor upgrades, self-healing/verify-gate skills):
add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform,
conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul,
agent-reliability.

Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only)
that provide zero documentation value over the existing CHANGELOG.md + git tags.
This commit is contained in:
2026-06-07 22:15:38 +00:00
parent 0d6e9a2413
commit c935687725
119 changed files with 4897 additions and 45 deletions

View File

@@ -0,0 +1,51 @@
## ADDED Requirements
### Requirement: Trajectory match evaluator
The system SHALL provide `create_trajectory_match_evaluator()` that compares agent tool-call trajectories against reference trajectories.
Parameters:
- `trajectory_match_mode: "strict" | "unordered" | "subset" | "superset"` — matching strategy
- `tool_args_match_mode: "exact" | "ignore" | "subset" | "superset"` — tool argument comparison
- `tool_args_match_overrides?: Record<string, ToolArgsMatchMode | Callable | string[]>` — per-tool custom matching
#### Scenario: Strict mode requires exact order
- **WHEN** output trajectory has tool calls `[A, B]` and reference is `[A, B]`
- **THEN** strict mode SHALL return score `true`
- **WHEN** output trajectory has tool calls `[B, A]` and reference is `[A, B]`
- **THEN** strict mode SHALL return score `false`
#### Scenario: Unordered mode ignores order
- **WHEN** output trajectory has tool calls `[B, A]` and reference is `[A, B]`
- **THEN** unordered mode SHALL return score `true`
#### Scenario: Subset mode accepts partial trajectory
- **WHEN** output trajectory has tool calls `[A]` and reference is `[A, B]`
- **THEN** subset mode SHALL return score `true`
#### Scenario: Superset mode allows extra tool calls
- **WHEN** output trajectory has tool calls `[A, B, C]` and reference is `[A, B]`
- **THEN** superset mode SHALL return score `true`
#### Scenario: Tool args ignore mode skips argument comparison
- **WHEN** `tool_args_match_mode="ignore"` is set
- **THEN** tool calls match regardless of their arguments
#### Scenario: Custom tool arg matcher is used
- **WHEN** `tool_args_match_overrides` contains a `Callable` for a tool name
- **THEN** that callable SHALL be invoked to compare the tool's arguments
### Requirement: Trajectory LLM-as-judge
The system SHALL provide `create_trajectory_llm_as_judge()` that uses an LLM to grade trajectory quality and accuracy.
#### Scenario: Trajectory is formatted as XML for LLM
- **WHEN** an LLM trajectory evaluator is invoked
- **THEN** the trajectory SHALL be formatted as XML with `<role>`, `<tool_call>`, `<tool_result>` elements