Files
indifferentketchup c935687725 chore(openspec): drop 9 superseded proposals + 11 stub archive files
Drop 9 batch proposals that are superseded by the boocode-lift-analysis
(boocontext-audit, conductor upgrades, self-healing/verify-gate skills):
add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform,
conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul,
agent-reliability.

Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only)
that provide zero documentation value over the existing CHANGELOG.md + git tags.
2026-06-07 22:15:38 +00:00

2.1 KiB

ADDED Requirements

Requirement: Trajectory match evaluator

The system SHALL provide create_trajectory_match_evaluator() that compares agent tool-call trajectories against reference trajectories.

Parameters:

  • trajectory_match_mode: "strict" | "unordered" | "subset" | "superset" — matching strategy
  • tool_args_match_mode: "exact" | "ignore" | "subset" | "superset" — tool argument comparison
  • tool_args_match_overrides?: Record<string, ToolArgsMatchMode | Callable | string[]> — per-tool custom matching

Scenario: Strict mode requires exact order

  • WHEN output trajectory has tool calls [A, B] and reference is [A, B]
  • THEN strict mode SHALL return score true
  • WHEN output trajectory has tool calls [B, A] and reference is [A, B]
  • THEN strict mode SHALL return score false

Scenario: Unordered mode ignores order

  • WHEN output trajectory has tool calls [B, A] and reference is [A, B]
  • THEN unordered mode SHALL return score true

Scenario: Subset mode accepts partial trajectory

  • WHEN output trajectory has tool calls [A] and reference is [A, B]
  • THEN subset mode SHALL return score true

Scenario: Superset mode allows extra tool calls

  • WHEN output trajectory has tool calls [A, B, C] and reference is [A, B]
  • THEN superset mode SHALL return score true

Scenario: Tool args ignore mode skips argument comparison

  • WHEN tool_args_match_mode="ignore" is set
  • THEN tool calls match regardless of their arguments

Scenario: Custom tool arg matcher is used

  • WHEN tool_args_match_overrides contains a Callable for a tool name
  • THEN that callable SHALL be invoked to compare the tool's arguments

Requirement: Trajectory LLM-as-judge

The system SHALL provide create_trajectory_llm_as_judge() that uses an LLM to grade trajectory quality and accuracy.

Scenario: Trajectory is formatted as XML for LLM

  • WHEN an LLM trajectory evaluator is invoked
  • THEN the trajectory SHALL be formatted as XML with <role>, <tool_call>, <tool_result> elements