chore(openspec): drop 9 superseded proposals + 11 stub archive files

Drop 9 batch proposals that are superseded by the boocode-lift-analysis
(boocontext-audit, conductor upgrades, self-healing/verify-gate skills):
add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform,
conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul,
agent-reliability.

Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only)
that provide zero documentation value over the existing CHANGELOG.md + git tags.
This commit is contained in:
2026-06-07 22:15:38 +00:00
parent 0d6e9a2413
commit c935687725
119 changed files with 4897 additions and 45 deletions

View File

@@ -0,0 +1,49 @@
## ADDED Requirements
### Requirement: Built-in evaluation prompt templates
The system SHALL ship with a library of prompt templates organized by domain, ready for use with `create_llm_as_judge()`.
Domains and included prompts:
**Quality:**
- `CORRECTNESS_PROMPT` — factual accuracy and completeness
- `CONCISENESS_PROMPT` — concise responses without hedging or fluff
- `HALLUCINATION_PROMPT` — claims verifiable from context
- `ANSWER_RELEVANCE_PROMPT` — output addresses the input question
- `PLAN_ADHERENCE_PROMPT` — agent actions match declared plan
- `LAZINESS_PROMPT` — detects blank or low-effort responses
**RAG:**
- `RAG_GROUNDEDNESS_PROMPT` — output claims supported by retrieved context
- `RAG_HELPFULNESS_PROMPT` — output addresses core question
- `RAG_RETRIEVAL_RELEVANCE_PROMPT` — retrieved context is relevant to input
**Safety:**
- `TOXICITY_PROMPT` — personal attacks, hate speech
- `FAIRNESS_PROMPT` — stereotyping, discrimination
**Security:**
- `PII_LEAKAGE_PROMPT` — names, contact info, credentials in output
- `PROMPT_INJECTION_PROMPT` — delimiter manipulation, roleplay bypass
- `CODE_INJECTION_PROMPT` — SQL injection, XSS, path traversal
**Trajectory:**
- `TRAJECTORY_ACCURACY_PROMPT` — logical progression, goal alignment
- `TRAJECTORY_ACCURACY_PROMPT_WITH_REFERENCE` — semantically equivalent to reference
- `TOOL_SELECTION_PROMPT` — right tools, right order, no redundant calls
**Conversation:**
- `USER_SATISFACTION_PROMPT` — gratitude, resolution, engagement
- `TASK_COMPLETION_PROMPT` — was the user's goal achieved
- `AGENT_TONE_PROMPT` — appropriate tone and professionalism
#### Scenario: Each prompt is a string with {inputs}, {outputs}, {reference_outputs} placeholders
- **WHEN** a prompt template is inspected
- **THEN** it SHALL be a string compatible with `str.format()` containing at least `{outputs}`
#### Scenario: Prompt templates follow rubric structure
- **WHEN** a prompt template is read
- **THEN** it SHALL contain `<Rubric>`, `<Instructions>`, and `<Reminder>` XML sections