boocode/openspec/changes/archived/2026-06-07-eval-sandbox-agent-runtime/specs/eval-prompt-library/spec.md

## ADDED Requirements

### Requirement: Built-in evaluation prompt templates

The system SHALL ship with a library of prompt templates organized by domain, ready for use with `create_llm_as_judge()`.

Domains and included prompts:

**Quality:**
- `CORRECTNESS_PROMPT` — factual accuracy and completeness
- `CONCISENESS_PROMPT` — concise responses without hedging or fluff
- `HALLUCINATION_PROMPT` — claims verifiable from context
- `ANSWER_RELEVANCE_PROMPT` — output addresses the input question
- `PLAN_ADHERENCE_PROMPT` — agent actions match declared plan
- `LAZINESS_PROMPT` — detects blank or low-effort responses

**RAG:**
- `RAG_GROUNDEDNESS_PROMPT` — output claims supported by retrieved context
- `RAG_HELPFULNESS_PROMPT` — output addresses core question
- `RAG_RETRIEVAL_RELEVANCE_PROMPT` — retrieved context is relevant to input

**Safety:**
- `TOXICITY_PROMPT` — personal attacks, hate speech
- `FAIRNESS_PROMPT` — stereotyping, discrimination

**Security:**
- `PII_LEAKAGE_PROMPT` — names, contact info, credentials in output
- `PROMPT_INJECTION_PROMPT` — delimiter manipulation, roleplay bypass
- `CODE_INJECTION_PROMPT` — SQL injection, XSS, path traversal

**Trajectory:**
- `TRAJECTORY_ACCURACY_PROMPT` — logical progression, goal alignment
- `TRAJECTORY_ACCURACY_PROMPT_WITH_REFERENCE` — semantically equivalent to reference
- `TOOL_SELECTION_PROMPT` — right tools, right order, no redundant calls

**Conversation:**
- `USER_SATISFACTION_PROMPT` — gratitude, resolution, engagement
- `TASK_COMPLETION_PROMPT` — was the user's goal achieved
- `AGENT_TONE_PROMPT` — appropriate tone and professionalism

#### Scenario: Each prompt is a string with {inputs}, {outputs}, {reference_outputs} placeholders

- **WHEN** a prompt template is inspected
- **THEN** it SHALL be a string compatible with `str.format()` containing at least `{outputs}`

#### Scenario: Prompt templates follow rubric structure

- **WHEN** a prompt template is read
- **THEN** it SHALL contain `<Rubric>`, `<Instructions>`, and `<Reminder>` XML sections