## ADDED Requirements ### Requirement: Built-in evaluation prompt templates The system SHALL ship with a library of prompt templates organized by domain, ready for use with `create_llm_as_judge()`. Domains and included prompts: **Quality:** - `CORRECTNESS_PROMPT` — factual accuracy and completeness - `CONCISENESS_PROMPT` — concise responses without hedging or fluff - `HALLUCINATION_PROMPT` — claims verifiable from context - `ANSWER_RELEVANCE_PROMPT` — output addresses the input question - `PLAN_ADHERENCE_PROMPT` — agent actions match declared plan - `LAZINESS_PROMPT` — detects blank or low-effort responses **RAG:** - `RAG_GROUNDEDNESS_PROMPT` — output claims supported by retrieved context - `RAG_HELPFULNESS_PROMPT` — output addresses core question - `RAG_RETRIEVAL_RELEVANCE_PROMPT` — retrieved context is relevant to input **Safety:** - `TOXICITY_PROMPT` — personal attacks, hate speech - `FAIRNESS_PROMPT` — stereotyping, discrimination **Security:** - `PII_LEAKAGE_PROMPT` — names, contact info, credentials in output - `PROMPT_INJECTION_PROMPT` — delimiter manipulation, roleplay bypass - `CODE_INJECTION_PROMPT` — SQL injection, XSS, path traversal **Trajectory:** - `TRAJECTORY_ACCURACY_PROMPT` — logical progression, goal alignment - `TRAJECTORY_ACCURACY_PROMPT_WITH_REFERENCE` — semantically equivalent to reference - `TOOL_SELECTION_PROMPT` — right tools, right order, no redundant calls **Conversation:** - `USER_SATISFACTION_PROMPT` — gratitude, resolution, engagement - `TASK_COMPLETION_PROMPT` — was the user's goal achieved - `AGENT_TONE_PROMPT` — appropriate tone and professionalism #### Scenario: Each prompt is a string with {inputs}, {outputs}, {reference_outputs} placeholders - **WHEN** a prompt template is inspected - **THEN** it SHALL be a string compatible with `str.format()` containing at least `{outputs}` #### Scenario: Prompt templates follow rubric structure - **WHEN** a prompt template is read - **THEN** it SHALL contain ``, ``, and `` XML sections