Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
50 lines
2.0 KiB
Markdown
50 lines
2.0 KiB
Markdown
## ADDED Requirements
|
|
|
|
### Requirement: Built-in evaluation prompt templates
|
|
|
|
The system SHALL ship with a library of prompt templates organized by domain, ready for use with `create_llm_as_judge()`.
|
|
|
|
Domains and included prompts:
|
|
|
|
**Quality:**
|
|
- `CORRECTNESS_PROMPT` — factual accuracy and completeness
|
|
- `CONCISENESS_PROMPT` — concise responses without hedging or fluff
|
|
- `HALLUCINATION_PROMPT` — claims verifiable from context
|
|
- `ANSWER_RELEVANCE_PROMPT` — output addresses the input question
|
|
- `PLAN_ADHERENCE_PROMPT` — agent actions match declared plan
|
|
- `LAZINESS_PROMPT` — detects blank or low-effort responses
|
|
|
|
**RAG:**
|
|
- `RAG_GROUNDEDNESS_PROMPT` — output claims supported by retrieved context
|
|
- `RAG_HELPFULNESS_PROMPT` — output addresses core question
|
|
- `RAG_RETRIEVAL_RELEVANCE_PROMPT` — retrieved context is relevant to input
|
|
|
|
**Safety:**
|
|
- `TOXICITY_PROMPT` — personal attacks, hate speech
|
|
- `FAIRNESS_PROMPT` — stereotyping, discrimination
|
|
|
|
**Security:**
|
|
- `PII_LEAKAGE_PROMPT` — names, contact info, credentials in output
|
|
- `PROMPT_INJECTION_PROMPT` — delimiter manipulation, roleplay bypass
|
|
- `CODE_INJECTION_PROMPT` — SQL injection, XSS, path traversal
|
|
|
|
**Trajectory:**
|
|
- `TRAJECTORY_ACCURACY_PROMPT` — logical progression, goal alignment
|
|
- `TRAJECTORY_ACCURACY_PROMPT_WITH_REFERENCE` — semantically equivalent to reference
|
|
- `TOOL_SELECTION_PROMPT` — right tools, right order, no redundant calls
|
|
|
|
**Conversation:**
|
|
- `USER_SATISFACTION_PROMPT` — gratitude, resolution, engagement
|
|
- `TASK_COMPLETION_PROMPT` — was the user's goal achieved
|
|
- `AGENT_TONE_PROMPT` — appropriate tone and professionalism
|
|
|
|
#### Scenario: Each prompt is a string with {inputs}, {outputs}, {reference_outputs} placeholders
|
|
|
|
- **WHEN** a prompt template is inspected
|
|
- **THEN** it SHALL be a string compatible with `str.format()` containing at least `{outputs}`
|
|
|
|
#### Scenario: Prompt templates follow rubric structure
|
|
|
|
- **WHEN** a prompt template is read
|
|
- **THEN** it SHALL contain `<Rubric>`, `<Instructions>`, and `<Reminder>` XML sections
|