Drop 9 batch proposals that are superseded by the boocode-lift-analysis (boocontext-audit, conductor upgrades, self-healing/verify-gate skills): add-3tier-memory, import-llm-evaluator, import-pregel-engine, plugin-platform, conductor-evolution, code-intelligence-upgrade, dev-workflow, ui-overhaul, agent-reliability. Delete 11 stub archive files (49-66B each, 'Status: Shipped. Archived.' only) that provide zero documentation value over the existing CHANGELOG.md + git tags.
2.0 KiB
2.0 KiB
ADDED Requirements
Requirement: Built-in evaluation prompt templates
The system SHALL ship with a library of prompt templates organized by domain, ready for use with create_llm_as_judge().
Domains and included prompts:
Quality:
CORRECTNESS_PROMPT— factual accuracy and completenessCONCISENESS_PROMPT— concise responses without hedging or fluffHALLUCINATION_PROMPT— claims verifiable from contextANSWER_RELEVANCE_PROMPT— output addresses the input questionPLAN_ADHERENCE_PROMPT— agent actions match declared planLAZINESS_PROMPT— detects blank or low-effort responses
RAG:
RAG_GROUNDEDNESS_PROMPT— output claims supported by retrieved contextRAG_HELPFULNESS_PROMPT— output addresses core questionRAG_RETRIEVAL_RELEVANCE_PROMPT— retrieved context is relevant to input
Safety:
TOXICITY_PROMPT— personal attacks, hate speechFAIRNESS_PROMPT— stereotyping, discrimination
Security:
PII_LEAKAGE_PROMPT— names, contact info, credentials in outputPROMPT_INJECTION_PROMPT— delimiter manipulation, roleplay bypassCODE_INJECTION_PROMPT— SQL injection, XSS, path traversal
Trajectory:
TRAJECTORY_ACCURACY_PROMPT— logical progression, goal alignmentTRAJECTORY_ACCURACY_PROMPT_WITH_REFERENCE— semantically equivalent to referenceTOOL_SELECTION_PROMPT— right tools, right order, no redundant calls
Conversation:
USER_SATISFACTION_PROMPT— gratitude, resolution, engagementTASK_COMPLETION_PROMPT— was the user's goal achievedAGENT_TONE_PROMPT— appropriate tone and professionalism
Scenario: Each prompt is a string with {inputs}, {outputs}, {reference_outputs} placeholders
- WHEN a prompt template is inspected
- THEN it SHALL be a string compatible with
str.format()containing at least{outputs}
Scenario: Prompt templates follow rubric structure
- WHEN a prompt template is read
- THEN it SHALL contain
<Rubric>,<Instructions>, and<Reminder>XML sections