v1.13.12: skills audit + token-tracking fix + codecontext + cap50 + UI cleanups

Multi-topic batch. The big-ticket item is the skills audit; the rest are smaller patches that compounded during the audit work. ## Skills audit (rules→recipes split) Vendored all 26 skills from /home/samkintop/opt/skills/ into data/skills/ (the boocode-repo-local skill library — see docker-compose change below). Audited via 5 parallel Claude Code agent-teams running the mgechev/skills-best-practices 4-step protocol (Discovery → Logic → Edge Case → self-Architecture-Refinement) per skill, ~2 min wall-clock vs the ~3.7-hour serial estimate. Result: 14 skills surviving (renamed to gerund form, frontmatter matched), 11 deleted (duplicates, BooCode-irrelevant patterns, Claude-already-does- natively), 1 migrated to BOOCHAT.md/BOOCODER.md as an always-true rule (verification-before-completion). Each surviving skill had its description refined to fix specific trigger gaps surfaced by the protocol — 4 real-bug findings landed (dead refs, stale tags, broken sub-file references in the original vendored content). Audit decisions documented in openspec/changes/v1.13.12-skills-audit/ audit-notes.md. Convention codified in BOOCHAT.md/BOOCODER.md "rules vs recipes" sections — future workflow rules go to those files (100% present), recipes stay in data/skills/ (~6% invoke rate in multi-turn per the Codeminer42 measurement). ## Token tracking + stale-stream banner fix (same root cause) ws-frames.ts IsoTimestamp was z.string().min(1) but postgres returns timestamp columns as JS Date objects. Every message_complete / session_updated / chat_updated frame was failing the v1.13.11 Zod gate and being silently dropped. Symptoms: token tracking blank in the UI (no usage frames landed); the 60s no-token-activity timer tripped the stale-stream banner because the frontend's local message state never saw status='streaming' flip to 'complete'. Fix: z.preprocess(v => v instanceof Date ? v.toISOString() : v, z.string().min(1)) applied to the IsoTimestamp primitive. Centralized, no publisher changes, works identically server + web (the parity test still passes). ## Codecontext .codecontextignore auto-install services/codecontext_client.ts now copies the codecontext/.codecontextignore.template into any project's root on the first call to that project if no .codecontextignore exists. One file written per project, idempotent (in-memory Set guard + access-check), silent fallback on read-only project. Stops the upstream empty-source- file parser crash on foreign projects' node_modules — previously required manually copying the template per project. ## Tool-call budget cap 30 → 50 services/inference/budget.ts: BUDGET_READ_ONLY and BUDGET_NO_AGENT bumped to 50 (from 30). BUDGET_NON_READ_ONLY stays at 10 (no write tools landed yet). Real recon sessions were hitting 30 with ~3 turns wasted on codecontext parse failures; legitimate need was ~27, and Architect-class system overviews want deeper recon. Headroom of 20 absorbs failure-retry turns without changing the safety floor — the doom-loop guard (3 identical calls → abort) catches the actual failure mode this cap was guarding against. v1.14 (Phase C outer agent loop) will supersede this via per-agent agent.steps. Throwaway-ish patch but unblocks deeper recon today. ## UI cleanups - ChatPane queued-message dropdown removed. Each queued message now has three buttons: edit (pop back into ChatInput via sendToChat event), force-send (was the dropdown's only useful action), and cancel. Default behavior (send when streaming completes) needs no UI — it's the implicit do-nothing path. - ChatThroughput removed from desktop tab strip (ChatTabBar.tsx). Mobile tab switcher still shows it. ## Plumbing - .gitignore: data/* + !data/AGENTS.md + !data/skills/ negation patterns so the vendored skill library + agent registry become git-tracked while session DB state stays out. - docker-compose.yml: removed /opt/skills:/data/skills override mount. Skills now live in the boocode repo at data/skills/, auditable per-batch. The host-level /opt/skills/ is preserved untouched for any other tools that read from it. - .codecontextignore at repo root: auto-installed when codecontext was first called against /opt/boocode itself; matches the template. - CLAUDE.md: updated to document the v1.13.11 publishFrame wrapper + message_parts table + tool_cost_stats view + DB-integration test pattern + host-side smoke endpoint quirk. (Pre-existing in working tree before this batch; shipped here for completeness.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 18:58:30 +00:00
parent bc376c878d
commit 0fa46cd06c
80 changed files with 6950 additions and 39 deletions
--- a/data/skills/anthropics/developing-agents/references/system-prompt-design.md
+++ b/data/skills/anthropics/developing-agents/references/system-prompt-design.md
@@ -0,0 +1,411 @@
+# System Prompt Design Patterns
+
+Complete guide to writing effective agent system prompts that enable autonomous, high-quality operation.
+
+## Core Structure
+
+Every agent system prompt should follow this proven structure:
+
+```markdown
+You are [specific role] specializing in [specific domain].
+
+**Your Core Responsibilities:**
+1. [Primary responsibility - the main task]
+2. [Secondary responsibility - supporting task]
+3. [Additional responsibilities as needed]
+
+**[Task Name] Process:**
+1. [First concrete step]
+2. [Second concrete step]
+3. [Continue with clear steps]
+[...]
+
+**Quality Standards:**
+- [Standard 1 with specifics]
+- [Standard 2 with specifics]
+- [Standard 3 with specifics]
+
+**Output Format:**
+Provide results structured as:
+- [Component 1]
+- [Component 2]
+- [Include specific formatting requirements]
+
+**Edge Cases:**
+Handle these situations:
+- [Edge case 1]: [Specific handling approach]
+- [Edge case 2]: [Specific handling approach]
+```
+
+## Pattern 1: Analysis Agents
+
+For agents that analyze code, PRs, or documentation:
+
+```markdown
+You are an expert [domain] analyzer specializing in [specific analysis type].
+
+**Your Core Responsibilities:**
+1. Thoroughly analyze [what] for [specific issues]
+2. Identify [patterns/problems/opportunities]
+3. Provide actionable recommendations
+
+**Analysis Process:**
+1. **Gather Context**: Read [what] using available tools
+2. **Initial Scan**: Identify obvious [issues/patterns]
+3. **Deep Analysis**: Examine [specific aspects]:
+   - [Aspect 1]: Check for [criteria]
+   - [Aspect 2]: Verify [criteria]
+   - [Aspect 3]: Assess [criteria]
+4. **Synthesize Findings**: Group related issues
+5. **Prioritize**: Rank by [severity/impact/urgency]
+6. **Generate Report**: Format according to output template
+
+**Quality Standards:**
+- Every finding includes file:line reference
+- Issues categorized by severity (critical/major/minor)
+- Recommendations are specific and actionable
+- Positive observations included for balance
+
+**Output Format:**
+## Summary
+[2-3 sentence overview]
+
+## Critical Issues
+- [file:line] - [Issue description] - [Recommendation]
+
+## Major Issues
+[...]
+
+## Minor Issues
+[...]
+
+## Recommendations
+[...]
+
+**Edge Cases:**
+- No issues found: Provide positive feedback and validation
+- Too many issues: Group and prioritize top 10
+- Unclear code: Request clarification rather than guessing
+```
+
+## Pattern 2: Generation Agents
+
+For agents that create code, tests, or documentation:
+
+```markdown
+You are an expert [domain] engineer specializing in creating high-quality [output type].
+
+**Your Core Responsibilities:**
+1. Generate [what] that meets [quality standards]
+2. Follow [specific conventions/patterns]
+3. Ensure [correctness/completeness/clarity]
+
+**Generation Process:**
+1. **Understand Requirements**: Analyze what needs to be created
+2. **Gather Context**: Read existing [code/docs/tests] for patterns
+3. **Design Structure**: Plan [architecture/organization/flow]
+4. **Generate Content**: Create [output] following:
+   - [Convention 1]
+   - [Convention 2]
+   - [Best practice 1]
+5. **Validate**: Verify [correctness/completeness]
+6. **Document**: Add comments/explanations as needed
+
+**Quality Standards:**
+- Follows project conventions (check CLAUDE.md)
+- [Specific quality metric 1]
+- [Specific quality metric 2]
+- Includes error handling
+- Well-documented and clear
+
+**Output Format:**
+Create [what] with:
+- [Structure requirement 1]
+- [Structure requirement 2]
+- Clear, descriptive naming
+- Comprehensive coverage
+
+**Edge Cases:**
+- Insufficient context: Ask user for clarification
+- Conflicting patterns: Follow most recent/explicit pattern
+- Complex requirements: Break into smaller pieces
+```
+
+## Pattern 3: Validation Agents
+
+For agents that validate, check, or verify:
+
+```markdown
+You are an expert [domain] validator specializing in ensuring [quality aspect].
+
+**Your Core Responsibilities:**
+1. Validate [what] against [criteria]
+2. Identify violations and issues
+3. Provide clear pass/fail determination
+
+**Validation Process:**
+1. **Load Criteria**: Understand validation requirements
+2. **Scan Target**: Read [what] needs validation
+3. **Check Rules**: For each rule:
+   - [Rule 1]: [Validation method]
+   - [Rule 2]: [Validation method]
+4. **Collect Violations**: Document each failure with details
+5. **Assess Severity**: Categorize issues
+6. **Determine Result**: Pass only if [criteria met]
+
+**Quality Standards:**
+- All violations include specific locations
+- Severity clearly indicated
+- Fix suggestions provided
+- No false positives
+
+**Output Format:**
+## Validation Result: [PASS/FAIL]
+
+## Summary
+[Overall assessment]
+
+## Violations Found: [count]
+### Critical ([count])
+- [Location]: [Issue] - [Fix]
+
+### Warnings ([count])
+- [Location]: [Issue] - [Fix]
+
+## Recommendations
+[How to fix violations]
+
+**Edge Cases:**
+- No violations: Confirm validation passed
+- Too many violations: Group by type, show top 20
+- Ambiguous rules: Document uncertainty, request clarification
+```
+
+## Pattern 4: Orchestration Agents
+
+For agents that coordinate multiple tools or steps:
+
+```markdown
+You are an expert [domain] orchestrator specializing in coordinating [complex workflow].
+
+**Your Core Responsibilities:**
+1. Coordinate [multi-step process]
+2. Manage [resources/tools/dependencies]
+3. Ensure [successful completion/integration]
+
+**Orchestration Process:**
+1. **Plan**: Understand full workflow and dependencies
+2. **Prepare**: Set up prerequisites
+3. **Execute Phases**:
+   - Phase 1: [What] using [tools]
+   - Phase 2: [What] using [tools]
+   - Phase 3: [What] using [tools]
+4. **Monitor**: Track progress and handle failures
+5. **Verify**: Confirm successful completion
+6. **Report**: Provide comprehensive summary
+
+**Quality Standards:**
+- Each phase completes successfully
+- Errors handled gracefully
+- Progress reported to user
+- Final state verified
+
+**Output Format:**
+## Workflow Execution Report
+
+### Completed Phases
+- [Phase]: [Result]
+
+### Results
+- [Output 1]
+- [Output 2]
+
+### Next Steps
+[If applicable]
+
+**Edge Cases:**
+- Phase failure: Attempt retry, then report and stop
+- Missing dependencies: Request from user
+- Timeout: Report partial completion
+```
+
+## Writing Style Guidelines
+
+### Tone and Voice
+
+**Use second person (addressing the agent):**
+```
+✅ You are responsible for...
+✅ You will analyze...
+✅ Your process should...
+
+❌ The agent is responsible for...
+❌ This agent will analyze...
+❌ I will analyze...
+```
+
+### Clarity and Specificity
+
+**Be specific, not vague:**
+```
+✅ Check for SQL injection by examining all database queries for parameterization
+❌ Look for security issues
+
+✅ Provide file:line references for each finding
+❌ Show where issues are
+
+✅ Categorize as critical (security), major (bugs), or minor (style)
+❌ Rate the severity of issues
+```
+
+### Actionable Instructions
+
+**Give concrete steps:**
+```
+✅ Read the file using the Read tool, then search for patterns using Grep
+❌ Analyze the code
+
+✅ Generate test file at test/path/to/file.test.ts
+❌ Create tests
+```
+
+## Common Pitfalls
+
+### ❌ Vague Responsibilities
+
+```markdown
+**Your Core Responsibilities:**
+1. Help the user with their code
+2. Provide assistance
+3. Be helpful
+```
+
+**Why bad:** Not specific enough to guide behavior.
+
+### ✅ Specific Responsibilities
+
+```markdown
+**Your Core Responsibilities:**
+1. Analyze TypeScript code for type safety issues
+2. Identify missing type annotations and improper 'any' usage
+3. Recommend specific type improvements with examples
+```
+
+### ❌ Missing Process Steps
+
+```markdown
+Analyze the code and provide feedback.
+```
+
+**Why bad:** Agent doesn't know HOW to analyze.
+
+### ✅ Clear Process
+
+```markdown
+**Analysis Process:**
+1. Read code files using Read tool
+2. Scan for type annotations on all functions
+3. Check for 'any' type usage
+4. Verify generic type parameters
+5. List findings with file:line references
+```
+
+### ❌ Undefined Output
+
+```markdown
+Provide a report.
+```
+
+**Why bad:** Agent doesn't know what format to use.
+
+### ✅ Defined Output Format
+
+```markdown
+**Output Format:**
+## Type Safety Report
+
+### Summary
+[Overview of findings]
+
+### Issues Found
+- `file.ts:42` - Missing return type on `processData`
+- `utils.ts:15` - Unsafe 'any' usage in parameter
+
+### Recommendations
+[Specific fixes with examples]
+```
+
+## Length Guidelines
+
+### Minimum Viable Agent
+
+**~500 words minimum:**
+- Role description
+- 3 core responsibilities
+- 5-step process
+- Output format
+
+### Standard Agent
+
+**~1,000-2,000 words:**
+- Detailed role and expertise
+- 5-8 responsibilities
+- 8-12 process steps
+- Quality standards
+- Output format
+- 3-5 edge cases
+
+### Comprehensive Agent
+
+**~2,000-5,000 words:**
+- Complete role with background
+- Comprehensive responsibilities
+- Detailed multi-phase process
+- Extensive quality standards
+- Multiple output formats
+- Many edge cases
+- Examples within system prompt
+
+**Avoid > 10,000 words:** Too long, diminishing returns.
+
+## Testing System Prompts
+
+### Test Completeness
+
+Can the agent handle these based on system prompt alone?
+
+- [ ] Typical task execution
+- [ ] Edge cases mentioned
+- [ ] Error scenarios
+- [ ] Unclear requirements
+- [ ] Large/complex inputs
+- [ ] Empty/missing inputs
+
+### Test Clarity
+
+Read the system prompt and ask:
+
+- Can another developer understand what this agent does?
+- Are process steps clear and actionable?
+- Is output format unambiguous?
+- Are quality standards measurable?
+
+### Iterate Based on Results
+
+After testing agent:
+1. Identify where it struggled
+2. Add missing guidance to system prompt
+3. Clarify ambiguous instructions
+4. Add process steps for edge cases
+5. Re-test
+
+## Conclusion
+
+Effective system prompts are:
+- **Specific**: Clear about what and how
+- **Structured**: Organized with clear sections
+- **Complete**: Covers normal and edge cases
+- **Actionable**: Provides concrete steps
+- **Testable**: Defines measurable standards
+
+Use the patterns above as templates, customize for your domain, and iterate based on agent performance.