Files

indifferentketchup 0fa46cd06c v1.13.12: skills audit + token-tracking fix + codecontext + cap50 + UI cleanups

Multi-topic batch. The big-ticket item is the skills audit; the rest are
smaller patches that compounded during the audit work.

## Skills audit (rules→recipes split)

Vendored all 26 skills from /home/samkintop/opt/skills/ into data/skills/
(the boocode-repo-local skill library — see docker-compose change below).
Audited via 5 parallel Claude Code agent-teams running the
mgechev/skills-best-practices 4-step protocol (Discovery → Logic → Edge
Case → self-Architecture-Refinement) per skill, ~2 min wall-clock vs the
~3.7-hour serial estimate.

Result: 14 skills surviving (renamed to gerund form, frontmatter matched),
11 deleted (duplicates, BooCode-irrelevant patterns, Claude-already-does-
natively), 1 migrated to BOOCHAT.md/BOOCODER.md as an always-true rule
(verification-before-completion). Each surviving skill had its description
refined to fix specific trigger gaps surfaced by the protocol — 4
real-bug findings landed (dead refs, stale tags, broken sub-file
references in the original vendored content).

Audit decisions documented in openspec/changes/v1.13.12-skills-audit/
audit-notes.md. Convention codified in BOOCHAT.md/BOOCODER.md "rules vs
recipes" sections — future workflow rules go to those files (100%
present), recipes stay in data/skills/ (~6% invoke rate in multi-turn
per the Codeminer42 measurement).

## Token tracking + stale-stream banner fix (same root cause)

ws-frames.ts IsoTimestamp was z.string().min(1) but postgres returns
timestamp columns as JS Date objects. Every message_complete /
session_updated / chat_updated frame was failing the v1.13.11 Zod gate
and being silently dropped. Symptoms: token tracking blank in the UI
(no usage frames landed); the 60s no-token-activity timer tripped the
stale-stream banner because the frontend's local message state never
saw status='streaming' flip to 'complete'.

Fix: z.preprocess(v => v instanceof Date ? v.toISOString() : v,
z.string().min(1)) applied to the IsoTimestamp primitive. Centralized,
no publisher changes, works identically server + web (the parity test
still passes).

## Codecontext .codecontextignore auto-install

services/codecontext_client.ts now copies the
codecontext/.codecontextignore.template into any project's root on the
first call to that project if no .codecontextignore exists. One file
written per project, idempotent (in-memory Set guard + access-check),
silent fallback on read-only project. Stops the upstream empty-source-
file parser crash on foreign projects' node_modules — previously
required manually copying the template per project.

## Tool-call budget cap 30 → 50

services/inference/budget.ts: BUDGET_READ_ONLY and BUDGET_NO_AGENT
bumped to 50 (from 30). BUDGET_NON_READ_ONLY stays at 10 (no write
tools landed yet). Real recon sessions were hitting 30 with ~3 turns
wasted on codecontext parse failures; legitimate need was ~27, and
Architect-class system overviews want deeper recon. Headroom of 20
absorbs failure-retry turns without changing the safety floor — the
doom-loop guard (3 identical calls → abort) catches the actual
failure mode this cap was guarding against.

v1.14 (Phase C outer agent loop) will supersede this via per-agent
agent.steps. Throwaway-ish patch but unblocks deeper recon today.

## UI cleanups

- ChatPane queued-message dropdown removed. Each queued message now
  has three buttons: edit (pop back into ChatInput via sendToChat
  event), force-send (was the dropdown's only useful action), and
  cancel. Default behavior (send when streaming completes) needs no
  UI — it's the implicit do-nothing path.
- ChatThroughput removed from desktop tab strip (ChatTabBar.tsx).
  Mobile tab switcher still shows it.

## Plumbing

- .gitignore: data/* + !data/AGENTS.md + !data/skills/ negation
  patterns so the vendored skill library + agent registry become
  git-tracked while session DB state stays out.
- docker-compose.yml: removed /opt/skills:/data/skills override
  mount. Skills now live in the boocode repo at data/skills/,
  auditable per-batch. The host-level /opt/skills/ is preserved
  untouched for any other tools that read from it.
- .codecontextignore at repo root: auto-installed when codecontext
  was first called against /opt/boocode itself; matches the template.
- CLAUDE.md: updated to document the v1.13.11 publishFrame wrapper +
  message_parts table + tool_cost_stats view + DB-integration test
  pattern + host-side smoke endpoint quirk. (Pre-existing in working
  tree before this batch; shipped here for completeness.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-22 18:58:30 +00:00

9.8 KiB

Raw Blame History

System Prompt Design Patterns

Complete guide to writing effective agent system prompts that enable autonomous, high-quality operation.

Core Structure

Every agent system prompt should follow this proven structure:

You are [specific role] specializing in [specific domain].

**Your Core Responsibilities:**
1. [Primary responsibility - the main task]
2. [Secondary responsibility - supporting task]
3. [Additional responsibilities as needed]

**[Task Name] Process:**
1. [First concrete step]
2. [Second concrete step]
3. [Continue with clear steps]
[...]

**Quality Standards:**
- [Standard 1 with specifics]
- [Standard 2 with specifics]
- [Standard 3 with specifics]

**Output Format:**
Provide results structured as:
- [Component 1]
- [Component 2]
- [Include specific formatting requirements]

**Edge Cases:**
Handle these situations:
- [Edge case 1]: [Specific handling approach]
- [Edge case 2]: [Specific handling approach]

Pattern 1: Analysis Agents

For agents that analyze code, PRs, or documentation:

You are an expert [domain] analyzer specializing in [specific analysis type].

**Your Core Responsibilities:**
1. Thoroughly analyze [what] for [specific issues]
2. Identify [patterns/problems/opportunities]
3. Provide actionable recommendations

**Analysis Process:**
1. **Gather Context**: Read [what] using available tools
2. **Initial Scan**: Identify obvious [issues/patterns]
3. **Deep Analysis**: Examine [specific aspects]:
   - [Aspect 1]: Check for [criteria]
   - [Aspect 2]: Verify [criteria]
   - [Aspect 3]: Assess [criteria]
4. **Synthesize Findings**: Group related issues
5. **Prioritize**: Rank by [severity/impact/urgency]
6. **Generate Report**: Format according to output template

**Quality Standards:**
- Every finding includes file:line reference
- Issues categorized by severity (critical/major/minor)
- Recommendations are specific and actionable
- Positive observations included for balance

**Output Format:**
## Summary
[2-3 sentence overview]

## Critical Issues
- [file:line] - [Issue description] - [Recommendation]

## Major Issues
[...]

## Minor Issues
[...]

## Recommendations
[...]

**Edge Cases:**
- No issues found: Provide positive feedback and validation
- Too many issues: Group and prioritize top 10
- Unclear code: Request clarification rather than guessing

Pattern 2: Generation Agents

For agents that create code, tests, or documentation:

You are an expert [domain] engineer specializing in creating high-quality [output type].

**Your Core Responsibilities:**
1. Generate [what] that meets [quality standards]
2. Follow [specific conventions/patterns]
3. Ensure [correctness/completeness/clarity]

**Generation Process:**
1. **Understand Requirements**: Analyze what needs to be created
2. **Gather Context**: Read existing [code/docs/tests] for patterns
3. **Design Structure**: Plan [architecture/organization/flow]
4. **Generate Content**: Create [output] following:
   - [Convention 1]
   - [Convention 2]
   - [Best practice 1]
5. **Validate**: Verify [correctness/completeness]
6. **Document**: Add comments/explanations as needed

**Quality Standards:**
- Follows project conventions (check CLAUDE.md)
- [Specific quality metric 1]
- [Specific quality metric 2]
- Includes error handling
- Well-documented and clear

**Output Format:**
Create [what] with:
- [Structure requirement 1]
- [Structure requirement 2]
- Clear, descriptive naming
- Comprehensive coverage

**Edge Cases:**
- Insufficient context: Ask user for clarification
- Conflicting patterns: Follow most recent/explicit pattern
- Complex requirements: Break into smaller pieces

Pattern 3: Validation Agents

For agents that validate, check, or verify:

You are an expert [domain] validator specializing in ensuring [quality aspect].

**Your Core Responsibilities:**
1. Validate [what] against [criteria]
2. Identify violations and issues
3. Provide clear pass/fail determination

**Validation Process:**
1. **Load Criteria**: Understand validation requirements
2. **Scan Target**: Read [what] needs validation
3. **Check Rules**: For each rule:
   - [Rule 1]: [Validation method]
   - [Rule 2]: [Validation method]
4. **Collect Violations**: Document each failure with details
5. **Assess Severity**: Categorize issues
6. **Determine Result**: Pass only if [criteria met]

**Quality Standards:**
- All violations include specific locations
- Severity clearly indicated
- Fix suggestions provided
- No false positives

**Output Format:**
## Validation Result: [PASS/FAIL]

## Summary
[Overall assessment]

## Violations Found: [count]
### Critical ([count])
- [Location]: [Issue] - [Fix]

### Warnings ([count])
- [Location]: [Issue] - [Fix]

## Recommendations
[How to fix violations]

**Edge Cases:**
- No violations: Confirm validation passed
- Too many violations: Group by type, show top 20
- Ambiguous rules: Document uncertainty, request clarification

Pattern 4: Orchestration Agents

For agents that coordinate multiple tools or steps:

You are an expert [domain] orchestrator specializing in coordinating [complex workflow].

**Your Core Responsibilities:**
1. Coordinate [multi-step process]
2. Manage [resources/tools/dependencies]
3. Ensure [successful completion/integration]

**Orchestration Process:**
1. **Plan**: Understand full workflow and dependencies
2. **Prepare**: Set up prerequisites
3. **Execute Phases**:
   - Phase 1: [What] using [tools]
   - Phase 2: [What] using [tools]
   - Phase 3: [What] using [tools]
4. **Monitor**: Track progress and handle failures
5. **Verify**: Confirm successful completion
6. **Report**: Provide comprehensive summary

**Quality Standards:**
- Each phase completes successfully
- Errors handled gracefully
- Progress reported to user
- Final state verified

**Output Format:**
## Workflow Execution Report

### Completed Phases
- [Phase]: [Result]

### Results
- [Output 1]
- [Output 2]

### Next Steps
[If applicable]

**Edge Cases:**
- Phase failure: Attempt retry, then report and stop
- Missing dependencies: Request from user
- Timeout: Report partial completion

Writing Style Guidelines

Tone and Voice

Use second person (addressing the agent):

✅ You are responsible for...
✅ You will analyze...
✅ Your process should...

❌ The agent is responsible for...
❌ This agent will analyze...
❌ I will analyze...

Clarity and Specificity

Be specific, not vague:

✅ Check for SQL injection by examining all database queries for parameterization
❌ Look for security issues

✅ Provide file:line references for each finding
❌ Show where issues are

✅ Categorize as critical (security), major (bugs), or minor (style)
❌ Rate the severity of issues

Actionable Instructions

Give concrete steps:

✅ Read the file using the Read tool, then search for patterns using Grep
❌ Analyze the code

✅ Generate test file at test/path/to/file.test.ts
❌ Create tests

Common Pitfalls

❌ Vague Responsibilities

**Your Core Responsibilities:**
1. Help the user with their code
2. Provide assistance
3. Be helpful

Why bad: Not specific enough to guide behavior.

✅ Specific Responsibilities

**Your Core Responsibilities:**
1. Analyze TypeScript code for type safety issues
2. Identify missing type annotations and improper 'any' usage
3. Recommend specific type improvements with examples

❌ Missing Process Steps

Analyze the code and provide feedback.

Why bad: Agent doesn't know HOW to analyze.

✅ Clear Process

**Analysis Process:**
1. Read code files using Read tool
2. Scan for type annotations on all functions
3. Check for 'any' type usage
4. Verify generic type parameters
5. List findings with file:line references

❌ Undefined Output

Provide a report.

Why bad: Agent doesn't know what format to use.

✅ Defined Output Format

**Output Format:**
## Type Safety Report

### Summary
[Overview of findings]

### Issues Found
- `file.ts:42` - Missing return type on `processData`
- `utils.ts:15` - Unsafe 'any' usage in parameter

### Recommendations
[Specific fixes with examples]

Length Guidelines

Minimum Viable Agent

~500 words minimum:

Role description
3 core responsibilities
5-step process
Output format

Standard Agent

~1,000-2,000 words:

Detailed role and expertise
5-8 responsibilities
8-12 process steps
Quality standards
Output format
3-5 edge cases

Comprehensive Agent

~2,000-5,000 words:

Complete role with background
Comprehensive responsibilities
Detailed multi-phase process
Extensive quality standards
Multiple output formats
Many edge cases
Examples within system prompt

Avoid > 10,000 words: Too long, diminishing returns.

Testing System Prompts

Test Completeness

Can the agent handle these based on system prompt alone?

Typical task execution
Edge cases mentioned
Error scenarios
Unclear requirements
Large/complex inputs
Empty/missing inputs

Test Clarity

Read the system prompt and ask:

Can another developer understand what this agent does?
Are process steps clear and actionable?
Is output format unambiguous?
Are quality standards measurable?

Iterate Based on Results

After testing agent:

Identify where it struggled
Add missing guidance to system prompt
Clarify ambiguous instructions
Add process steps for edge cases
Re-test

Conclusion

Effective system prompts are:

Specific: Clear about what and how
Structured: Organized with clear sections
Complete: Covers normal and edge cases
Actionable: Provides concrete steps
Testable: Defines measurable standards

Use the patterns above as templates, customize for your domain, and iterate based on agent performance.

9.8 KiB Raw Blame History