# v1.13.12 — skills audit pass

Audit of 26 skills vendored from `/home/samkintop/opt/skills/` into `/opt/boocode/data/skills/`. Each sorted into one of four buckets per the Codeminer42 rules→recipes split.

## Deviations from the batch spec

| Spec said | Reality | Resolution |
|---|---|---|
| `/opt/boocode/skills/` is the audit target | Skills directory is `/opt/boocode/data/skills/` (per `services/skills.ts:19` `SKILLS_ROOT = '/data/skills'`) | Vendored to the correct path |
| `/opt/boocode/AGENTS.md` for bucket-(a) rule additions | `data/AGENTS.md` is an agent registry (`## H2` per agent with frontmatter), not a rules file | Bucket-(a) rules go to `BOOCHAT.md` (the container guidance file the chat agent reads) instead |
| "7 vendored v1.12 skills" exist to audit | Zero SKILL.md ever committed; `data/skills/` was empty | Vendored all 26 from `/home/samkintop/opt/skills/` in this batch (vendor + audit combined) |
| `data/` content tracked in git | `.gitignore` excluded all of `data/` | Added negation patterns (`data/*` + `!data/AGENTS.md` + `!data/skills/`) so audit work shows up in git |
| Container reads `data/skills/` from the boocode repo | `docker-compose.yml:18` had `- /opt/skills:/data/skills` override mount — container actually read from host-level `/opt/skills/`, ignoring repo `data/skills/` | Removed the override mount. Skill library now lives in `data/skills/` (repo-tracked, per-batch auditable). Host `/opt/skills/` preserved untouched for other tools (Claude Code, etc.). 1-line deviation from spec's "zero code change" claim — necessary to make the spec's intent actually take effect |

## Bucket tally

| Bucket | Action | Count |
|---|---|---|
| (a) | Move to BOOCHAT.md as always-true rule | 1 |
| (b) | Keep as recipe, apply Anthropic conventions | 14 |
| (c) | Keep + move bulk to `references/` (SKILL.md > 500 lines) | 0 |
| (d) | Delete (duplicates Claude native capability or doesn't fit BooCode) | 11 |
| **Total** | | **26** |

No skill exceeded the 500-line ceiling — bucket (c) is empty. Longest survivor: `systematic-debugging` at 296 lines.

## Per-skill decisions

| Skill (path) | Lines | Bucket | Disposition | Rationale |
|---|---:|:---:|---|---|
| `anthropics/agent-development` | 196 | (b) | Keep; rename → `developing-agents` | BooCode-specific value (manages `data/AGENTS.md` tier-2 registry) |
| `anthropics/claude-md-improver` | 180 | (d) | Delete | Overlaps `boocode-guidance-improver` (more specific) |
| `anthropics/frontend-design` | 42 | (b) | Keep; rename → `designing-frontends` | Concise UI design guidance, no overlap |
| `anthropics-knowledge-work/code-review` | 118 | (b) | Keep; rename → `reviewing-code` | Generic code review process distinct from `receiving-` / `requesting-code-review` |
| `anthropics-knowledge-work/task-management` | 91 | (d) | Delete | `user-invocable: false`; duplicates BooCode's TodoWrite/TaskCreate native capability |
| `asyrafhussin/react-vite-best-practices` | 182 | (b) | Keep; rename → `optimizing-react-vite` | Matches BooCode's stack (Vite, not Next.js) |
| `boocode/boocode-guidance-improver` | 167 | (b) | Keep; rename → `improving-boocode-guidance` | BooCode-specific 10-dimension rubric for `CLAUDE.md`/`BOOCHAT.md`/`BOOCODER.md`/`AGENTS.md` |
| `mattpocock/diagnose` | 117 | (b) | Keep; rename → `diagnosing-bugs` | Complement to `systematic-debugging`: focus on building a feedback loop |
| `mattpocock/grill-me` | 20 | (b) | Keep; rename → `grilling-plans` | Plan stress-testing |
| `mattpocock/grill-with-docs` | 98 | (d) | Delete | Requires `CONTEXT.md` and `docs/adr/` that BooCode doesn't have |
| `mattpocock/handoff` | 17 | (d) | Delete | BooCode is single-user; no agent handoff scenario |
| `mattpocock/improve-codebase-architecture` | 71 | (d) | Delete | Requires `CONTEXT.md` and `docs/adr/` |
| `mattpocock/to-issues` | 83 | (d) | Delete | BooCode uses `openspec/changes/`, not an issue tracker |
| `mattpocock/to-prd` | 76 | (d) | Delete | Same — no issue tracker |
| `mattpocock/write-a-skill` | 121 | (b) | Keep; rename → `writing-skills` | Authoring new skills for this very system |
| `mattpocock/zoom-out` | 7 | (d) | Delete | Claude does this natively when asked; 7-line skill is overhead |
| `superpowers/brainstorming` | 164 | (b) | Keep (already gerund) | Before-features creative-work process |
| `superpowers/receiving-code-review` | 213 | (b) | Keep (already gerund) | Sam reviews everything; process for handling feedback |
| `superpowers/requesting-code-review` | 103 | (b) | Keep (already gerund) | Before-merge verification |
| `superpowers/systematic-debugging` | 296 | (b) | Keep (already gerund) | Comprehensive bug-fix discipline (root-cause-first) |
| `superpowers/using-superpowers` | 117 | (d) | Delete | Meta-skill about skill discovery; Claude does discovery natively |
| `superpowers/verification-before-completion` | 139 | (a) | Migrate rule to `BOOCHAT.md`, delete skill dir | Always-true rule: evidence before assertions. Belongs 100% present, not 6% invoked |
| `superpowers/writing-plans` | 152 | (b) | Keep (already gerund) | Maps to BooCode's `openspec/changes/` workflow |
| `vercel-labs/find-skills` | 142 | (d) | Delete | Skill discovery — Claude does this natively |
| `vercel-labs/react-best-practices` | 149 | (d) | Delete | Next.js focus; BooCode uses Vite (asyrafhussin's version is the fit) |
| `vercel-labs/web-design-guidelines` | 39 | (b) | Keep; rename → `reviewing-web-design` | UI compliance review |

## Bucket-(a) migration text

Single rule extracted from `superpowers/verification-before-completion` (139 lines → ~3 lines in BOOCHAT.md):

> **Don't claim work is complete without verifying.** Run the relevant command (test, build, smoke) and confirm the expected output before reporting success. Evidence before assertions catches regressions you'd otherwise miss.

The 139-line process content does not move to BOOCHAT.md — the rule itself is what needs to be 100% present. Process detail is recoverable from the upstream repo if anyone wants to read it later.

## Verification protocol coverage

| Step | Owner | Status |
|---|---|---|
| 1. Discovery (paste SKILL.md, check first-200-char triggering) | Sam — fresh Claude.ai chat per skill | Pending |
| 2. Logic (paste realistic task, check skill recognizes) | Sam — fresh Claude.ai chat | Pending |
| 3. Edge Case (paste boundary task, check correct invoke/decline) | Sam — fresh Claude.ai chat | Pending |
| 4. Architecture Refinement (paste skill + chats, ask for critique) | Sam — fresh Claude.ai chat | Pending |
| 5. `skillgrade --smoke` (5 trials per skill) | Sam — host install `npm i -g skillgrade` first | Pending |

Eval.yaml files written per surviving skill (14 files) so the `skillgrade --smoke` runs are mechanical once `skillgrade` is installed.

## skillgrade scope correction

The `eval.yaml` stubs authored in the prior session use a flat `tasks: [{prompt, grader: [list]}]` shape that does not validate against skillgrade's canonical schema (canonical needs `name`, `instruction`, `workspace`, structured `graders` with `type: deterministic | llm_rubric`, `run` shell script, `weight`, plus a Docker `provider` block). Rewriting all 14 in the canonical format is out of scope for this batch — each needs Docker workspace setup and grader scripts that capture skill-output correctness. Filed as a follow-up: **v1.13.13 — skillgrade eval.yaml canonical rewrite + first quantitative pass**.

The smoke-results column in the table below is `n/a*` for that reason. The 4-step qualitative protocol still runs (via the agent team in this batch) and surfaces the structural issues that quantitative trials would have caught anyway.

## 4-step protocol findings (agent-team batch)

Each surviving skill was assessed by one of 5 parallel teammates (alpha / bravo / charlie / delta / echo) running the mgechev/skills-best-practices 4-step protocol: Discovery → Logic → Edge Case → self-Architecture-Refinement. Teammates wrote per-agent findings to `/tmp/audit-<name>.md`; the table here aggregates.

Ratings shorthand: D=Discovery, L=Logic, E=Edge Case (each 1-5).

| Skill | Auditor | D / L / E | 4-step verdict | Fix applied |
|---|---|:---:|---|---|
| `anthropics/designing-frontends` | alpha | 5 / 5 / 3 | Strong primary triggers; over-broad with "artifacts, posters" (not code targets) | Removed "artifacts, posters" from description trigger list |
| `anthropics/developing-agents` | alpha | 5 / 5 / 4 | Sharp triggering; stale "(as of v1.11.x)" tag and broken `inference.ts:721-731` reference (actual code is at `stream-phase.ts:403-406`) | Updated stale version tag + cross-reference |
| `anthropics-knowledge-work/reviewing-code` | alpha | 4 / 4 / 3 | Good for explicit PR/diff triggers; dead `CONNECTORS.md` cross-reference (file doesn't exist) | Removed broken cross-reference |
| `mattpocock/diagnosing-bugs` | bravo | 5 / 5 / 4 | Strong; missing colloquial phrasings like "not working" / "something wrong" | Added informal trigger phrases |
| `mattpocock/grilling-plans` | bravo | 3 / 4 / 3 | Trigger coverage too narrow — only "grill me" reliably fires; structural risk: mandatory `ask_user_input` tool may not exist in BooCode's tool registry (flagged but not patched — needs env verification) | Added "poke holes", "challenge my design", "play devil's advocate", "what am I missing" triggers |
| `mattpocock/writing-skills` | bravo | 4 / 5 / 3 | Missing "create" phrasing; description-length rule (≤1024 chars) not in Review Checklist | Added "create" trigger + checklist item |
| `superpowers/brainstorming` | charlie | 4 / 4 / 3 | Vague "modifying behavior" causes both over- and under-firing; HARD-GATE wording could be clearer about writing-plans being permitted | Tightened to "non-trivial modifications" + added "refactoring" |
| `superpowers/receiving-code-review` | charlie | 3 / 4 / 3 | Conditional qualifier "especially if feedback seems unclear or technically questionable" mis-frames as edge-case skill rather than default protocol | Removed conditional + broadened to informal channels |
| `superpowers/requesting-code-review` | charlie | 3 / 4 / 2 | Scope collision with built-in `code-review` skill — near-identical surface language but different execution (subagent dispatch vs inline). LOWEST EDGE-CASE SCORE OF THE BATCH | Added "dispatches a separate subagent reviewer" differentiator |
| `superpowers/systematic-debugging` | delta | 5 / 5 / 4 | Strong; build/compile failures appear in body but missing from frontmatter trigger | Extended description with "build failure, compile error" + "debug/investigate/diagnose" |
| `superpowers/writing-plans` | delta | 3 / 4 / 3 | Spec-centric framing gatekeeps on pre-existing spec doc; colloquial "write me a plan" misses | Added colloquial planning trigger phrases |
| `asyrafhussin/optimizing-react-vite` | delta | 4 / 5 / 3 | Over-broad "Vite configuration" triggers on non-perf tasks; body references `rules/*.md` and `AGENTS.md` files that don't exist in the skill dir | Narrowed scope + added broken-reference warnings |
| `boocode/improving-boocode-guidance` | echo | 5 / 5 / 4 | Strong; "critique" in description prose but missing from examples list | Added `"critique my BOOCODER.md"` to examples |
| `vercel-labs/reviewing-web-design` | echo | 3 / 4 / 3 | Generic triggers collide with general code-review; delegates substance to external GitHub URL with no fallback on fetch failure | Named "Vercel's live web-interface-guidelines" as differentiator + added 404 fallback |

## Aggregate notes

**Trigger-quality stats (qualitative, n=14):**
- Discovery 5/5: 5 skills | 4/5: 4 skills | 3/5: 5 skills (avg ~4.0)
- Edge case is the weakest dimension across the batch — most skills hit 3/5 (borderline invoke/decline). Suggests skills are over- or under-triggering on adjacent-but-different tasks.
- Every skill had at least one fix applied. None were judged "clean" with zero issues.
- Zero skills were flagged for retroactive bucket-(a) reclassification — all 14 remain (b) recipes.

**Real bugs surfaced (not just polish):**
- `anthropics/developing-agents`: stale code reference (inference.ts:721-731 → stream-phase.ts:403-406). Real dead link.
- `anthropics-knowledge-work/reviewing-code`: dead CONNECTORS.md cross-reference.
- `asyrafhussin/optimizing-react-vite`: references `rules/*.md` and `AGENTS.md` subfiles that don't exist in the skill directory.
- `superpowers/requesting-code-review`: scope collision with built-in `code-review` (review skills/auto-routing — Sam may want to drop one of these).

**Structural flags requiring environment verification (not patched):**
- `mattpocock/grilling-plans`: mandatory `ask_user_input` tool call assumed available. Confirm BooCode's tool registry exposes this to the chat-surface model. If not, the skill body's MANDATORY instruction deadlocks.

**Skillgrade gap remains:**
- Quantitative trigger rates (the original v1.13.12 N/5 column) require skillgrade with canonical-format eval.yaml. Filed as **v1.13.13** follow-up. The qualitative 4-step protocol catches the same class of issue (and arguably more — the broken-reference bugs above would not have shown up in skillgrade's invoke/decline trials).

**Per-agent artifacts (working files, not part of repo):**
- `/tmp/audit-alpha.md` — designing-frontends, developing-agents, reviewing-code
- `/tmp/audit-bravo.md` — diagnosing-bugs, grilling-plans, writing-skills
- `/tmp/audit-charlie.md` — brainstorming, receiving-code-review, requesting-code-review
- `/tmp/audit-delta.md` — systematic-debugging, writing-plans, optimizing-react-vite
- `/tmp/audit-echo.md` — improving-boocode-guidance, reviewing-web-design