Files
boocode/openspec/changes/v1.13.12-skills-audit/audit-notes.md
indifferentketchup 0fa46cd06c v1.13.12: skills audit + token-tracking fix + codecontext + cap50 + UI cleanups
Multi-topic batch. The big-ticket item is the skills audit; the rest are
smaller patches that compounded during the audit work.

## Skills audit (rules→recipes split)

Vendored all 26 skills from /home/samkintop/opt/skills/ into data/skills/
(the boocode-repo-local skill library — see docker-compose change below).
Audited via 5 parallel Claude Code agent-teams running the
mgechev/skills-best-practices 4-step protocol (Discovery → Logic → Edge
Case → self-Architecture-Refinement) per skill, ~2 min wall-clock vs the
~3.7-hour serial estimate.

Result: 14 skills surviving (renamed to gerund form, frontmatter matched),
11 deleted (duplicates, BooCode-irrelevant patterns, Claude-already-does-
natively), 1 migrated to BOOCHAT.md/BOOCODER.md as an always-true rule
(verification-before-completion). Each surviving skill had its description
refined to fix specific trigger gaps surfaced by the protocol — 4
real-bug findings landed (dead refs, stale tags, broken sub-file
references in the original vendored content).

Audit decisions documented in openspec/changes/v1.13.12-skills-audit/
audit-notes.md. Convention codified in BOOCHAT.md/BOOCODER.md "rules vs
recipes" sections — future workflow rules go to those files (100%
present), recipes stay in data/skills/ (~6% invoke rate in multi-turn
per the Codeminer42 measurement).

## Token tracking + stale-stream banner fix (same root cause)

ws-frames.ts IsoTimestamp was z.string().min(1) but postgres returns
timestamp columns as JS Date objects. Every message_complete /
session_updated / chat_updated frame was failing the v1.13.11 Zod gate
and being silently dropped. Symptoms: token tracking blank in the UI
(no usage frames landed); the 60s no-token-activity timer tripped the
stale-stream banner because the frontend's local message state never
saw status='streaming' flip to 'complete'.

Fix: z.preprocess(v => v instanceof Date ? v.toISOString() : v,
z.string().min(1)) applied to the IsoTimestamp primitive. Centralized,
no publisher changes, works identically server + web (the parity test
still passes).

## Codecontext .codecontextignore auto-install

services/codecontext_client.ts now copies the
codecontext/.codecontextignore.template into any project's root on the
first call to that project if no .codecontextignore exists. One file
written per project, idempotent (in-memory Set guard + access-check),
silent fallback on read-only project. Stops the upstream empty-source-
file parser crash on foreign projects' node_modules — previously
required manually copying the template per project.

## Tool-call budget cap 30 → 50

services/inference/budget.ts: BUDGET_READ_ONLY and BUDGET_NO_AGENT
bumped to 50 (from 30). BUDGET_NON_READ_ONLY stays at 10 (no write
tools landed yet). Real recon sessions were hitting 30 with ~3 turns
wasted on codecontext parse failures; legitimate need was ~27, and
Architect-class system overviews want deeper recon. Headroom of 20
absorbs failure-retry turns without changing the safety floor — the
doom-loop guard (3 identical calls → abort) catches the actual
failure mode this cap was guarding against.

v1.14 (Phase C outer agent loop) will supersede this via per-agent
agent.steps. Throwaway-ish patch but unblocks deeper recon today.

## UI cleanups

- ChatPane queued-message dropdown removed. Each queued message now
  has three buttons: edit (pop back into ChatInput via sendToChat
  event), force-send (was the dropdown's only useful action), and
  cancel. Default behavior (send when streaming completes) needs no
  UI — it's the implicit do-nothing path.
- ChatThroughput removed from desktop tab strip (ChatTabBar.tsx).
  Mobile tab switcher still shows it.

## Plumbing

- .gitignore: data/* + !data/AGENTS.md + !data/skills/ negation
  patterns so the vendored skill library + agent registry become
  git-tracked while session DB state stays out.
- docker-compose.yml: removed /opt/skills:/data/skills override
  mount. Skills now live in the boocode repo at data/skills/,
  auditable per-batch. The host-level /opt/skills/ is preserved
  untouched for any other tools that read from it.
- .codecontextignore at repo root: auto-installed when codecontext
  was first called against /opt/boocode itself; matches the template.
- CLAUDE.md: updated to document the v1.13.11 publishFrame wrapper +
  message_parts table + tool_cost_stats view + DB-integration test
  pattern + host-side smoke endpoint quirk. (Pre-existing in working
  tree before this batch; shipped here for completeness.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 18:58:30 +00:00

13 KiB

v1.13.12 — skills audit pass

Audit of 26 skills vendored from /home/samkintop/opt/skills/ into /opt/boocode/data/skills/. Each sorted into one of four buckets per the Codeminer42 rules→recipes split.

Deviations from the batch spec

Spec said Reality Resolution
/opt/boocode/skills/ is the audit target Skills directory is /opt/boocode/data/skills/ (per services/skills.ts:19 SKILLS_ROOT = '/data/skills') Vendored to the correct path
/opt/boocode/AGENTS.md for bucket-(a) rule additions data/AGENTS.md is an agent registry (## H2 per agent with frontmatter), not a rules file Bucket-(a) rules go to BOOCHAT.md (the container guidance file the chat agent reads) instead
"7 vendored v1.12 skills" exist to audit Zero SKILL.md ever committed; data/skills/ was empty Vendored all 26 from /home/samkintop/opt/skills/ in this batch (vendor + audit combined)
data/ content tracked in git .gitignore excluded all of data/ Added negation patterns (data/* + !data/AGENTS.md + !data/skills/) so audit work shows up in git
Container reads data/skills/ from the boocode repo docker-compose.yml:18 had - /opt/skills:/data/skills override mount — container actually read from host-level /opt/skills/, ignoring repo data/skills/ Removed the override mount. Skill library now lives in data/skills/ (repo-tracked, per-batch auditable). Host /opt/skills/ preserved untouched for other tools (Claude Code, etc.). 1-line deviation from spec's "zero code change" claim — necessary to make the spec's intent actually take effect

Bucket tally

Bucket Action Count
(a) Move to BOOCHAT.md as always-true rule 1
(b) Keep as recipe, apply Anthropic conventions 14
(c) Keep + move bulk to references/ (SKILL.md > 500 lines) 0
(d) Delete (duplicates Claude native capability or doesn't fit BooCode) 11
Total 26

No skill exceeded the 500-line ceiling — bucket (c) is empty. Longest survivor: systematic-debugging at 296 lines.

Per-skill decisions

Skill (path) Lines Bucket Disposition Rationale
anthropics/agent-development 196 (b) Keep; rename → developing-agents BooCode-specific value (manages data/AGENTS.md tier-2 registry)
anthropics/claude-md-improver 180 (d) Delete Overlaps boocode-guidance-improver (more specific)
anthropics/frontend-design 42 (b) Keep; rename → designing-frontends Concise UI design guidance, no overlap
anthropics-knowledge-work/code-review 118 (b) Keep; rename → reviewing-code Generic code review process distinct from receiving- / requesting-code-review
anthropics-knowledge-work/task-management 91 (d) Delete user-invocable: false; duplicates BooCode's TodoWrite/TaskCreate native capability
asyrafhussin/react-vite-best-practices 182 (b) Keep; rename → optimizing-react-vite Matches BooCode's stack (Vite, not Next.js)
boocode/boocode-guidance-improver 167 (b) Keep; rename → improving-boocode-guidance BooCode-specific 10-dimension rubric for CLAUDE.md/BOOCHAT.md/BOOCODER.md/AGENTS.md
mattpocock/diagnose 117 (b) Keep; rename → diagnosing-bugs Complement to systematic-debugging: focus on building a feedback loop
mattpocock/grill-me 20 (b) Keep; rename → grilling-plans Plan stress-testing
mattpocock/grill-with-docs 98 (d) Delete Requires CONTEXT.md and docs/adr/ that BooCode doesn't have
mattpocock/handoff 17 (d) Delete BooCode is single-user; no agent handoff scenario
mattpocock/improve-codebase-architecture 71 (d) Delete Requires CONTEXT.md and docs/adr/
mattpocock/to-issues 83 (d) Delete BooCode uses openspec/changes/, not an issue tracker
mattpocock/to-prd 76 (d) Delete Same — no issue tracker
mattpocock/write-a-skill 121 (b) Keep; rename → writing-skills Authoring new skills for this very system
mattpocock/zoom-out 7 (d) Delete Claude does this natively when asked; 7-line skill is overhead
superpowers/brainstorming 164 (b) Keep (already gerund) Before-features creative-work process
superpowers/receiving-code-review 213 (b) Keep (already gerund) Sam reviews everything; process for handling feedback
superpowers/requesting-code-review 103 (b) Keep (already gerund) Before-merge verification
superpowers/systematic-debugging 296 (b) Keep (already gerund) Comprehensive bug-fix discipline (root-cause-first)
superpowers/using-superpowers 117 (d) Delete Meta-skill about skill discovery; Claude does discovery natively
superpowers/verification-before-completion 139 (a) Migrate rule to BOOCHAT.md, delete skill dir Always-true rule: evidence before assertions. Belongs 100% present, not 6% invoked
superpowers/writing-plans 152 (b) Keep (already gerund) Maps to BooCode's openspec/changes/ workflow
vercel-labs/find-skills 142 (d) Delete Skill discovery — Claude does this natively
vercel-labs/react-best-practices 149 (d) Delete Next.js focus; BooCode uses Vite (asyrafhussin's version is the fit)
vercel-labs/web-design-guidelines 39 (b) Keep; rename → reviewing-web-design UI compliance review

Bucket-(a) migration text

Single rule extracted from superpowers/verification-before-completion (139 lines → ~3 lines in BOOCHAT.md):

Don't claim work is complete without verifying. Run the relevant command (test, build, smoke) and confirm the expected output before reporting success. Evidence before assertions catches regressions you'd otherwise miss.

The 139-line process content does not move to BOOCHAT.md — the rule itself is what needs to be 100% present. Process detail is recoverable from the upstream repo if anyone wants to read it later.

Verification protocol coverage

Step Owner Status
1. Discovery (paste SKILL.md, check first-200-char triggering) Sam — fresh Claude.ai chat per skill Pending
2. Logic (paste realistic task, check skill recognizes) Sam — fresh Claude.ai chat Pending
3. Edge Case (paste boundary task, check correct invoke/decline) Sam — fresh Claude.ai chat Pending
4. Architecture Refinement (paste skill + chats, ask for critique) Sam — fresh Claude.ai chat Pending
5. skillgrade --smoke (5 trials per skill) Sam — host install npm i -g skillgrade first Pending

Eval.yaml files written per surviving skill (14 files) so the skillgrade --smoke runs are mechanical once skillgrade is installed.

skillgrade scope correction

The eval.yaml stubs authored in the prior session use a flat tasks: [{prompt, grader: [list]}] shape that does not validate against skillgrade's canonical schema (canonical needs name, instruction, workspace, structured graders with type: deterministic | llm_rubric, run shell script, weight, plus a Docker provider block). Rewriting all 14 in the canonical format is out of scope for this batch — each needs Docker workspace setup and grader scripts that capture skill-output correctness. Filed as a follow-up: v1.13.13 — skillgrade eval.yaml canonical rewrite + first quantitative pass.

The smoke-results column in the table below is n/a* for that reason. The 4-step qualitative protocol still runs (via the agent team in this batch) and surfaces the structural issues that quantitative trials would have caught anyway.

4-step protocol findings (agent-team batch)

Each surviving skill was assessed by one of 5 parallel teammates (alpha / bravo / charlie / delta / echo) running the mgechev/skills-best-practices 4-step protocol: Discovery → Logic → Edge Case → self-Architecture-Refinement. Teammates wrote per-agent findings to /tmp/audit-<name>.md; the table here aggregates.

Ratings shorthand: D=Discovery, L=Logic, E=Edge Case (each 1-5).

Skill Auditor D / L / E 4-step verdict Fix applied
anthropics/designing-frontends alpha 5 / 5 / 3 Strong primary triggers; over-broad with "artifacts, posters" (not code targets) Removed "artifacts, posters" from description trigger list
anthropics/developing-agents alpha 5 / 5 / 4 Sharp triggering; stale "(as of v1.11.x)" tag and broken inference.ts:721-731 reference (actual code is at stream-phase.ts:403-406) Updated stale version tag + cross-reference
anthropics-knowledge-work/reviewing-code alpha 4 / 4 / 3 Good for explicit PR/diff triggers; dead CONNECTORS.md cross-reference (file doesn't exist) Removed broken cross-reference
mattpocock/diagnosing-bugs bravo 5 / 5 / 4 Strong; missing colloquial phrasings like "not working" / "something wrong" Added informal trigger phrases
mattpocock/grilling-plans bravo 3 / 4 / 3 Trigger coverage too narrow — only "grill me" reliably fires; structural risk: mandatory ask_user_input tool may not exist in BooCode's tool registry (flagged but not patched — needs env verification) Added "poke holes", "challenge my design", "play devil's advocate", "what am I missing" triggers
mattpocock/writing-skills bravo 4 / 5 / 3 Missing "create" phrasing; description-length rule (≤1024 chars) not in Review Checklist Added "create" trigger + checklist item
superpowers/brainstorming charlie 4 / 4 / 3 Vague "modifying behavior" causes both over- and under-firing; HARD-GATE wording could be clearer about writing-plans being permitted Tightened to "non-trivial modifications" + added "refactoring"
superpowers/receiving-code-review charlie 3 / 4 / 3 Conditional qualifier "especially if feedback seems unclear or technically questionable" mis-frames as edge-case skill rather than default protocol Removed conditional + broadened to informal channels
superpowers/requesting-code-review charlie 3 / 4 / 2 Scope collision with built-in code-review skill — near-identical surface language but different execution (subagent dispatch vs inline). LOWEST EDGE-CASE SCORE OF THE BATCH Added "dispatches a separate subagent reviewer" differentiator
superpowers/systematic-debugging delta 5 / 5 / 4 Strong; build/compile failures appear in body but missing from frontmatter trigger Extended description with "build failure, compile error" + "debug/investigate/diagnose"
superpowers/writing-plans delta 3 / 4 / 3 Spec-centric framing gatekeeps on pre-existing spec doc; colloquial "write me a plan" misses Added colloquial planning trigger phrases
asyrafhussin/optimizing-react-vite delta 4 / 5 / 3 Over-broad "Vite configuration" triggers on non-perf tasks; body references rules/*.md and AGENTS.md files that don't exist in the skill dir Narrowed scope + added broken-reference warnings
boocode/improving-boocode-guidance echo 5 / 5 / 4 Strong; "critique" in description prose but missing from examples list Added "critique my BOOCODER.md" to examples
vercel-labs/reviewing-web-design echo 3 / 4 / 3 Generic triggers collide with general code-review; delegates substance to external GitHub URL with no fallback on fetch failure Named "Vercel's live web-interface-guidelines" as differentiator + added 404 fallback

Aggregate notes

Trigger-quality stats (qualitative, n=14):

  • Discovery 5/5: 5 skills | 4/5: 4 skills | 3/5: 5 skills (avg ~4.0)
  • Edge case is the weakest dimension across the batch — most skills hit 3/5 (borderline invoke/decline). Suggests skills are over- or under-triggering on adjacent-but-different tasks.
  • Every skill had at least one fix applied. None were judged "clean" with zero issues.
  • Zero skills were flagged for retroactive bucket-(a) reclassification — all 14 remain (b) recipes.

Real bugs surfaced (not just polish):

  • anthropics/developing-agents: stale code reference (inference.ts:721-731 → stream-phase.ts:403-406). Real dead link.
  • anthropics-knowledge-work/reviewing-code: dead CONNECTORS.md cross-reference.
  • asyrafhussin/optimizing-react-vite: references rules/*.md and AGENTS.md subfiles that don't exist in the skill directory.
  • superpowers/requesting-code-review: scope collision with built-in code-review (review skills/auto-routing — Sam may want to drop one of these).

Structural flags requiring environment verification (not patched):

  • mattpocock/grilling-plans: mandatory ask_user_input tool call assumed available. Confirm BooCode's tool registry exposes this to the chat-surface model. If not, the skill body's MANDATORY instruction deadlocks.

Skillgrade gap remains:

  • Quantitative trigger rates (the original v1.13.12 N/5 column) require skillgrade with canonical-format eval.yaml. Filed as v1.13.13 follow-up. The qualitative 4-step protocol catches the same class of issue (and arguably more — the broken-reference bugs above would not have shown up in skillgrade's invoke/decline trials).

Per-agent artifacts (working files, not part of repo):

  • /tmp/audit-alpha.md — designing-frontends, developing-agents, reviewing-code
  • /tmp/audit-bravo.md — diagnosing-bugs, grilling-plans, writing-skills
  • /tmp/audit-charlie.md — brainstorming, receiving-code-review, requesting-code-review
  • /tmp/audit-delta.md — systematic-debugging, writing-plans, optimizing-react-vite
  • /tmp/audit-echo.md — improving-boocode-guidance, reviewing-web-design