Multi-topic batch. The big-ticket item is the skills audit; the rest are smaller patches that compounded during the audit work. ## Skills audit (rules→recipes split) Vendored all 26 skills from /home/samkintop/opt/skills/ into data/skills/ (the boocode-repo-local skill library — see docker-compose change below). Audited via 5 parallel Claude Code agent-teams running the mgechev/skills-best-practices 4-step protocol (Discovery → Logic → Edge Case → self-Architecture-Refinement) per skill, ~2 min wall-clock vs the ~3.7-hour serial estimate. Result: 14 skills surviving (renamed to gerund form, frontmatter matched), 11 deleted (duplicates, BooCode-irrelevant patterns, Claude-already-does- natively), 1 migrated to BOOCHAT.md/BOOCODER.md as an always-true rule (verification-before-completion). Each surviving skill had its description refined to fix specific trigger gaps surfaced by the protocol — 4 real-bug findings landed (dead refs, stale tags, broken sub-file references in the original vendored content). Audit decisions documented in openspec/changes/v1.13.12-skills-audit/ audit-notes.md. Convention codified in BOOCHAT.md/BOOCODER.md "rules vs recipes" sections — future workflow rules go to those files (100% present), recipes stay in data/skills/ (~6% invoke rate in multi-turn per the Codeminer42 measurement). ## Token tracking + stale-stream banner fix (same root cause) ws-frames.ts IsoTimestamp was z.string().min(1) but postgres returns timestamp columns as JS Date objects. Every message_complete / session_updated / chat_updated frame was failing the v1.13.11 Zod gate and being silently dropped. Symptoms: token tracking blank in the UI (no usage frames landed); the 60s no-token-activity timer tripped the stale-stream banner because the frontend's local message state never saw status='streaming' flip to 'complete'. Fix: z.preprocess(v => v instanceof Date ? v.toISOString() : v, z.string().min(1)) applied to the IsoTimestamp primitive. Centralized, no publisher changes, works identically server + web (the parity test still passes). ## Codecontext .codecontextignore auto-install services/codecontext_client.ts now copies the codecontext/.codecontextignore.template into any project's root on the first call to that project if no .codecontextignore exists. One file written per project, idempotent (in-memory Set guard + access-check), silent fallback on read-only project. Stops the upstream empty-source- file parser crash on foreign projects' node_modules — previously required manually copying the template per project. ## Tool-call budget cap 30 → 50 services/inference/budget.ts: BUDGET_READ_ONLY and BUDGET_NO_AGENT bumped to 50 (from 30). BUDGET_NON_READ_ONLY stays at 10 (no write tools landed yet). Real recon sessions were hitting 30 with ~3 turns wasted on codecontext parse failures; legitimate need was ~27, and Architect-class system overviews want deeper recon. Headroom of 20 absorbs failure-retry turns without changing the safety floor — the doom-loop guard (3 identical calls → abort) catches the actual failure mode this cap was guarding against. v1.14 (Phase C outer agent loop) will supersede this via per-agent agent.steps. Throwaway-ish patch but unblocks deeper recon today. ## UI cleanups - ChatPane queued-message dropdown removed. Each queued message now has three buttons: edit (pop back into ChatInput via sendToChat event), force-send (was the dropdown's only useful action), and cancel. Default behavior (send when streaming completes) needs no UI — it's the implicit do-nothing path. - ChatThroughput removed from desktop tab strip (ChatTabBar.tsx). Mobile tab switcher still shows it. ## Plumbing - .gitignore: data/* + !data/AGENTS.md + !data/skills/ negation patterns so the vendored skill library + agent registry become git-tracked while session DB state stays out. - docker-compose.yml: removed /opt/skills:/data/skills override mount. Skills now live in the boocode repo at data/skills/, auditable per-batch. The host-level /opt/skills/ is preserved untouched for any other tools that read from it. - .codecontextignore at repo root: auto-installed when codecontext was first called against /opt/boocode itself; matches the template. - CLAUDE.md: updated to document the v1.13.11 publishFrame wrapper + message_parts table + tool_cost_stats view + DB-integration test pattern + host-side smoke endpoint quirk. (Pre-existing in working tree before this batch; shipped here for completeness.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
v1.13.12 — skills audit pass
Audit of 26 skills vendored from /home/samkintop/opt/skills/ into /opt/boocode/data/skills/. Each sorted into one of four buckets per the Codeminer42 rules→recipes split.
Deviations from the batch spec
| Spec said | Reality | Resolution |
|---|---|---|
/opt/boocode/skills/ is the audit target |
Skills directory is /opt/boocode/data/skills/ (per services/skills.ts:19 SKILLS_ROOT = '/data/skills') |
Vendored to the correct path |
/opt/boocode/AGENTS.md for bucket-(a) rule additions |
data/AGENTS.md is an agent registry (## H2 per agent with frontmatter), not a rules file |
Bucket-(a) rules go to BOOCHAT.md (the container guidance file the chat agent reads) instead |
| "7 vendored v1.12 skills" exist to audit | Zero SKILL.md ever committed; data/skills/ was empty |
Vendored all 26 from /home/samkintop/opt/skills/ in this batch (vendor + audit combined) |
data/ content tracked in git |
.gitignore excluded all of data/ |
Added negation patterns (data/* + !data/AGENTS.md + !data/skills/) so audit work shows up in git |
Container reads data/skills/ from the boocode repo |
docker-compose.yml:18 had - /opt/skills:/data/skills override mount — container actually read from host-level /opt/skills/, ignoring repo data/skills/ |
Removed the override mount. Skill library now lives in data/skills/ (repo-tracked, per-batch auditable). Host /opt/skills/ preserved untouched for other tools (Claude Code, etc.). 1-line deviation from spec's "zero code change" claim — necessary to make the spec's intent actually take effect |
Bucket tally
| Bucket | Action | Count |
|---|---|---|
| (a) | Move to BOOCHAT.md as always-true rule | 1 |
| (b) | Keep as recipe, apply Anthropic conventions | 14 |
| (c) | Keep + move bulk to references/ (SKILL.md > 500 lines) |
0 |
| (d) | Delete (duplicates Claude native capability or doesn't fit BooCode) | 11 |
| Total | 26 |
No skill exceeded the 500-line ceiling — bucket (c) is empty. Longest survivor: systematic-debugging at 296 lines.
Per-skill decisions
| Skill (path) | Lines | Bucket | Disposition | Rationale |
|---|---|---|---|---|
anthropics/agent-development |
196 | (b) | Keep; rename → developing-agents |
BooCode-specific value (manages data/AGENTS.md tier-2 registry) |
anthropics/claude-md-improver |
180 | (d) | Delete | Overlaps boocode-guidance-improver (more specific) |
anthropics/frontend-design |
42 | (b) | Keep; rename → designing-frontends |
Concise UI design guidance, no overlap |
anthropics-knowledge-work/code-review |
118 | (b) | Keep; rename → reviewing-code |
Generic code review process distinct from receiving- / requesting-code-review |
anthropics-knowledge-work/task-management |
91 | (d) | Delete | user-invocable: false; duplicates BooCode's TodoWrite/TaskCreate native capability |
asyrafhussin/react-vite-best-practices |
182 | (b) | Keep; rename → optimizing-react-vite |
Matches BooCode's stack (Vite, not Next.js) |
boocode/boocode-guidance-improver |
167 | (b) | Keep; rename → improving-boocode-guidance |
BooCode-specific 10-dimension rubric for CLAUDE.md/BOOCHAT.md/BOOCODER.md/AGENTS.md |
mattpocock/diagnose |
117 | (b) | Keep; rename → diagnosing-bugs |
Complement to systematic-debugging: focus on building a feedback loop |
mattpocock/grill-me |
20 | (b) | Keep; rename → grilling-plans |
Plan stress-testing |
mattpocock/grill-with-docs |
98 | (d) | Delete | Requires CONTEXT.md and docs/adr/ that BooCode doesn't have |
mattpocock/handoff |
17 | (d) | Delete | BooCode is single-user; no agent handoff scenario |
mattpocock/improve-codebase-architecture |
71 | (d) | Delete | Requires CONTEXT.md and docs/adr/ |
mattpocock/to-issues |
83 | (d) | Delete | BooCode uses openspec/changes/, not an issue tracker |
mattpocock/to-prd |
76 | (d) | Delete | Same — no issue tracker |
mattpocock/write-a-skill |
121 | (b) | Keep; rename → writing-skills |
Authoring new skills for this very system |
mattpocock/zoom-out |
7 | (d) | Delete | Claude does this natively when asked; 7-line skill is overhead |
superpowers/brainstorming |
164 | (b) | Keep (already gerund) | Before-features creative-work process |
superpowers/receiving-code-review |
213 | (b) | Keep (already gerund) | Sam reviews everything; process for handling feedback |
superpowers/requesting-code-review |
103 | (b) | Keep (already gerund) | Before-merge verification |
superpowers/systematic-debugging |
296 | (b) | Keep (already gerund) | Comprehensive bug-fix discipline (root-cause-first) |
superpowers/using-superpowers |
117 | (d) | Delete | Meta-skill about skill discovery; Claude does discovery natively |
superpowers/verification-before-completion |
139 | (a) | Migrate rule to BOOCHAT.md, delete skill dir |
Always-true rule: evidence before assertions. Belongs 100% present, not 6% invoked |
superpowers/writing-plans |
152 | (b) | Keep (already gerund) | Maps to BooCode's openspec/changes/ workflow |
vercel-labs/find-skills |
142 | (d) | Delete | Skill discovery — Claude does this natively |
vercel-labs/react-best-practices |
149 | (d) | Delete | Next.js focus; BooCode uses Vite (asyrafhussin's version is the fit) |
vercel-labs/web-design-guidelines |
39 | (b) | Keep; rename → reviewing-web-design |
UI compliance review |
Bucket-(a) migration text
Single rule extracted from superpowers/verification-before-completion (139 lines → ~3 lines in BOOCHAT.md):
Don't claim work is complete without verifying. Run the relevant command (test, build, smoke) and confirm the expected output before reporting success. Evidence before assertions catches regressions you'd otherwise miss.
The 139-line process content does not move to BOOCHAT.md — the rule itself is what needs to be 100% present. Process detail is recoverable from the upstream repo if anyone wants to read it later.
Verification protocol coverage
| Step | Owner | Status |
|---|---|---|
| 1. Discovery (paste SKILL.md, check first-200-char triggering) | Sam — fresh Claude.ai chat per skill | Pending |
| 2. Logic (paste realistic task, check skill recognizes) | Sam — fresh Claude.ai chat | Pending |
| 3. Edge Case (paste boundary task, check correct invoke/decline) | Sam — fresh Claude.ai chat | Pending |
| 4. Architecture Refinement (paste skill + chats, ask for critique) | Sam — fresh Claude.ai chat | Pending |
5. skillgrade --smoke (5 trials per skill) |
Sam — host install npm i -g skillgrade first |
Pending |
Eval.yaml files written per surviving skill (14 files) so the skillgrade --smoke runs are mechanical once skillgrade is installed.
skillgrade scope correction
The eval.yaml stubs authored in the prior session use a flat tasks: [{prompt, grader: [list]}] shape that does not validate against skillgrade's canonical schema (canonical needs name, instruction, workspace, structured graders with type: deterministic | llm_rubric, run shell script, weight, plus a Docker provider block). Rewriting all 14 in the canonical format is out of scope for this batch — each needs Docker workspace setup and grader scripts that capture skill-output correctness. Filed as a follow-up: v1.13.13 — skillgrade eval.yaml canonical rewrite + first quantitative pass.
The smoke-results column in the table below is n/a* for that reason. The 4-step qualitative protocol still runs (via the agent team in this batch) and surfaces the structural issues that quantitative trials would have caught anyway.
4-step protocol findings (agent-team batch)
Each surviving skill was assessed by one of 5 parallel teammates (alpha / bravo / charlie / delta / echo) running the mgechev/skills-best-practices 4-step protocol: Discovery → Logic → Edge Case → self-Architecture-Refinement. Teammates wrote per-agent findings to /tmp/audit-<name>.md; the table here aggregates.
Ratings shorthand: D=Discovery, L=Logic, E=Edge Case (each 1-5).
| Skill | Auditor | D / L / E | 4-step verdict | Fix applied |
|---|---|---|---|---|
anthropics/designing-frontends |
alpha | 5 / 5 / 3 | Strong primary triggers; over-broad with "artifacts, posters" (not code targets) | Removed "artifacts, posters" from description trigger list |
anthropics/developing-agents |
alpha | 5 / 5 / 4 | Sharp triggering; stale "(as of v1.11.x)" tag and broken inference.ts:721-731 reference (actual code is at stream-phase.ts:403-406) |
Updated stale version tag + cross-reference |
anthropics-knowledge-work/reviewing-code |
alpha | 4 / 4 / 3 | Good for explicit PR/diff triggers; dead CONNECTORS.md cross-reference (file doesn't exist) |
Removed broken cross-reference |
mattpocock/diagnosing-bugs |
bravo | 5 / 5 / 4 | Strong; missing colloquial phrasings like "not working" / "something wrong" | Added informal trigger phrases |
mattpocock/grilling-plans |
bravo | 3 / 4 / 3 | Trigger coverage too narrow — only "grill me" reliably fires; structural risk: mandatory ask_user_input tool may not exist in BooCode's tool registry (flagged but not patched — needs env verification) |
Added "poke holes", "challenge my design", "play devil's advocate", "what am I missing" triggers |
mattpocock/writing-skills |
bravo | 4 / 5 / 3 | Missing "create" phrasing; description-length rule (≤1024 chars) not in Review Checklist | Added "create" trigger + checklist item |
superpowers/brainstorming |
charlie | 4 / 4 / 3 | Vague "modifying behavior" causes both over- and under-firing; HARD-GATE wording could be clearer about writing-plans being permitted | Tightened to "non-trivial modifications" + added "refactoring" |
superpowers/receiving-code-review |
charlie | 3 / 4 / 3 | Conditional qualifier "especially if feedback seems unclear or technically questionable" mis-frames as edge-case skill rather than default protocol | Removed conditional + broadened to informal channels |
superpowers/requesting-code-review |
charlie | 3 / 4 / 2 | Scope collision with built-in code-review skill — near-identical surface language but different execution (subagent dispatch vs inline). LOWEST EDGE-CASE SCORE OF THE BATCH |
Added "dispatches a separate subagent reviewer" differentiator |
superpowers/systematic-debugging |
delta | 5 / 5 / 4 | Strong; build/compile failures appear in body but missing from frontmatter trigger | Extended description with "build failure, compile error" + "debug/investigate/diagnose" |
superpowers/writing-plans |
delta | 3 / 4 / 3 | Spec-centric framing gatekeeps on pre-existing spec doc; colloquial "write me a plan" misses | Added colloquial planning trigger phrases |
asyrafhussin/optimizing-react-vite |
delta | 4 / 5 / 3 | Over-broad "Vite configuration" triggers on non-perf tasks; body references rules/*.md and AGENTS.md files that don't exist in the skill dir |
Narrowed scope + added broken-reference warnings |
boocode/improving-boocode-guidance |
echo | 5 / 5 / 4 | Strong; "critique" in description prose but missing from examples list | Added "critique my BOOCODER.md" to examples |
vercel-labs/reviewing-web-design |
echo | 3 / 4 / 3 | Generic triggers collide with general code-review; delegates substance to external GitHub URL with no fallback on fetch failure | Named "Vercel's live web-interface-guidelines" as differentiator + added 404 fallback |
Aggregate notes
Trigger-quality stats (qualitative, n=14):
- Discovery 5/5: 5 skills | 4/5: 4 skills | 3/5: 5 skills (avg ~4.0)
- Edge case is the weakest dimension across the batch — most skills hit 3/5 (borderline invoke/decline). Suggests skills are over- or under-triggering on adjacent-but-different tasks.
- Every skill had at least one fix applied. None were judged "clean" with zero issues.
- Zero skills were flagged for retroactive bucket-(a) reclassification — all 14 remain (b) recipes.
Real bugs surfaced (not just polish):
anthropics/developing-agents: stale code reference (inference.ts:721-731 → stream-phase.ts:403-406). Real dead link.anthropics-knowledge-work/reviewing-code: dead CONNECTORS.md cross-reference.asyrafhussin/optimizing-react-vite: referencesrules/*.mdandAGENTS.mdsubfiles that don't exist in the skill directory.superpowers/requesting-code-review: scope collision with built-incode-review(review skills/auto-routing — Sam may want to drop one of these).
Structural flags requiring environment verification (not patched):
mattpocock/grilling-plans: mandatoryask_user_inputtool call assumed available. Confirm BooCode's tool registry exposes this to the chat-surface model. If not, the skill body's MANDATORY instruction deadlocks.
Skillgrade gap remains:
- Quantitative trigger rates (the original v1.13.12 N/5 column) require skillgrade with canonical-format eval.yaml. Filed as v1.13.13 follow-up. The qualitative 4-step protocol catches the same class of issue (and arguably more — the broken-reference bugs above would not have shown up in skillgrade's invoke/decline trials).
Per-agent artifacts (working files, not part of repo):
/tmp/audit-alpha.md— designing-frontends, developing-agents, reviewing-code/tmp/audit-bravo.md— diagnosing-bugs, grilling-plans, writing-skills/tmp/audit-charlie.md— brainstorming, receiving-code-review, requesting-code-review/tmp/audit-delta.md— systematic-debugging, writing-plans, optimizing-react-vite/tmp/audit-echo.md— improving-boocode-guidance, reviewing-web-design