Files

indifferentketchup 0fa46cd06c v1.13.12: skills audit + token-tracking fix + codecontext + cap50 + UI cleanups

Multi-topic batch. The big-ticket item is the skills audit; the rest are
smaller patches that compounded during the audit work.

## Skills audit (rules→recipes split)

Vendored all 26 skills from /home/samkintop/opt/skills/ into data/skills/
(the boocode-repo-local skill library — see docker-compose change below).
Audited via 5 parallel Claude Code agent-teams running the
mgechev/skills-best-practices 4-step protocol (Discovery → Logic → Edge
Case → self-Architecture-Refinement) per skill, ~2 min wall-clock vs the
~3.7-hour serial estimate.

Result: 14 skills surviving (renamed to gerund form, frontmatter matched),
11 deleted (duplicates, BooCode-irrelevant patterns, Claude-already-does-
natively), 1 migrated to BOOCHAT.md/BOOCODER.md as an always-true rule
(verification-before-completion). Each surviving skill had its description
refined to fix specific trigger gaps surfaced by the protocol — 4
real-bug findings landed (dead refs, stale tags, broken sub-file
references in the original vendored content).

Audit decisions documented in openspec/changes/v1.13.12-skills-audit/
audit-notes.md. Convention codified in BOOCHAT.md/BOOCODER.md "rules vs
recipes" sections — future workflow rules go to those files (100%
present), recipes stay in data/skills/ (~6% invoke rate in multi-turn
per the Codeminer42 measurement).

## Token tracking + stale-stream banner fix (same root cause)

ws-frames.ts IsoTimestamp was z.string().min(1) but postgres returns
timestamp columns as JS Date objects. Every message_complete /
session_updated / chat_updated frame was failing the v1.13.11 Zod gate
and being silently dropped. Symptoms: token tracking blank in the UI
(no usage frames landed); the 60s no-token-activity timer tripped the
stale-stream banner because the frontend's local message state never
saw status='streaming' flip to 'complete'.

Fix: z.preprocess(v => v instanceof Date ? v.toISOString() : v,
z.string().min(1)) applied to the IsoTimestamp primitive. Centralized,
no publisher changes, works identically server + web (the parity test
still passes).

## Codecontext .codecontextignore auto-install

services/codecontext_client.ts now copies the
codecontext/.codecontextignore.template into any project's root on the
first call to that project if no .codecontextignore exists. One file
written per project, idempotent (in-memory Set guard + access-check),
silent fallback on read-only project. Stops the upstream empty-source-
file parser crash on foreign projects' node_modules — previously
required manually copying the template per project.

## Tool-call budget cap 30 → 50

services/inference/budget.ts: BUDGET_READ_ONLY and BUDGET_NO_AGENT
bumped to 50 (from 30). BUDGET_NON_READ_ONLY stays at 10 (no write
tools landed yet). Real recon sessions were hitting 30 with ~3 turns
wasted on codecontext parse failures; legitimate need was ~27, and
Architect-class system overviews want deeper recon. Headroom of 20
absorbs failure-retry turns without changing the safety floor — the
doom-loop guard (3 identical calls → abort) catches the actual
failure mode this cap was guarding against.

v1.14 (Phase C outer agent loop) will supersede this via per-agent
agent.steps. Throwaway-ish patch but unblocks deeper recon today.

## UI cleanups

- ChatPane queued-message dropdown removed. Each queued message now
  has three buttons: edit (pop back into ChatInput via sendToChat
  event), force-send (was the dropdown's only useful action), and
  cancel. Default behavior (send when streaming completes) needs no
  UI — it's the implicit do-nothing path.
- ChatThroughput removed from desktop tab strip (ChatTabBar.tsx).
  Mobile tab switcher still shows it.

## Plumbing

- .gitignore: data/* + !data/AGENTS.md + !data/skills/ negation
  patterns so the vendored skill library + agent registry become
  git-tracked while session DB state stays out.
- docker-compose.yml: removed /opt/skills:/data/skills override
  mount. Skills now live in the boocode repo at data/skills/,
  auditable per-batch. The host-level /opt/skills/ is preserved
  untouched for any other tools that read from it.
- .codecontextignore at repo root: auto-installed when codecontext
  was first called against /opt/boocode itself; matches the template.
- CLAUDE.md: updated to document the v1.13.11 publishFrame wrapper +
  message_parts table + tool_cost_stats view + DB-integration test
  pattern + host-side smoke endpoint quirk. (Pre-existing in working
  tree before this batch; shipped here for completeness.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-22 18:58:30 +00:00

13 KiB

Raw Blame History

v1.13.12 — skills audit pass

Audit of 26 skills vendored from /home/samkintop/opt/skills/ into /opt/boocode/data/skills/. Each sorted into one of four buckets per the Codeminer42 rules→recipes split.

Deviations from the batch spec

Spec said	Reality	Resolution
`/opt/boocode/skills/` is the audit target	Skills directory is `/opt/boocode/data/skills/` (per `services/skills.ts:19` `SKILLS_ROOT = '/data/skills'`)	Vendored to the correct path
`/opt/boocode/AGENTS.md` for bucket-(a) rule additions	`data/AGENTS.md` is an agent registry (`## H2` per agent with frontmatter), not a rules file	Bucket-(a) rules go to `BOOCHAT.md` (the container guidance file the chat agent reads) instead
"7 vendored v1.12 skills" exist to audit	Zero SKILL.md ever committed; `data/skills/` was empty	Vendored all 26 from `/home/samkintop/opt/skills/` in this batch (vendor + audit combined)
`data/` content tracked in git	`.gitignore` excluded all of `data/`	Added negation patterns (`data/*` + `!data/AGENTS.md` + `!data/skills/`) so audit work shows up in git
Container reads `data/skills/` from the boocode repo	`docker-compose.yml:18` had `- /opt/skills:/data/skills` override mount — container actually read from host-level `/opt/skills/`, ignoring repo `data/skills/`	Removed the override mount. Skill library now lives in `data/skills/` (repo-tracked, per-batch auditable). Host `/opt/skills/` preserved untouched for other tools (Claude Code, etc.). 1-line deviation from spec's "zero code change" claim — necessary to make the spec's intent actually take effect

Bucket tally

Bucket	Action	Count
(a)	Move to BOOCHAT.md as always-true rule	1
(b)	Keep as recipe, apply Anthropic conventions	14
(c)	Keep + move bulk to `references/` (SKILL.md > 500 lines)	0
(d)	Delete (duplicates Claude native capability or doesn't fit BooCode)	11
Total		26

No skill exceeded the 500-line ceiling — bucket (c) is empty. Longest survivor: systematic-debugging at 296 lines.

Per-skill decisions

Skill (path)	Lines	Bucket	Disposition	Rationale
`anthropics/agent-development`	196	(b)	Keep; rename → `developing-agents`	BooCode-specific value (manages `data/AGENTS.md` tier-2 registry)
`anthropics/claude-md-improver`	180	(d)	Delete	Overlaps `boocode-guidance-improver` (more specific)
`anthropics/frontend-design`	42	(b)	Keep; rename → `designing-frontends`	Concise UI design guidance, no overlap
`anthropics-knowledge-work/code-review`	118	(b)	Keep; rename → `reviewing-code`	Generic code review process distinct from `receiving-` / `requesting-code-review`
`anthropics-knowledge-work/task-management`	91	(d)	Delete	`user-invocable: false`; duplicates BooCode's TodoWrite/TaskCreate native capability
`asyrafhussin/react-vite-best-practices`	182	(b)	Keep; rename → `optimizing-react-vite`	Matches BooCode's stack (Vite, not Next.js)
`boocode/boocode-guidance-improver`	167	(b)	Keep; rename → `improving-boocode-guidance`	BooCode-specific 10-dimension rubric for `CLAUDE.md`/`BOOCHAT.md`/`BOOCODER.md`/`AGENTS.md`
`mattpocock/diagnose`	117	(b)	Keep; rename → `diagnosing-bugs`	Complement to `systematic-debugging`: focus on building a feedback loop
`mattpocock/grill-me`	20	(b)	Keep; rename → `grilling-plans`	Plan stress-testing
`mattpocock/grill-with-docs`	98	(d)	Delete	Requires `CONTEXT.md` and `docs/adr/` that BooCode doesn't have
`mattpocock/handoff`	17	(d)	Delete	BooCode is single-user; no agent handoff scenario
`mattpocock/improve-codebase-architecture`	71	(d)	Delete	Requires `CONTEXT.md` and `docs/adr/`
`mattpocock/to-issues`	83	(d)	Delete	BooCode uses `openspec/changes/`, not an issue tracker
`mattpocock/to-prd`	76	(d)	Delete	Same — no issue tracker
`mattpocock/write-a-skill`	121	(b)	Keep; rename → `writing-skills`	Authoring new skills for this very system
`mattpocock/zoom-out`	7	(d)	Delete	Claude does this natively when asked; 7-line skill is overhead
`superpowers/brainstorming`	164	(b)	Keep (already gerund)	Before-features creative-work process
`superpowers/receiving-code-review`	213	(b)	Keep (already gerund)	Sam reviews everything; process for handling feedback
`superpowers/requesting-code-review`	103	(b)	Keep (already gerund)	Before-merge verification
`superpowers/systematic-debugging`	296	(b)	Keep (already gerund)	Comprehensive bug-fix discipline (root-cause-first)
`superpowers/using-superpowers`	117	(d)	Delete	Meta-skill about skill discovery; Claude does discovery natively
`superpowers/verification-before-completion`	139	(a)	Migrate rule to `BOOCHAT.md`, delete skill dir	Always-true rule: evidence before assertions. Belongs 100% present, not 6% invoked
`superpowers/writing-plans`	152	(b)	Keep (already gerund)	Maps to BooCode's `openspec/changes/` workflow
`vercel-labs/find-skills`	142	(d)	Delete	Skill discovery — Claude does this natively
`vercel-labs/react-best-practices`	149	(d)	Delete	Next.js focus; BooCode uses Vite (asyrafhussin's version is the fit)
`vercel-labs/web-design-guidelines`	39	(b)	Keep; rename → `reviewing-web-design`	UI compliance review

Bucket-(a) migration text

Single rule extracted from superpowers/verification-before-completion (139 lines → ~3 lines in BOOCHAT.md):

Don't claim work is complete without verifying. Run the relevant command (test, build, smoke) and confirm the expected output before reporting success. Evidence before assertions catches regressions you'd otherwise miss.

The 139-line process content does not move to BOOCHAT.md — the rule itself is what needs to be 100% present. Process detail is recoverable from the upstream repo if anyone wants to read it later.

Verification protocol coverage

Step	Owner	Status
1. Discovery (paste SKILL.md, check first-200-char triggering)	Sam — fresh Claude.ai chat per skill	Pending
2. Logic (paste realistic task, check skill recognizes)	Sam — fresh Claude.ai chat	Pending
3. Edge Case (paste boundary task, check correct invoke/decline)	Sam — fresh Claude.ai chat	Pending
4. Architecture Refinement (paste skill + chats, ask for critique)	Sam — fresh Claude.ai chat	Pending
5. `skillgrade --smoke` (5 trials per skill)	Sam — host install `npm i -g skillgrade` first	Pending

Eval.yaml files written per surviving skill (14 files) so the skillgrade --smoke runs are mechanical once skillgrade is installed.

skillgrade scope correction

The eval.yaml stubs authored in the prior session use a flat tasks: [{prompt, grader: [list]}] shape that does not validate against skillgrade's canonical schema (canonical needs name, instruction, workspace, structured graders with type: deterministic | llm_rubric, run shell script, weight, plus a Docker provider block). Rewriting all 14 in the canonical format is out of scope for this batch — each needs Docker workspace setup and grader scripts that capture skill-output correctness. Filed as a follow-up: v1.13.13 — skillgrade eval.yaml canonical rewrite + first quantitative pass.

The smoke-results column in the table below is n/a* for that reason. The 4-step qualitative protocol still runs (via the agent team in this batch) and surfaces the structural issues that quantitative trials would have caught anyway.

4-step protocol findings (agent-team batch)

Each surviving skill was assessed by one of 5 parallel teammates (alpha / bravo / charlie / delta / echo) running the mgechev/skills-best-practices 4-step protocol: Discovery → Logic → Edge Case → self-Architecture-Refinement. Teammates wrote per-agent findings to /tmp/audit-<name>.md; the table here aggregates.

Ratings shorthand: D=Discovery, L=Logic, E=Edge Case (each 1-5).

Skill	Auditor	D / L / E	4-step verdict	Fix applied
`anthropics/designing-frontends`	alpha	5 / 5 / 3	Strong primary triggers; over-broad with "artifacts, posters" (not code targets)	Removed "artifacts, posters" from description trigger list
`anthropics/developing-agents`	alpha	5 / 5 / 4	Sharp triggering; stale "(as of v1.11.x)" tag and broken `inference.ts:721-731` reference (actual code is at `stream-phase.ts:403-406`)	Updated stale version tag + cross-reference
`anthropics-knowledge-work/reviewing-code`	alpha	4 / 4 / 3	Good for explicit PR/diff triggers; dead `CONNECTORS.md` cross-reference (file doesn't exist)	Removed broken cross-reference
`mattpocock/diagnosing-bugs`	bravo	5 / 5 / 4	Strong; missing colloquial phrasings like "not working" / "something wrong"	Added informal trigger phrases
`mattpocock/grilling-plans`	bravo	3 / 4 / 3	Trigger coverage too narrow — only "grill me" reliably fires; structural risk: mandatory `ask_user_input` tool may not exist in BooCode's tool registry (flagged but not patched — needs env verification)	Added "poke holes", "challenge my design", "play devil's advocate", "what am I missing" triggers
`mattpocock/writing-skills`	bravo	4 / 5 / 3	Missing "create" phrasing; description-length rule (≤1024 chars) not in Review Checklist	Added "create" trigger + checklist item
`superpowers/brainstorming`	charlie	4 / 4 / 3	Vague "modifying behavior" causes both over- and under-firing; HARD-GATE wording could be clearer about writing-plans being permitted	Tightened to "non-trivial modifications" + added "refactoring"
`superpowers/receiving-code-review`	charlie	3 / 4 / 3	Conditional qualifier "especially if feedback seems unclear or technically questionable" mis-frames as edge-case skill rather than default protocol	Removed conditional + broadened to informal channels
`superpowers/requesting-code-review`	charlie	3 / 4 / 2	Scope collision with built-in `code-review` skill — near-identical surface language but different execution (subagent dispatch vs inline). LOWEST EDGE-CASE SCORE OF THE BATCH	Added "dispatches a separate subagent reviewer" differentiator
`superpowers/systematic-debugging`	delta	5 / 5 / 4	Strong; build/compile failures appear in body but missing from frontmatter trigger	Extended description with "build failure, compile error" + "debug/investigate/diagnose"
`superpowers/writing-plans`	delta	3 / 4 / 3	Spec-centric framing gatekeeps on pre-existing spec doc; colloquial "write me a plan" misses	Added colloquial planning trigger phrases
`asyrafhussin/optimizing-react-vite`	delta	4 / 5 / 3	Over-broad "Vite configuration" triggers on non-perf tasks; body references `rules/*.md` and `AGENTS.md` files that don't exist in the skill dir	Narrowed scope + added broken-reference warnings
`boocode/improving-boocode-guidance`	echo	5 / 5 / 4	Strong; "critique" in description prose but missing from examples list	Added `"critique my BOOCODER.md"` to examples
`vercel-labs/reviewing-web-design`	echo	3 / 4 / 3	Generic triggers collide with general code-review; delegates substance to external GitHub URL with no fallback on fetch failure	Named "Vercel's live web-interface-guidelines" as differentiator + added 404 fallback

Aggregate notes

Trigger-quality stats (qualitative, n=14):

Discovery 5/5: 5 skills | 4/5: 4 skills | 3/5: 5 skills (avg ~4.0)
Edge case is the weakest dimension across the batch — most skills hit 3/5 (borderline invoke/decline). Suggests skills are over- or under-triggering on adjacent-but-different tasks.
Every skill had at least one fix applied. None were judged "clean" with zero issues.
Zero skills were flagged for retroactive bucket-(a) reclassification — all 14 remain (b) recipes.

Real bugs surfaced (not just polish):

anthropics/developing-agents: stale code reference (inference.ts:721-731 → stream-phase.ts:403-406). Real dead link.
anthropics-knowledge-work/reviewing-code: dead CONNECTORS.md cross-reference.
asyrafhussin/optimizing-react-vite: references rules/*.md and AGENTS.md subfiles that don't exist in the skill directory.
superpowers/requesting-code-review: scope collision with built-in code-review (review skills/auto-routing — Sam may want to drop one of these).

Structural flags requiring environment verification (not patched):

mattpocock/grilling-plans: mandatory ask_user_input tool call assumed available. Confirm BooCode's tool registry exposes this to the chat-surface model. If not, the skill body's MANDATORY instruction deadlocks.

Skillgrade gap remains:

Quantitative trigger rates (the original v1.13.12 N/5 column) require skillgrade with canonical-format eval.yaml. Filed as v1.13.13 follow-up. The qualitative 4-step protocol catches the same class of issue (and arguably more — the broken-reference bugs above would not have shown up in skillgrade's invoke/decline trials).

Per-agent artifacts (working files, not part of repo):

/tmp/audit-alpha.md — designing-frontends, developing-agents, reviewing-code
/tmp/audit-bravo.md — diagnosing-bugs, grilling-plans, writing-skills
/tmp/audit-charlie.md — brainstorming, receiving-code-review, requesting-code-review
/tmp/audit-delta.md — systematic-debugging, writing-plans, optimizing-react-vite
/tmp/audit-echo.md — improving-boocode-guidance, reviewing-web-design

13 KiB Raw Blame History