boocode/data/skills/boocode/improving-boocode-guidance/SKILL.md

---
name: improving-boocode-guidance
description: This skill should be used when the user asks to audit, review, check, improve, or critique CLAUDE.md, BOOCHAT.md, BOOCODER.md, or AGENTS.md files in a BooCode project. Examples: "audit my CLAUDE.md", "review my container guidance", "check this AGENTS.md for issues", "improve my BOOCHAT.md", "critique my BOOCODER.md".
---

# BooCode Guidance Improver

Audit guidance files in a BooCode project against a 10-dimension rubric, then propose targeted edits. **Read-only.** Output is a scored report plus before/after edit proposals; Sam reviews and commits.

## Phase 1 — Discovery

Find every guidance file in the project. The expected set:

- `CLAUDE.md` (repo root) — engineering conventions, gotchas, commands
- `AGENTS.md` (repo root) — **agent navigation** (doc map, task routing, openspec usage). Not the agent registry.
- `BOOCHAT.md` (repo root) — container guidance for the read-only chat surface
- `BOOCODER.md` (repo root) — container guidance for the write-capable surface
- `data/AGENTS.md` — single-file tier-2 **agent registry**, `## H2` per agent

Glob with `find_files` then load each with `view_file`:

```
find_files: pattern="{CLAUDE,BOOCHAT,BOOCODER,AGENTS}.md", path="."
find_files: pattern="data/AGENTS.md", path="."
```

If a file expected by the project's architecture is missing (e.g. BOOCHAT.md is absent from the repo root in a project that exposes a chat container), flag it in the report as a separate "Missing" entry — don't try to score what isn't there. Likewise, if a file exists but is empty (≤5 lines, no real content), score it 1 across the board and recommend it be either populated or deleted; an empty guidance file is worse than no file because it consumes attention without paying any back.

## Phase 2 — Score against the rubric

For each file, score each of the 10 dimensions on 1–5 (1 = absent or actively misleading; 5 = exemplary). Use the rubric below verbatim. Cite a representative line range for each score.

### a. Refusal rails up front

The first ~10 lines name explicit "do not" directives — what the agent must not do, ideally with a one-line reason. Surfacing refusals early prevents the model from acting on a hopeful misread later.

- **5** — first 10 lines contain ≥3 explicit refusals (e.g. *"Do not commit"*, *"Do not push"*, *"Do not write files"*) with brief reasons or contexts
- **3** — refusals exist but are buried below line 30, or stated only once without context
- **1** — no refusals anywhere; the agent has to infer constraints from positive instructions only

### b. Version anchor

A concrete version, tag, or date is mentioned near the top so a stale memory becomes obvious to a future reader. Pure "current" / "latest" claims rot silently.

- **5** — version/tag in the first 20 lines, plus a "last meaningful update" date inline somewhere
- **3** — a version tag exists but only deep in the file (e.g. inside a commit-history block)
- **1** — no version, no date, no anchor; nothing to detect staleness against

### c. Why-with-what

Every non-obvious convention or rule is followed by a one-line justification (`Why:` / `Reason:` / dash). Rules without reasons can't be reasoned about at the edges; they get either blindly followed or quietly violated.

- **5** — every non-trivial rule has a sentence-level "why" inline
- **3** — most rules have reasons, but a few load-bearing ones (e.g. "use overflowWrap not wordWrap") are bare
- **1** — rules read as commandments with no rationale

### d. Authoritative vs misleading sources

Places where a tool can lie (e.g. *"root `tsc --noEmit` uses project references and can miss errors that the per-app tsconfig catches"*) are called out, and the authoritative path is named. Without this, the agent picks the most convenient signal and ships a regression.

- **5** — at least one explicit "X can lie; use Y instead" pair, named with file paths
- **3** — implicit hints ("CLI is authoritative") without naming what the misleading signal is
- **1** — no acknowledgement that any tool can lie

### e. Resolution order

For any stacked configuration (system prompts, env vars, agent definitions, schemas), the precedence is documented end-to-end with what wins on conflict. Missing precedence rules force the agent to guess at boundaries.

- **5** — explicit ordered list (e.g. *"base → container guidance → agent.system_prompt → user prompt"*) with "last wins" or "first wins" stated
- **3** — order is implied by section sequence but not stated; precedence on conflict is unclear
- **1** — multiple sources mentioned, no order, no winner

### f. Failure modes

Each subsystem has a "what happens when this fails" note — fallbacks, defaults, swallow vs propagate decisions. Without this the agent assumes the happy path and writes brittle code.

- **5** — every major subsystem (DB, broker, LLM call, tool execution) names its failure behavior
- **3** — some failure paths documented, others implicit
- **1** — failure modes invisible; reader can't tell what's defensive and what isn't

### g. Don't / refusals (deep)

Beyond the top-of-file refusal rails, the body contains a sustained "don't" thread — anti-patterns the project has burned on. Each "don't" should name what triggered it (PR, incident, refactor) so it can be re-evaluated.

- **5** — multiple "don't" entries scattered through the file, each with a hint at the triggering context
- **3** — a handful of "don't"s, no context — reader can't tell what's still load-bearing
- **1** — pure positive instructions; no anti-pattern surface

### h. Concrete call sites

Specific file paths and symbol names are used (e.g. `apps/server/src/services/inference.ts:209-225 buildSystemPrompt`), not vague pointers ("in the service layer", "somewhere in tools"). Vague pointers force the agent into an extra search round-trip per claim.

- **5** — claims about code consistently cite file:line or file:symbol (e.g. *"buildSystemPrompt at apps/server/src/services/system-prompt.ts:42"*)
- **3** — some claims cite paths but not lines or symbols (*"in apps/server/src/services/inference.ts"*)
- **1** — claims read like "the broker handles pub/sub" with no path at all

A reliable test for this dimension: pick three random claims about behaviour, and try to land at the named code in two clicks. If you can't, the score drops.

### i. Convention drift guards

Pairs of files that must stay in sync are named explicitly (e.g. *"CHECK constraints in schema.sql ↔ `*_STATUSES` const arrays in `apps/server/src/types/api.ts`"*). Without these guards, one half drifts and the test that would catch it doesn't exist.

- **5** — every cross-file invariant in the project has a "keep in sync" callout
- **3** — one or two such guards present; obvious sibling files (frontend type ↔ backend type) not mentioned
- **1** — invariants are invisible; every edit risks silent divergence

### j. No theater

Every line earns its keep. No "be helpful", no "remember to think step by step", no "as an AI assistant" preamble. Theater wastes tokens and trains the model to skim.

- **5** — every line carries either a fact, a rule, or a pointer; reads tight
- **3** — a few filler sentences ("strive for excellence", "remember to think carefully") but mostly substantive
- **1** — heavy preamble, motivational platitudes, or restated framework defaults

Worth a separate pass: re-read the file and ask "would removing this line confuse a future reader?" — if the honest answer is no, the line is theater and should go.

## Phase 3 — Propose one concrete edit per ≤3

For every dimension scoring 3 or lower, generate one specific edit proposal. Each proposal must be:

- **File**: full repo-relative path
- **Anchor**: a quoted ~one-line existing string or `(new section after L<n>)`
- **Before**: existing text (or `(none)`)
- **After**: proposed text
- **Why**: one sentence linking back to the rubric dimension and what the change unlocks

Example proposal:

```
### Proposed edit 1 — dimension (a) Refusal rails up front

File: BOOCHAT.md
Anchor: "## Capabilities" (L3)
Before:
  ## Capabilities
After:
  ## You cannot
  - Write, edit, or delete files
  - Run shell commands
  - Make commits, push, or pull

  ## Capabilities
Why: the upstream rubric requires explicit "do not" rails in the first 10 lines so the
model can't reach for a write tool and self-justify after the fact.
```

Keep proposals minimal. One edit per dimension scoring ≤3 — don't pad. If a single edit would lift two dimensions at once, say so and don't double-count.

Do not propose more than ~10 edits per file. If a file scores ≤3 on more than 10 dimensions (rare), the file needs a rewrite, not patches — say that instead, and propose a high-level outline rather than a flood of line-level edits.

## Phase 4 — Output

Output as a single numbered list, in this order:

1. Per-file score table: 10 rows × score column × one-line evidence column
2. Per-file aggregate (sum out of 50) and overall grade band: A (≥45), B (35–44), C (25–34), D (15–24), F (<15)
3. Proposed edits, numbered globally across all files
4. Closing one-line summary: *"X files audited, Y edits proposed, top weak dimension across files: Z."*

Do not edit any file. Do not call any write tool. Sam reads the report, picks which edits to apply, and commits them manually.

## Anti-patterns this skill explicitly avoids

- Auto-generating CLAUDE.md from scratch (different problem — that's `claude-md-improver`'s domain)
- Scoring the *project's* code quality (out of scope — this rubric is about guidance files only)
- Padding the report with generic "best practices" not tied to one of the 10 dimensions
- Restating the rubric in every per-file section (state it once at the top, reference dimensions by letter throughout)