Ship Paseo-equivalent provider snapshot, AgentComposerBar, ACP dispatch rewrite with streaming/persist, permission prompts, and agent commands. Follow-up: pane-scoped chat resolution, CoderMessageList tool timeline, WS user-delta replace, and inference orphan tool_call stripping. Archive openspec v2-2; update CHANGELOG and CURRENT. Co-authored-by: Cursor <cursoragent@cursor.com>
168 lines
9.7 KiB
Markdown
168 lines
9.7 KiB
Markdown
---
|
||
name: improving-boocode-guidance
|
||
description: This skill should be used when the user asks to audit, review, check, improve, or critique CLAUDE.md, BOOCHAT.md, BOOCODER.md, or AGENTS.md files in a BooCode project. Examples: "audit my CLAUDE.md", "review my container guidance", "check this AGENTS.md for issues", "improve my BOOCHAT.md", "critique my BOOCODER.md".
|
||
---
|
||
|
||
# BooCode Guidance Improver
|
||
|
||
Audit guidance files in a BooCode project against a 10-dimension rubric, then propose targeted edits. **Read-only.** Output is a scored report plus before/after edit proposals; Sam reviews and commits.
|
||
|
||
## Phase 1 — Discovery
|
||
|
||
Find every guidance file in the project. The expected set:
|
||
|
||
- `CLAUDE.md` (repo root) — engineering conventions, gotchas, commands
|
||
- `AGENTS.md` (repo root) — **agent navigation** (doc map, task routing, openspec usage). Not the agent registry.
|
||
- `BOOCHAT.md` (repo root) — container guidance for the read-only chat surface
|
||
- `BOOCODER.md` (repo root) — container guidance for the write-capable surface
|
||
- `data/AGENTS.md` — single-file tier-2 **agent registry**, `## H2` per agent
|
||
|
||
Glob with `find_files` then load each with `view_file`:
|
||
|
||
```
|
||
find_files: pattern="{CLAUDE,BOOCHAT,BOOCODER,AGENTS}.md", path="."
|
||
find_files: pattern="data/AGENTS.md", path="."
|
||
```
|
||
|
||
If a file expected by the project's architecture is missing (e.g. BOOCHAT.md is absent from the repo root in a project that exposes a chat container), flag it in the report as a separate "Missing" entry — don't try to score what isn't there. Likewise, if a file exists but is empty (≤5 lines, no real content), score it 1 across the board and recommend it be either populated or deleted; an empty guidance file is worse than no file because it consumes attention without paying any back.
|
||
|
||
## Phase 2 — Score against the rubric
|
||
|
||
For each file, score each of the 10 dimensions on 1–5 (1 = absent or actively misleading; 5 = exemplary). Use the rubric below verbatim. Cite a representative line range for each score.
|
||
|
||
### a. Refusal rails up front
|
||
|
||
The first ~10 lines name explicit "do not" directives — what the agent must not do, ideally with a one-line reason. Surfacing refusals early prevents the model from acting on a hopeful misread later.
|
||
|
||
- **5** — first 10 lines contain ≥3 explicit refusals (e.g. *"Do not commit"*, *"Do not push"*, *"Do not write files"*) with brief reasons or contexts
|
||
- **3** — refusals exist but are buried below line 30, or stated only once without context
|
||
- **1** — no refusals anywhere; the agent has to infer constraints from positive instructions only
|
||
|
||
### b. Version anchor
|
||
|
||
A concrete version, tag, or date is mentioned near the top so a stale memory becomes obvious to a future reader. Pure "current" / "latest" claims rot silently.
|
||
|
||
- **5** — version/tag in the first 20 lines, plus a "last meaningful update" date inline somewhere
|
||
- **3** — a version tag exists but only deep in the file (e.g. inside a commit-history block)
|
||
- **1** — no version, no date, no anchor; nothing to detect staleness against
|
||
|
||
### c. Why-with-what
|
||
|
||
Every non-obvious convention or rule is followed by a one-line justification (`Why:` / `Reason:` / dash). Rules without reasons can't be reasoned about at the edges; they get either blindly followed or quietly violated.
|
||
|
||
- **5** — every non-trivial rule has a sentence-level "why" inline
|
||
- **3** — most rules have reasons, but a few load-bearing ones (e.g. "use overflowWrap not wordWrap") are bare
|
||
- **1** — rules read as commandments with no rationale
|
||
|
||
### d. Authoritative vs misleading sources
|
||
|
||
Places where a tool can lie (e.g. *"root `tsc --noEmit` uses project references and can miss errors that the per-app tsconfig catches"*) are called out, and the authoritative path is named. Without this, the agent picks the most convenient signal and ships a regression.
|
||
|
||
- **5** — at least one explicit "X can lie; use Y instead" pair, named with file paths
|
||
- **3** — implicit hints ("CLI is authoritative") without naming what the misleading signal is
|
||
- **1** — no acknowledgement that any tool can lie
|
||
|
||
### e. Resolution order
|
||
|
||
For any stacked configuration (system prompts, env vars, agent definitions, schemas), the precedence is documented end-to-end with what wins on conflict. Missing precedence rules force the agent to guess at boundaries.
|
||
|
||
- **5** — explicit ordered list (e.g. *"base → container guidance → agent.system_prompt → user prompt"*) with "last wins" or "first wins" stated
|
||
- **3** — order is implied by section sequence but not stated; precedence on conflict is unclear
|
||
- **1** — multiple sources mentioned, no order, no winner
|
||
|
||
### f. Failure modes
|
||
|
||
Each subsystem has a "what happens when this fails" note — fallbacks, defaults, swallow vs propagate decisions. Without this the agent assumes the happy path and writes brittle code.
|
||
|
||
- **5** — every major subsystem (DB, broker, LLM call, tool execution) names its failure behavior
|
||
- **3** — some failure paths documented, others implicit
|
||
- **1** — failure modes invisible; reader can't tell what's defensive and what isn't
|
||
|
||
### g. Don't / refusals (deep)
|
||
|
||
Beyond the top-of-file refusal rails, the body contains a sustained "don't" thread — anti-patterns the project has burned on. Each "don't" should name what triggered it (PR, incident, refactor) so it can be re-evaluated.
|
||
|
||
- **5** — multiple "don't" entries scattered through the file, each with a hint at the triggering context
|
||
- **3** — a handful of "don't"s, no context — reader can't tell what's still load-bearing
|
||
- **1** — pure positive instructions; no anti-pattern surface
|
||
|
||
### h. Concrete call sites
|
||
|
||
Specific file paths and symbol names are used (e.g. `apps/server/src/services/inference.ts:209-225 buildSystemPrompt`), not vague pointers ("in the service layer", "somewhere in tools"). Vague pointers force the agent into an extra search round-trip per claim.
|
||
|
||
- **5** — claims about code consistently cite file:line or file:symbol (e.g. *"buildSystemPrompt at apps/server/src/services/system-prompt.ts:42"*)
|
||
- **3** — some claims cite paths but not lines or symbols (*"in apps/server/src/services/inference.ts"*)
|
||
- **1** — claims read like "the broker handles pub/sub" with no path at all
|
||
|
||
A reliable test for this dimension: pick three random claims about behaviour, and try to land at the named code in two clicks. If you can't, the score drops.
|
||
|
||
### i. Convention drift guards
|
||
|
||
Pairs of files that must stay in sync are named explicitly (e.g. *"CHECK constraints in schema.sql ↔ `*_STATUSES` const arrays in `apps/server/src/types/api.ts`"*). Without these guards, one half drifts and the test that would catch it doesn't exist.
|
||
|
||
- **5** — every cross-file invariant in the project has a "keep in sync" callout
|
||
- **3** — one or two such guards present; obvious sibling files (frontend type ↔ backend type) not mentioned
|
||
- **1** — invariants are invisible; every edit risks silent divergence
|
||
|
||
### j. No theater
|
||
|
||
Every line earns its keep. No "be helpful", no "remember to think step by step", no "as an AI assistant" preamble. Theater wastes tokens and trains the model to skim.
|
||
|
||
- **5** — every line carries either a fact, a rule, or a pointer; reads tight
|
||
- **3** — a few filler sentences ("strive for excellence", "remember to think carefully") but mostly substantive
|
||
- **1** — heavy preamble, motivational platitudes, or restated framework defaults
|
||
|
||
Worth a separate pass: re-read the file and ask "would removing this line confuse a future reader?" — if the honest answer is no, the line is theater and should go.
|
||
|
||
## Phase 3 — Propose one concrete edit per ≤3
|
||
|
||
For every dimension scoring 3 or lower, generate one specific edit proposal. Each proposal must be:
|
||
|
||
- **File**: full repo-relative path
|
||
- **Anchor**: a quoted ~one-line existing string or `(new section after L<n>)`
|
||
- **Before**: existing text (or `(none)`)
|
||
- **After**: proposed text
|
||
- **Why**: one sentence linking back to the rubric dimension and what the change unlocks
|
||
|
||
Example proposal:
|
||
|
||
```
|
||
### Proposed edit 1 — dimension (a) Refusal rails up front
|
||
|
||
File: BOOCHAT.md
|
||
Anchor: "## Capabilities" (L3)
|
||
Before:
|
||
## Capabilities
|
||
After:
|
||
## You cannot
|
||
- Write, edit, or delete files
|
||
- Run shell commands
|
||
- Make commits, push, or pull
|
||
|
||
## Capabilities
|
||
Why: the upstream rubric requires explicit "do not" rails in the first 10 lines so the
|
||
model can't reach for a write tool and self-justify after the fact.
|
||
```
|
||
|
||
Keep proposals minimal. One edit per dimension scoring ≤3 — don't pad. If a single edit would lift two dimensions at once, say so and don't double-count.
|
||
|
||
Do not propose more than ~10 edits per file. If a file scores ≤3 on more than 10 dimensions (rare), the file needs a rewrite, not patches — say that instead, and propose a high-level outline rather than a flood of line-level edits.
|
||
|
||
## Phase 4 — Output
|
||
|
||
Output as a single numbered list, in this order:
|
||
|
||
1. Per-file score table: 10 rows × score column × one-line evidence column
|
||
2. Per-file aggregate (sum out of 50) and overall grade band: A (≥45), B (35–44), C (25–34), D (15–24), F (<15)
|
||
3. Proposed edits, numbered globally across all files
|
||
4. Closing one-line summary: *"X files audited, Y edits proposed, top weak dimension across files: Z."*
|
||
|
||
Do not edit any file. Do not call any write tool. Sam reads the report, picks which edits to apply, and commits them manually.
|
||
|
||
## Anti-patterns this skill explicitly avoids
|
||
|
||
- Auto-generating CLAUDE.md from scratch (different problem — that's `claude-md-improver`'s domain)
|
||
- Scoring the *project's* code quality (out of scope — this rubric is about guidance files only)
|
||
- Padding the report with generic "best practices" not tied to one of the 10 dimensions
|
||
- Restating the rubric in every per-file section (state it once at the top, reference dimensions by letter throughout)
|