verification discipline rules + chat naming from assistant response
BOOCHAT.md + BOOCODER.md: 4 verification rules added to both — verify against running container not source files, never count dist/, run commands before claiming success, derive counts from commands. auto_name.ts: chat titles now derived from the assistant's first response only (user message dropped from naming input). System prompt updated to "summarize the topic or outcome — do NOT copy the first few words verbatim." Produces titles like "Fastify Route Setup" instead of echoing the assistant's opening sentence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -39,6 +39,13 @@
|
|||||||
|
|
||||||
Always-true rules (process discipline, refusals, behavior contracts) live here in `BOOCHAT.md` — and in `BOOCODER.md` / `CLAUDE.md` per their scopes — where they are 100% present in every turn. On-demand recipes (specific procedures, scaffolds, checklists) live in `/data/skills/` and invoke roughly 6% of the time in clean multi-turn flow (Codeminer42 measurement, 2026). Don't file workflow rules as skills — they silently misfire. See Anthropic agent-skills best-practices (platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices) for the canonical conventions.
|
Always-true rules (process discipline, refusals, behavior contracts) live here in `BOOCHAT.md` — and in `BOOCODER.md` / `CLAUDE.md` per their scopes — where they are 100% present in every turn. On-demand recipes (specific procedures, scaffolds, checklists) live in `/data/skills/` and invoke roughly 6% of the time in clean multi-turn flow (Codeminer42 measurement, 2026). Don't file workflow rules as skills — they silently misfire. See Anthropic agent-skills best-practices (platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices) for the canonical conventions.
|
||||||
|
|
||||||
|
## Verification discipline
|
||||||
|
|
||||||
|
- When assessing implementation status, verify against the running container (`curl /api/health`) and latest git commit (`git log --oneline -3`), not just source file contents. Source files can be mid-edit. The deployed state is the truth.
|
||||||
|
- Never count `dist/` directory sizes as source lines. Only count `src/**/*.ts` files. Compiled output is inflated by inlined types and transpilation artifacts.
|
||||||
|
- Before claiming a feature works, run the actual command and show the output. "Should work" is not verification. Acceptable evidence: test output (`pnpm test`), build output (`pnpm build`), curl response, docker logs, `\d tablename` output. If you can't run it, say so explicitly — don't assert success without evidence.
|
||||||
|
- When reporting counts (tools, tests, files, routes, lines), derive the number from a command (`grep -c`, `wc -l`, test runner output) — not from memory or approximation.
|
||||||
|
|
||||||
## Known limitations
|
## Known limitations
|
||||||
|
|
||||||
- Codecontext re-analyzes the project graph on each call against a different target_dir. First call to a new project may take 1-3 seconds; subsequent calls to the same project return in ~10ms.
|
- Codecontext re-analyzes the project graph on each call against a different target_dir. First call to a new project may take 1-3 seconds; subsequent calls to the same project return in ~10ms.
|
||||||
|
|||||||
@@ -30,3 +30,10 @@ Every file modification queues in `pending_changes` before touching disk. The us
|
|||||||
- If uncertain about scope, use smaller edits and verify between steps.
|
- If uncertain about scope, use smaller edits and verify between steps.
|
||||||
- Cite file paths + line numbers for context.
|
- Cite file paths + line numbers for context.
|
||||||
- Verify before reporting work complete: run the relevant test/build/smoke and confirm output matches the claim. Evidence first, assertion second.
|
- Verify before reporting work complete: run the relevant test/build/smoke and confirm output matches the claim. Evidence first, assertion second.
|
||||||
|
|
||||||
|
## Verification discipline
|
||||||
|
|
||||||
|
- When assessing implementation status, verify against the running container (`curl /api/health`) and latest git commit (`git log --oneline -3`), not just source file contents. Source files can be mid-edit. The deployed state is the truth.
|
||||||
|
- Never count `dist/` directory sizes as source lines. Only count `src/**/*.ts` files. Compiled output is inflated by inlined types and transpilation artifacts.
|
||||||
|
- Before claiming a feature works, run the actual command and show the output. "Should work" is not verification. Acceptable evidence: test output (`pnpm test`), build output (`pnpm build`), curl response, docker logs, `\d tablename` output. If you can't run it, say so explicitly — don't assert success without evidence.
|
||||||
|
- When reporting counts (tools, tests, files, routes, lines), derive the number from a command (`grep -c`, `wc -l`, test runner output) — not from memory or approximation.
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
import type { InferenceContext } from './inference/index.js';
|
import type { InferenceContext } from './inference/index.js';
|
||||||
|
|
||||||
const NAMING_SYSTEM_PROMPT =
|
const NAMING_SYSTEM_PROMPT =
|
||||||
'You name chat sessions. Reply directly with no thinking, reasoning, or explanation. Output ONLY the title, 4 words max, no quotes, no punctuation, no prefix like "Title:".';
|
'You name chat sessions based on what the assistant did. Summarize the topic or outcome — do NOT copy the first few words verbatim. Reply directly with no thinking, reasoning, or explanation. Output ONLY the title, 4 words max, no quotes, no punctuation, no prefix like "Title:".';
|
||||||
|
|
||||||
const MAX_TITLE_CHARS = 60;
|
const MAX_TITLE_CHARS = 60;
|
||||||
|
|
||||||
@@ -70,12 +70,6 @@ export async function maybeAutoNameChat(
|
|||||||
const model = sessionRows[0]?.model;
|
const model = sessionRows[0]?.model;
|
||||||
if (!model) return;
|
if (!model) return;
|
||||||
|
|
||||||
const userMsg = await ctx.sql<{ content: string }[]>`
|
|
||||||
SELECT content FROM messages
|
|
||||||
WHERE chat_id = ${chatId} AND role = 'user'
|
|
||||||
ORDER BY created_at ASC
|
|
||||||
LIMIT 1
|
|
||||||
`;
|
|
||||||
const assistantMsg = await ctx.sql<{ content: string }[]>`
|
const assistantMsg = await ctx.sql<{ content: string }[]>`
|
||||||
SELECT content FROM messages
|
SELECT content FROM messages
|
||||||
WHERE chat_id = ${chatId}
|
WHERE chat_id = ${chatId}
|
||||||
@@ -85,9 +79,8 @@ export async function maybeAutoNameChat(
|
|||||||
ORDER BY created_at ASC
|
ORDER BY created_at ASC
|
||||||
LIMIT 1
|
LIMIT 1
|
||||||
`;
|
`;
|
||||||
if (!userMsg[0] || !assistantMsg[0]) return;
|
if (!assistantMsg[0]) return;
|
||||||
|
|
||||||
const userText = userMsg[0].content.slice(0, 2000);
|
|
||||||
const assistantText = assistantMsg[0].content.slice(0, 2000);
|
const assistantText = assistantMsg[0].content.slice(0, 2000);
|
||||||
|
|
||||||
const body = {
|
const body = {
|
||||||
@@ -96,7 +89,7 @@ export async function maybeAutoNameChat(
|
|||||||
{ role: 'system', content: NAMING_SYSTEM_PROMPT },
|
{ role: 'system', content: NAMING_SYSTEM_PROMPT },
|
||||||
{
|
{
|
||||||
role: 'user',
|
role: 'user',
|
||||||
content: `First user message: ${userText}\nFirst assistant reply: ${assistantText}`,
|
content: assistantText,
|
||||||
},
|
},
|
||||||
],
|
],
|
||||||
max_tokens: 30,
|
max_tokens: 30,
|
||||||
|
|||||||
Reference in New Issue
Block a user