v1.13.15-tools: tiered tool loading via BOOCODE_TOOLS env var

Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause — pattern only, no code lift). Adds BOOCODE_TOOLS env var with three tiers: - core (4 tools): view_file, list_dir, grep, find_files. ~2k token schema cost. - standard (15 tools): core + web_search, web_fetch, git_status, all 8 codecontext_* tools. ~10k token schema cost. - all (default; current behavior): every tool in ALL_TOOLS (20). ~21k token schema cost. The env var is a CEILING — narrows agent whitelists, never expands. Default behavior unchanged when var is unset. resolveToolTier is case-insensitive and falls back to 'all' on unknown values. CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against TOOLS_BY_NAME via two top-level for-loops that throw on the first missing name. Module fails to import if a tier references a tool that doesn't exist in the registry — catches typos and stale tier definitions at boot rather than silently filtering valid tools out of agent whitelists. Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from process.env per parse, intersects with the agent's declared frontmatter tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse read is fine — agents are re-parsed on the existing 60s cache TTL. Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant + TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195; +9 new). Deviation from the brief: the codecontext tools in the actual registry have NO codecontext_* prefix (the brief's STANDARD list assumed it). Used the actual names (get_codebase_overview, search_symbols, etc.). Module-load validation would have failed boot with the prefixed names. Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool whitelists. With BOOCODE_TOOLS=core in .env + container restart, the same agents narrow to 4 tools (find_files, grep, list_dir, view_file) — intersection of declared whitelist ∩ core tier. Reverted after confirmation. CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's Optional list. .env.example gained a commented BOOCODE_TOOLS=all line with the per-tier token-cost table. ~110 LoC across 5 files (4 modified + 1 test expansion). Under the brief's ~30 LoC estimate for code; the test suite expansion drove most of the growth.
v1.13.15-openspec: reformat batch docs to OpenSpec directory structure
2026-05-22 14:59:01 +00:00 · 2026-05-22 14:54:17 +00:00 · 2026-05-22 14:52:37 +00:00 · 2026-05-22 14:42:09 +00:00 · 2026-05-22 14:07:11 +00:00
19 changed files with 1079 additions and 32 deletions
--- a/.env.example
+++ b/.env.example
@@ -10,3 +10,12 @@ POSTGRES_PASSWORD=CHANGE_ME
 # Internal Tailscale address that bypasses Authelia. Override if you
 # point BooCode at a different SearXNG instance.
 SEARXNG_URL=http://100.114.205.53:8888
 # v1.13.15-tools: BOOCODE_TOOLS narrows the tool whitelist sent to the LLM.
 # Unset (default) → all tools (~21k schema). Useful primarily for single-purpose
 # sessions where the model only needs read-only filesystem access.
 #
 # core      → view_file, list_dir, grep, find_files                       (~2k)
 # standard  → core + web_*, git_status, all 8 codecontext_* tools         (~10k)
 # all       → every tool in ALL_TOOLS                                     (~21k)
 # BOOCODE_TOOLS=all
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,7 @@
 node_modules
 dist
 .env
 CLAUDE.local.md
 *.log
 .DS_Store
 .vite
--- a/BOOCHAT.md
+++ b/BOOCHAT.md
@@ -1,7 +1,5 @@
 # BooChat
 You are the assistant running inside BooChat — a self-hosted developer chat app.
 ## Capabilities
 - Read-only file tools: `view_file`, `list_dir`, `grep`, `find_files`
--- a/BOOCODER.md
+++ b/BOOCODER.md
@@ -2,8 +2,6 @@
 > (Stub. v2.0 implementation pending. This file documents the intended contract.)
 You are the assistant running inside BooCoder — the write-capable companion to BooChat.
 ## Capabilities
 - Everything in `BOOCHAT.md`
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -47,10 +47,12 @@ Tests: `pnpm -C apps/server test` runs the vitest suite. No test harness on `app
 Key services:
 - **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (v1.13.0 dual-write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts`), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures.
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Three gotchas the LSP/test suite won't catch:
+- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch:
  - **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
  - **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.
  - **Tools have NO `execute` field.** BooCode dispatches tools in tool-phase.ts, not the AI SDK loop. Only `description` + `inputSchema: jsonSchema(parameters)` — surfacing tool-call parts via `fullStream` and stopping is what we want.
  - **`includeUsage: true` MUST be set on `createOpenAICompatible`** in `services/inference/provider.ts`. The adapter defaults it false, omitting `stream_options.include_usage` from the request body; llama-swap then never emits the usage block and `result.usage.inputTokens/outputTokens` resolve to `undefined`. Latent regression from v1.13.1-A through v1.13.7 — every assistant row in that window has `tokens_used`/`ctx_used` NULL. Don't remove this flag during refactor.
  - **Tool-call-only turns may emit a leading `\n` text-delta** as the assistant content. `MessageList.flatten`'s `hasText` and `MessageBubble`'s `hasContent` both `.trim()` before the length check — otherwise whitespace-only content renders an empty bubble + ActionRow between every tool call (v1.13.7 fix). `payload.ts:buildMessagesPayload` also skips `status='failed'` AND complete-but-empty (no content, no tool_calls) assistant rows to avoid "Cannot have 2 or more assistant messages at the end of the list" upstream rejections after cap-hit + Continue.
 - **AI SDK ModelMessage conversion** (`toModelMessages` in stream-phase.ts). Tool messages need a `toolName` for `ToolResultPart` — BooCode's OpenAI-shape history doesn't carry it, so a forward-scan builds a `tool_call_id → toolName` map from prior assistant `tool_calls`. Tool outputs wrapped as `{ type: 'json' | 'text', value }` matching the v6 `ToolResultOutput` union. Assistant messages with reasoning emit a `ReasoningPart` first in the content array (v1.13.1-C).
 - **`experimental_repairToolCall`** (v1.13.3) wired into `streamText` to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through implementation — logs the bad call and returns it unmodified; `executeToolPhase`'s existing zod-reject error path routes it to the model on the next turn.
 - **`chat_status` frame shape** (published via `broker.publishUser`) — `status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'` (widened from `working|idle|error` in v1.12.1). Frontend `useChatStatus` derives `idle_warm` (<30s since idle) vs `idle_cold`. `ChatThroughput` renders inline beside `StatusDot` only when streaming or tool_running, fed by 500ms-throttled `'usage'` WS frames (`completion_tokens` + `ctx_used` + `ctx_max`). The `POST /api/chats/:id/discard_stale` endpoint exists to mark a stuck-streaming row as `failed` when the frontend's 60s no-token-activity timer (`ChatPane` content-length watcher) gives up.
@@ -58,7 +60,9 @@ Key services:
 - **Periodic 60s sweeper** in `apps/server/src/index.ts` (v1.13.3 + v1.13.5). Same `setInterval` runs `sweepStaleStreaming` (marks `messages.status='streaming'` older than 5 min as `failed`, publishes `chat_status='idle'` so the UI dot drops) and `cleanupTruncations` (TTL + orphan reap of tmpfs truncation files). `app.addHook('onClose')` clears the timer. No-op when nothing to reap.
 - **`services/broker.ts`** — In-memory pub/sub with two channel types: per-session (message streaming) and per-user (sidebar updates). No persistence; clients reconnect on restart.
 - **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) go through three guard layers: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). v1.11.8+ web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (resolved with `project.default_web_search_enabled` fallback) and filtered out of the LLM's tool schema when false. v1.13.5 truncation: when a tool slice cuts content, `services/truncate.ts` stashes the full text on tmpfs at `BOOCODE_TRUNCATION_DIR` (default `/tmp/boocode-truncations`, 0o700) keyed by an opaque `tr_<12 base32 chars>` id, and the `view_truncated_output(id)` tool retrieves it. 5MB cap (matches `view_file`'s `MAX_FILE_BYTES`), 7-day TTL, reaped by the periodic sweeper. Tmpfs path means container restart loses retrieval — acceptable, the model usually has moved on.
- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = ctx_max - 20k`. **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
+- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = floor(0.85 × ctx_max)` (v1.13.9 opencode-pattern early trigger; was `ctx_max - 20k` pre-v1.13.9, which gave only 7.6% headroom at 262k and 0 budget for ≤20k contexts). **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). First inferences after a boocode boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model yet; negative cache TTL is 60s, recovers on next turn. v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
 - **`services/system-prompt.ts`** — `buildSystemPrompt` is the string-returning shim; `buildSystemPromptWithFingerprint` is the canonical impl returning `{prompt, fingerprint, drift}`. v1.13.8 instrumentation: SHA-256 of the assembled prefix is logged per `buildMessagesPayload` call (msg `prefix-fingerprint`, level=info); a `Map<sessionId, lastHash>` observer fires `prefix-drift` (level=warn) on hash change with a field-level `changed_inputs` diff. Smoke proved the prefix is byte-stable across turns in steady-state — the originally-planned `system_prompt_cache` DB table was dropped as redundant against the v1.12.0 input-layer mtime caches (BOOCHAT.md here + AGENTS.md global+per-project in `agents.ts:safeStat`).
 - **`services/inference/budget.ts`** — tool-call budgets: `BUDGET_READ_ONLY = 30`, `BUDGET_NON_READ_ONLY = 10` (forward-looking; no write tools yet), `BUDGET_NO_AGENT = 30` (v1.13.7; was 15 — every tool in `ALL_TOOLS` is read-only today, so no-agent mode shares the read-only-agent cap). Per-agent `max_tool_calls` from AGENTS.md frontmatter overrides.
 - **`messages_with_parts` view** (v1.13.1-B; `schema.sql`). Read sites that need `tool_calls` / `tool_results` / `reasoning_parts` SELECT from this view, NOT `messages` directly. `COALESCE`s parts-table rows over the legacy JSON columns, so pre-v1.13.0 history still resolves. Writes still target `messages`; the v1.13.0 dual-write into `message_parts` keeps both halves in sync. New payload-assembly code must use the view — calling `messages.tool_calls` directly will miss anything written post-v1.13.1-B if the JSON column ever drifts (and dual-write makes that easy to miss). Shapes: `tool_calls jsonb[]`, `tool_results jsonb` single object, `reasoning_parts jsonb[]` of `{text}`.
 - **`services/file_ops.ts`** — Shared file operation implementations used by both inference tools and HTTP routes.
 - **`services/auto_name.ts`** — Non-streaming LLM call to generate 4-word session titles after first assistant reply.
@@ -108,11 +112,12 @@ Schema CHECK migration order when renaming allowed values: (1) `ALTER TABLE ...
 ## Environment
-Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context).
+Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context), `BOOCODE_TOOLS` (`core` | `standard` | `all`, default `all`; v1.13.15-tools tier filter — ceiling, never expands an agent's whitelist).
 ## Workflow
 - Sam reviews all diffs and commits manually. Do not commit unless explicitly asked.
 - Per-batch docs live under `openspec/changes/<slug>/{proposal,tasks,design}.md`. Already-shipped batches are snapshots in `openspec/changes/archived/`. New batches follow the proposal+tasks shape; see `openspec/README.md` for the convention.
 - Deploy: `cd /opt/boocode && docker compose up --build -d` (or `docker compose build --no-cache boocode && docker compose up -d` if you suspect a layer-cache issue).
 - Git push to Gitea: `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin <branch>`. The default agent identity is rejected; the in-repo deploy key (`secrets/`, gitignored) is the working one. Transient `Connection reset by peer` retries cleanly after `sleep 5`.
 - Don't accumulate `.bak-*` files. Clean them up in the same batch or immediately after merge.
--- a/apps/server/src/index.ts
+++ b/apps/server/src/index.ts
@@ -16,6 +16,7 @@ import { registerWebSocket } from './routes/ws.js';
 import { registerModelRoutes } from './routes/models.js';
 import { registerAgentRoutes } from './routes/agents.js';
 import { registerSkillsRoutes } from './routes/skills.js';
 import { registerToolsRoutes } from './routes/tools.js';
 import { createInferenceRunner } from './services/inference/index.js';
 import { createBroker } from './services/broker.js';
 import { listSkills } from './services/skills.js';
@@ -83,6 +84,7 @@ async function main() {
  registerAgentRoutes(app, sql);
  registerSidebarRoutes(app, sql);
  registerChatRoutes(app, sql, broker);
  registerToolsRoutes(app, sql);
  // Batch 9.6: warm the skills cache at boot and surface the count. Empty or
  // missing /data/skills is non-fatal — the skill tools just return empty.
--- a/apps/server/src/routes/tools.ts
+++ b/apps/server/src/routes/tools.ts
@@ -0,0 +1,40 @@
 import type { FastifyInstance } from 'fastify';
 import type { Sql } from '../db.js';
 export interface ToolCostStat {
  tool_name: string;
  mean_prompt_tokens: number;
  mean_completion_tokens: number;
  n_calls: number;
  updated_at: string;
 }
 // v1.13.10: per-tool token cost rolling window read endpoint. Backed by the
 // tool_cost_stats view in schema.sql (last 100 calls per tool, equal-split
 // attribution across multi-tool turns, sentinel/failed-turn excluded).
 // Consumed by AgentPicker for at-a-glance per-agent cost hints.
 export function registerToolsRoutes(app: FastifyInstance, sql: Sql): void {
  app.get('/api/tools/cost_stats', async () => {
    const rows = await sql<
      {
        tool_name: string;
        prompt_tokens_sum: number;
        completion_tokens_sum: number;
        n_calls: number;
        updated_at: string;
      }[]
    >`
      SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
      FROM tool_cost_stats
      ORDER BY tool_name ASC
    `;
    const stats: ToolCostStat[] = rows.map((r) => ({
      tool_name: r.tool_name,
      mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
      mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
      n_calls: r.n_calls,
      updated_at: r.updated_at,
    }));
    return { stats };
  });
 }
--- a/apps/server/src/schema.sql
+++ b/apps/server/src/schema.sql
@@ -119,6 +119,68 @@ SELECT
    WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
 FROM messages m;
 -- v1.13.10: per-tool token cost rolling window. Derives from
 -- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
 -- the legacy JSON column) so this works whether the chat predates v1.13.0
 -- or postdates v1.13.2 (column drop). No new write site — all source data
 -- already lands via the existing tool-phase.ts:94-95 UPDATE.
 --
 -- Attribution model: equal split. A turn emitting N tool calls divides its
 -- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
 -- brief for rationale + rejected alternatives.
 --
 -- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
 -- = completion (output). Non-obvious naming; pinned via canonical writes at
 -- tool-phase.ts:94-95 et al.
 --
 -- Filtering rationale:
 --   status='complete'                — exclude failed/cancelled (defense in
 --                                      depth; failed-path doesn't write
 --                                      tokens_used so they're filtered
 --                                      indirectly too).
 --   metadata->>'kind' exclusions     — exclude cap_hit / doom_loop sentinels
 --                                      (defense in depth; sentinels are
 --                                      role='system' with tool_calls=NULL
 --                                      so they're filtered indirectly too).
 --   experimental_repairToolCall      — no special handling; retries flow
 --                                      as normal next-turn tool_result
 --                                      errors and count naturally.
 --
 -- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
 -- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
 -- tools, < 100 calls each). DROP VIEW + recreate to change window size.
 CREATE OR REPLACE VIEW tool_cost_stats AS
 WITH per_call AS (
  SELECT
    (tc->>'name')::text AS tool_name,
    (m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
    (m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
    m.created_at,
    ROW_NUMBER() OVER (
      PARTITION BY (tc->>'name')::text
      ORDER BY m.created_at DESC
    ) AS rn
  FROM messages_with_parts m,
    LATERAL jsonb_array_elements(m.tool_calls) AS tc
  WHERE m.tool_calls IS NOT NULL
    AND jsonb_array_length(m.tool_calls) > 0
    AND m.tokens_used IS NOT NULL
    AND m.ctx_used IS NOT NULL
    AND m.status = 'complete'
    AND (m.metadata IS NULL
         OR m.metadata->>'kind' IS NULL
         OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
 )
 SELECT
  tool_name,
  ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
  ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
  COUNT(*)::int AS n_calls,
  MAX(created_at) AS updated_at
 FROM per_call
 WHERE rn <= 100
 GROUP BY tool_name;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;
--- a/apps/server/src/services/tests/tool_cost_stats.test.ts
+++ b/apps/server/src/services/tests/tool_cost_stats.test.ts
@@ -0,0 +1,228 @@
 import { describe, it, expect, beforeAll, afterAll } from 'vitest';
 import postgres from 'postgres';
 import { readFileSync } from 'node:fs';
 import { resolve } from 'node:path';
 import { fileURLToPath } from 'node:url';
 // v1.13.10: integration tests for the tool_cost_stats view. Skipped unless
 // DATABASE_URL is set so they don't break `pnpm test` on a fresh checkout.
 // Run with:
 //   DATABASE_URL=postgres://boocode:<pw>@localhost:5500/boocode pnpm -C apps/server test
 //
 // Isolation: each test uses a unique tool_name suffix derived from a per-test
 // counter. The view aggregates globally across all chats, so without unique
 // tool names parallel test runs would interfere. Cleanup deletes by tool_name
 // suffix in afterAll.
 const DB_URL = process.env.DATABASE_URL;
 const describeFn = DB_URL ? describe : describe.skip;
 const TEST_RUN_ID = `v13_10_${Date.now()}`;
 const tname = (suffix: string) => `${TEST_RUN_ID}_${suffix}`;
 describeFn('tool_cost_stats view (v1.13.10)', () => {
  let sql: ReturnType<typeof postgres>;
  let projectId: string;
  let sessionId: string;
  let chatId: string;
  beforeAll(async () => {
    if (!DB_URL) return;
    sql = postgres(DB_URL, { max: 2, idle_timeout: 5, connect_timeout: 5, onnotice: () => {} });
    // Apply the schema before fixtures so the view exists. Idempotent via
    // CREATE OR REPLACE VIEW + CREATE TABLE IF NOT EXISTS; safe to run on a
    // pre-populated DB. Mirrors apps/server/src/db.ts:applySchema.
    const here = fileURLToPath(import.meta.url);
    const schemaPath = resolve(here, '../../../schema.sql');
    const ddl = readFileSync(schemaPath, 'utf8');
    await sql.unsafe(ddl);
    // Fixture project + session + chat for all inserts in this file.
    const proj = await sql<{ id: string }[]>`
      INSERT INTO projects (name, path)
      VALUES (${`tool_cost_stats_test_${TEST_RUN_ID}`}, ${`/tmp/${TEST_RUN_ID}`})
      RETURNING id
    `;
    projectId = proj[0]!.id;
    const sess = await sql<{ id: string }[]>`
      INSERT INTO sessions (project_id, name, model)
      VALUES (${projectId}, ${'test'}, ${'test-model'})
      RETURNING id
    `;
    sessionId = sess[0]!.id;
    const chat = await sql<{ id: string }[]>`
      INSERT INTO chats (session_id, name) VALUES (${sessionId}, ${'test'}) RETURNING id
    `;
    chatId = chat[0]!.id;
  });
  afterAll(async () => {
    if (!DB_URL) return;
    // Project FK CASCADE cleans sessions/chats/messages/parts in one shot.
    await sql`DELETE FROM projects WHERE id = ${projectId}`;
    await sql.end({ timeout: 5 });
  });
  async function insertAssistantTurn(opts: {
    toolNames: string[];
    tokensUsed: number | null;
    ctxUsed: number | null;
    status?: 'streaming' | 'complete' | 'failed' | 'cancelled';
    metadata?: { kind: string } | null;
    createdAt?: Date;
  }): Promise<string> {
    const toolCalls = opts.toolNames.map((name, i) => ({
      id: `call_${TEST_RUN_ID}_${name}_${i}`,
      name,
      args: {},
    }));
    const created = opts.createdAt ?? new Date();
    const rows = await sql<{ id: string }[]>`
      INSERT INTO messages (
        session_id, chat_id, role, content, kind, status,
        tool_calls, tokens_used, ctx_used,
        metadata, created_at
      )
      VALUES (
        ${sessionId}, ${chatId}, 'assistant', '', 'message',
        ${opts.status ?? 'complete'},
        ${sql.json(toolCalls as never)},
        ${opts.tokensUsed},
        ${opts.ctxUsed},
        ${opts.metadata ? sql.json(opts.metadata as never) : null},
        ${created}
      )
      RETURNING id
    `;
    return rows[0]!.id;
  }
  it('returns empty when no tool calls exist for a tool name', async () => {
    const t = tname('absent');
    const stats = await sql<{ tool_name: string }[]>`
      SELECT * FROM tool_cost_stats WHERE tool_name = ${t}
    `;
    expect(stats).toEqual([]);
  });
  it('attributes single-tool turn fully to that tool', async () => {
    const t = tname('single');
    await insertAssistantTurn({ toolNames: [t], tokensUsed: 300, ctxUsed: 15000 });
    const stats = await sql<{
      tool_name: string;
      prompt_tokens_sum: number;
      completion_tokens_sum: number;
      n_calls: number;
    }[]>`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
    expect(stats[0]).toMatchObject({
      tool_name: t,
      prompt_tokens_sum: 15000,
      completion_tokens_sum: 300,
      n_calls: 1,
    });
  });
  it('splits multi-tool turn equally across tools', async () => {
    const a = tname('multi_a');
    const b = tname('multi_b');
    const c = tname('multi_c');
    // 3 tools, 300 completion / 15000 prompt → each gets 100 / 5000
    await insertAssistantTurn({ toolNames: [a, b, c], tokensUsed: 300, ctxUsed: 15000 });
    const stats = await sql<{
      tool_name: string;
      prompt_tokens_sum: number;
      completion_tokens_sum: number;
      n_calls: number;
    }[]>`
      SELECT * FROM tool_cost_stats
      WHERE tool_name IN (${a}, ${b}, ${c})
      ORDER BY tool_name
    `;
    expect(stats).toHaveLength(3);
    for (const s of stats) {
      expect(s.completion_tokens_sum).toBe(100);
      expect(s.prompt_tokens_sum).toBe(5000);
      expect(s.n_calls).toBe(1);
    }
  });
  it('limits to last 100 calls per tool (FIFO window)', async () => {
    const t = tname('window');
    // Insert 110 turns with monotonically-increasing created_at and tokensUsed.
    // Expect view to keep only the most recent 100.
    const base = Date.now() + 1_000_000; // distant future to avoid colliding with other tests
    for (let i = 1; i <= 110; i++) {
      await insertAssistantTurn({
        toolNames: [t],
        tokensUsed: i, // 1..110
        ctxUsed: i * 10,
        createdAt: new Date(base + i),
      });
    }
    const [stat] = await sql<{
      n_calls: number;
      completion_tokens_sum: number;
    }[]>`SELECT n_calls, completion_tokens_sum FROM tool_cost_stats WHERE tool_name = ${t}`;
    expect(stat!.n_calls).toBe(100);
    // Last 100 are tokensUsed=11..110, sum = (11+110)*100/2 = 6050.
    expect(stat!.completion_tokens_sum).toBe(6050);
  });
  it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
    const t = tname('null_tokens');
    await insertAssistantTurn({ toolNames: [t], tokensUsed: null, ctxUsed: 1000 });
    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: null });
    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
    expect(stats).toEqual([]);
  });
  it('excludes failed/cancelled turns and cap_hit/doom_loop sentinel rows', async () => {
    const t = tname('filtered');
    // A: status='failed'                              — excluded
    // B: status='cancelled'                           — excluded
    // C: status='complete', metadata={kind:'cap_hit'} — excluded
    // D: status='complete', metadata={kind:'doom_loop'} — excluded
    // E: status='complete', metadata=null             — included
    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'failed' });
    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'cancelled' });
    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'cap_hit' } });
    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'doom_loop' } });
    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: null });
    const [stat] = await sql<{ n_calls: number }[]>`
      SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
    `;
    expect(stat!.n_calls).toBe(1);
  });
  it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
    const t = tname('parts');
    // Insert an assistant row with messages.tool_calls=NULL but a
    // message_parts row carrying the tool_call. The view reads via
    // messages_with_parts, which COALESCEs the parts table over the legacy
    // column — so this row should still aggregate.
    const rows = await sql<{ id: string }[]>`
      INSERT INTO messages (
        session_id, chat_id, role, content, kind, status,
        tool_calls, tokens_used, ctx_used
      )
      VALUES (
        ${sessionId}, ${chatId}, 'assistant', '', 'message', 'complete',
        NULL, 200, 5000
      )
      RETURNING id
    `;
    const messageId = rows[0]!.id;
    await sql`
      INSERT INTO message_parts (message_id, sequence, kind, payload)
      VALUES (
        ${messageId}, 0, 'tool_call',
        ${sql.json({ id: `tc_parts_${TEST_RUN_ID}`, name: t, args: {} } as never)}
      )
    `;
    const [stat] = await sql<{ n_calls: number }[]>`
      SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
    `;
    expect(stat!.n_calls).toBe(1);
  });
 });
--- a/apps/server/src/services/tests/tools.test.ts
+++ b/apps/server/src/services/tests/tools.test.ts
@@ -1,5 +1,11 @@
 import { describe, it, expect } from 'vitest';
-import { ALL_TOOLS } from '../tools.js';
+import {
  ALL_TOOLS,
  CORE_TOOL_NAMES,
  STANDARD_TOOL_NAMES,
  TOOLS_BY_NAME,
  resolveToolTier,
 } from '../tools.js';
 describe('ALL_TOOLS registry', () => {
  // v1.13.3: tools must be alpha-sorted at module load. llama.cpp's prompt
@@ -12,3 +18,59 @@ describe('ALL_TOOLS registry', () => {
    expect(names).toEqual([...names].sort((a, b) => a.localeCompare(b)));
  });
 });
 describe('resolveToolTier (v1.13.15-tools)', () => {
  it('returns CORE tools for tier=core', () => {
    expect(resolveToolTier('core')).toEqual(CORE_TOOL_NAMES);
  });
  it('returns STANDARD tools for tier=standard', () => {
    const result = resolveToolTier('standard');
    expect(result.length).toBe(STANDARD_TOOL_NAMES.length);
    expect(result.length).toBeGreaterThan(CORE_TOOL_NAMES.length);
    // STANDARD is a strict superset of CORE.
    expect(result).toEqual(expect.arrayContaining([...CORE_TOOL_NAMES]));
  });
  it('returns ALL tool names for tier=all', () => {
    expect(resolveToolTier('all').length).toBe(ALL_TOOLS.length);
  });
  it('defaults to all when env var is undefined', () => {
    expect(resolveToolTier(undefined).length).toBe(ALL_TOOLS.length);
  });
  it('is case-insensitive', () => {
    expect(resolveToolTier('CORE')).toEqual(CORE_TOOL_NAMES);
    expect(resolveToolTier('Standard').length).toBe(STANDARD_TOOL_NAMES.length);
  });
  it('falls back to all for unknown tier strings', () => {
    expect(resolveToolTier('bogus').length).toBe(ALL_TOOLS.length);
  });
 });
 describe('CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validation', () => {
  // The module-load validation in tools.ts throws if a tier references a
  // tool that doesn't exist in TOOLS_BY_NAME. These tests double-check that
  // invariant from the consumer side so a future tier-list edit can't smuggle
  // in a typo without a test failure.
  it('every CORE name exists in TOOLS_BY_NAME', () => {
    for (const name of CORE_TOOL_NAMES) {
      expect(TOOLS_BY_NAME[name], `CORE references unknown tool '${name}'`).toBeDefined();
    }
  });
  it('every STANDARD name exists in TOOLS_BY_NAME', () => {
    for (const name of STANDARD_TOOL_NAMES) {
      expect(TOOLS_BY_NAME[name], `STANDARD references unknown tool '${name}'`).toBeDefined();
    }
  });
  it('CORE is a subset of STANDARD', () => {
    const standardSet = new Set<string>(STANDARD_TOOL_NAMES);
    for (const name of CORE_TOOL_NAMES) {
      expect(standardSet.has(name), `'${name}' is in CORE but not STANDARD`).toBe(true);
    }
  });
 });
--- a/apps/server/src/services/agents.ts
+++ b/apps/server/src/services/agents.ts
@@ -1,7 +1,7 @@
 import { promises as fs } from 'node:fs';
 import { join } from 'node:path';
 import type { Agent, AgentsResponse, AgentParseError } from '../types/api.js';
-import { ALL_TOOLS } from './tools.js';
+import { ALL_TOOLS, resolveToolTier } from './tools.js';
 // v1.8.1: global agents live at /data/AGENTS.md inside the container
 // (./data:/data:ro mount on the host). Per-project AGENTS.md at the project
@@ -186,11 +186,14 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
    throw new Error(fmErrors.join('; '));
  }
  // v1.13.15-tools: intersect with BOOCODE_TOOLS tier (ceiling, not expansion).
  // Unset → resolveToolTier returns ALL tool names → no narrowing.
  const tierAllowed = new Set(resolveToolTier(process.env.BOOCODE_TOOLS));
  const filteredTools = Array.isArray(fm.tools)
    ? fm.tools.filter((t): t is string =>
-        (ALL_TOOL_NAMES as readonly string[]).includes(t),
+        (ALL_TOOL_NAMES as readonly string[]).includes(t) && tierAllowed.has(t),
      )
-    : DEFAULT_TOOLS;
+    : DEFAULT_TOOLS.filter((t) => tierAllowed.has(t));
  return {
    id: slugify(section.name),
--- a/apps/server/src/services/tools.ts
+++ b/apps/server/src/services/tools.ts
@@ -700,6 +700,64 @@ export const TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntrie
  ALL_TOOLS.map((t) => [t.name, t])
 );
 // v1.13.15-tools: tiered tool loading. BOOCODE_TOOLS env var (`core` |
 // `standard` | `all`) filters the agent's tool whitelist before LLM dispatch.
 // Daily-driver token win on qwen3.6-35b-a3b — the 35B-A3B MoE benefits from
 // any prompt-cache stability win (fewer tools = shorter, more stable tool
 // schemas in the system prompt). Pattern lift from eyaltoledano/claude-task-
 // master (MIT + Commons Clause — pattern only, no code lift).
 //
 // The env var is a CEILING. It only narrows; never expands an agent's
 // declared whitelist. Default behavior (var unset) is unchanged: all tools.
 export const CORE_TOOL_NAMES = [
  'view_file',
  'list_dir',
  'grep',
  'find_files',
 ] as const;
 export const STANDARD_TOOL_NAMES = [
  ...CORE_TOOL_NAMES,
  'web_search',
  'web_fetch',
  'git_status',
  'get_codebase_overview',
  'get_file_analysis',
  'get_symbol_info',
  'search_symbols',
  'get_dependencies',
  'watch_changes',
  'get_semantic_neighborhoods',
  'get_framework_analysis',
 ] as const;
 // Module-load validation: every name in CORE / STANDARD must exist in
 // TOOLS_BY_NAME. Catches typos and stale tier definitions before they reach
 // production; server boot fails loudly rather than silently filtering valid
 // tools out of agent whitelists.
 for (const name of CORE_TOOL_NAMES) {
  if (!TOOLS_BY_NAME[name]) {
    throw new Error(`CORE_TOOL_NAMES references unknown tool: '${name}'`);
  }
 }
 for (const name of STANDARD_TOOL_NAMES) {
  if (!TOOLS_BY_NAME[name]) {
    throw new Error(`STANDARD_TOOL_NAMES references unknown tool: '${name}'`);
  }
 }
 export function resolveToolTier(tier: string | undefined): readonly string[] {
  switch ((tier ?? 'all').toLowerCase()) {
    case 'core':
      return CORE_TOOL_NAMES;
    case 'standard':
      return STANDARD_TOOL_NAMES;
    case 'all':
    default:
      return ALL_TOOLS.map((t) => t.name);
  }
 }
 export function toolJsonSchemas(): ToolJsonSchema[] {
  return ALL_TOOLS.map((t) => t.jsonSchema);
 }
--- a/apps/web/src/api/client.ts
+++ b/apps/web/src/api/client.ts
@@ -12,6 +12,7 @@ import type {
  GitMeta,
  Skill,
  AskUserAnswer,
  ToolCostStat,
 } from './types';
 export class ApiError extends Error {
@@ -262,6 +263,14 @@ export const api = {
    list: () => request<{ skills: Skill[] }>('/api/skills'),
  },
  // v1.13.10: per-tool cost rolling-window stats (last 100 calls per tool,
  // equal-split attribution across multi-tool turns). Read endpoint backed by
  // the tool_cost_stats view. AgentPicker consumes this for per-agent cost
  // hints.
  tools: {
    costStats: () => request<{ stats: ToolCostStat[] }>('/api/tools/cost_stats'),
  },
  settings: {
    get: () => request<Record<string, unknown>>('/api/settings'),
    patch: (body: Record<string, unknown>) =>
--- a/apps/web/src/api/types.ts
+++ b/apps/web/src/api/types.ts
@@ -1,6 +1,18 @@
 export const PROJECT_STATUSES = ['open', 'archived'] as const;
 export type ProjectStatus = typeof PROJECT_STATUSES[number];
 // v1.13.10: per-tool cost rolling-window stat. Returned by
 // GET /api/tools/cost_stats — one entry per tool with mean prompt/completion
 // tokens over the last 100 invocations. AgentPicker sums across an agent's
 // whitelisted tools for per-agent cost hints.
 export interface ToolCostStat {
  tool_name: string;
  mean_prompt_tokens: number;
  mean_completion_tokens: number;
  n_calls: number;
  updated_at: string;
 }
 export interface Project {
  id: string;
  name: string;
--- a/apps/web/src/components/AgentPicker.tsx
+++ b/apps/web/src/components/AgentPicker.tsx
@@ -1,8 +1,8 @@
-import { useEffect, useState } from 'react';
+import { useEffect, useMemo, useState } from 'react';
 import { Check, ChevronDown } from 'lucide-react';
 import { toast } from 'sonner';
 import { api } from '@/api/client';
-import type { Agent, AgentParseError } from '@/api/types';
+import type { Agent, AgentParseError, ToolCostStat } from '@/api/types';
 import {
  DropdownMenu,
  DropdownMenuContent,
@@ -22,6 +22,10 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
  const [parseErrors, setParseErrors] = useState<AgentParseError[]>([]);
  const [error, setError] = useState<string | null>(null);
  const [open, setOpen] = useState(false);
  // v1.13.10: per-tool cost rolling window. Fetched once on mount; would
  // refresh on remount or page reload. Acceptable for a decision aid — the
  // 100-call rolling mean doesn't shift fast.
  const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
  // v1.8.1: per-agent parse errors are non-blocking. Silent if any agents
  // loaded successfully; a gray warning toast fires only when EVERY agent
@@ -52,6 +56,29 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
    };
  }, [projectId]);
  // v1.13.10: cost stats are project-independent — the 100-call rolling
  // window is global across all chats. Fetch once per mount; tolerate failure
  // silently (cost line hides).
  useEffect(() => {
    let cancelled = false;
    api.tools
      .costStats()
      .then((r) => {
        if (!cancelled) setCostStats(r.stats);
      })
      .catch(() => {
        if (!cancelled) setCostStats([]);
      });
    return () => {
      cancelled = true;
    };
  }, []);
  const costByTool = useMemo(
    () => Object.fromEntries(costStats.map((s) => [s.tool_name, s])),
    [costStats],
  );
  const selectedAgent = agents?.find((a) => a.id === value) ?? null;
  const triggerLabel = value === null
    ? 'No agent'
@@ -86,25 +113,33 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
              <span className="font-medium">No agent</span>
            </DropdownMenuItem>
            {agents.length > 0 && <DropdownMenuSeparator />}
-            {agents.map((a) => (
+            {agents.map((a) => {
-              <DropdownMenuItem
+              const cost = agentCost(a, costByTool);
-                key={a.id}
+              return (
-                onSelect={() => void onChange(a.id)}
+                <DropdownMenuItem
-                className="text-xs flex-col items-start gap-0.5"
+                  key={a.id}
-              >
+                  onSelect={() => void onChange(a.id)}
-                <div className="flex items-center gap-1.5">
+                  className="text-xs flex-col items-start gap-0.5"
-                  <Check
+                >
-                    className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
+                  <div className="flex items-center gap-1.5">
-                  />
+                    <Check
-                  <span className="font-medium">{a.name}</span>
+                      className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
-                </div>
+                    />
-                {a.description && (
+                    <span className="font-medium">{a.name}</span>
-                  <span className="text-muted-foreground pl-[18px] truncate w-full">
+                  </div>
-                    {a.description}
+                  {a.description && (
-                  </span>
+                    <span className="text-muted-foreground pl-[18px] truncate w-full">
-                )}
+                      {a.description}
-              </DropdownMenuItem>
+                    </span>
-            ))}
+                  )}
                  {cost.nWithData > 0 && (
                    <span className="text-muted-foreground/70 pl-[18px] truncate w-full">
                      ~{formatK(cost.prompt)} prompt / {cost.completion} completion · {cost.nWithData}/{cost.nTools} tools{cost.mostRecent ? ` · last call ${formatAgo(cost.mostRecent)}` : ''}
                    </span>
                  )}
                </DropdownMenuItem>
              );
            })}
            {parseErrors.length > 0 && (
              <div
                className="px-2 py-1.5 mt-1 text-xs text-amber-500 border-t border-border"
@@ -119,3 +154,49 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
    </DropdownMenu>
  );
 }
 // v1.13.10: sum the per-tool means across an agent's whitelisted tools.
 // Sum-of-means, not mean-of-sums — we're combining independent rolling
 // averages. nWithData reflects how many of the agent's tools have any
 // history yet; the line hides entirely when zero so a fresh deploy doesn't
 // render "0k / 0 / 0 tools".
 function agentCost(
  agent: Agent,
  costByTool: Record<string, ToolCostStat>,
 ): {
  prompt: number;
  completion: number;
  nTools: number;
  nWithData: number;
  mostRecent: string | null;
 } {
  let prompt = 0;
  let completion = 0;
  let nWithData = 0;
  let mostRecent: string | null = null;
  for (const t of agent.tools) {
    const s = costByTool[t];
    if (!s) continue;
    prompt += s.mean_prompt_tokens;
    completion += s.mean_completion_tokens;
    nWithData++;
    if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
  }
  return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
 }
 function formatK(n: number): string {
  if (n < 1000) return String(n);
  if (n < 10_000) return `${(n / 1000).toFixed(1)}k`;
  return `${Math.round(n / 1000)}k`;
 }
 function formatAgo(iso: string): string {
  const then = new Date(iso).getTime();
  if (Number.isNaN(then)) return '—';
  const diff = Date.now() - then;
  if (diff < 60_000) return 'just now';
  if (diff < 3_600_000) return `${Math.round(diff / 60_000)}m ago`;
  if (diff < 86_400_000) return `${Math.round(diff / 3_600_000)}h ago`;
  return `${Math.round(diff / 86_400_000)}d ago`;
 }
--- a/openspec/README.md
+++ b/openspec/README.md
@@ -0,0 +1,38 @@
 # openspec
 Per-batch documentation convention adopted v1.13.15-openspec.
 Lift source: Fission-AI/OpenSpec directory layout. **No CLI dependency** — just
 the folder shape. Full OpenSpec lifecycle adoption is a future v1.14+ batch.
 ## Layout
 ```
 openspec/
  changes/
    <slug>/                          # one folder per shipped or planned batch
      proposal.md                    # Why + scope summary
      tasks.md                       # implementation step list
      design.md                      # architecture / data-model decisions (optional)
      specs/                         # reserved for future OpenSpec CLI adoption
    archived/                        # snapshots of pre-v1.13.15 batch docs
      <original-filename>.md
  specs/                             # global specs, future v1.14+ use
 ```
 ## Conventions
 - Slugs are lowercase-hyphenated derived from the batch title
  (e.g. `v1-13-10-per-tool-cost`, `file-attachments-v3-5`).
 - Already-shipped pre-v1.13.15 batches live in `changes/archived/` as
  single-file snapshots. They were not split into proposal/tasks because
  the work was already complete; archiving preserves git history.
 - New v1.13.15+ batches should land directly in
  `changes/<slug>/proposal.md` (+ tasks.md, + design.md when applicable).
 - `proposal.md` carries the "Why" and scope. `tasks.md` is the action list
  (numbered or checkbox). `design.md` is for non-trivial architectural
  decisions worth recording separately.
 - A canonical dispatch brief (matching the v1.13.9 / v1.13.10 format)
  is most naturally split as proposal.md (Where we are, Why this matters,
  rationale sections) + tasks.md (Scope items, Build + smoke) + design.md
  (Attribution model, Filtering, Canonical mapping).
--- a/openspec/changes/archived/boocode_batch10.md
+++ b/openspec/changes/archived/boocode_batch10.md
--- a/openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
+++ b/openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
@@ -0,0 +1,441 @@
 ```
 #careful #boocode #nofluff
 v1.13.10 — per-tool token cost accounting (rolling 100-call window)
 Goal: surface per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of `messages_with_parts` (no new table, no new write site) + a read endpoint + AgentPicker tooltip extension. Estimated ~240 LoC, mostly UI.
 ## Where we are
 - Last tag: v1.13.9 (compaction overflow trigger — `floor(0.85 × ctx_max)` early-trigger). Branch clean.
 - v1.13.x cleanup line ✅ through v1.13.9. Queued: v1.13.10 (this) → v1.13.11 (WS Zod) → v1.13.12 (skills audit) → v1.13.2 (column drop, last).
 - Dependency (satisfied since v1.13.7 commit `ff29b48`): `includeUsage: true` on `createOpenAICompatible` in `apps/server/src/services/inference/provider.ts`. Without it, `messages.tokens_used`/`ctx_used` were NULL for v1.13.1-A → v1.13.7 (latent regression). Now populated.
 ## Why this matters
 Today: AgentPicker lists agents by name + description. No cost signal. Users pick the architect agent (full tool whitelist, 21k of tool schema) for one-liner questions a refactorer (3 tools, 4k schema) could answer.
 Tomorrow: each agent listing shows its mean prompt + completion cost per tool, derived from the last 100 invocations across all chats. Decision aid, not a hard gate.
 Why a SQL view instead of a denormalized stats table:
 - All the source data already lands in `messages` (tool_calls JSON + tokens_used + ctx_used) and `message_parts` (read via the `messages_with_parts` view). Zero new write sites.
 - Rolling 100-call window is a `ROW_NUMBER() OVER (PARTITION BY tool_name ORDER BY created_at DESC) <= 100` — natural fit for a view.
 - View is rollback-safe. If the math is wrong, `DROP VIEW` and re-deploy; no orphan rows, no backfill.
 - At BooCode scale (single user, ~30 tools, ~100 calls/tool), aggregate-on-read is microseconds. Premature to denormalize.
 The roadmap schema row (`tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)`) matches both a table and a view. View is the lighter implementation.
 ## Canonical column mapping (pinned)
 The `messages` columns are named non-obviously. Pinned mapping, confirmed across 5 write sites + 1 read site:
 | Column          | Semantic meaning   | AI SDK v6 source name |
 |-----------------|--------------------|-----------------------|
 | `ctx_used`      | prompt / input tokens   | `usage.inputTokens`   |
 | `tokens_used`   | completion / output tokens | `usage.outputTokens`  |
 Write sites confirmed: `tool-phase.ts:94-95`, `error-handler.ts:109-110`, `sentinel-summaries.ts:130-131`, `sentinel-summaries.ts:387-388`, `stream-phase.ts:319-320`. Canonical read at `payload.ts:190-191` reverses: `const promptTokens = updated.ctx_used; const completionTokens = updated.tokens_used`.
 `tokens_used` reads like "total" but is completion only. Project convention since the columns predate v1.13.x. Do not "fix" the naming inside this batch — out of scope; downstream consumers depend on the current mapping.
 ## Attribution model
 A single assistant turn can emit N tool calls in parallel. llama-swap returns ONE (prompt_tokens, completion_tokens) per turn, not per tool. Attribution requires a split.
 **Chosen approach: equal split.** For an assistant turn that emits N tool calls with prompt P and completion C, each tool is attributed P/N prompt + C/N completion. The 100-call rolling mean smooths split noise. Implementation: `tokens_used::float / jsonb_array_length(tool_calls)` at the unnest site.
 **Alternatives rejected:**
 - "Full turn cost to every tool" (no division). Over-states; a 5-tool turn would 5×-count every tool's cost.
 - "Result-size only" (`length(JSON.stringify(output)) / 4`). Loses the LLM's actual usage signal; doesn't capture how expensive a tool's output is to the next prompt.
 - "Consuming-turn delta" (next turn prompt_tokens − this turn prompt_tokens, attribute to the tool that emitted the result). Most accurate but requires bubble-back math through the `executeToolPhase → runAssistantTurn` recursion. Over-engineered for the rolling-average use case.
 **If Sam wants a different split, change one line in the view definition (the divisor).**
 ## Filtering — sentinel, failure, repair-call semantics
 The view excludes rows that aren't real tool-cost signal:
 - **Failed and cancelled turns** (`status != 'complete'`). The `error-handler.ts` failed/cancelled paths don't write `tokens_used`/`ctx_used`, so the existing `tokens_used IS NOT NULL` clause already filters these. Adding `status='complete'` is defense in depth and makes intent explicit.
 - **Cap-hit and doom-loop sentinel rows** (`metadata->>'kind' IN ('cap_hit', 'doom_loop')`). Sentinels are `role='system'` rows with `tool_calls=NULL`, so the existing `tool_calls IS NOT NULL` clause already filters them. The explicit metadata filter is defense in depth — it survives future schema drift where someone might INSERT a sentinel with a non-null tool_calls.
 - **`experimental_repairToolCall` retries.** No special handling needed. Our impl (per `CLAUDE.md`) is pass-through — malformed calls flow to zod-reject → tool_result error → next normal turn handles. No separate rows; the next turn's tokens count naturally.
 ## Recon (already done; paste for reference)
 ```
 cd /opt/boocode
 grep -n "tokens_used\|ctx_used\|inputTokens\|outputTokens" apps/server/src/services/inference/*.ts | head -30
 grep -n "metadata\|cap_hit\|doom_loop" apps/server/src/services/inference/sentinels.ts apps/server/src/schema.sql | head -10
 psql -h localhost -p 5432 -U postgres -d boocode -c "\d messages_with_parts" | head -30
 ```
 Expected: confirms the canonical mapping in the table above; confirms `messages.metadata jsonb` exists at `schema.sql:259`; confirms `messages_with_parts` exposes `m.metadata` at `schema.sql:92`.
 ## Scope
 ### 1. schema.sql — `tool_cost_stats` view (~35 LoC)
 Append after the `messages_with_parts` view (after line 120):
 ```sql
 -- v1.13.10: per-tool token cost rolling window. Derives from
 -- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
 -- the legacy JSON column) so this works whether the chat predates v1.13.0
 -- or postdates v1.13.2 (column drop). No new write site — all source data
 -- already lands via the existing tool-phase.ts:94-95 UPDATE.
 --
 -- Attribution model: equal split. A turn emitting N tool calls divides its
 -- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
 -- brief for rationale + rejected alternatives.
 --
 -- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
 -- = completion (output). Non-obvious naming; pinned via canonical writes at
 -- tool-phase.ts:94-95 et al.
 --
 -- Filtering rationale:
 --   status='complete'                — exclude failed/cancelled (defense in
 --                                      depth; failed-path doesn't write
 --                                      tokens_used so they're also filtered
 --                                      indirectly).
 --   metadata->>'kind' exclusions     — exclude cap_hit / doom_loop sentinels
 --                                      (defense in depth; sentinels are
 --                                      role='system' with tool_calls=NULL
 --                                      so they're filtered indirectly too).
 --   experimental_repairToolCall      — no special handling; retries flow
 --                                      as normal next-turn tool_result
 --                                      errors and count naturally.
 --
 -- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
 -- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
 -- tools, < 100 calls each). DROP VIEW + recreate to change window size.
 CREATE OR REPLACE VIEW tool_cost_stats AS
 WITH per_call AS (
  SELECT
    (tc->>'name')::text AS tool_name,
    (m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
    (m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
    m.created_at,
    ROW_NUMBER() OVER (
      PARTITION BY (tc->>'name')::text
      ORDER BY m.created_at DESC
    ) AS rn
  FROM messages_with_parts m,
    LATERAL jsonb_array_elements(m.tool_calls) AS tc
  WHERE m.tool_calls IS NOT NULL
    AND jsonb_array_length(m.tool_calls) > 0
    AND m.tokens_used IS NOT NULL
    AND m.ctx_used IS NOT NULL
    AND m.status = 'complete'
    AND (m.metadata IS NULL
         OR m.metadata->>'kind' IS NULL
         OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
 )
 SELECT
  tool_name,
  ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
  ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
  COUNT(*)::int AS n_calls,
  MAX(created_at) AS updated_at
 FROM per_call
 WHERE rn <= 100
 GROUP BY tool_name;
 ```
 Notes:
 - `NULLIF(..., 0)` guards against div-by-zero on `jsonb_array_length=0` (should never happen given the WHERE clause, but defensive).
 - `ROUND(SUM(...))::int` — frontend doesn't want decimals; sum-then-round is more accurate than per-row round-then-sum.
 - View is read from `messages_with_parts` not `messages`, so legacy pre-v1.13.0 rows and post-v1.13.2 rows both resolve.
 - No index needed; the underlying `idx_messages_chat` covers the JOIN; the LATERAL unnest is bounded by the 100-row partition.
 ### 2. apps/server/src/routes/tools.ts (NEW, ~40 LoC)
 New route file. Register in `apps/server/src/index.ts` next to the other `register*Routes(app, sql, ...)` calls.
 ```ts
 import type { FastifyInstance } from 'fastify';
 import type { Sql } from '../db.js';
 export interface ToolCostStat {
  tool_name: string;
  mean_prompt_tokens: number;
  mean_completion_tokens: number;
  n_calls: number;
  updated_at: string;
 }
 export function registerToolsRoutes(app: FastifyInstance, sql: Sql) {
  app.get('/api/tools/cost_stats', async () => {
    const rows = await sql<{
      tool_name: string;
      prompt_tokens_sum: number;
      completion_tokens_sum: number;
      n_calls: number;
      updated_at: string;
    }[]>`
      SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
      FROM tool_cost_stats
      ORDER BY tool_name ASC
    `;
    const stats: ToolCostStat[] = rows.map(r => ({
      tool_name: r.tool_name,
      mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
      mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
      n_calls: r.n_calls,
      updated_at: r.updated_at,
    }));
    return { stats };
  });
 }
 ```
 Route is bodyless, idempotent, cheap. No pagination (≤30 tools).
 ### 3. apps/server/src/services/__tests__/tool_cost_stats.test.ts (NEW, ~95 LoC)
 Integration test against real Postgres (matches `inference.test.ts` pattern). Fixtures:
 ```ts
 import { describe, it, expect, beforeEach } from 'vitest';
 import { connect } from '../../db.js';
 describe('tool_cost_stats view (v1.13.10)', () => {
  // ... session + chat + project setup helpers ...
  it('returns empty when no tool calls exist', async () => {
    // fresh chat, only user/assistant text turns
    const stats = await sql`SELECT * FROM tool_cost_stats`;
    expect(stats).toEqual([]);
  });
  it('attributes single-tool turn fully to that tool', async () => {
    // insert one assistant message with tool_calls=[{name: 'view_file', ...}],
    // tokens_used=300, ctx_used=15000, status='complete'
    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
    expect(stats[0]).toMatchObject({
      tool_name: 'view_file',
      prompt_tokens_sum: 15000,
      completion_tokens_sum: 300,
      n_calls: 1,
    });
  });
  it('splits multi-tool turn equally across tools', async () => {
    // insert one assistant turn with 3 tool calls (view_file, grep, list_dir),
    // tokens_used=300, ctx_used=15000 → each tool gets 100 completion, 5000 prompt
    const stats = await sql`SELECT * FROM tool_cost_stats ORDER BY tool_name`;
    expect(stats).toHaveLength(3);
    for (const s of stats) {
      expect(s.completion_tokens_sum).toBe(100);
      expect(s.prompt_tokens_sum).toBe(5000);
      expect(s.n_calls).toBe(1);
    }
  });
  it('limits to last 100 calls per tool (FIFO window)', async () => {
    // insert 150 turns each calling view_file once with monotonically
    // increasing tokens_used; expect only the most recent 100 to count
    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
    expect(stats[0]!.n_calls).toBe(100);
    // mean should reflect the latter half (51..150), not 1..150
  });
  it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
    // insert a turn with tool_calls but tokens_used=NULL → must not appear
    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
    expect(stats).toEqual([]);
  });
  it('excludes failed and cancelled turns + sentinel metadata rows', async () => {
    // insert four rows for tool_name='view_file', all with tokens_used+ctx_used
    // populated:
    //   row A: status='failed'                            — excluded
    //   row B: status='cancelled'                         — excluded
    //   row C: status='complete', metadata={kind:'cap_hit'}   — excluded
    //   row D: status='complete', metadata={kind:'doom_loop'} — excluded
    //   row E: status='complete', metadata=null               — included
    // Expect n_calls=1, attributable to row E only.
    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
    expect(stats[0]!.n_calls).toBe(1);
  });
  it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
    // insert a v1.13.0+ row with messages.tool_calls=NULL but
    // message_parts rows containing the tool_call → must still aggregate
    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='grep'`;
    expect(stats[0]!.n_calls).toBe(1);
  });
 });
 ```
 Pattern: each test resets the messages table for the fixture chat (TRUNCATE not DELETE — Postgres `messages` has FK CASCADE) and inserts hand-crafted rows. The view is recomputed on every SELECT.
 ### 4. apps/web/src/api/types.ts + client.ts (~10 LoC)
 Add to `types.ts`:
 ```ts
 export interface ToolCostStat {
  tool_name: string;
  mean_prompt_tokens: number;
  mean_completion_tokens: number;
  n_calls: number;
  updated_at: string;
 }
 ```
 Add to `client.ts` under the existing `api.*` namespace structure:
 ```ts
 tools: {
  costStats: () => fetch<{ stats: ToolCostStat[] }>('GET', '/api/tools/cost_stats'),
 },
 ```
 Match the casing convention of the existing namespaces (`api.agents.list`, `api.chats.archive`, etc.).
 ### 5. apps/web/src/components/AgentPicker.tsx — tooltip extension (~80 LoC delta)
 Currently (line 67): `title={selectedAgent?.description}` — native HTML title attribute on the trigger button.
 Replacement: dropdown items get a per-agent cost line in muted text below the description. Format:
 ```
 [Agent name]
 [Agent description]
 ~5.2k prompt / 280 completion · 6 tools · last call 3h ago
 ```
 Implementation steps:
 1. Fetch `api.tools.costStats()` once on mount (alongside the existing `api.agents.list()`). Cache result for the lifetime of the picker open state. Re-fetch only on `useEffect` dep change.
 2. Compute per-agent aggregate: for each agent, sum the means of its whitelisted tools. Sum-of-means, not mean-of-sums — we're combining independent rolling averages.
 3. Render below description (one line, muted, truncated). Show "—" if no calls recorded yet for any of the agent's tools.
 4. Don't break the existing native `title=` for backward compat; layer the cost line additively.
 ```tsx
 const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
 useEffect(() => {
  api.tools.costStats().then(r => setCostStats(r.stats)).catch(() => setCostStats([]));
 }, []);
 const costByTool = useMemo(
  () => Object.fromEntries(costStats.map(s => [s.tool_name, s])),
  [costStats],
 );
 function agentCost(agent: Agent): { prompt: number; completion: number; nTools: number; nWithData: number; mostRecent: string | null } {
  let prompt = 0, completion = 0, nWithData = 0;
  let mostRecent: string | null = null;
  for (const t of agent.tools) {
    const s = costByTool[t];
    if (!s) continue;
    prompt += s.mean_prompt_tokens;
    completion += s.mean_completion_tokens;
    nWithData++;
    if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
  }
  return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
 }
 ```
 For the line render: `~${formatK(prompt)} prompt / ${completion} completion · ${nWithData}/${nTools} tools · ${formatAgo(mostRecent)}`. Skip entirely when `nWithData === 0` to avoid showing "0k / 0 / 0 tools" for fresh-from-deploy state.
 **`formatK` / `formatAgo`:** colocate at the bottom of `AgentPicker.tsx`. Don't extract to a util file in this batch — single use site.
 ## What NOT to do
 - **Don't add a new write site at `tool-phase.ts` or `finalizeCompletion`.** All source data is already there via existing UPDATEs.
 - **Don't denormalize.** The view is sufficient and rollback-safe at BooCode's single-user scale.
 - **Don't add per-tool cost to the message bubble.** Out of scope. AgentPicker tooltip only.
 - **Don't fold per-call rows into a moving sum via triggers.** Aggregate on read; 100 rows × 30 tools is microseconds in Postgres.
 - **Don't track `result_chars` (the size of `tool_results.output`).** Tempting as a second cost signal but out of scope here. Future batch if Sam wants it.
 - **Don't add a session-scoped or chat-scoped filter to `tool_cost_stats`.** The rolling window is GLOBAL across all chats — the agent picker is a project-level decision aid. Per-chat surfacing is a future v1.14+ design.
 - **Don't change the attribution model post-deployment** without dropping the view first. Mid-flight semantic changes give bogus historical means.
 - **Don't "fix" the `ctx_used`/`tokens_used` naming inside this batch.** Non-obvious but pinned across 5 write sites. Renaming is its own batch.
 - **Don't rely solely on `tool_calls IS NOT NULL` for sentinel exclusion.** It works today (sentinels are role='system' with tool_calls=NULL) but the explicit `status='complete'` + `metadata->>'kind'` filters are defense in depth and survive future schema drift.
 ## Backup before edits
 ```
 cd /opt/boocode
 cp apps/server/src/schema.sql{,.bak-$(date +%Y%m%d-%H%M%S)}
 cp apps/web/src/components/AgentPicker.tsx{,.bak-$(date +%Y%m%d-%H%M%S)}
 ```
 (No backup needed for new files in items 2, 3, 4.)
 ## Verify
 ```
 pnpm -C apps/server test
 ```
 Expected: all existing tests pass + 7 new in `tool_cost_stats.test.ts`. Total moves from 195 → 202.
 ```
 cd /opt/boocode
 docker compose exec boocode_db psql -U postgres -d boocode -c \
  "SELECT * FROM tool_cost_stats ORDER BY n_calls DESC LIMIT 10;"
 ```
 Expected: in any live deployment with v1.13.7+ history, this returns real rows for `view_file`, `grep`, `list_dir`, etc. If empty: `messages.tool_calls` was NULL for the v1.13.1-A → v1.13.7 latent regression window and recovery only begins with v1.13.7+ traffic.
 ## Build + smoke
 ```
 cd /opt/boocode
 docker compose up --build -d boocode
 docker compose logs --since=30s boocode | tail -20
 ```
 Smoke A — view recompiles on schema apply:
 ```
 docker compose logs boocode | grep -i "tool_cost_stats\|applySchema"
 ```
 Expected: clean schema apply, view registered idempotently.
 Smoke B — endpoint returns data:
 ```
 curl -s http://localhost:3000/api/tools/cost_stats | jq '.stats | length, .stats[0]'
 ```
 Expected: nonzero length if any v1.13.7+ tool calls exist; one stat object with all 5 fields populated.
 Smoke C — UI:
 1. Open browser to `boocode.indifferentketchup.com`.
 2. Open AgentPicker dropdown on any session.
 3. Each agent row shows a muted cost line below its description: `~5.2k prompt / 280 completion · 6/8 tools · last call 2h ago`.
 4. Agents with no tool history show just description (no cost line).
 5. Confirm cost line truncates with the existing text-muted-foreground / truncate pattern; doesn't break the layout at mobile widths (open Vivaldi devtools, set iPhone-13 viewport).
 ## Files expected to touch
 - `apps/server/src/schema.sql` — ~35 LoC delta (view definition + filter comments)
 - `apps/server/src/routes/tools.ts` — NEW, ~40 LoC
 - `apps/server/src/index.ts` — 1 line (`registerToolsRoutes(app, sql)`)
 - `apps/server/src/services/__tests__/tool_cost_stats.test.ts` — NEW, ~95 LoC
 - `apps/web/src/api/types.ts` — ~7 LoC (interface)
 - `apps/web/src/api/client.ts` — ~3 LoC (namespace + method)
 - `apps/web/src/components/AgentPicker.tsx` — ~80 LoC delta (cost line + fetch hook + helpers)
 Total ~260 LoC. Matches roadmap estimate.
 ## Workflow conventions
 - Backups before destructive edits (above) on the two MODIFIED files. New files don't need backups.
 - Sam reviews diffs. Never `git add` / `git commit` / `git push` / `git pull` on Sam's behalf.
 - Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
 - Tests authoritative: `pnpm -C apps/server test`.
 - View definition lives in `schema.sql` (idempotent via `CREATE OR REPLACE VIEW`); no migration shim needed.
 ## Don't repeat past mistakes
 - v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, `BUDGET_NO_AGENT=30`): all live. This batch depends on `includeUsage:true`. If unset, `tool_cost_stats` returns empty rows.
 - v1.13.8 prefix instrumentation: untouched.
 - v1.13.9 ratio-only `usable()`: untouched.
 - v1.13.4 two-tier prune: untouched.
 - v1.13.5 truncate.ts opaque-id pattern: untouched.
 - v1.13.1-B `messages_with_parts` view: this view is the source. Don't reach past it to raw `messages`.
 - v1.13.2 will DROP `messages.tool_calls`/`tool_results` columns. The `tool_cost_stats` view reads from `messages_with_parts` not `messages`, so it survives. Verify after v1.13.2 ships.
 ## Source files to read in project knowledge
 - `boocode_roadmap.md` (v1.13.10 row at line 114; schema row at line 474)
 - `boocode_code_review.md` (cost-tracking design background)
 - `CLAUDE.md` (project conventions; messages_with_parts invariant at L80; v1.13.7 includeUsage invariant)
 ```
--- a/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
+++ b/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
Author	SHA1	Message	Date
indifferentketchup	34cbecf975	v1.13.15-tools: tiered tool loading via BOOCODE_TOOLS env var Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause — pattern only, no code lift). Adds BOOCODE_TOOLS env var with three tiers: - core (4 tools): view_file, list_dir, grep, find_files. ~2k token schema cost. - standard (15 tools): core + web_search, web_fetch, git_status, all 8 codecontext_* tools. ~10k token schema cost. - all (default; current behavior): every tool in ALL_TOOLS (20). ~21k token schema cost. The env var is a CEILING — narrows agent whitelists, never expands. Default behavior unchanged when var is unset. resolveToolTier is case-insensitive and falls back to 'all' on unknown values. CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against TOOLS_BY_NAME via two top-level for-loops that throw on the first missing name. Module fails to import if a tier references a tool that doesn't exist in the registry — catches typos and stale tier definitions at boot rather than silently filtering valid tools out of agent whitelists. Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from process.env per parse, intersects with the agent's declared frontmatter tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse read is fine — agents are re-parsed on the existing 60s cache TTL. Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant + TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195; +9 new). Deviation from the brief: the codecontext tools in the actual registry have NO codecontext_* prefix (the brief's STANDARD list assumed it). Used the actual names (get_codebase_overview, search_symbols, etc.). Module-load validation would have failed boot with the prefixed names. Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool whitelists. With BOOCODE_TOOLS=core in .env + container restart, the same agents narrow to 4 tools (find_files, grep, list_dir, view_file) — intersection of declared whitelist ∩ core tier. Reverted after confirmation. CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's Optional list. .env.example gained a commented BOOCODE_TOOLS=all line with the per-tier token-cost table. ~110 LoC across 5 files (4 modified + 1 test expansion). Under the brief's ~30 LoC estimate for code; the test suite expansion drove most of the growth.	2026-05-22 14:59:01 +00:00
indifferentketchup	5a3f357ce9	v1.13.15-openspec: reformat batch docs to OpenSpec directory structure Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal, specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep documentation reformat; replaces ad-hoc boocode_batchN.md / handoff_vN.N.N.md convention. Existing batch docs moved into openspec/changes/archived/ via git mv (preserves history): - boocode_batch10.md - handoff_v1.13.8_prefix_verify.md - handoff_v1.13.10_per_tool_cost.md Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The work was already shipped; the originals are preserved as archived snapshots. New v1.13.15+ batches land directly in openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when applicable) per the convention documented in openspec/README.md. CLAUDE.md gained a one-line pointer to the convention (workflow section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both remain well under the AgentLint hard caps. specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+). No CLI dep added in this batch — directory structure only. If/when the full OpenSpec lifecycle is adopted, that lands as a separate batch.	2026-05-22 14:54:17 +00:00
indifferentketchup	fc11e8dc91	v1.13.15-agentlint: instruction-file audit against AgentLint 31-check standard Manual audit pass against 0xmariowu/AgentLint's evidence-backed checks (MIT, drawn from 265 versions of Anthropic's internal Claude Code system prompt). Findings and fixes: - Identity sections ("You are the assistant running inside ...") removed from BOOCHAT.md (line 3) and BOOCODER.md (line 5). The model already knows where it's running; the openers were emphatic decoration. - CLAUDE.local.md added to .gitignore (.env was already covered). Claude Code's Glob tool ignores .gitignore by default, which means any local override file was otherwise readable by any agent walking the workspace. - CLAUDE.md unchanged — already passes all 10 checks. Emphasis density 0.58/1000 words (under Anthropic's 1.4/1000 endpoint); two IMPORTANT/ MUST references are load-bearing (tsc-noEmit footgun, v1.13.7 includeUsage invariant); zero identity sections; zero --no-verify references; 27,682 chars (under the 40,000-char silent-drop limit). Line count (153) is over the 60-120 target band, but the brief explicitly forbids structural rewrites in the audit pass. Targets not in scope: - /opt/boocode/AGENTS.md does not exist in this repo (removed in v1.12, per CLAUDE.md:152). The global agent registry lives at /data/AGENTS.md (bind-mounted from outside the repo); can't be touched by this batch. - No .github/workflows/ directory — SHA-pin audit (step 8) skipped. Cumulative effect: model spends fewer tokens parsing instruction-file ceremony in BOOCHAT/BOOCODER and receives sharper priority signal per Anthropic's measured-evolution data. Zero code changes.	2026-05-22 14:52:37 +00:00
indifferentketchup	9ce638c916	v1.13.10: per-tool token cost accounting (rolling 100-call view) Surfaces per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of messages_with_parts plus a read endpoint and AgentPicker tooltip extension. No new write site; all source data already lands via the existing tool-phase.ts:94-95 / error-handler.ts: 109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage: true fix made non-NULL. (1) schema.sql — new tool_cost_stats view. Window-functions over messages_with_parts.tool_calls with LATERAL jsonb_array_elements. Attribution: equal split — multi-tool turn divides tokens N-ways; the 100-call rolling mean absorbs split noise. Filters: status= 'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude failed turns and sentinels respectively; tool_calls IS NOT NULL is defense-in-depth since sentinels are role='system' rows. CREATE OR REPLACE means schema apply is idempotent. (2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_ completion_tokens computed at read time (sum / n_calls). Sorted by tool_name ASC. No pagination — ≤30 tools. (3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests keyed off DATABASE_URL env var. Tests skip gracefully when unset (no-DB default). beforeAll applies the schema via sql.unsafe(read FileSync(schema.sql)) for self-contained runs. Helper insertAssistant Turn shared across cases. Covers: empty state, single-tool attribution, multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion, parts-authoritative read via messages_with_parts, failed/sentinel exclusion. (4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools. costStats() method binding. (5) AgentPicker.tsx — fetch costStats on mount, compute per-agent sum-of-means across whitelisted tools, render muted cost line below description: "~5.2k prompt / 280 completion · 6/8 tools · last call 3h ago". Skips line entirely when no tool history; preserves existing native title= for layout backward-compat. formatK/formatAgo colocated. Tests: 202/202 pass (195 prior + 7 new view-integration). Server + web tsc clean. Smoke: schema applied cleanly; GET /api/tools/cost_stats returns canonical JSON; view + endpoint agree. Single-row result expected given the v1.13.1-A → v1.13.7 NULL latent regression window; new traffic populates organically. Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both match. View vs table decision documented in handoff_v1.13.10_per_ tool_cost.md (rollback-safe, microsecond-fast at BooCode scale). ~270 LoC across 8 files (5 modified + 3 new).	2026-05-22 14:42:09 +00:00
indifferentketchup	8126d78b34	docs: capture v1.13.7-v1.13.9 invariants in CLAUDE.md Five additions surfacing session-discovered constraints future Claude sessions need: - AI SDK v6 includeUsage:true requirement (avoids re-introducing the v1.13.1-A→v1.13.7 NULL-tokens regression) - \n text-delta trim guards in MessageList/MessageBubble + payload.ts failed/empty-assistant skip rules (avoid undoing v1.13.7) - 0.85 × ctx_max overflow formula (v1.13.9) replacing the stale ctx_max - 20k line - New services/system-prompt.ts bullet documenting the v1.13.8 fingerprint instrumentation surface - New services/inference/budget.ts bullet with current BUDGET_NO_AGENT=30 and read-only-tools rationale	2026-05-22 14:07:11 +00:00