v1.13.15-tools: tiered tool loading via BOOCODE_TOOLS env var

Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause — pattern only, no code lift). Adds BOOCODE_TOOLS env var with three tiers: - core (4 tools): view_file, list_dir, grep, find_files. ~2k token schema cost. - standard (15 tools): core + web_search, web_fetch, git_status, all 8 codecontext_* tools. ~10k token schema cost. - all (default; current behavior): every tool in ALL_TOOLS (20). ~21k token schema cost. The env var is a CEILING — narrows agent whitelists, never expands. Default behavior unchanged when var is unset. resolveToolTier is case-insensitive and falls back to 'all' on unknown values. CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against TOOLS_BY_NAME via two top-level for-loops that throw on the first missing name. Module fails to import if a tier references a tool that doesn't exist in the registry — catches typos and stale tier definitions at boot rather than silently filtering valid tools out of agent whitelists. Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from process.env per parse, intersects with the agent's declared frontmatter tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse read is fine — agents are re-parsed on the existing 60s cache TTL. Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant + TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195; +9 new). Deviation from the brief: the codecontext tools in the actual registry have NO codecontext_* prefix (the brief's STANDARD list assumed it). Used the actual names (get_codebase_overview, search_symbols, etc.). Module-load validation would have failed boot with the prefixed names. Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool whitelists. With BOOCODE_TOOLS=core in .env + container restart, the same agents narrow to 4 tools (find_files, grep, list_dir, view_file) — intersection of declared whitelist ∩ core tier. Reverted after confirmation. CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's Optional list. .env.example gained a commented BOOCODE_TOOLS=all line with the per-tier token-cost table. ~110 LoC across 5 files (4 modified + 1 test expansion). Under the brief's ~30 LoC estimate for code; the test suite expansion drove most of the growth.
v1.13.15-openspec: reformat batch docs to OpenSpec directory structure
2026-05-22 14:59:01 +00:00 · 2026-05-22 14:54:17 +00:00 · 2026-05-22 14:52:37 +00:00 · 2026-05-22 14:42:09 +00:00 · 2026-05-22 14:07:11 +00:00
19 changed files with 1079 additions and 32 deletions
--- a/.env.example
+++ b/.env.example
@@ -10,3 +10,12 @@ POSTGRES_PASSWORD=CHANGE_ME
 # Internal Tailscale address that bypasses Authelia. Override if you
 # point BooCode at a different SearXNG instance.
 SEARXNG_URL=http://100.114.205.53:8888
+
+# v1.13.15-tools: BOOCODE_TOOLS narrows the tool whitelist sent to the LLM.
+# Unset (default) → all tools (~21k schema). Useful primarily for single-purpose
+# sessions where the model only needs read-only filesystem access.
+#
+# core      → view_file, list_dir, grep, find_files                       (~2k)
+# standard  → core + web_*, git_status, all 8 codecontext_* tools         (~10k)
+# all       → every tool in ALL_TOOLS                                     (~21k)
+# BOOCODE_TOOLS=all
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,7 @@
 node_modules
 dist
 .env
+CLAUDE.local.md
 *.log
 .DS_Store
 .vite
--- a/BOOCHAT.md
+++ b/BOOCHAT.md
@@ -1,7 +1,5 @@
 # BooChat

-You are the assistant running inside BooChat — a self-hosted developer chat app.
-
 ## Capabilities

 - Read-only file tools: `view_file`, `list_dir`, `grep`, `find_files`
--- a/BOOCODER.md
+++ b/BOOCODER.md
@@ -2,8 +2,6 @@

 > (Stub. v2.0 implementation pending. This file documents the intended contract.)

-You are the assistant running inside BooCoder — the write-capable companion to BooChat.
-
 ## Capabilities

 - Everything in `BOOCHAT.md`
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -47,10 +47,12 @@ Tests: `pnpm -C apps/server test` runs the vitest suite. No test harness on `app

 Key services:
 - **`services/inference/`** — Public surface re-exported via `inference/index.ts`; callers import from `./services/inference/index.js` explicitly (NodeNext doesn't honor directory-index resolution). Layout: `turn.ts` (runAssistantTurn / runInference / createInferenceRunner; exports `InferenceFrame`, `InferenceContext`, `TurnArgs`, `StreamResult`), `stream-phase.ts` (streamCompletion as a v1.13.1-A AI SDK adapter + executeStreamPhase), `provider.ts` (`upstreamModel(baseURL, modelId)` wrapping `createOpenAICompatible` against llama-swap), `tool-phase.ts` (executeToolPhase; value back-edges into turn.ts for the runAssistantTurn recursion — cycle safe because deref at call time, not module top-level), `sentinel-summaries.ts` (runCapHitSummary + runDoomLoopSummary + their sentinel inserters), `error-handler.ts` (handleAbortOrError, finalizeCompletion), `payload.ts` (buildMessagesPayload, loadContext, maybeFlagForCompaction, `OpenAiMessage`), `sentinels.ts` (`detectDoomLoop`, `DOOM_LOOP_THRESHOLD`, sentinel predicates), `budget.ts` (resolveToolBudget), `xml-parser.ts` (qwen3.6 XML tool-call fallback — KEEP, AI SDK doesn't handle inline-XML tool calls), `parts.ts` (v1.13.0 dual-write helpers: `partsFromAssistantMessage`, `partsFromToolMessage`, `insertParts`), `prune.ts` (v1.13.4 two-tier compaction; `selectPruneTargets` is the pure decision helper), `types.ts` (`StreamPhaseState`, `DB_FLUSH_INTERVAL_MS`). **`TurnArgs`** is the per-turn state envelope threaded through the `executeToolPhase → runAssistantTurn` recursion; reset in `runInference` at user-message boundary. Add new per-turn state to `TurnArgs`, not module-level closures.
- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Three gotchas the LSP/test suite won't catch:
+- **AI SDK v6 streamCompletion adapter** (v1.13.1-A; `services/inference/stream-phase.ts`). `streamText` is the underlying call; the BooCode layer above (executeStreamPhase, finalize, dual-write) is shape-preserved via an adapter. Five gotchas the LSP/test suite won't catch:
  - **Abort signals are swallowed.** `streamText`'s `fullStream` iterator exits cleanly when `abortSignal` fires — no throw. Post-iteration `if (signal?.aborted) throw <AbortError>` is required; without it the row finalizes as `complete` instead of `cancelled`. Comment in stream-phase.ts pins this; don't refactor it away.
  - **Usage lands only at stream end** via `await result.usage` (`inputTokens` / `outputTokens` v6 names → mapped to `promptTokens` / `completionTokens` for the existing onUsage callback). Mid-stream live tok/s is gone vs v1.12.2; ChatThroughput shows a single value at stream end.
  - **Tools have NO `execute` field.** BooCode dispatches tools in tool-phase.ts, not the AI SDK loop. Only `description` + `inputSchema: jsonSchema(parameters)` — surfacing tool-call parts via `fullStream` and stopping is what we want.
+  - **`includeUsage: true` MUST be set on `createOpenAICompatible`** in `services/inference/provider.ts`. The adapter defaults it false, omitting `stream_options.include_usage` from the request body; llama-swap then never emits the usage block and `result.usage.inputTokens/outputTokens` resolve to `undefined`. Latent regression from v1.13.1-A through v1.13.7 — every assistant row in that window has `tokens_used`/`ctx_used` NULL. Don't remove this flag during refactor.
+  - **Tool-call-only turns may emit a leading `\n` text-delta** as the assistant content. `MessageList.flatten`'s `hasText` and `MessageBubble`'s `hasContent` both `.trim()` before the length check — otherwise whitespace-only content renders an empty bubble + ActionRow between every tool call (v1.13.7 fix). `payload.ts:buildMessagesPayload` also skips `status='failed'` AND complete-but-empty (no content, no tool_calls) assistant rows to avoid "Cannot have 2 or more assistant messages at the end of the list" upstream rejections after cap-hit + Continue.
 - **AI SDK ModelMessage conversion** (`toModelMessages` in stream-phase.ts). Tool messages need a `toolName` for `ToolResultPart` — BooCode's OpenAI-shape history doesn't carry it, so a forward-scan builds a `tool_call_id → toolName` map from prior assistant `tool_calls`. Tool outputs wrapped as `{ type: 'json' | 'text', value }` matching the v6 `ToolResultOutput` union. Assistant messages with reasoning emit a `ReasoningPart` first in the content array (v1.13.1-C).
 - **`experimental_repairToolCall`** (v1.13.3) wired into `streamText` to keep the stream alive when qwen3.6 emits malformed tool args. Pass-through implementation — logs the bad call and returns it unmodified; `executeToolPhase`'s existing zod-reject error path routes it to the model on the next turn.
 - **`chat_status` frame shape** (published via `broker.publishUser`) — `status: 'streaming' | 'tool_running' | 'waiting_for_input' | 'idle' | 'error'` (widened from `working|idle|error` in v1.12.1). Frontend `useChatStatus` derives `idle_warm` (<30s since idle) vs `idle_cold`. `ChatThroughput` renders inline beside `StatusDot` only when streaming or tool_running, fed by 500ms-throttled `'usage'` WS frames (`completion_tokens` + `ctx_used` + `ctx_max`). The `POST /api/chats/:id/discard_stale` endpoint exists to mark a stuck-streaming row as `failed` when the frontend's 60s no-token-activity timer (`ChatPane` content-length watcher) gives up.
@@ -58,7 +60,9 @@ Key services:
 - **Periodic 60s sweeper** in `apps/server/src/index.ts` (v1.13.3 + v1.13.5). Same `setInterval` runs `sweepStaleStreaming` (marks `messages.status='streaming'` older than 5 min as `failed`, publishes `chat_status='idle'` so the UI dot drops) and `cleanupTruncations` (TTL + orphan reap of tmpfs truncation files). `app.addHook('onClose')` clears the timer. No-op when nothing to reap.
 - **`services/broker.ts`** — In-memory pub/sub with two channel types: per-session (message streaming) and per-user (sidebar updates). No persistence; clients reconnect on restart.
 - **`services/tools.ts`** — Tool registry (`ALL_TOOLS`, `READ_ONLY_TOOL_NAMES`, `TOOLS_BY_NAME`). Filesystem tools (view_file/list_dir/grep/find_files) go through three guard layers: `path_guard.ts` (workspace scope), `secret_guard.ts` (filename deny list), `url_guard.ts` (SSRF/private-IP block for web_fetch). v1.11.8+ web tools (`web_search`, `web_fetch`) are opt-in per chat via `session.web_search_enabled` (resolved with `project.default_web_search_enabled` fallback) and filtered out of the LLM's tool schema when false. v1.13.5 truncation: when a tool slice cuts content, `services/truncate.ts` stashes the full text on tmpfs at `BOOCODE_TRUNCATION_DIR` (default `/tmp/boocode-truncations`, 0o700) keyed by an opaque `tr_<12 base32 chars>` id, and the `view_truncated_output(id)` tool retrieves it. 5MB cap (matches `view_file`'s `MAX_FILE_BYTES`), 7-day TTL, reaped by the periodic sweeper. Tmpfs path means container restart loses retrieval — acceptable, the model usually has moved on.
- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = ctx_max - 20k`. **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
+- **`services/compaction.ts`** + **`services/model-context.ts`** — v1.11.0 anchored rolling summary (single `summary=true` assistant row per chat, supersedes itself on each compaction). Triggered when `chats.needs_compaction` is set after an inference turn exceeds `usable(ctx_max) = floor(0.85 × ctx_max)` (v1.13.9 opencode-pattern early trigger; was `ctx_max - 20k` pre-v1.13.9, which gave only 7.6% headroom at 262k and 0 budget for ≤20k contexts). **`ctx_max` comes from `model-context.getModelContext()` which fetches `${LLAMA_SWAP_URL}/upstream/<model>/props`** — NOT from `parsed.timings.n_ctx` (the stream completion's `timings` doesn't carry n_ctx; that read was dead code until v1.11.3 ripped it out). First inferences after a boocode boot may have `ctx_max=NULL` if llama-swap hasn't loaded the model yet; negative cache TTL is 60s, recovers on next turn. v1.13.6: `buildHeadPayload` embeds `reasoning_parts` as a `<reasoning>...</reasoning>` prose prefix on the assistant `content` (OpenAI wire shape has no structured reasoning field; the summarizer reads text). Standalone tag when content is empty (tool-call-only turn). `buildHeadPayload` + `OpenAiMessage` exported for test access — keep them exported.
+- **`services/system-prompt.ts`** — `buildSystemPrompt` is the string-returning shim; `buildSystemPromptWithFingerprint` is the canonical impl returning `{prompt, fingerprint, drift}`. v1.13.8 instrumentation: SHA-256 of the assembled prefix is logged per `buildMessagesPayload` call (msg `prefix-fingerprint`, level=info); a `Map<sessionId, lastHash>` observer fires `prefix-drift` (level=warn) on hash change with a field-level `changed_inputs` diff. Smoke proved the prefix is byte-stable across turns in steady-state — the originally-planned `system_prompt_cache` DB table was dropped as redundant against the v1.12.0 input-layer mtime caches (BOOCHAT.md here + AGENTS.md global+per-project in `agents.ts:safeStat`).
+- **`services/inference/budget.ts`** — tool-call budgets: `BUDGET_READ_ONLY = 30`, `BUDGET_NON_READ_ONLY = 10` (forward-looking; no write tools yet), `BUDGET_NO_AGENT = 30` (v1.13.7; was 15 — every tool in `ALL_TOOLS` is read-only today, so no-agent mode shares the read-only-agent cap). Per-agent `max_tool_calls` from AGENTS.md frontmatter overrides.
 - **`messages_with_parts` view** (v1.13.1-B; `schema.sql`). Read sites that need `tool_calls` / `tool_results` / `reasoning_parts` SELECT from this view, NOT `messages` directly. `COALESCE`s parts-table rows over the legacy JSON columns, so pre-v1.13.0 history still resolves. Writes still target `messages`; the v1.13.0 dual-write into `message_parts` keeps both halves in sync. New payload-assembly code must use the view — calling `messages.tool_calls` directly will miss anything written post-v1.13.1-B if the JSON column ever drifts (and dual-write makes that easy to miss). Shapes: `tool_calls jsonb[]`, `tool_results jsonb` single object, `reasoning_parts jsonb[]` of `{text}`.
 - **`services/file_ops.ts`** — Shared file operation implementations used by both inference tools and HTTP routes.
 - **`services/auto_name.ts`** — Non-streaming LLM call to generate 4-word session titles after first assistant reply.
@@ -108,11 +112,12 @@ Schema CHECK migration order when renaming allowed values: (1) `ALTER TABLE ...

 ## Environment

-Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context).
+Required: `DATABASE_URL`, `LLAMA_SWAP_URL`. Optional: `PORT` (3000), `HOST` (0.0.0.0), `PROJECT_ROOT_WHITELIST` (/opt, read-only scope for add-existing path resolution), `BOOTSTRAP_ROOT` (/opt/projects, writable scope for create-new-project bootstrap mkdir target — host must `mkdir -p /opt/projects` before container start), `DEFAULT_MODEL`, `LOG_LEVEL`, `SEARXNG_URL` (default `http://100.114.205.53:8888` — internal Tailscale Fathom; the public `search.indifferentketchup.com` is behind Authelia and unusable from server context), `BOOCODE_TOOLS` (`core` | `standard` | `all`, default `all`; v1.13.15-tools tier filter — ceiling, never expands an agent's whitelist).

 ## Workflow

 - Sam reviews all diffs and commits manually. Do not commit unless explicitly asked.
+- Per-batch docs live under `openspec/changes/<slug>/{proposal,tasks,design}.md`. Already-shipped batches are snapshots in `openspec/changes/archived/`. New batches follow the proposal+tasks shape; see `openspec/README.md` for the convention.
 - Deploy: `cd /opt/boocode && docker compose up --build -d` (or `docker compose build --no-cache boocode && docker compose up -d` if you suspect a layer-cache issue).
 - Git push to Gitea: `GIT_SSH_COMMAND="ssh -i /opt/boocode/secrets/boocode_gitea -o IdentitiesOnly=yes" git push origin <branch>`. The default agent identity is rejected; the in-repo deploy key (`secrets/`, gitignored) is the working one. Transient `Connection reset by peer` retries cleanly after `sleep 5`.
 - Don't accumulate `.bak-*` files. Clean them up in the same batch or immediately after merge.
--- a/apps/server/src/index.ts
+++ b/apps/server/src/index.ts
@@ -16,6 +16,7 @@ import { registerWebSocket } from './routes/ws.js';
 import { registerModelRoutes } from './routes/models.js';
 import { registerAgentRoutes } from './routes/agents.js';
 import { registerSkillsRoutes } from './routes/skills.js';
+import { registerToolsRoutes } from './routes/tools.js';
 import { createInferenceRunner } from './services/inference/index.js';
 import { createBroker } from './services/broker.js';
 import { listSkills } from './services/skills.js';
@@ -83,6 +84,7 @@ async function main() {
  registerAgentRoutes(app, sql);
  registerSidebarRoutes(app, sql);
  registerChatRoutes(app, sql, broker);
+  registerToolsRoutes(app, sql);

  // Batch 9.6: warm the skills cache at boot and surface the count. Empty or
  // missing /data/skills is non-fatal — the skill tools just return empty.
--- a/apps/server/src/routes/tools.ts
+++ b/apps/server/src/routes/tools.ts
@@ -0,0 +1,40 @@
+import type { FastifyInstance } from 'fastify';
+import type { Sql } from '../db.js';
+
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+
+// v1.13.10: per-tool token cost rolling window read endpoint. Backed by the
+// tool_cost_stats view in schema.sql (last 100 calls per tool, equal-split
+// attribution across multi-tool turns, sentinel/failed-turn excluded).
+// Consumed by AgentPicker for at-a-glance per-agent cost hints.
+export function registerToolsRoutes(app: FastifyInstance, sql: Sql): void {
+  app.get('/api/tools/cost_stats', async () => {
+    const rows = await sql<
+      {
+        tool_name: string;
+        prompt_tokens_sum: number;
+        completion_tokens_sum: number;
+        n_calls: number;
+        updated_at: string;
+      }[]
+    >`
+      SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
+      FROM tool_cost_stats
+      ORDER BY tool_name ASC
+    `;
+    const stats: ToolCostStat[] = rows.map((r) => ({
+      tool_name: r.tool_name,
+      mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
+      mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
+      n_calls: r.n_calls,
+      updated_at: r.updated_at,
+    }));
+    return { stats };
+  });
+}
--- a/apps/server/src/schema.sql
+++ b/apps/server/src/schema.sql
@@ -119,6 +119,68 @@ SELECT
    WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
 FROM messages m;

+-- v1.13.10: per-tool token cost rolling window. Derives from
+-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
+-- the legacy JSON column) so this works whether the chat predates v1.13.0
+-- or postdates v1.13.2 (column drop). No new write site — all source data
+-- already lands via the existing tool-phase.ts:94-95 UPDATE.
+--
+-- Attribution model: equal split. A turn emitting N tool calls divides its
+-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
+-- brief for rationale + rejected alternatives.
+--
+-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
+-- = completion (output). Non-obvious naming; pinned via canonical writes at
+-- tool-phase.ts:94-95 et al.
+--
+-- Filtering rationale:
+--   status='complete'                — exclude failed/cancelled (defense in
+--                                      depth; failed-path doesn't write
+--                                      tokens_used so they're filtered
+--                                      indirectly too).
+--   metadata->>'kind' exclusions     — exclude cap_hit / doom_loop sentinels
+--                                      (defense in depth; sentinels are
+--                                      role='system' with tool_calls=NULL
+--                                      so they're filtered indirectly too).
+--   experimental_repairToolCall      — no special handling; retries flow
+--                                      as normal next-turn tool_result
+--                                      errors and count naturally.
+--
+-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
+-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
+-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
+CREATE OR REPLACE VIEW tool_cost_stats AS
+WITH per_call AS (
+  SELECT
+    (tc->>'name')::text AS tool_name,
+    (m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
+    (m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
+    m.created_at,
+    ROW_NUMBER() OVER (
+      PARTITION BY (tc->>'name')::text
+      ORDER BY m.created_at DESC
+    ) AS rn
+  FROM messages_with_parts m,
+    LATERAL jsonb_array_elements(m.tool_calls) AS tc
+  WHERE m.tool_calls IS NOT NULL
+    AND jsonb_array_length(m.tool_calls) > 0
+    AND m.tokens_used IS NOT NULL
+    AND m.ctx_used IS NOT NULL
+    AND m.status = 'complete'
+    AND (m.metadata IS NULL
+         OR m.metadata->>'kind' IS NULL
+         OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
+)
+SELECT
+  tool_name,
+  ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
+  ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
+  COUNT(*)::int AS n_calls,
+  MAX(created_at) AS updated_at
+FROM per_call
+WHERE rn <= 100
+GROUP BY tool_name;
+
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;
--- a/apps/server/src/services/tests/tool_cost_stats.test.ts
+++ b/apps/server/src/services/tests/tool_cost_stats.test.ts
@@ -0,0 +1,228 @@
+import { describe, it, expect, beforeAll, afterAll } from 'vitest';
+import postgres from 'postgres';
+import { readFileSync } from 'node:fs';
+import { resolve } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+// v1.13.10: integration tests for the tool_cost_stats view. Skipped unless
+// DATABASE_URL is set so they don't break `pnpm test` on a fresh checkout.
+// Run with:
+//   DATABASE_URL=postgres://boocode:<pw>@localhost:5500/boocode pnpm -C apps/server test
+//
+// Isolation: each test uses a unique tool_name suffix derived from a per-test
+// counter. The view aggregates globally across all chats, so without unique
+// tool names parallel test runs would interfere. Cleanup deletes by tool_name
+// suffix in afterAll.
+
+const DB_URL = process.env.DATABASE_URL;
+const describeFn = DB_URL ? describe : describe.skip;
+
+const TEST_RUN_ID = `v13_10_${Date.now()}`;
+const tname = (suffix: string) => `${TEST_RUN_ID}_${suffix}`;
+
+describeFn('tool_cost_stats view (v1.13.10)', () => {
+  let sql: ReturnType<typeof postgres>;
+  let projectId: string;
+  let sessionId: string;
+  let chatId: string;
+
+  beforeAll(async () => {
+    if (!DB_URL) return;
+    sql = postgres(DB_URL, { max: 2, idle_timeout: 5, connect_timeout: 5, onnotice: () => {} });
+
+    // Apply the schema before fixtures so the view exists. Idempotent via
+    // CREATE OR REPLACE VIEW + CREATE TABLE IF NOT EXISTS; safe to run on a
+    // pre-populated DB. Mirrors apps/server/src/db.ts:applySchema.
+    const here = fileURLToPath(import.meta.url);
+    const schemaPath = resolve(here, '../../../schema.sql');
+    const ddl = readFileSync(schemaPath, 'utf8');
+    await sql.unsafe(ddl);
+
+    // Fixture project + session + chat for all inserts in this file.
+    const proj = await sql<{ id: string }[]>`
+      INSERT INTO projects (name, path)
+      VALUES (${`tool_cost_stats_test_${TEST_RUN_ID}`}, ${`/tmp/${TEST_RUN_ID}`})
+      RETURNING id
+    `;
+    projectId = proj[0]!.id;
+    const sess = await sql<{ id: string }[]>`
+      INSERT INTO sessions (project_id, name, model)
+      VALUES (${projectId}, ${'test'}, ${'test-model'})
+      RETURNING id
+    `;
+    sessionId = sess[0]!.id;
+    const chat = await sql<{ id: string }[]>`
+      INSERT INTO chats (session_id, name) VALUES (${sessionId}, ${'test'}) RETURNING id
+    `;
+    chatId = chat[0]!.id;
+  });
+
+  afterAll(async () => {
+    if (!DB_URL) return;
+    // Project FK CASCADE cleans sessions/chats/messages/parts in one shot.
+    await sql`DELETE FROM projects WHERE id = ${projectId}`;
+    await sql.end({ timeout: 5 });
+  });
+
+  async function insertAssistantTurn(opts: {
+    toolNames: string[];
+    tokensUsed: number | null;
+    ctxUsed: number | null;
+    status?: 'streaming' | 'complete' | 'failed' | 'cancelled';
+    metadata?: { kind: string } | null;
+    createdAt?: Date;
+  }): Promise<string> {
+    const toolCalls = opts.toolNames.map((name, i) => ({
+      id: `call_${TEST_RUN_ID}_${name}_${i}`,
+      name,
+      args: {},
+    }));
+    const created = opts.createdAt ?? new Date();
+    const rows = await sql<{ id: string }[]>`
+      INSERT INTO messages (
+        session_id, chat_id, role, content, kind, status,
+        tool_calls, tokens_used, ctx_used,
+        metadata, created_at
+      )
+      VALUES (
+        ${sessionId}, ${chatId}, 'assistant', '', 'message',
+        ${opts.status ?? 'complete'},
+        ${sql.json(toolCalls as never)},
+        ${opts.tokensUsed},
+        ${opts.ctxUsed},
+        ${opts.metadata ? sql.json(opts.metadata as never) : null},
+        ${created}
+      )
+      RETURNING id
+    `;
+    return rows[0]!.id;
+  }
+
+  it('returns empty when no tool calls exist for a tool name', async () => {
+    const t = tname('absent');
+    const stats = await sql<{ tool_name: string }[]>`
+      SELECT * FROM tool_cost_stats WHERE tool_name = ${t}
+    `;
+    expect(stats).toEqual([]);
+  });
+
+  it('attributes single-tool turn fully to that tool', async () => {
+    const t = tname('single');
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 300, ctxUsed: 15000 });
+    const stats = await sql<{
+      tool_name: string;
+      prompt_tokens_sum: number;
+      completion_tokens_sum: number;
+      n_calls: number;
+    }[]>`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
+    expect(stats[0]).toMatchObject({
+      tool_name: t,
+      prompt_tokens_sum: 15000,
+      completion_tokens_sum: 300,
+      n_calls: 1,
+    });
+  });
+
+  it('splits multi-tool turn equally across tools', async () => {
+    const a = tname('multi_a');
+    const b = tname('multi_b');
+    const c = tname('multi_c');
+    // 3 tools, 300 completion / 15000 prompt → each gets 100 / 5000
+    await insertAssistantTurn({ toolNames: [a, b, c], tokensUsed: 300, ctxUsed: 15000 });
+    const stats = await sql<{
+      tool_name: string;
+      prompt_tokens_sum: number;
+      completion_tokens_sum: number;
+      n_calls: number;
+    }[]>`
+      SELECT * FROM tool_cost_stats
+      WHERE tool_name IN (${a}, ${b}, ${c})
+      ORDER BY tool_name
+    `;
+    expect(stats).toHaveLength(3);
+    for (const s of stats) {
+      expect(s.completion_tokens_sum).toBe(100);
+      expect(s.prompt_tokens_sum).toBe(5000);
+      expect(s.n_calls).toBe(1);
+    }
+  });
+
+  it('limits to last 100 calls per tool (FIFO window)', async () => {
+    const t = tname('window');
+    // Insert 110 turns with monotonically-increasing created_at and tokensUsed.
+    // Expect view to keep only the most recent 100.
+    const base = Date.now() + 1_000_000; // distant future to avoid colliding with other tests
+    for (let i = 1; i <= 110; i++) {
+      await insertAssistantTurn({
+        toolNames: [t],
+        tokensUsed: i, // 1..110
+        ctxUsed: i * 10,
+        createdAt: new Date(base + i),
+      });
+    }
+    const [stat] = await sql<{
+      n_calls: number;
+      completion_tokens_sum: number;
+    }[]>`SELECT n_calls, completion_tokens_sum FROM tool_cost_stats WHERE tool_name = ${t}`;
+    expect(stat!.n_calls).toBe(100);
+    // Last 100 are tokensUsed=11..110, sum = (11+110)*100/2 = 6050.
+    expect(stat!.completion_tokens_sum).toBe(6050);
+  });
+
+  it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
+    const t = tname('null_tokens');
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: null, ctxUsed: 1000 });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: null });
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
+    expect(stats).toEqual([]);
+  });
+
+  it('excludes failed/cancelled turns and cap_hit/doom_loop sentinel rows', async () => {
+    const t = tname('filtered');
+    // A: status='failed'                              — excluded
+    // B: status='cancelled'                           — excluded
+    // C: status='complete', metadata={kind:'cap_hit'} — excluded
+    // D: status='complete', metadata={kind:'doom_loop'} — excluded
+    // E: status='complete', metadata=null             — included
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'failed' });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'cancelled' });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'cap_hit' } });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'doom_loop' } });
+    await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: null });
+    const [stat] = await sql<{ n_calls: number }[]>`
+      SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
+    `;
+    expect(stat!.n_calls).toBe(1);
+  });
+
+  it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
+    const t = tname('parts');
+    // Insert an assistant row with messages.tool_calls=NULL but a
+    // message_parts row carrying the tool_call. The view reads via
+    // messages_with_parts, which COALESCEs the parts table over the legacy
+    // column — so this row should still aggregate.
+    const rows = await sql<{ id: string }[]>`
+      INSERT INTO messages (
+        session_id, chat_id, role, content, kind, status,
+        tool_calls, tokens_used, ctx_used
+      )
+      VALUES (
+        ${sessionId}, ${chatId}, 'assistant', '', 'message', 'complete',
+        NULL, 200, 5000
+      )
+      RETURNING id
+    `;
+    const messageId = rows[0]!.id;
+    await sql`
+      INSERT INTO message_parts (message_id, sequence, kind, payload)
+      VALUES (
+        ${messageId}, 0, 'tool_call',
+        ${sql.json({ id: `tc_parts_${TEST_RUN_ID}`, name: t, args: {} } as never)}
+      )
+    `;
+    const [stat] = await sql<{ n_calls: number }[]>`
+      SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
+    `;
+    expect(stat!.n_calls).toBe(1);
+  });
+});
--- a/apps/server/src/services/tests/tools.test.ts
+++ b/apps/server/src/services/tests/tools.test.ts
@@ -1,5 +1,11 @@
 import { describe, it, expect } from 'vitest';
-import { ALL_TOOLS } from '../tools.js';
+import {
+  ALL_TOOLS,
+  CORE_TOOL_NAMES,
+  STANDARD_TOOL_NAMES,
+  TOOLS_BY_NAME,
+  resolveToolTier,
+} from '../tools.js';

 describe('ALL_TOOLS registry', () => {
  // v1.13.3: tools must be alpha-sorted at module load. llama.cpp's prompt
@@ -12,3 +18,59 @@ describe('ALL_TOOLS registry', () => {
    expect(names).toEqual([...names].sort((a, b) => a.localeCompare(b)));
  });
 });
+
+describe('resolveToolTier (v1.13.15-tools)', () => {
+  it('returns CORE tools for tier=core', () => {
+    expect(resolveToolTier('core')).toEqual(CORE_TOOL_NAMES);
+  });
+
+  it('returns STANDARD tools for tier=standard', () => {
+    const result = resolveToolTier('standard');
+    expect(result.length).toBe(STANDARD_TOOL_NAMES.length);
+    expect(result.length).toBeGreaterThan(CORE_TOOL_NAMES.length);
+    // STANDARD is a strict superset of CORE.
+    expect(result).toEqual(expect.arrayContaining([...CORE_TOOL_NAMES]));
+  });
+
+  it('returns ALL tool names for tier=all', () => {
+    expect(resolveToolTier('all').length).toBe(ALL_TOOLS.length);
+  });
+
+  it('defaults to all when env var is undefined', () => {
+    expect(resolveToolTier(undefined).length).toBe(ALL_TOOLS.length);
+  });
+
+  it('is case-insensitive', () => {
+    expect(resolveToolTier('CORE')).toEqual(CORE_TOOL_NAMES);
+    expect(resolveToolTier('Standard').length).toBe(STANDARD_TOOL_NAMES.length);
+  });
+
+  it('falls back to all for unknown tier strings', () => {
+    expect(resolveToolTier('bogus').length).toBe(ALL_TOOLS.length);
+  });
+});
+
+describe('CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validation', () => {
+  // The module-load validation in tools.ts throws if a tier references a
+  // tool that doesn't exist in TOOLS_BY_NAME. These tests double-check that
+  // invariant from the consumer side so a future tier-list edit can't smuggle
+  // in a typo without a test failure.
+  it('every CORE name exists in TOOLS_BY_NAME', () => {
+    for (const name of CORE_TOOL_NAMES) {
+      expect(TOOLS_BY_NAME[name], `CORE references unknown tool '${name}'`).toBeDefined();
+    }
+  });
+
+  it('every STANDARD name exists in TOOLS_BY_NAME', () => {
+    for (const name of STANDARD_TOOL_NAMES) {
+      expect(TOOLS_BY_NAME[name], `STANDARD references unknown tool '${name}'`).toBeDefined();
+    }
+  });
+
+  it('CORE is a subset of STANDARD', () => {
+    const standardSet = new Set<string>(STANDARD_TOOL_NAMES);
+    for (const name of CORE_TOOL_NAMES) {
+      expect(standardSet.has(name), `'${name}' is in CORE but not STANDARD`).toBe(true);
+    }
+  });
+});
--- a/apps/server/src/services/agents.ts
+++ b/apps/server/src/services/agents.ts
@@ -1,7 +1,7 @@
 import { promises as fs } from 'node:fs';
 import { join } from 'node:path';
 import type { Agent, AgentsResponse, AgentParseError } from '../types/api.js';
-import { ALL_TOOLS } from './tools.js';
+import { ALL_TOOLS, resolveToolTier } from './tools.js';

 // v1.8.1: global agents live at /data/AGENTS.md inside the container
 // (./data:/data:ro mount on the host). Per-project AGENTS.md at the project
@@ -186,11 +186,14 @@ function parseAgentSection(section: RawSection): Omit<Agent, 'source'> {
    throw new Error(fmErrors.join('; '));
  }

+  // v1.13.15-tools: intersect with BOOCODE_TOOLS tier (ceiling, not expansion).
+  // Unset → resolveToolTier returns ALL tool names → no narrowing.
+  const tierAllowed = new Set(resolveToolTier(process.env.BOOCODE_TOOLS));
  const filteredTools = Array.isArray(fm.tools)
    ? fm.tools.filter((t): t is string =>
-        (ALL_TOOL_NAMES as readonly string[]).includes(t),
+        (ALL_TOOL_NAMES as readonly string[]).includes(t) && tierAllowed.has(t),
      )
-    : DEFAULT_TOOLS;
+    : DEFAULT_TOOLS.filter((t) => tierAllowed.has(t));

  return {
    id: slugify(section.name),
--- a/apps/server/src/services/tools.ts
+++ b/apps/server/src/services/tools.ts
@@ -700,6 +700,64 @@ export const TOOLS_BY_NAME: Record<string, ToolDef<unknown>> = Object.fromEntrie
  ALL_TOOLS.map((t) => [t.name, t])
 );

+// v1.13.15-tools: tiered tool loading. BOOCODE_TOOLS env var (`core` |
+// `standard` | `all`) filters the agent's tool whitelist before LLM dispatch.
+// Daily-driver token win on qwen3.6-35b-a3b — the 35B-A3B MoE benefits from
+// any prompt-cache stability win (fewer tools = shorter, more stable tool
+// schemas in the system prompt). Pattern lift from eyaltoledano/claude-task-
+// master (MIT + Commons Clause — pattern only, no code lift).
+//
+// The env var is a CEILING. It only narrows; never expands an agent's
+// declared whitelist. Default behavior (var unset) is unchanged: all tools.
+export const CORE_TOOL_NAMES = [
+  'view_file',
+  'list_dir',
+  'grep',
+  'find_files',
+] as const;
+
+export const STANDARD_TOOL_NAMES = [
+  ...CORE_TOOL_NAMES,
+  'web_search',
+  'web_fetch',
+  'git_status',
+  'get_codebase_overview',
+  'get_file_analysis',
+  'get_symbol_info',
+  'search_symbols',
+  'get_dependencies',
+  'watch_changes',
+  'get_semantic_neighborhoods',
+  'get_framework_analysis',
+] as const;
+
+// Module-load validation: every name in CORE / STANDARD must exist in
+// TOOLS_BY_NAME. Catches typos and stale tier definitions before they reach
+// production; server boot fails loudly rather than silently filtering valid
+// tools out of agent whitelists.
+for (const name of CORE_TOOL_NAMES) {
+  if (!TOOLS_BY_NAME[name]) {
+    throw new Error(`CORE_TOOL_NAMES references unknown tool: '${name}'`);
+  }
+}
+for (const name of STANDARD_TOOL_NAMES) {
+  if (!TOOLS_BY_NAME[name]) {
+    throw new Error(`STANDARD_TOOL_NAMES references unknown tool: '${name}'`);
+  }
+}
+
+export function resolveToolTier(tier: string | undefined): readonly string[] {
+  switch ((tier ?? 'all').toLowerCase()) {
+    case 'core':
+      return CORE_TOOL_NAMES;
+    case 'standard':
+      return STANDARD_TOOL_NAMES;
+    case 'all':
+    default:
+      return ALL_TOOLS.map((t) => t.name);
+  }
+}
+
 export function toolJsonSchemas(): ToolJsonSchema[] {
  return ALL_TOOLS.map((t) => t.jsonSchema);
 }
--- a/apps/web/src/api/client.ts
+++ b/apps/web/src/api/client.ts
@@ -12,6 +12,7 @@ import type {
  GitMeta,
  Skill,
  AskUserAnswer,
+  ToolCostStat,
 } from './types';

 export class ApiError extends Error {
@@ -262,6 +263,14 @@ export const api = {
    list: () => request<{ skills: Skill[] }>('/api/skills'),
  },

+  // v1.13.10: per-tool cost rolling-window stats (last 100 calls per tool,
+  // equal-split attribution across multi-tool turns). Read endpoint backed by
+  // the tool_cost_stats view. AgentPicker consumes this for per-agent cost
+  // hints.
+  tools: {
+    costStats: () => request<{ stats: ToolCostStat[] }>('/api/tools/cost_stats'),
+  },
+
  settings: {
    get: () => request<Record<string, unknown>>('/api/settings'),
    patch: (body: Record<string, unknown>) =>
--- a/apps/web/src/api/types.ts
+++ b/apps/web/src/api/types.ts
@@ -1,6 +1,18 @@
 export const PROJECT_STATUSES = ['open', 'archived'] as const;
 export type ProjectStatus = typeof PROJECT_STATUSES[number];

+// v1.13.10: per-tool cost rolling-window stat. Returned by
+// GET /api/tools/cost_stats — one entry per tool with mean prompt/completion
+// tokens over the last 100 invocations. AgentPicker sums across an agent's
+// whitelisted tools for per-agent cost hints.
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+
 export interface Project {
  id: string;
  name: string;
--- a/apps/web/src/components/AgentPicker.tsx
+++ b/apps/web/src/components/AgentPicker.tsx
@@ -1,8 +1,8 @@
-import { useEffect, useState } from 'react';
+import { useEffect, useMemo, useState } from 'react';
 import { Check, ChevronDown } from 'lucide-react';
 import { toast } from 'sonner';
 import { api } from '@/api/client';
-import type { Agent, AgentParseError } from '@/api/types';
+import type { Agent, AgentParseError, ToolCostStat } from '@/api/types';
 import {
  DropdownMenu,
  DropdownMenuContent,
@@ -22,6 +22,10 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
  const [parseErrors, setParseErrors] = useState<AgentParseError[]>([]);
  const [error, setError] = useState<string | null>(null);
  const [open, setOpen] = useState(false);
+  // v1.13.10: per-tool cost rolling window. Fetched once on mount; would
+  // refresh on remount or page reload. Acceptable for a decision aid — the
+  // 100-call rolling mean doesn't shift fast.
+  const [costStats, setCostStats] = useState<ToolCostStat[]>([]);

  // v1.8.1: per-agent parse errors are non-blocking. Silent if any agents
  // loaded successfully; a gray warning toast fires only when EVERY agent
@@ -52,6 +56,29 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
    };
  }, [projectId]);

+  // v1.13.10: cost stats are project-independent — the 100-call rolling
+  // window is global across all chats. Fetch once per mount; tolerate failure
+  // silently (cost line hides).
+  useEffect(() => {
+    let cancelled = false;
+    api.tools
+      .costStats()
+      .then((r) => {
+        if (!cancelled) setCostStats(r.stats);
+      })
+      .catch(() => {
+        if (!cancelled) setCostStats([]);
+      });
+    return () => {
+      cancelled = true;
+    };
+  }, []);
+
+  const costByTool = useMemo(
+    () => Object.fromEntries(costStats.map((s) => [s.tool_name, s])),
+    [costStats],
+  );
+
  const selectedAgent = agents?.find((a) => a.id === value) ?? null;
  const triggerLabel = value === null
    ? 'No agent'
@@ -86,25 +113,33 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
              <span className="font-medium">No agent</span>
            </DropdownMenuItem>
            {agents.length > 0 && <DropdownMenuSeparator />}
-            {agents.map((a) => (
-              <DropdownMenuItem
-                key={a.id}
-                onSelect={() => void onChange(a.id)}
-                className="text-xs flex-col items-start gap-0.5"
-              >
-                <div className="flex items-center gap-1.5">
-                  <Check
-                    className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
-                  />
-                  <span className="font-medium">{a.name}</span>
-                </div>
-                {a.description && (
-                  <span className="text-muted-foreground pl-[18px] truncate w-full">
-                    {a.description}
-                  </span>
-                )}
-              </DropdownMenuItem>
-            ))}
+            {agents.map((a) => {
+              const cost = agentCost(a, costByTool);
+              return (
+                <DropdownMenuItem
+                  key={a.id}
+                  onSelect={() => void onChange(a.id)}
+                  className="text-xs flex-col items-start gap-0.5"
+                >
+                  <div className="flex items-center gap-1.5">
+                    <Check
+                      className={`size-3 ${a.id === value ? 'opacity-100' : 'opacity-0'}`}
+                    />
+                    <span className="font-medium">{a.name}</span>
+                  </div>
+                  {a.description && (
+                    <span className="text-muted-foreground pl-[18px] truncate w-full">
+                      {a.description}
+                    </span>
+                  )}
+                  {cost.nWithData > 0 && (
+                    <span className="text-muted-foreground/70 pl-[18px] truncate w-full">
+                      ~{formatK(cost.prompt)} prompt / {cost.completion} completion · {cost.nWithData}/{cost.nTools} tools{cost.mostRecent ? ` · last call ${formatAgo(cost.mostRecent)}` : ''}
+                    </span>
+                  )}
+                </DropdownMenuItem>
+              );
+            })}
            {parseErrors.length > 0 && (
              <div
                className="px-2 py-1.5 mt-1 text-xs text-amber-500 border-t border-border"
@@ -119,3 +154,49 @@ export function AgentPicker({ projectId, value, onChange }: Props) {
    </DropdownMenu>
  );
 }
+
+// v1.13.10: sum the per-tool means across an agent's whitelisted tools.
+// Sum-of-means, not mean-of-sums — we're combining independent rolling
+// averages. nWithData reflects how many of the agent's tools have any
+// history yet; the line hides entirely when zero so a fresh deploy doesn't
+// render "0k / 0 / 0 tools".
+function agentCost(
+  agent: Agent,
+  costByTool: Record<string, ToolCostStat>,
+): {
+  prompt: number;
+  completion: number;
+  nTools: number;
+  nWithData: number;
+  mostRecent: string | null;
+} {
+  let prompt = 0;
+  let completion = 0;
+  let nWithData = 0;
+  let mostRecent: string | null = null;
+  for (const t of agent.tools) {
+    const s = costByTool[t];
+    if (!s) continue;
+    prompt += s.mean_prompt_tokens;
+    completion += s.mean_completion_tokens;
+    nWithData++;
+    if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
+  }
+  return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
+}
+
+function formatK(n: number): string {
+  if (n < 1000) return String(n);
+  if (n < 10_000) return `${(n / 1000).toFixed(1)}k`;
+  return `${Math.round(n / 1000)}k`;
+}
+
+function formatAgo(iso: string): string {
+  const then = new Date(iso).getTime();
+  if (Number.isNaN(then)) return '—';
+  const diff = Date.now() - then;
+  if (diff < 60_000) return 'just now';
+  if (diff < 3_600_000) return `${Math.round(diff / 60_000)}m ago`;
+  if (diff < 86_400_000) return `${Math.round(diff / 3_600_000)}h ago`;
+  return `${Math.round(diff / 86_400_000)}d ago`;
+}
--- a/openspec/README.md
+++ b/openspec/README.md
@@ -0,0 +1,38 @@
+# openspec
+
+Per-batch documentation convention adopted v1.13.15-openspec.
+
+Lift source: Fission-AI/OpenSpec directory layout. **No CLI dependency** — just
+the folder shape. Full OpenSpec lifecycle adoption is a future v1.14+ batch.
+
+## Layout
+
+```
+openspec/
+  changes/
+    <slug>/                          # one folder per shipped or planned batch
+      proposal.md                    # Why + scope summary
+      tasks.md                       # implementation step list
+      design.md                      # architecture / data-model decisions (optional)
+      specs/                         # reserved for future OpenSpec CLI adoption
+    archived/                        # snapshots of pre-v1.13.15 batch docs
+      <original-filename>.md
+  specs/                             # global specs, future v1.14+ use
+```
+
+## Conventions
+
+- Slugs are lowercase-hyphenated derived from the batch title
+  (e.g. `v1-13-10-per-tool-cost`, `file-attachments-v3-5`).
+- Already-shipped pre-v1.13.15 batches live in `changes/archived/` as
+  single-file snapshots. They were not split into proposal/tasks because
+  the work was already complete; archiving preserves git history.
+- New v1.13.15+ batches should land directly in
+  `changes/<slug>/proposal.md` (+ tasks.md, + design.md when applicable).
+- `proposal.md` carries the "Why" and scope. `tasks.md` is the action list
+  (numbered or checkbox). `design.md` is for non-trivial architectural
+  decisions worth recording separately.
+- A canonical dispatch brief (matching the v1.13.9 / v1.13.10 format)
+  is most naturally split as proposal.md (Where we are, Why this matters,
+  rationale sections) + tasks.md (Scope items, Build + smoke) + design.md
+  (Attribution model, Filtering, Canonical mapping).
--- a/openspec/changes/archived/boocode_batch10.md
+++ b/openspec/changes/archived/boocode_batch10.md
--- a/openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
+++ b/openspec/changes/archived/handoff_v1.13.10_per_tool_cost.md
@@ -0,0 +1,441 @@
+```
+#careful #boocode #nofluff
+
+v1.13.10 — per-tool token cost accounting (rolling 100-call window)
+
+Goal: surface per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of `messages_with_parts` (no new table, no new write site) + a read endpoint + AgentPicker tooltip extension. Estimated ~240 LoC, mostly UI.
+
+## Where we are
+
+- Last tag: v1.13.9 (compaction overflow trigger — `floor(0.85 × ctx_max)` early-trigger). Branch clean.
+- v1.13.x cleanup line ✅ through v1.13.9. Queued: v1.13.10 (this) → v1.13.11 (WS Zod) → v1.13.12 (skills audit) → v1.13.2 (column drop, last).
+- Dependency (satisfied since v1.13.7 commit `ff29b48`): `includeUsage: true` on `createOpenAICompatible` in `apps/server/src/services/inference/provider.ts`. Without it, `messages.tokens_used`/`ctx_used` were NULL for v1.13.1-A → v1.13.7 (latent regression). Now populated.
+
+## Why this matters
+
+Today: AgentPicker lists agents by name + description. No cost signal. Users pick the architect agent (full tool whitelist, 21k of tool schema) for one-liner questions a refactorer (3 tools, 4k schema) could answer.
+
+Tomorrow: each agent listing shows its mean prompt + completion cost per tool, derived from the last 100 invocations across all chats. Decision aid, not a hard gate.
+
+Why a SQL view instead of a denormalized stats table:
+- All the source data already lands in `messages` (tool_calls JSON + tokens_used + ctx_used) and `message_parts` (read via the `messages_with_parts` view). Zero new write sites.
+- Rolling 100-call window is a `ROW_NUMBER() OVER (PARTITION BY tool_name ORDER BY created_at DESC) <= 100` — natural fit for a view.
+- View is rollback-safe. If the math is wrong, `DROP VIEW` and re-deploy; no orphan rows, no backfill.
+- At BooCode scale (single user, ~30 tools, ~100 calls/tool), aggregate-on-read is microseconds. Premature to denormalize.
+
+The roadmap schema row (`tool_cost_stats (tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at)`) matches both a table and a view. View is the lighter implementation.
+
+## Canonical column mapping (pinned)
+
+The `messages` columns are named non-obviously. Pinned mapping, confirmed across 5 write sites + 1 read site:
+
+| Column          | Semantic meaning   | AI SDK v6 source name |
+|-----------------|--------------------|-----------------------|
+| `ctx_used`      | prompt / input tokens   | `usage.inputTokens`   |
+| `tokens_used`   | completion / output tokens | `usage.outputTokens`  |
+
+Write sites confirmed: `tool-phase.ts:94-95`, `error-handler.ts:109-110`, `sentinel-summaries.ts:130-131`, `sentinel-summaries.ts:387-388`, `stream-phase.ts:319-320`. Canonical read at `payload.ts:190-191` reverses: `const promptTokens = updated.ctx_used; const completionTokens = updated.tokens_used`.
+
+`tokens_used` reads like "total" but is completion only. Project convention since the columns predate v1.13.x. Do not "fix" the naming inside this batch — out of scope; downstream consumers depend on the current mapping.
+
+## Attribution model
+
+A single assistant turn can emit N tool calls in parallel. llama-swap returns ONE (prompt_tokens, completion_tokens) per turn, not per tool. Attribution requires a split.
+
+**Chosen approach: equal split.** For an assistant turn that emits N tool calls with prompt P and completion C, each tool is attributed P/N prompt + C/N completion. The 100-call rolling mean smooths split noise. Implementation: `tokens_used::float / jsonb_array_length(tool_calls)` at the unnest site.
+
+**Alternatives rejected:**
+- "Full turn cost to every tool" (no division). Over-states; a 5-tool turn would 5×-count every tool's cost.
+- "Result-size only" (`length(JSON.stringify(output)) / 4`). Loses the LLM's actual usage signal; doesn't capture how expensive a tool's output is to the next prompt.
+- "Consuming-turn delta" (next turn prompt_tokens − this turn prompt_tokens, attribute to the tool that emitted the result). Most accurate but requires bubble-back math through the `executeToolPhase → runAssistantTurn` recursion. Over-engineered for the rolling-average use case.
+
+**If Sam wants a different split, change one line in the view definition (the divisor).**
+
+## Filtering — sentinel, failure, repair-call semantics
+
+The view excludes rows that aren't real tool-cost signal:
+
+- **Failed and cancelled turns** (`status != 'complete'`). The `error-handler.ts` failed/cancelled paths don't write `tokens_used`/`ctx_used`, so the existing `tokens_used IS NOT NULL` clause already filters these. Adding `status='complete'` is defense in depth and makes intent explicit.
+- **Cap-hit and doom-loop sentinel rows** (`metadata->>'kind' IN ('cap_hit', 'doom_loop')`). Sentinels are `role='system'` rows with `tool_calls=NULL`, so the existing `tool_calls IS NOT NULL` clause already filters them. The explicit metadata filter is defense in depth — it survives future schema drift where someone might INSERT a sentinel with a non-null tool_calls.
+- **`experimental_repairToolCall` retries.** No special handling needed. Our impl (per `CLAUDE.md`) is pass-through — malformed calls flow to zod-reject → tool_result error → next normal turn handles. No separate rows; the next turn's tokens count naturally.
+
+## Recon (already done; paste for reference)
+
+```
+cd /opt/boocode
+grep -n "tokens_used\|ctx_used\|inputTokens\|outputTokens" apps/server/src/services/inference/*.ts | head -30
+grep -n "metadata\|cap_hit\|doom_loop" apps/server/src/services/inference/sentinels.ts apps/server/src/schema.sql | head -10
+psql -h localhost -p 5432 -U postgres -d boocode -c "\d messages_with_parts" | head -30
+```
+
+Expected: confirms the canonical mapping in the table above; confirms `messages.metadata jsonb` exists at `schema.sql:259`; confirms `messages_with_parts` exposes `m.metadata` at `schema.sql:92`.
+
+## Scope
+
+### 1. schema.sql — `tool_cost_stats` view (~35 LoC)
+
+Append after the `messages_with_parts` view (after line 120):
+
+```sql
+-- v1.13.10: per-tool token cost rolling window. Derives from
+-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
+-- the legacy JSON column) so this works whether the chat predates v1.13.0
+-- or postdates v1.13.2 (column drop). No new write site — all source data
+-- already lands via the existing tool-phase.ts:94-95 UPDATE.
+--
+-- Attribution model: equal split. A turn emitting N tool calls divides its
+-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
+-- brief for rationale + rejected alternatives.
+--
+-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
+-- = completion (output). Non-obvious naming; pinned via canonical writes at
+-- tool-phase.ts:94-95 et al.
+--
+-- Filtering rationale:
+--   status='complete'                — exclude failed/cancelled (defense in
+--                                      depth; failed-path doesn't write
+--                                      tokens_used so they're also filtered
+--                                      indirectly).
+--   metadata->>'kind' exclusions     — exclude cap_hit / doom_loop sentinels
+--                                      (defense in depth; sentinels are
+--                                      role='system' with tool_calls=NULL
+--                                      so they're filtered indirectly too).
+--   experimental_repairToolCall      — no special handling; retries flow
+--                                      as normal next-turn tool_result
+--                                      errors and count naturally.
+--
+-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
+-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
+-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
+CREATE OR REPLACE VIEW tool_cost_stats AS
+WITH per_call AS (
+  SELECT
+    (tc->>'name')::text AS tool_name,
+    (m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
+    (m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
+    m.created_at,
+    ROW_NUMBER() OVER (
+      PARTITION BY (tc->>'name')::text
+      ORDER BY m.created_at DESC
+    ) AS rn
+  FROM messages_with_parts m,
+    LATERAL jsonb_array_elements(m.tool_calls) AS tc
+  WHERE m.tool_calls IS NOT NULL
+    AND jsonb_array_length(m.tool_calls) > 0
+    AND m.tokens_used IS NOT NULL
+    AND m.ctx_used IS NOT NULL
+    AND m.status = 'complete'
+    AND (m.metadata IS NULL
+         OR m.metadata->>'kind' IS NULL
+         OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
+)
+SELECT
+  tool_name,
+  ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
+  ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
+  COUNT(*)::int AS n_calls,
+  MAX(created_at) AS updated_at
+FROM per_call
+WHERE rn <= 100
+GROUP BY tool_name;
+```
+
+Notes:
+- `NULLIF(..., 0)` guards against div-by-zero on `jsonb_array_length=0` (should never happen given the WHERE clause, but defensive).
+- `ROUND(SUM(...))::int` — frontend doesn't want decimals; sum-then-round is more accurate than per-row round-then-sum.
+- View is read from `messages_with_parts` not `messages`, so legacy pre-v1.13.0 rows and post-v1.13.2 rows both resolve.
+- No index needed; the underlying `idx_messages_chat` covers the JOIN; the LATERAL unnest is bounded by the 100-row partition.
+
+### 2. apps/server/src/routes/tools.ts (NEW, ~40 LoC)
+
+New route file. Register in `apps/server/src/index.ts` next to the other `register*Routes(app, sql, ...)` calls.
+
+```ts
+import type { FastifyInstance } from 'fastify';
+import type { Sql } from '../db.js';
+
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+
+export function registerToolsRoutes(app: FastifyInstance, sql: Sql) {
+  app.get('/api/tools/cost_stats', async () => {
+    const rows = await sql<{
+      tool_name: string;
+      prompt_tokens_sum: number;
+      completion_tokens_sum: number;
+      n_calls: number;
+      updated_at: string;
+    }[]>`
+      SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
+      FROM tool_cost_stats
+      ORDER BY tool_name ASC
+    `;
+    const stats: ToolCostStat[] = rows.map(r => ({
+      tool_name: r.tool_name,
+      mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
+      mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
+      n_calls: r.n_calls,
+      updated_at: r.updated_at,
+    }));
+    return { stats };
+  });
+}
+```
+
+Route is bodyless, idempotent, cheap. No pagination (≤30 tools).
+
+### 3. apps/server/src/services/__tests__/tool_cost_stats.test.ts (NEW, ~95 LoC)
+
+Integration test against real Postgres (matches `inference.test.ts` pattern). Fixtures:
+
+```ts
+import { describe, it, expect, beforeEach } from 'vitest';
+import { connect } from '../../db.js';
+
+describe('tool_cost_stats view (v1.13.10)', () => {
+  // ... session + chat + project setup helpers ...
+
+  it('returns empty when no tool calls exist', async () => {
+    // fresh chat, only user/assistant text turns
+    const stats = await sql`SELECT * FROM tool_cost_stats`;
+    expect(stats).toEqual([]);
+  });
+
+  it('attributes single-tool turn fully to that tool', async () => {
+    // insert one assistant message with tool_calls=[{name: 'view_file', ...}],
+    // tokens_used=300, ctx_used=15000, status='complete'
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]).toMatchObject({
+      tool_name: 'view_file',
+      prompt_tokens_sum: 15000,
+      completion_tokens_sum: 300,
+      n_calls: 1,
+    });
+  });
+
+  it('splits multi-tool turn equally across tools', async () => {
+    // insert one assistant turn with 3 tool calls (view_file, grep, list_dir),
+    // tokens_used=300, ctx_used=15000 → each tool gets 100 completion, 5000 prompt
+    const stats = await sql`SELECT * FROM tool_cost_stats ORDER BY tool_name`;
+    expect(stats).toHaveLength(3);
+    for (const s of stats) {
+      expect(s.completion_tokens_sum).toBe(100);
+      expect(s.prompt_tokens_sum).toBe(5000);
+      expect(s.n_calls).toBe(1);
+    }
+  });
+
+  it('limits to last 100 calls per tool (FIFO window)', async () => {
+    // insert 150 turns each calling view_file once with monotonically
+    // increasing tokens_used; expect only the most recent 100 to count
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]!.n_calls).toBe(100);
+    // mean should reflect the latter half (51..150), not 1..150
+  });
+
+  it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
+    // insert a turn with tool_calls but tokens_used=NULL → must not appear
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats).toEqual([]);
+  });
+
+  it('excludes failed and cancelled turns + sentinel metadata rows', async () => {
+    // insert four rows for tool_name='view_file', all with tokens_used+ctx_used
+    // populated:
+    //   row A: status='failed'                            — excluded
+    //   row B: status='cancelled'                         — excluded
+    //   row C: status='complete', metadata={kind:'cap_hit'}   — excluded
+    //   row D: status='complete', metadata={kind:'doom_loop'} — excluded
+    //   row E: status='complete', metadata=null               — included
+    // Expect n_calls=1, attributable to row E only.
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='view_file'`;
+    expect(stats[0]!.n_calls).toBe(1);
+  });
+
+  it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
+    // insert a v1.13.0+ row with messages.tool_calls=NULL but
+    // message_parts rows containing the tool_call → must still aggregate
+    const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name='grep'`;
+    expect(stats[0]!.n_calls).toBe(1);
+  });
+});
+```
+
+Pattern: each test resets the messages table for the fixture chat (TRUNCATE not DELETE — Postgres `messages` has FK CASCADE) and inserts hand-crafted rows. The view is recomputed on every SELECT.
+
+### 4. apps/web/src/api/types.ts + client.ts (~10 LoC)
+
+Add to `types.ts`:
+
+```ts
+export interface ToolCostStat {
+  tool_name: string;
+  mean_prompt_tokens: number;
+  mean_completion_tokens: number;
+  n_calls: number;
+  updated_at: string;
+}
+```
+
+Add to `client.ts` under the existing `api.*` namespace structure:
+
+```ts
+tools: {
+  costStats: () => fetch<{ stats: ToolCostStat[] }>('GET', '/api/tools/cost_stats'),
+},
+```
+
+Match the casing convention of the existing namespaces (`api.agents.list`, `api.chats.archive`, etc.).
+
+### 5. apps/web/src/components/AgentPicker.tsx — tooltip extension (~80 LoC delta)
+
+Currently (line 67): `title={selectedAgent?.description}` — native HTML title attribute on the trigger button.
+
+Replacement: dropdown items get a per-agent cost line in muted text below the description. Format:
+
+```
+[Agent name]
+[Agent description]
+~5.2k prompt / 280 completion · 6 tools · last call 3h ago
+```
+
+Implementation steps:
+1. Fetch `api.tools.costStats()` once on mount (alongside the existing `api.agents.list()`). Cache result for the lifetime of the picker open state. Re-fetch only on `useEffect` dep change.
+2. Compute per-agent aggregate: for each agent, sum the means of its whitelisted tools. Sum-of-means, not mean-of-sums — we're combining independent rolling averages.
+3. Render below description (one line, muted, truncated). Show "—" if no calls recorded yet for any of the agent's tools.
+4. Don't break the existing native `title=` for backward compat; layer the cost line additively.
+
+```tsx
+const [costStats, setCostStats] = useState<ToolCostStat[]>([]);
+useEffect(() => {
+  api.tools.costStats().then(r => setCostStats(r.stats)).catch(() => setCostStats([]));
+}, []);
+const costByTool = useMemo(
+  () => Object.fromEntries(costStats.map(s => [s.tool_name, s])),
+  [costStats],
+);
+function agentCost(agent: Agent): { prompt: number; completion: number; nTools: number; nWithData: number; mostRecent: string | null } {
+  let prompt = 0, completion = 0, nWithData = 0;
+  let mostRecent: string | null = null;
+  for (const t of agent.tools) {
+    const s = costByTool[t];
+    if (!s) continue;
+    prompt += s.mean_prompt_tokens;
+    completion += s.mean_completion_tokens;
+    nWithData++;
+    if (!mostRecent || s.updated_at > mostRecent) mostRecent = s.updated_at;
+  }
+  return { prompt, completion, nTools: agent.tools.length, nWithData, mostRecent };
+}
+```
+
+For the line render: `~${formatK(prompt)} prompt / ${completion} completion · ${nWithData}/${nTools} tools · ${formatAgo(mostRecent)}`. Skip entirely when `nWithData === 0` to avoid showing "0k / 0 / 0 tools" for fresh-from-deploy state.
+
+**`formatK` / `formatAgo`:** colocate at the bottom of `AgentPicker.tsx`. Don't extract to a util file in this batch — single use site.
+
+## What NOT to do
+
+- **Don't add a new write site at `tool-phase.ts` or `finalizeCompletion`.** All source data is already there via existing UPDATEs.
+- **Don't denormalize.** The view is sufficient and rollback-safe at BooCode's single-user scale.
+- **Don't add per-tool cost to the message bubble.** Out of scope. AgentPicker tooltip only.
+- **Don't fold per-call rows into a moving sum via triggers.** Aggregate on read; 100 rows × 30 tools is microseconds in Postgres.
+- **Don't track `result_chars` (the size of `tool_results.output`).** Tempting as a second cost signal but out of scope here. Future batch if Sam wants it.
+- **Don't add a session-scoped or chat-scoped filter to `tool_cost_stats`.** The rolling window is GLOBAL across all chats — the agent picker is a project-level decision aid. Per-chat surfacing is a future v1.14+ design.
+- **Don't change the attribution model post-deployment** without dropping the view first. Mid-flight semantic changes give bogus historical means.
+- **Don't "fix" the `ctx_used`/`tokens_used` naming inside this batch.** Non-obvious but pinned across 5 write sites. Renaming is its own batch.
+- **Don't rely solely on `tool_calls IS NOT NULL` for sentinel exclusion.** It works today (sentinels are role='system' with tool_calls=NULL) but the explicit `status='complete'` + `metadata->>'kind'` filters are defense in depth and survive future schema drift.
+
+## Backup before edits
+
+```
+cd /opt/boocode
+cp apps/server/src/schema.sql{,.bak-$(date +%Y%m%d-%H%M%S)}
+cp apps/web/src/components/AgentPicker.tsx{,.bak-$(date +%Y%m%d-%H%M%S)}
+```
+
+(No backup needed for new files in items 2, 3, 4.)
+
+## Verify
+
+```
+pnpm -C apps/server test
+```
+
+Expected: all existing tests pass + 7 new in `tool_cost_stats.test.ts`. Total moves from 195 → 202.
+
+```
+cd /opt/boocode
+docker compose exec boocode_db psql -U postgres -d boocode -c \
+  "SELECT * FROM tool_cost_stats ORDER BY n_calls DESC LIMIT 10;"
+```
+
+Expected: in any live deployment with v1.13.7+ history, this returns real rows for `view_file`, `grep`, `list_dir`, etc. If empty: `messages.tool_calls` was NULL for the v1.13.1-A → v1.13.7 latent regression window and recovery only begins with v1.13.7+ traffic.
+
+## Build + smoke
+
+```
+cd /opt/boocode
+docker compose up --build -d boocode
+docker compose logs --since=30s boocode | tail -20
+```
+
+Smoke A — view recompiles on schema apply:
+```
+docker compose logs boocode | grep -i "tool_cost_stats\|applySchema"
+```
+Expected: clean schema apply, view registered idempotently.
+
+Smoke B — endpoint returns data:
+```
+curl -s http://localhost:3000/api/tools/cost_stats | jq '.stats | length, .stats[0]'
+```
+Expected: nonzero length if any v1.13.7+ tool calls exist; one stat object with all 5 fields populated.
+
+Smoke C — UI:
+1. Open browser to `boocode.indifferentketchup.com`.
+2. Open AgentPicker dropdown on any session.
+3. Each agent row shows a muted cost line below its description: `~5.2k prompt / 280 completion · 6/8 tools · last call 2h ago`.
+4. Agents with no tool history show just description (no cost line).
+5. Confirm cost line truncates with the existing text-muted-foreground / truncate pattern; doesn't break the layout at mobile widths (open Vivaldi devtools, set iPhone-13 viewport).
+
+## Files expected to touch
+
+- `apps/server/src/schema.sql` — ~35 LoC delta (view definition + filter comments)
+- `apps/server/src/routes/tools.ts` — NEW, ~40 LoC
+- `apps/server/src/index.ts` — 1 line (`registerToolsRoutes(app, sql)`)
+- `apps/server/src/services/__tests__/tool_cost_stats.test.ts` — NEW, ~95 LoC
+- `apps/web/src/api/types.ts` — ~7 LoC (interface)
+- `apps/web/src/api/client.ts` — ~3 LoC (namespace + method)
+- `apps/web/src/components/AgentPicker.tsx` — ~80 LoC delta (cost line + fetch hook + helpers)
+
+Total ~260 LoC. Matches roadmap estimate.
+
+## Workflow conventions
+
+- Backups before destructive edits (above) on the two MODIFIED files. New files don't need backups.
+- Sam reviews diffs. Never `git add` / `git commit` / `git push` / `git pull` on Sam's behalf.
+- Build: `docker compose up --build -d boocode`. No `--no-cache` unless layer-cache trap surfaces.
+- Tests authoritative: `pnpm -C apps/server test`.
+- View definition lives in `schema.sql` (idempotent via `CREATE OR REPLACE VIEW`); no migration shim needed.
+
+## Don't repeat past mistakes
+
+- v1.13.7 stability bundle (`includeUsage:true`, trim guards, payload filter, `BUDGET_NO_AGENT=30`): all live. This batch depends on `includeUsage:true`. If unset, `tool_cost_stats` returns empty rows.
+- v1.13.8 prefix instrumentation: untouched.
+- v1.13.9 ratio-only `usable()`: untouched.
+- v1.13.4 two-tier prune: untouched.
+- v1.13.5 truncate.ts opaque-id pattern: untouched.
+- v1.13.1-B `messages_with_parts` view: this view is the source. Don't reach past it to raw `messages`.
+- v1.13.2 will DROP `messages.tool_calls`/`tool_results` columns. The `tool_cost_stats` view reads from `messages_with_parts` not `messages`, so it survives. Verify after v1.13.2 ships.
+
+## Source files to read in project knowledge
+
+- `boocode_roadmap.md` (v1.13.10 row at line 114; schema row at line 474)
+- `boocode_code_review.md` (cost-tracking design background)
+- `CLAUDE.md` (project conventions; messages_with_parts invariant at L80; v1.13.7 includeUsage invariant)
+```
--- a/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
+++ b/openspec/changes/archived/handoff_v1.13.8_prefix_verify.md
Author	SHA1	Message	Date
indifferentketchup	34cbecf975	v1.13.15-tools: tiered tool loading via BOOCODE_TOOLS env var Pattern lift from eyaltoledano/claude-task-master (MIT + Commons Clause — pattern only, no code lift). Adds BOOCODE_TOOLS env var with three tiers: - core (4 tools): view_file, list_dir, grep, find_files. ~2k token schema cost. - standard (15 tools): core + web_search, web_fetch, git_status, all 8 codecontext_* tools. ~10k token schema cost. - all (default; current behavior): every tool in ALL_TOOLS (20). ~21k token schema cost. The env var is a CEILING — narrows agent whitelists, never expands. Default behavior unchanged when var is unset. resolveToolTier is case-insensitive and falls back to 'all' on unknown values. CORE_TOOL_NAMES + STANDARD_TOOL_NAMES validated at module load against TOOLS_BY_NAME via two top-level for-loops that throw on the first missing name. Module fails to import if a tier references a tool that doesn't exist in the registry — catches typos and stale tier definitions at boot rather than silently filtering valid tools out of agent whitelists. Wiring: agents.ts parseAgentBlock now reads BOOCODE_TOOLS from process.env per parse, intersects with the agent's declared frontmatter tools (or DEFAULT_TOOLS when frontmatter omits the field). Per-parse read is fine — agents are re-parsed on the existing 60s cache TTL. Tests: tools.test.ts grows from 1 to 10 tests. Covers resolveToolTier across tiers/case/unknown values + the CORE-subset-of-STANDARD invariant + TOOLS_BY_NAME existence for both tier sets. 204/204 pass (was 195; +9 new). Deviation from the brief: the codecontext tools in the actual registry have NO codecontext_* prefix (the brief's STANDARD list assumed it). Used the actual names (get_codebase_overview, search_symbols, etc.). Module-load validation would have failed boot with the prefixed names. Smoke: with BOOCODE_TOOLS unset, agents return their full 12-tool whitelists. With BOOCODE_TOOLS=core in .env + container restart, the same agents narrow to 4 tools (find_files, grep, list_dir, view_file) — intersection of declared whitelist ∩ core tier. Reverted after confirmation. CLAUDE.md updated with BOOCODE_TOOLS in the Environment section's Optional list. .env.example gained a commented BOOCODE_TOOLS=all line with the per-tier token-cost table. ~110 LoC across 5 files (4 modified + 1 test expansion). Under the brief's ~30 LoC estimate for code; the test suite expansion drove most of the growth.	2026-05-22 14:59:01 +00:00
indifferentketchup	5a3f357ce9	v1.13.15-openspec: reformat batch docs to OpenSpec directory structure Adopt Fission-AI/OpenSpec's openspec/changes/<change-name>/{proposal, specs,design,tasks}.md shape for BooCode's own batch docs. Zero-dep documentation reformat; replaces ad-hoc boocode_batchN.md / handoff_vN.N.N.md convention. Existing batch docs moved into openspec/changes/archived/ via git mv (preserves history): - boocode_batch10.md - handoff_v1.13.8_prefix_verify.md - handoff_v1.13.10_per_tool_cost.md Pre-v1.13.15 docs were NOT split into proposal/tasks/design files. The work was already shipped; the originals are preserved as archived snapshots. New v1.13.15+ batches land directly in openspec/changes/<slug>/proposal.md (+ tasks.md, + design.md when applicable) per the convention documented in openspec/README.md. CLAUDE.md gained a one-line pointer to the convention (workflow section). File grew from 153 → 154 lines, 27,682 → 27,925 chars; both remain well under the AgentLint hard caps. specs/ directory is reserved for future OpenSpec CLI adoption (v1.14+). No CLI dep added in this batch — directory structure only. If/when the full OpenSpec lifecycle is adopted, that lands as a separate batch.	2026-05-22 14:54:17 +00:00
indifferentketchup	fc11e8dc91	v1.13.15-agentlint: instruction-file audit against AgentLint 31-check standard Manual audit pass against 0xmariowu/AgentLint's evidence-backed checks (MIT, drawn from 265 versions of Anthropic's internal Claude Code system prompt). Findings and fixes: - Identity sections ("You are the assistant running inside ...") removed from BOOCHAT.md (line 3) and BOOCODER.md (line 5). The model already knows where it's running; the openers were emphatic decoration. - CLAUDE.local.md added to .gitignore (.env was already covered). Claude Code's Glob tool ignores .gitignore by default, which means any local override file was otherwise readable by any agent walking the workspace. - CLAUDE.md unchanged — already passes all 10 checks. Emphasis density 0.58/1000 words (under Anthropic's 1.4/1000 endpoint); two IMPORTANT/ MUST references are load-bearing (tsc-noEmit footgun, v1.13.7 includeUsage invariant); zero identity sections; zero --no-verify references; 27,682 chars (under the 40,000-char silent-drop limit). Line count (153) is over the 60-120 target band, but the brief explicitly forbids structural rewrites in the audit pass. Targets not in scope: - /opt/boocode/AGENTS.md does not exist in this repo (removed in v1.12, per CLAUDE.md:152). The global agent registry lives at /data/AGENTS.md (bind-mounted from outside the repo); can't be touched by this batch. - No .github/workflows/ directory — SHA-pin audit (step 8) skipped. Cumulative effect: model spends fewer tokens parsing instruction-file ceremony in BOOCHAT/BOOCODER and receives sharper priority signal per Anthropic's measured-evolution data. Zero code changes.	2026-05-22 14:52:37 +00:00
indifferentketchup	9ce638c916	v1.13.10: per-tool token cost accounting (rolling 100-call view) Surfaces per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of messages_with_parts plus a read endpoint and AgentPicker tooltip extension. No new write site; all source data already lands via the existing tool-phase.ts:94-95 / error-handler.ts: 109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage: true fix made non-NULL. (1) schema.sql — new tool_cost_stats view. Window-functions over messages_with_parts.tool_calls with LATERAL jsonb_array_elements. Attribution: equal split — multi-tool turn divides tokens N-ways; the 100-call rolling mean absorbs split noise. Filters: status= 'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude failed turns and sentinels respectively; tool_calls IS NOT NULL is defense-in-depth since sentinels are role='system' rows. CREATE OR REPLACE means schema apply is idempotent. (2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_ completion_tokens computed at read time (sum / n_calls). Sorted by tool_name ASC. No pagination — ≤30 tools. (3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests keyed off DATABASE_URL env var. Tests skip gracefully when unset (no-DB default). beforeAll applies the schema via sql.unsafe(read FileSync(schema.sql)) for self-contained runs. Helper insertAssistant Turn shared across cases. Covers: empty state, single-tool attribution, multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion, parts-authoritative read via messages_with_parts, failed/sentinel exclusion. (4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools. costStats() method binding. (5) AgentPicker.tsx — fetch costStats on mount, compute per-agent sum-of-means across whitelisted tools, render muted cost line below description: "~5.2k prompt / 280 completion · 6/8 tools · last call 3h ago". Skips line entirely when no tool history; preserves existing native title= for layout backward-compat. formatK/formatAgo colocated. Tests: 202/202 pass (195 prior + 7 new view-integration). Server + web tsc clean. Smoke: schema applied cleanly; GET /api/tools/cost_stats returns canonical JSON; view + endpoint agree. Single-row result expected given the v1.13.1-A → v1.13.7 NULL latent regression window; new traffic populates organically. Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both match. View vs table decision documented in handoff_v1.13.10_per_ tool_cost.md (rollback-safe, microsecond-fast at BooCode scale). ~270 LoC across 8 files (5 modified + 3 new).	2026-05-22 14:42:09 +00:00
indifferentketchup	8126d78b34	docs: capture v1.13.7-v1.13.9 invariants in CLAUDE.md Five additions surfacing session-discovered constraints future Claude sessions need: - AI SDK v6 includeUsage:true requirement (avoids re-introducing the v1.13.1-A→v1.13.7 NULL-tokens regression) - \n text-delta trim guards in MessageList/MessageBubble + payload.ts failed/empty-assistant skip rules (avoid undoing v1.13.7) - 0.85 × ctx_max overflow formula (v1.13.9) replacing the stale ctx_max - 20k line - New services/system-prompt.ts bullet documenting the v1.13.8 fingerprint instrumentation surface - New services/inference/budget.ts bullet with current BUDGET_NO_AGENT=30 and read-only-tools rationale	2026-05-22 14:07:11 +00:00