v1.13.10: per-tool token cost accounting (rolling 100-call view)

Surfaces per-tool prompt/completion-token rolling averages in AgentPicker for at-a-glance agent-cost hints. Implementation is a SQL view on top of messages_with_parts plus a read endpoint and AgentPicker tooltip extension. No new write site; all source data already lands via the existing tool-phase.ts:94-95 / error-handler.ts: 109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage: true fix made non-NULL. (1) schema.sql — new tool_cost_stats view. Window-functions over messages_with_parts.tool_calls with LATERAL jsonb_array_elements. Attribution: equal split — multi-tool turn divides tokens N-ways; the 100-call rolling mean absorbs split noise. Filters: status= 'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude failed turns and sentinels respectively; tool_calls IS NOT NULL is defense-in-depth since sentinels are role='system' rows. CREATE OR REPLACE means schema apply is idempotent. (2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_ completion_tokens computed at read time (sum / n_calls). Sorted by tool_name ASC. No pagination — ≤30 tools. (3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests keyed off DATABASE_URL env var. Tests skip gracefully when unset (no-DB default). beforeAll applies the schema via sql.unsafe(read FileSync(schema.sql)) for self-contained runs. Helper insertAssistant Turn shared across cases. Covers: empty state, single-tool attribution, multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion, parts-authoritative read via messages_with_parts, failed/sentinel exclusion. (4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools. costStats() method binding. (5) AgentPicker.tsx — fetch costStats on mount, compute per-agent sum-of-means across whitelisted tools, render muted cost line below description: "~5.2k prompt / 280 completion · 6/8 tools · last call 3h ago". Skips line entirely when no tool history; preserves existing native title= for layout backward-compat. formatK/formatAgo colocated. Tests: 202/202 pass (195 prior + 7 new view-integration). Server + web tsc clean. Smoke: schema applied cleanly; GET /api/tools/cost_stats returns canonical JSON; view + endpoint agree. Single-row result expected given the v1.13.1-A → v1.13.7 NULL latent regression window; new traffic populates organically. Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both match. View vs table decision documented in handoff_v1.13.10_per_ tool_cost.md (rollback-safe, microsecond-fast at BooCode scale). ~270 LoC across 8 files (5 modified + 3 new).
2026-05-22 14:42:09 +00:00
parent 8126d78b34
commit 9ce638c916
8 changed files with 896 additions and 21 deletions
--- a/apps/server/src/schema.sql
+++ b/apps/server/src/schema.sql
@@ -119,6 +119,68 @@ SELECT
    WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
 FROM messages m;

+-- v1.13.10: per-tool token cost rolling window. Derives from
+-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
+-- the legacy JSON column) so this works whether the chat predates v1.13.0
+-- or postdates v1.13.2 (column drop). No new write site — all source data
+-- already lands via the existing tool-phase.ts:94-95 UPDATE.
+--
+-- Attribution model: equal split. A turn emitting N tool calls divides its
+-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
+-- brief for rationale + rejected alternatives.
+--
+-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
+-- = completion (output). Non-obvious naming; pinned via canonical writes at
+-- tool-phase.ts:94-95 et al.
+--
+-- Filtering rationale:
+--   status='complete'                — exclude failed/cancelled (defense in
+--                                      depth; failed-path doesn't write
+--                                      tokens_used so they're filtered
+--                                      indirectly too).
+--   metadata->>'kind' exclusions     — exclude cap_hit / doom_loop sentinels
+--                                      (defense in depth; sentinels are
+--                                      role='system' with tool_calls=NULL
+--                                      so they're filtered indirectly too).
+--   experimental_repairToolCall      — no special handling; retries flow
+--                                      as normal next-turn tool_result
+--                                      errors and count naturally.
+--
+-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
+-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
+-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
+CREATE OR REPLACE VIEW tool_cost_stats AS
+WITH per_call AS (
+  SELECT
+    (tc->>'name')::text AS tool_name,
+    (m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
+    (m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
+    m.created_at,
+    ROW_NUMBER() OVER (
+      PARTITION BY (tc->>'name')::text
+      ORDER BY m.created_at DESC
+    ) AS rn
+  FROM messages_with_parts m,
+    LATERAL jsonb_array_elements(m.tool_calls) AS tc
+  WHERE m.tool_calls IS NOT NULL
+    AND jsonb_array_length(m.tool_calls) > 0
+    AND m.tokens_used IS NOT NULL
+    AND m.ctx_used IS NOT NULL
+    AND m.status = 'complete'
+    AND (m.metadata IS NULL
+         OR m.metadata->>'kind' IS NULL
+         OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
+)
+SELECT
+  tool_name,
+  ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
+  ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
+  COUNT(*)::int AS n_calls,
+  MAX(created_at) AS updated_at
+FROM per_call
+WHERE rn <= 100
+GROUP BY tool_name;
+
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
 ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;