v1.13.10: per-tool token cost accounting (rolling 100-call view)

Surfaces per-tool prompt/completion-token rolling averages in
AgentPicker for at-a-glance agent-cost hints. Implementation is a
SQL view on top of messages_with_parts plus a read endpoint and
AgentPicker tooltip extension. No new write site; all source data
already lands via the existing tool-phase.ts:94-95 / error-handler.ts:
109-110 / sentinel-summaries.ts UPDATEs that v1.13.7's includeUsage:
true fix made non-NULL.

(1) schema.sql — new tool_cost_stats view. Window-functions over
messages_with_parts.tool_calls with LATERAL jsonb_array_elements.
Attribution: equal split — multi-tool turn divides tokens N-ways;
the 100-call rolling mean absorbs split noise. Filters: status=
'complete' + metadata.kind NOT IN ('cap_hit','doom_loop') exclude
failed turns and sentinels respectively; tool_calls IS NOT NULL is
defense-in-depth since sentinels are role='system' rows. CREATE OR
REPLACE means schema apply is idempotent.

(2) routes/tools.ts NEW + index.ts wire-in. GET /api/tools/cost_stats
returns { stats: ToolCostStat[] } with mean_prompt_tokens / mean_
completion_tokens computed at read time (sum / n_calls). Sorted by
tool_name ASC. No pagination — ≤30 tools.

(3) __tests__/tool_cost_stats.test.ts NEW — 7 integration tests
keyed off DATABASE_URL env var. Tests skip gracefully when unset
(no-DB default). beforeAll applies the schema via sql.unsafe(read
FileSync(schema.sql)) for self-contained runs. Helper insertAssistant
Turn shared across cases. Covers: empty state, single-tool attribution,
multi-tool equal split, 100-call FIFO window, NULL-tokens exclusion,
parts-authoritative read via messages_with_parts, failed/sentinel
exclusion.

(4) web/api/types.ts + client.ts — ToolCostStat interface + api.tools.
costStats() method binding.

(5) AgentPicker.tsx — fetch costStats on mount, compute per-agent
sum-of-means across whitelisted tools, render muted cost line below
description: "~5.2k prompt / 280 completion · 6/8 tools · last call
3h ago". Skips line entirely when no tool history; preserves existing
native title= for layout backward-compat. formatK/formatAgo colocated.

Tests: 202/202 pass (195 prior + 7 new view-integration). Server +
web tsc clean.

Smoke: schema applied cleanly; GET /api/tools/cost_stats returns
canonical JSON; view + endpoint agree. Single-row result expected
given the v1.13.1-A → v1.13.7 NULL latent regression window; new
traffic populates organically.

Roadmap row at boocode_roadmap.md:114 plus schema row at :474 both
match. View vs table decision documented in handoff_v1.13.10_per_
tool_cost.md (rollback-safe, microsecond-fast at BooCode scale).

~270 LoC across 8 files (5 modified + 3 new).
This commit is contained in:
2026-05-22 14:42:09 +00:00
parent 8126d78b34
commit 9ce638c916
8 changed files with 896 additions and 21 deletions

View File

@@ -16,6 +16,7 @@ import { registerWebSocket } from './routes/ws.js';
import { registerModelRoutes } from './routes/models.js';
import { registerAgentRoutes } from './routes/agents.js';
import { registerSkillsRoutes } from './routes/skills.js';
import { registerToolsRoutes } from './routes/tools.js';
import { createInferenceRunner } from './services/inference/index.js';
import { createBroker } from './services/broker.js';
import { listSkills } from './services/skills.js';
@@ -83,6 +84,7 @@ async function main() {
registerAgentRoutes(app, sql);
registerSidebarRoutes(app, sql);
registerChatRoutes(app, sql, broker);
registerToolsRoutes(app, sql);
// Batch 9.6: warm the skills cache at boot and surface the count. Empty or
// missing /data/skills is non-fatal — the skill tools just return empty.

View File

@@ -0,0 +1,40 @@
import type { FastifyInstance } from 'fastify';
import type { Sql } from '../db.js';
export interface ToolCostStat {
tool_name: string;
mean_prompt_tokens: number;
mean_completion_tokens: number;
n_calls: number;
updated_at: string;
}
// v1.13.10: per-tool token cost rolling window read endpoint. Backed by the
// tool_cost_stats view in schema.sql (last 100 calls per tool, equal-split
// attribution across multi-tool turns, sentinel/failed-turn excluded).
// Consumed by AgentPicker for at-a-glance per-agent cost hints.
export function registerToolsRoutes(app: FastifyInstance, sql: Sql): void {
app.get('/api/tools/cost_stats', async () => {
const rows = await sql<
{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
updated_at: string;
}[]
>`
SELECT tool_name, prompt_tokens_sum, completion_tokens_sum, n_calls, updated_at
FROM tool_cost_stats
ORDER BY tool_name ASC
`;
const stats: ToolCostStat[] = rows.map((r) => ({
tool_name: r.tool_name,
mean_prompt_tokens: Math.round(r.prompt_tokens_sum / r.n_calls),
mean_completion_tokens: Math.round(r.completion_tokens_sum / r.n_calls),
n_calls: r.n_calls,
updated_at: r.updated_at,
}));
return { stats };
});
}

View File

@@ -119,6 +119,68 @@ SELECT
WHERE p.message_id = m.id AND p.kind = 'reasoning' AND p.hidden_at IS NULL) AS reasoning_parts
FROM messages m;
-- v1.13.10: per-tool token cost rolling window. Derives from
-- messages_with_parts (the v1.13.1-B view that COALESCEs message_parts over
-- the legacy JSON column) so this works whether the chat predates v1.13.0
-- or postdates v1.13.2 (column drop). No new write site — all source data
-- already lands via the existing tool-phase.ts:94-95 UPDATE.
--
-- Attribution model: equal split. A turn emitting N tool calls divides its
-- prompt/completion tokens by N before attribution. See v1.13.10 dispatch
-- brief for rationale + rejected alternatives.
--
-- Column mapping: messages.ctx_used = prompt (input), messages.tokens_used
-- = completion (output). Non-obvious naming; pinned via canonical writes at
-- tool-phase.ts:94-95 et al.
--
-- Filtering rationale:
-- status='complete' — exclude failed/cancelled (defense in
-- depth; failed-path doesn't write
-- tokens_used so they're filtered
-- indirectly too).
-- metadata->>'kind' exclusions — exclude cap_hit / doom_loop sentinels
-- (defense in depth; sentinels are
-- role='system' with tool_calls=NULL
-- so they're filtered indirectly too).
-- experimental_repairToolCall — no special handling; retries flow
-- as normal next-turn tool_result
-- errors and count naturally.
--
-- Rolling window: last 100 calls per tool_name, ordered by created_at DESC.
-- Aggregate-on-read is microseconds at BooCode scale (single user, ~30
-- tools, < 100 calls each). DROP VIEW + recreate to change window size.
CREATE OR REPLACE VIEW tool_cost_stats AS
WITH per_call AS (
SELECT
(tc->>'name')::text AS tool_name,
(m.ctx_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS prompt_tokens,
(m.tokens_used::float / NULLIF(jsonb_array_length(m.tool_calls), 0)) AS completion_tokens,
m.created_at,
ROW_NUMBER() OVER (
PARTITION BY (tc->>'name')::text
ORDER BY m.created_at DESC
) AS rn
FROM messages_with_parts m,
LATERAL jsonb_array_elements(m.tool_calls) AS tc
WHERE m.tool_calls IS NOT NULL
AND jsonb_array_length(m.tool_calls) > 0
AND m.tokens_used IS NOT NULL
AND m.ctx_used IS NOT NULL
AND m.status = 'complete'
AND (m.metadata IS NULL
OR m.metadata->>'kind' IS NULL
OR m.metadata->>'kind' NOT IN ('cap_hit', 'doom_loop'))
)
SELECT
tool_name,
ROUND(SUM(prompt_tokens))::int AS prompt_tokens_sum,
ROUND(SUM(completion_tokens))::int AS completion_tokens_sum,
COUNT(*)::int AS n_calls,
MAX(created_at) AS updated_at
FROM per_call
WHERE rn <= 100
GROUP BY tool_name;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS tokens_used INTEGER;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_used INTEGER;
ALTER TABLE messages ADD COLUMN IF NOT EXISTS ctx_max INTEGER;

View File

@@ -0,0 +1,228 @@
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import postgres from 'postgres';
import { readFileSync } from 'node:fs';
import { resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
// v1.13.10: integration tests for the tool_cost_stats view. Skipped unless
// DATABASE_URL is set so they don't break `pnpm test` on a fresh checkout.
// Run with:
// DATABASE_URL=postgres://boocode:<pw>@localhost:5500/boocode pnpm -C apps/server test
//
// Isolation: each test uses a unique tool_name suffix derived from a per-test
// counter. The view aggregates globally across all chats, so without unique
// tool names parallel test runs would interfere. Cleanup deletes by tool_name
// suffix in afterAll.
const DB_URL = process.env.DATABASE_URL;
const describeFn = DB_URL ? describe : describe.skip;
const TEST_RUN_ID = `v13_10_${Date.now()}`;
const tname = (suffix: string) => `${TEST_RUN_ID}_${suffix}`;
describeFn('tool_cost_stats view (v1.13.10)', () => {
let sql: ReturnType<typeof postgres>;
let projectId: string;
let sessionId: string;
let chatId: string;
beforeAll(async () => {
if (!DB_URL) return;
sql = postgres(DB_URL, { max: 2, idle_timeout: 5, connect_timeout: 5, onnotice: () => {} });
// Apply the schema before fixtures so the view exists. Idempotent via
// CREATE OR REPLACE VIEW + CREATE TABLE IF NOT EXISTS; safe to run on a
// pre-populated DB. Mirrors apps/server/src/db.ts:applySchema.
const here = fileURLToPath(import.meta.url);
const schemaPath = resolve(here, '../../../schema.sql');
const ddl = readFileSync(schemaPath, 'utf8');
await sql.unsafe(ddl);
// Fixture project + session + chat for all inserts in this file.
const proj = await sql<{ id: string }[]>`
INSERT INTO projects (name, path)
VALUES (${`tool_cost_stats_test_${TEST_RUN_ID}`}, ${`/tmp/${TEST_RUN_ID}`})
RETURNING id
`;
projectId = proj[0]!.id;
const sess = await sql<{ id: string }[]>`
INSERT INTO sessions (project_id, name, model)
VALUES (${projectId}, ${'test'}, ${'test-model'})
RETURNING id
`;
sessionId = sess[0]!.id;
const chat = await sql<{ id: string }[]>`
INSERT INTO chats (session_id, name) VALUES (${sessionId}, ${'test'}) RETURNING id
`;
chatId = chat[0]!.id;
});
afterAll(async () => {
if (!DB_URL) return;
// Project FK CASCADE cleans sessions/chats/messages/parts in one shot.
await sql`DELETE FROM projects WHERE id = ${projectId}`;
await sql.end({ timeout: 5 });
});
async function insertAssistantTurn(opts: {
toolNames: string[];
tokensUsed: number | null;
ctxUsed: number | null;
status?: 'streaming' | 'complete' | 'failed' | 'cancelled';
metadata?: { kind: string } | null;
createdAt?: Date;
}): Promise<string> {
const toolCalls = opts.toolNames.map((name, i) => ({
id: `call_${TEST_RUN_ID}_${name}_${i}`,
name,
args: {},
}));
const created = opts.createdAt ?? new Date();
const rows = await sql<{ id: string }[]>`
INSERT INTO messages (
session_id, chat_id, role, content, kind, status,
tool_calls, tokens_used, ctx_used,
metadata, created_at
)
VALUES (
${sessionId}, ${chatId}, 'assistant', '', 'message',
${opts.status ?? 'complete'},
${sql.json(toolCalls as never)},
${opts.tokensUsed},
${opts.ctxUsed},
${opts.metadata ? sql.json(opts.metadata as never) : null},
${created}
)
RETURNING id
`;
return rows[0]!.id;
}
it('returns empty when no tool calls exist for a tool name', async () => {
const t = tname('absent');
const stats = await sql<{ tool_name: string }[]>`
SELECT * FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stats).toEqual([]);
});
it('attributes single-tool turn fully to that tool', async () => {
const t = tname('single');
await insertAssistantTurn({ toolNames: [t], tokensUsed: 300, ctxUsed: 15000 });
const stats = await sql<{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
}[]>`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stats[0]).toMatchObject({
tool_name: t,
prompt_tokens_sum: 15000,
completion_tokens_sum: 300,
n_calls: 1,
});
});
it('splits multi-tool turn equally across tools', async () => {
const a = tname('multi_a');
const b = tname('multi_b');
const c = tname('multi_c');
// 3 tools, 300 completion / 15000 prompt → each gets 100 / 5000
await insertAssistantTurn({ toolNames: [a, b, c], tokensUsed: 300, ctxUsed: 15000 });
const stats = await sql<{
tool_name: string;
prompt_tokens_sum: number;
completion_tokens_sum: number;
n_calls: number;
}[]>`
SELECT * FROM tool_cost_stats
WHERE tool_name IN (${a}, ${b}, ${c})
ORDER BY tool_name
`;
expect(stats).toHaveLength(3);
for (const s of stats) {
expect(s.completion_tokens_sum).toBe(100);
expect(s.prompt_tokens_sum).toBe(5000);
expect(s.n_calls).toBe(1);
}
});
it('limits to last 100 calls per tool (FIFO window)', async () => {
const t = tname('window');
// Insert 110 turns with monotonically-increasing created_at and tokensUsed.
// Expect view to keep only the most recent 100.
const base = Date.now() + 1_000_000; // distant future to avoid colliding with other tests
for (let i = 1; i <= 110; i++) {
await insertAssistantTurn({
toolNames: [t],
tokensUsed: i, // 1..110
ctxUsed: i * 10,
createdAt: new Date(base + i),
});
}
const [stat] = await sql<{
n_calls: number;
completion_tokens_sum: number;
}[]>`SELECT n_calls, completion_tokens_sum FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stat!.n_calls).toBe(100);
// Last 100 are tokensUsed=11..110, sum = (11+110)*100/2 = 6050.
expect(stat!.completion_tokens_sum).toBe(6050);
});
it('excludes turns with NULL tokens_used (pre-v1.13.7 latent regression)', async () => {
const t = tname('null_tokens');
await insertAssistantTurn({ toolNames: [t], tokensUsed: null, ctxUsed: 1000 });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: null });
const stats = await sql`SELECT * FROM tool_cost_stats WHERE tool_name = ${t}`;
expect(stats).toEqual([]);
});
it('excludes failed/cancelled turns and cap_hit/doom_loop sentinel rows', async () => {
const t = tname('filtered');
// A: status='failed' — excluded
// B: status='cancelled' — excluded
// C: status='complete', metadata={kind:'cap_hit'} — excluded
// D: status='complete', metadata={kind:'doom_loop'} — excluded
// E: status='complete', metadata=null — included
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'failed' });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, status: 'cancelled' });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'cap_hit' } });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: { kind: 'doom_loop' } });
await insertAssistantTurn({ toolNames: [t], tokensUsed: 100, ctxUsed: 1000, metadata: null });
const [stat] = await sql<{ n_calls: number }[]>`
SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stat!.n_calls).toBe(1);
});
it('reads tool_calls via messages_with_parts (parts-authoritative)', async () => {
const t = tname('parts');
// Insert an assistant row with messages.tool_calls=NULL but a
// message_parts row carrying the tool_call. The view reads via
// messages_with_parts, which COALESCEs the parts table over the legacy
// column — so this row should still aggregate.
const rows = await sql<{ id: string }[]>`
INSERT INTO messages (
session_id, chat_id, role, content, kind, status,
tool_calls, tokens_used, ctx_used
)
VALUES (
${sessionId}, ${chatId}, 'assistant', '', 'message', 'complete',
NULL, 200, 5000
)
RETURNING id
`;
const messageId = rows[0]!.id;
await sql`
INSERT INTO message_parts (message_id, sequence, kind, payload)
VALUES (
${messageId}, 0, 'tool_call',
${sql.json({ id: `tc_parts_${TEST_RUN_ID}`, name: t, args: {} } as never)}
)
`;
const [stat] = await sql<{ n_calls: number }[]>`
SELECT n_calls FROM tool_cost_stats WHERE tool_name = ${t}
`;
expect(stat!.n_calls).toBe(1);
});
});